Task2: Restaurant Recommendation Objective: Create a restaurant recommendation system based on user preferences Steps: Preprocess the dataset by handling missing values and encoding categorical variables. Determine the criteria for restaurant recommendations (e.g., cuisine preference, price range). Implement a content-based filtering approach where users are recommended restaurants similar to their preferred criteria. Test the recommendation system by providing sample user preferences and evaluating the quality of recommendations. Task: Restaurant Recommendation.

In [386]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import jaccard_score
from scipy.spatial.distance import pdist, squareform

In [387]:
import warnings
warnings.filterwarnings("ignore")

In [388]:
df=pd.read_csv("C:\\Users\\Mpatt\\OneDrive\\Desktop\\AssignmentDATASet\\Dataset.csv")
print(df)

      Restaurant ID           Restaurant Name  Country Code              City  \
0           6317637          Le Petit Souffle           162       Makati City   
1           6304287          Izakaya Kikufuji           162       Makati City   
2           6300002    Heat - Edsa Shangri-La           162  Mandaluyong City   
3           6318506                      Ooma           162  Mandaluyong City   
4           6314302               Sambo Kojin           162  Mandaluyong City   
...             ...                       ...           ...               ...   
9546        5915730               Naml۱ Gurme           208         ��stanbul   
9547        5908749              Ceviz A��ac۱           208         ��stanbul   
9548        5915807                     Huqqa           208         ��stanbul   
9549        5916112               A���k Kahve           208         ��stanbul   
9550        5927402  Walter's Coffee Roastery           208         ��stanbul   

                           

In [389]:
df.isnull().sum()

Restaurant ID           0
Restaurant Name         0
Country Code            0
City                    0
Address                 0
Locality                0
Locality Verbose        0
Longitude               0
Latitude                0
Cuisines                9
Average Cost for two    0
Currency                0
Has Table booking       0
Has Online delivery     0
Is delivering now       0
Switch to order menu    0
Price range             0
Aggregate rating        0
Rating color            0
Rating text             0
Votes                   0
dtype: int64

In [390]:
# Handle missing values
df.fillna(method='ffill', inplace=True)
df= df.replace("�", "",  regex=True)
print(df)

      Restaurant ID           Restaurant Name  Country Code              City  \
0           6317637          Le Petit Souffle           162       Makati City   
1           6304287          Izakaya Kikufuji           162       Makati City   
2           6300002    Heat - Edsa Shangri-La           162  Mandaluyong City   
3           6318506                      Ooma           162  Mandaluyong City   
4           6314302               Sambo Kojin           162  Mandaluyong City   
...             ...                       ...           ...               ...   
9546        5915730               Naml۱ Gurme           208           stanbul   
9547        5908749                Ceviz Aac۱           208           stanbul   
9548        5915807                     Huqqa           208           stanbul   
9549        5916112                  Ak Kahve           208           stanbul   
9550        5927402  Walter's Coffee Roastery           208           stanbul   

                           

In [391]:
df.columns

Index(['Restaurant ID', 'Restaurant Name', 'Country Code', 'City', 'Address',
       'Locality', 'Locality Verbose', 'Longitude', 'Latitude', 'Cuisines',
       'Average Cost for two', 'Currency', 'Has Table booking',
       'Has Online delivery', 'Is delivering now', 'Switch to order menu',
       'Price range', 'Aggregate rating', 'Rating color', 'Rating text',
       'Votes'],
      dtype='object')

Normalise

In [392]:
df = df[['Restaurant ID','Restaurant Name','Cuisines','Aggregate rating','Votes']]
df

Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Aggregate rating,Votes
0,6317637,Le Petit Souffle,"French, Japanese, Desserts",4.8,314
1,6304287,Izakaya Kikufuji,Japanese,4.5,591
2,6300002,Heat - Edsa Shangri-La,"Seafood, Asian, Filipino, Indian",4.4,270
3,6318506,Ooma,"Japanese, Sushi",4.9,365
4,6314302,Sambo Kojin,"Japanese, Korean",4.8,229
...,...,...,...,...,...
9546,5915730,Naml۱ Gurme,Turkish,4.1,788
9547,5908749,Ceviz Aac۱,"World Cuisine, Patisserie, Cafe",4.2,1034
9548,5915807,Huqqa,"Italian, World Cuisine",3.7,661
9549,5916112,Ak Kahve,Restaurant Cafe,4.0,901


# Cleaning

In [393]:
descData = pd.DataFrame({
    'Column': df.columns,
    'Data Type': df.dtypes.values,
    'Missing Value': df.isna().sum().values,
    'Pct Missing Value': (df.isna().mean() * 100).round(2).values,
    'Num Unique': df.nunique().values,
    'Unique Sample': [df[col].dropna().sample(2, random_state=42).tolist() if df[col].nunique() > 1 else df[col].dropna().unique().tolist() for col in df.columns]
})

print(descData)

             Column Data Type  Missing Value  Pct Missing Value  Num Unique  \
0     Restaurant ID     int64              0                0.0        9551   
1   Restaurant Name    object              0                0.0        7446   
2          Cuisines    object              0                0.0        1825   
3  Aggregate rating   float64              0                0.0          33   
4             Votes     int64              0                0.0        1012   

                         Unique Sample  
0                     [3918, 18408054]  
1    [Wah Ji Wah, 19 Flavours Biryani]  
2  [North Indian, Mughlai, Hyderabadi]  
3                           [2.1, 4.1]  
4                             [54, 84]  


In [394]:
df = df.dropna()
df

Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Aggregate rating,Votes
0,6317637,Le Petit Souffle,"French, Japanese, Desserts",4.8,314
1,6304287,Izakaya Kikufuji,Japanese,4.5,591
2,6300002,Heat - Edsa Shangri-La,"Seafood, Asian, Filipino, Indian",4.4,270
3,6318506,Ooma,"Japanese, Sushi",4.9,365
4,6314302,Sambo Kojin,"Japanese, Korean",4.8,229
...,...,...,...,...,...
9546,5915730,Naml۱ Gurme,Turkish,4.1,788
9547,5908749,Ceviz Aac۱,"World Cuisine, Patisserie, Cafe",4.2,1034
9548,5915807,Huqqa,"Italian, World Cuisine",3.7,661
9549,5916112,Ak Kahve,Restaurant Cafe,4.0,901


In [395]:
df.duplicated().sum()
df['Restaurant Name'].duplicated().sum()
df['Restaurant Name'].value_counts()


Restaurant Name
Cafe Coffee Day             83
Domino's Pizza              79
Subway                      63
Green Chick Chop            51
McDonald's                  48
                            ..
Odeon Social                 1
Johnny Rockets               1
House of Commons             1
HotMess                      1
Walter's Coffee Roastery     1
Name: count, Length: 7446, dtype: int64

In [396]:
df = df.sort_values(by=['Restaurant Name','Aggregate rating'],ascending=False)


In [397]:
df[df['Restaurant Name']=="Domino's Pizza"].head()

Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Aggregate rating,Votes
3031,143,Domino's Pizza,"Pizza, Fast Food",3.7,336
1844,5065,Domino's Pizza,"Pizza, Fast Food",3.6,146
2448,15078,Domino's Pizza,"Pizza, Fast Food",3.6,86
7618,18263236,Domino's Pizza,"Pizza, Fast Food",3.6,24
8437,384,Domino's Pizza,"Pizza, Fast Food",3.6,547


In [398]:
df = df.drop_duplicates('Restaurant Name',keep='first')
df

Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Aggregate rating,Votes
3120,18222559,{Niche} - Cafe & Bar,"North Indian, Chinese, Italian, Continental",4.1,492
9334,7100938,wagamama,"Japanese, Asian",3.7,131
9523,6000871,ukuraa Sofras۱,"Kebab, Izgara",4.4,296
9454,6401789,tashas,"Cafe, Mediterranean",4.1,374
4659,18361747,t Lounge by Dilmah,"Cafe, Tea, Desserts",3.6,34
...,...,...,...,...,...
8692,18317511,#Urban Caf,"North Indian, Chinese, Italian",3.3,49
6998,18336489,#OFF Campus,"Cafe, Continental, Italian, Fast Food",3.7,216
2613,18311951,#InstaFreeze,Ice Cream,0.0,2
9148,18378803,#Dilliwaala6,North Indian,3.7,124


In [399]:
df['Restaurant Name'].value_counts()

Restaurant Name
{Niche} - Cafe & Bar        1
French Toast                1
Fourteen Eleven Tea Cafe    1
Fozzie's Pizzaiolo          1
Frasers                     1
                           ..
Pizza Point                 1
Pizza Station               1
Pizza Street                1
Pizza Treat                 1
#45                         1
Name: count, Length: 7446, dtype: int64

In [400]:
df = df[df['Aggregate rating']>=4.0]
df

Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Aggregate rating,Votes
3120,18222559,{Niche} - Cafe & Bar,"North Indian, Chinese, Italian, Continental",4.1,492
9523,6000871,ukuraa Sofras۱,"Kebab, Izgara",4.4,296
9454,6401789,tashas,"Cafe, Mediterranean",4.1,374
9385,6113857,sketch Gallery,"British, Contemporary",4.5,148
1837,18418247,feel ALIVE,"North Indian, American, Asian, Biryani",4.7,69
...,...,...,...,...,...
1468,18408054,19 Flavours Biryani,"Mughlai, Hyderabadi",4.1,84
2484,18233317,145 Kala Ghoda,"Fast Food, Beverages, Desserts",4.2,1606
2292,2100784,11th Avenue Cafe Bistro,"Cafe, American, Italian, Continental",4.1,377
751,2600031,10 Downing Street,"North Indian, Chinese",4.0,257


In [401]:
# Split Cuisines into list
df['Cuisines'] = df['Cuisines'].str.split(', ')
df

Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Aggregate rating,Votes
3120,18222559,{Niche} - Cafe & Bar,"[North Indian, Chinese, Italian, Continental]",4.1,492
9523,6000871,ukuraa Sofras۱,"[Kebab, Izgara]",4.4,296
9454,6401789,tashas,"[Cafe, Mediterranean]",4.1,374
9385,6113857,sketch Gallery,"[British, Contemporary]",4.5,148
1837,18418247,feel ALIVE,"[North Indian, American, Asian, Biryani]",4.7,69
...,...,...,...,...,...
1468,18408054,19 Flavours Biryani,"[Mughlai, Hyderabadi]",4.1,84
2484,18233317,145 Kala Ghoda,"[Fast Food, Beverages, Desserts]",4.2,1606
2292,2100784,11th Avenue Cafe Bistro,"[Cafe, American, Italian, Continental]",4.1,377
751,2600031,10 Downing Street,"[North Indian, Chinese]",4.0,257


In [402]:
lens = df['Cuisines'].str.len()  # Get the length of each list in 'Cuisines'
df = pd.DataFrame({
    'Aggregate rating':np.repeat(df['Aggregate rating'].values, lens),
    'Restaurant Name': np.repeat(df['Restaurant Name'].values, lens),
    'Cuisines': np.concatenate(df['Cuisines'].values)
})

print(df)


      Aggregate rating          Restaurant Name      Cuisines
0                  4.1     {Niche} - Cafe & Bar  North Indian
1                  4.1     {Niche} - Cafe & Bar       Chinese
2                  4.1     {Niche} - Cafe & Bar       Italian
3                  4.1     {Niche} - Cafe & Bar   Continental
4                  4.4           ukuraa Sofras۱         Kebab
...                ...                      ...           ...
2972               4.1  11th Avenue Cafe Bistro       Italian
2973               4.1  11th Avenue Cafe Bistro   Continental
2974               4.0        10 Downing Street  North Indian
2975               4.0        10 Downing Street       Chinese
2976               4.5                   'Ohana      Hawaiian

[2977 rows x 3 columns]


In [403]:
print(df.columns)


Index(['Aggregate rating', 'Restaurant Name', 'Cuisines'], dtype='object')


In [404]:
df['Cuisines'].value_counts()

Cuisines
North Indian     270
Italian          237
Chinese          200
Continental      199
Cafe             177
                ... 
D_ner              1
Awadhi             1
Irish              1
Maharashtrian      1
Asian Fusion       1
Name: count, Length: 128, dtype: int64

In [405]:
# Cross Tabulate Restaurant Name and Cuisines
TabRestoCuisines = pd.crosstab(df['Restaurant Name'],
                                df['Cuisines'])
TabRestoCuisines

Cuisines,Afghani,African,American,Andhra,Arabian,Argentine,Asian,Asian Fusion,Australian,Awadhi,...,Teriyaki,Tex-Mex,Thai,Tibetan,Turkish,Turkish Pizza,Vegetarian,Vietnamese,Western,World Cuisine
Restaurant Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Ohana,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10 Downing Street,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
11th Avenue Cafe Bistro,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
145 Kala Ghoda,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
19 Flavours Biryani,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
feel ALIVE,0,0,1,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
sketch Gallery,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
tashas,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
ukuraa Sofras۱,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [406]:
print(df.columns)

Index(['Aggregate rating', 'Restaurant Name', 'Cuisines'], dtype='object')


In [407]:
TabRestoCuisines.loc['feel ALIVE'].values

array([0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64)

In [408]:
# Extract the values for the two rows
olive_bistro = TabRestoCuisines.loc["Olive Bistro"].values
rose_cafe = TabRestoCuisines.loc["Rose Cafe"].values

# Calculate the intersection and union
intersection = np.sum(np.minimum(olive_bistro, rose_cafe))
union = np.sum(np.maximum(olive_bistro, rose_cafe))

# Compute the Jaccard score
jaccard_score_custom = intersection / union

print(jaccard_score_custom)

0.3333333333333333


In [409]:
# Create Similarity Value DF
jaccardDist = pdist(TabRestoCuisines.values, metric='jaccard')
jaccardMatrix = squareform(jaccardDist)
jaccardSim = 1 - jaccardMatrix
dfJaccard = pd.DataFrame(
    jaccardSim,
    index=TabRestoCuisines.index,
    columns=TabRestoCuisines.index)

dfJaccard

Restaurant Name,'Ohana,10 Downing Street,11th Avenue Cafe Bistro,145 Kala Ghoda,19 Flavours Biryani,1918 Bistro & Grill,2 Dog,22nd Parallel,3 Wise Monkeys,38 Barracks,...,Zoeys Pizzeria,Zolocrust - Hotel Clarks Amer,Zombie Burger + Drink Lab,Zuka Choco-la,Zunzi's,feel ALIVE,sketch Gallery,tashas,ukuraa Sofras۱,{Niche} - Cafe & Bar
Restaurant Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Ohana,1.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.00,0.000000,0.0,0.0,0.0,0.000000
10 Downing Street,0.0,1.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.200000,...,0.0,0.0,0.0,0.000000,0.00,0.200000,0.0,0.0,0.0,0.500000
11th Avenue Cafe Bistro,0.0,0.0,1.000000,0.0,0.0,0.0,0.166667,0.0,0.0,0.333333,...,0.0,0.4,0.0,0.000000,0.00,0.142857,0.0,0.2,0.0,0.333333
145 Kala Ghoda,0.0,0.0,0.000000,1.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.2,0.333333,0.00,0.000000,0.0,0.0,0.0,0.000000
19 Flavours Biryani,0.0,0.0,0.000000,0.0,1.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.00,0.000000,0.0,0.0,0.0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
feel ALIVE,0.0,0.2,0.142857,0.0,0.0,0.0,0.166667,0.0,0.0,0.600000,...,0.0,0.0,0.0,0.000000,0.00,1.000000,0.0,0.0,0.0,0.142857
sketch Gallery,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.00,0.000000,1.0,0.0,0.0,0.000000
tashas,0.0,0.0,0.200000,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.25,0.000000,0.0,1.0,0.0,0.000000
ukuraa Sofras۱,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.00,0.000000,0.0,0.0,1.0,0.000000


In [410]:
print(df.columns)

Index(['Aggregate rating', 'Restaurant Name', 'Cuisines'], dtype='object')


In [411]:
# Input Initial Restaurant Name
resto = 'Ooma'

sim = dfJaccard.loc[resto].sort_values(ascending=False)
sim = pd.DataFrame({'Restaurant Name': sim.index, 'simScore': sim.values})
sim = sim[(sim['Restaurant Name']!= resto) & (sim['simScore']>=0.7)].head(5)



In [412]:
# Merge The Rating
RestoRec = pd.merge(sim,df[['Restaurant Name','Aggregate rating']],how='inner',on='Restaurant Name')
FinalRestoRec = RestoRec.sort_values('Aggregate rating',ascending=False).drop_duplicates('Restaurant Name',keep='first')
FinalRestoRec

Unnamed: 0,Restaurant Name,simScore,Aggregate rating
0,Sushi Masa,1.0,4.9
2,Nobu,1.0,4.4
4,Ichiban,1.0,4.3
6,Nagai,1.0,4.3
8,Guppy,1.0,4.1


The data will present the top 5 recommended restaurants, all with ratings of 4 and above, guaranteeing that the recommendation system delivers high-quality and objectively excellent ratings.