## Task 2 : Restaurant Recommendation 



Objective: Create a restaurant recommendation system based on user preferences.


Steps:
- Preprocess the dataset by handling missing values and encoding categorical variables.
- Determine the criteria for restaurant recommendations (e.g., cuisine preference, price range).
- Implement a content-based filtering approach where users are recommended restaurants similar to their preferred criteria.
- Test the recommendation system by providing sample user preferences and evaluating the quality of recommendations.

In [8]:
#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


In [9]:
#load dataframe
df = pd.read_csv('https://raw.githubusercontent.com/laxmi-narayan-87/cognifyz_technologies/main/Dataset%20.csv')
df.head()

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
0,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City","Century City Mall, Poblacion, Makati City, Mak...",121.027535,14.565443,"French, Japanese, Desserts",...,Botswana Pula(P),Yes,No,No,No,3,4.8,Dark Green,Excellent,314
1,6304287,Izakaya Kikufuji,162,Makati City,"Little Tokyo, 2277 Chino Roces Avenue, Legaspi...","Little Tokyo, Legaspi Village, Makati City","Little Tokyo, Legaspi Village, Makati City, Ma...",121.014101,14.553708,Japanese,...,Botswana Pula(P),Yes,No,No,No,3,4.5,Dark Green,Excellent,591
2,6300002,Heat - Edsa Shangri-La,162,Mandaluyong City,"Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal...","Edsa Shangri-La, Ortigas, Mandaluyong City","Edsa Shangri-La, Ortigas, Mandaluyong City, Ma...",121.056831,14.581404,"Seafood, Asian, Filipino, Indian",...,Botswana Pula(P),Yes,No,No,No,4,4.4,Green,Very Good,270
3,6318506,Ooma,162,Mandaluyong City,"Third Floor, Mega Fashion Hall, SM Megamall, O...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.056475,14.585318,"Japanese, Sushi",...,Botswana Pula(P),No,No,No,No,4,4.9,Dark Green,Excellent,365
4,6314302,Sambo Kojin,162,Mandaluyong City,"Third Floor, Mega Atrium, SM Megamall, Ortigas...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.057508,14.58445,"Japanese, Korean",...,Botswana Pula(P),Yes,No,No,No,4,4.8,Dark Green,Excellent,229


In [10]:
dfR = df[['Restaurant ID','Restaurant Name','Cuisines','Price range','Aggregate rating','Votes']]
dfR


Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Price range,Aggregate rating,Votes
0,6317637,Le Petit Souffle,"French, Japanese, Desserts",3,4.8,314
1,6304287,Izakaya Kikufuji,Japanese,3,4.5,591
2,6300002,Heat - Edsa Shangri-La,"Seafood, Asian, Filipino, Indian",4,4.4,270
3,6318506,Ooma,"Japanese, Sushi",4,4.9,365
4,6314302,Sambo Kojin,"Japanese, Korean",4,4.8,229
...,...,...,...,...,...,...
9546,5915730,Naml۱ Gurme,Turkish,3,4.1,788
9547,5908749,Ceviz A��ac۱,"World Cuisine, Patisserie, Cafe",3,4.2,1034
9548,5915807,Huqqa,"Italian, World Cuisine",4,3.7,661
9549,5916112,A���k Kahve,Restaurant Cafe,4,4.0,901


In [11]:
#handle missing values
dfR.isna().sum()


Restaurant ID       0
Restaurant Name     0
Cuisines            9
Price range         0
Aggregate rating    0
Votes               0
dtype: int64

In [12]:
dfR = dfR.dropna()
dfR.isna().sum()

Restaurant ID       0
Restaurant Name     0
Cuisines            0
Price range         0
Aggregate rating    0
Votes               0
dtype: int64

In [13]:
dfR.duplicated().sum()


np.int64(0)

In [14]:
dfR['Restaurant Name'].duplicated().sum()



np.int64(2105)

In [15]:
dfR['Restaurant Name'].value_counts()

Restaurant Name
Cafe Coffee Day      83
Domino's Pizza       79
Subway               63
Green Chick Chop     51
McDonald's           48
                     ..
�ukura��a Sofras۱     1
Gaga Manjero          1
Cafemiz               1
Nusr-Et               1
Maori                 1
Name: count, Length: 7437, dtype: int64

In [16]:
#sorting the restaurants by name and rating
dfR = dfR.sort_values(by=['Restaurant Name','Aggregate rating'],ascending=False)
dfR.head()



Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Price range,Aggregate rating,Votes
9523,6000871,�ukura��a Sofras۱,"Kebab, Izgara",3,4.4,296
3120,18222559,{Niche} - Cafe & Bar,"North Indian, Chinese, Italian, Continental",3,4.1,492
9334,7100938,wagamama,"Japanese, Asian",4,3.7,131
9454,6401789,tashas,"Cafe, Mediterranean",4,4.1,374
4659,18361747,t Lounge by Dilmah,"Cafe, Tea, Desserts",2,3.6,34


In [17]:
dfR[dfR["Restaurant Name"]=="Cafe Coffee Day"].head()


Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Price range,Aggregate rating,Votes
6430,5595,Cafe Coffee Day,Cafe,1,3.6,58
8432,594,Cafe Coffee Day,Cafe,1,3.6,125
3946,305736,Cafe Coffee Day,Cafe,1,3.5,35
5877,8828,Cafe Coffee Day,Cafe,1,3.5,50
3001,596,Cafe Coffee Day,Cafe,1,3.4,277


In [18]:
#removing duplicate entries of same restaurant name
dfR = dfR.drop_duplicates('Restaurant Name',keep='first')
dfR


Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Price range,Aggregate rating,Votes
9523,6000871,�ukura��a Sofras۱,"Kebab, Izgara",3,4.4,296
3120,18222559,{Niche} - Cafe & Bar,"North Indian, Chinese, Italian, Continental",3,4.1,492
9334,7100938,wagamama,"Japanese, Asian",4,3.7,131
9454,6401789,tashas,"Cafe, Mediterranean",4,4.1,374
4659,18361747,t Lounge by Dilmah,"Cafe, Tea, Desserts",2,3.6,34
...,...,...,...,...,...,...
8692,18317511,#Urban Caf��,"North Indian, Chinese, Italian",2,3.3,49
6998,18336489,#OFF Campus,"Cafe, Continental, Italian, Fast Food",2,3.7,216
2613,18311951,#InstaFreeze,Ice Cream,1,0.0,2
9148,18378803,#Dilliwaala6,North Indian,3,3.7,124


In [19]:
dfR['Restaurant Name'].value_counts()


Restaurant Name
#45                       1
�ukura��a Sofras۱         1
{Niche} - Cafe & Bar      1
wagamama                  1
tashas                    1
                         ..
Zune - Piccadily Hotel    1
Zunzi's                   1
Zustt Yummy               1
Zync - Rosewood Hotel     1
bu��no                    1
Name: count, Length: 7437, dtype: int64

In [20]:
dfR = dfR[dfR['Aggregate rating']>3.9]
dfR

Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Price range,Aggregate rating,Votes
9523,6000871,�ukura��a Sofras۱,"Kebab, Izgara",3,4.4,296
3120,18222559,{Niche} - Cafe & Bar,"North Indian, Chinese, Italian, Continental",3,4.1,492
9454,6401789,tashas,"Cafe, Mediterranean",4,4.1,374
9385,6113857,sketch Gallery,"British, Contemporary",4,4.5,148
1837,18418247,feel ALIVE,"North Indian, American, Asian, Biryani",3,4.7,69
...,...,...,...,...,...,...
1468,18408054,19 Flavours Biryani,"Mughlai, Hyderabadi",2,4.1,84
2484,18233317,145 Kala Ghoda,"Fast Food, Beverages, Desserts",3,4.2,1606
2292,2100784,11th Avenue Cafe Bistro,"Cafe, American, Italian, Continental",2,4.1,377
751,2600031,10 Downing Street,"North Indian, Chinese",3,4.0,257


In [21]:
#splitting cuisines into list
dfR['Cuisines'] = dfR['Cuisines'].str.split(', ')
dfR

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfR['Cuisines'] = dfR['Cuisines'].str.split(', ')


Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Price range,Aggregate rating,Votes
9523,6000871,�ukura��a Sofras۱,"[Kebab, Izgara]",3,4.4,296
3120,18222559,{Niche} - Cafe & Bar,"[North Indian, Chinese, Italian, Continental]",3,4.1,492
9454,6401789,tashas,"[Cafe, Mediterranean]",4,4.1,374
9385,6113857,sketch Gallery,"[British, Contemporary]",4,4.5,148
1837,18418247,feel ALIVE,"[North Indian, American, Asian, Biryani]",3,4.7,69
...,...,...,...,...,...,...
1468,18408054,19 Flavours Biryani,"[Mughlai, Hyderabadi]",2,4.1,84
2484,18233317,145 Kala Ghoda,"[Fast Food, Beverages, Desserts]",3,4.2,1606
2292,2100784,11th Avenue Cafe Bistro,"[Cafe, American, Italian, Continental]",2,4.1,377
751,2600031,10 Downing Street,"[North Indian, Chinese]",3,4.0,257


In [22]:
dfR = dfR.explode('Cuisines')
dfR


Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Price range,Aggregate rating,Votes
9523,6000871,�ukura��a Sofras۱,Kebab,3,4.4,296
9523,6000871,�ukura��a Sofras۱,Izgara,3,4.4,296
3120,18222559,{Niche} - Cafe & Bar,North Indian,3,4.1,492
3120,18222559,{Niche} - Cafe & Bar,Chinese,3,4.1,492
3120,18222559,{Niche} - Cafe & Bar,Italian,3,4.1,492
...,...,...,...,...,...,...
2292,2100784,11th Avenue Cafe Bistro,Italian,2,4.1,377
2292,2100784,11th Avenue Cafe Bistro,Continental,2,4.1,377
751,2600031,10 Downing Street,North Indian,3,4.0,257
751,2600031,10 Downing Street,Chinese,3,4.0,257


In [23]:
dfR['Cuisines'].value_counts()


Cuisines
North Indian      270
Italian           237
Chinese           200
Continental       199
Cafe              177
                 ... 
Fish and Chips      1
Cuban               1
Mangalorean         1
New American        1
Bubble Tea          1
Name: count, Length: 128, dtype: int64

In [24]:
restoXcuisines = pd.crosstab(dfR['Restaurant Name'], dfR['Cuisines'])



In [25]:
restoXcuisines


Cuisines,Afghani,African,American,Andhra,Arabian,Argentine,Asian,Asian Fusion,Australian,Awadhi,...,Teriyaki,Tex-Mex,Thai,Tibetan,Turkish,Turkish Pizza,Vegetarian,Vietnamese,Western,World Cuisine
Restaurant Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Ohana,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10 Downing Street,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
11th Avenue Cafe Bistro,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
145 Kala Ghoda,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
19 Flavours Biryani,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
feel ALIVE,0,0,1,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
sketch Gallery,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
tashas,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
{Niche} - Cafe & Bar,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [26]:
dfR['Restaurant Name'].sample(20, random_state=194)

7729                           Indian Saffron Co.
3013                           Naturals Ice Cream
422         Sansei Seafood Restaurant & Sushi Bar
81                       Sainte Marie Gastronomia
3310                                Spezia Bistro
7849                              Cafeteria & Co.
2407                             India Restaurant
637                               Sheroes Hangout
1858                           Boombox Brewstreet
2393    Aangan - Downtown Multicuisine Restaurant
9339                                         Bank
7067                                     Pa Pa Ya
728                                          Toit
6447                                 Bakerz Lodge
9469                           The Belgian Triple
7727                                 Cafe Connect
9532                                    Masaba��۱
734                            ECHOES Koramangala
3703                   Sakley's The Mountain Cafe
6714                                  Showstopper


In [27]:
from sklearn.metrics import jaccard_score


In [28]:
print(jaccard_score(restoXcuisines.loc["Olive Bistro"].values,
                    restoXcuisines.loc["Rose Cafe"].values))


0.3333333333333333


In [29]:
from scipy.spatial.distance import pdist, squareform

In [30]:
jaccardDist = pdist(restoXcuisines.values, metric='jaccard')
jaccardMatrix = squareform(jaccardDist)
jaccardSim = 1 - jaccardMatrix
dfJaccard = pd.DataFrame(
    jaccardSim,
    index=restoXcuisines.index,
    columns=restoXcuisines.index)

dfJaccard


Restaurant Name,'Ohana,10 Downing Street,11th Avenue Cafe Bistro,145 Kala Ghoda,19 Flavours Biryani,1918 Bistro & Grill,2 Dog,22nd Parallel,3 Wise Monkeys,38 Barracks,...,Zoeys Pizzeria,Zolocrust - Hotel Clarks Amer,Zombie Burger + Drink Lab,Zuka Choco-la,Zunzi's,feel ALIVE,sketch Gallery,tashas,{Niche} - Cafe & Bar,�ukura��a Sofras۱
Restaurant Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Ohana,1.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.00,0.000000,0.0,0.0,0.000000,0.0
10 Downing Street,0.0,1.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.200000,...,0.0,0.0,0.0,0.000000,0.00,0.200000,0.0,0.0,0.500000,0.0
11th Avenue Cafe Bistro,0.0,0.0,1.000000,0.0,0.0,0.0,0.166667,0.0,0.0,0.333333,...,0.0,0.4,0.0,0.000000,0.00,0.142857,0.0,0.2,0.333333,0.0
145 Kala Ghoda,0.0,0.0,0.000000,1.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.2,0.333333,0.00,0.000000,0.0,0.0,0.000000,0.0
19 Flavours Biryani,0.0,0.0,0.000000,0.0,1.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.00,0.000000,0.0,0.0,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
feel ALIVE,0.0,0.2,0.142857,0.0,0.0,0.0,0.166667,0.0,0.0,0.600000,...,0.0,0.0,0.0,0.000000,0.00,1.000000,0.0,0.0,0.142857,0.0
sketch Gallery,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.00,0.000000,1.0,0.0,0.000000,0.0
tashas,0.0,0.0,0.200000,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.25,0.000000,0.0,1.0,0.000000,0.0
{Niche} - Cafe & Bar,0.0,0.5,0.333333,0.0,0.0,0.0,0.000000,0.0,0.0,0.333333,...,0.0,0.4,0.0,0.000000,0.00,0.142857,0.0,0.0,1.000000,0.0


In [31]:
dfR['Restaurant Name'].sample(20)


3589         The Flying Saucer Cafe
664                 Turquoise Villa
3116            United Coffee House
262              Tursi's Latin King
2395                     Cocoa Tree
4631                        Zaffran
1221                        Pier 38
7043       TBH - The Big House Cafe
9141                   Baker Street
376     Tu-Do Vietnamese Restaurant
771                  Taste Of China
7729             Indian Saffron Co.
660                  Nini's Kitchen
855              Eddie's Patisserie
8209     New Town Cafe - Park Plaza
1438             7 Degrees Brauhaus
5207                     Vero Gusto
2099              Indian Grill Room
5899                  Eleven Course
2684                  Jom Jom Malay
Name: Restaurant Name, dtype: object

In [32]:
resto = 'Ooma'

sim = dfJaccard.loc[resto].sort_values(ascending=False)
sim = pd.DataFrame({'Restaurant Name': sim.index, 'simScore': sim.values})
sim = sim[(sim['Restaurant Name']!= resto) & (sim['simScore']>=0.7)].head(5)
RestoRec = pd.merge(sim,dfR[['Restaurant Name','Aggregate rating']],how='inner',on='Restaurant Name')
FinalRestoRec = RestoRec.sort_values('Aggregate rating',ascending=False).drop_duplicates('Restaurant Name',keep='first')


In [33]:
FinalRestoRec



Unnamed: 0,Restaurant Name,simScore,Aggregate rating
3,Sushi Masa,1.0,4.9
7,Miyabi 9,1.0,4.8
8,Nagai,1.0,4.3
1,Osaka,1.0,4.2
5,Guppy,1.0,4.1
