## Task: Restaurant Recommendation
###### Objective: Create a restaurant recommendation system based on user preferences.

 Steps:
 
 Preprocess the dataset by handling missing values and encoding categorical variables.
 
 Determine the criteria for restaurant recommendations (e.g., cuisine preference, price range).
 
 Implement a content-based filtering approach where users are recommended restaurants similar to their preferred criteria.
 
 Test the recommendation system by providing sample user preferences and evaluating the quality of recommendations.


In [1]:
# Import necessary libraries
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
# Load the dataset
df = pd.read_csv('dataset.csv')
df.shape

(9551, 21)

# Preprocessing

## Handle missing values

In [3]:

df.isnull().sum()

Restaurant ID           0
Restaurant Name         0
Country Code            0
City                    0
Address                 0
Locality                0
Locality Verbose        0
Longitude               0
Latitude                0
Cuisines                9
Average Cost for two    0
Currency                0
Has Table booking       0
Has Online delivery     0
Is delivering now       0
Switch to order menu    0
Price range             0
Aggregate rating        0
Rating color            0
Rating text             0
Votes                   0
dtype: int64

In [11]:
# Separate the data into rows with known Cuisines and unknown Cuisines
data= df # making a copy of the original data set
known_cuisines = data[data['Cuisines'].notna()]
unknown_cuisines = data[data['Cuisines'].isna()]

In [12]:
unknown_cuisines

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes


In [7]:
# Assuming 'df' is your DataFrame
cuisines_in_albany = df[df['City'] == 'Savannah'].sort_values(by='Votes', ascending=False)

# Print the sorted cuisines in Albany
print(cuisines_in_albany['Cuisines'])


452               American, Seafood, Southern
436                                  Southern
439                        American, Southern
437                                    Burger
454                                   Seafood
438             Desserts, Sandwich, Ice Cream
447                          Breakfast, Cajun
440    International, Mediterranean, Sandwich
441              American, Bar Food, Sandwich
451                  Cafe, Sandwich, Southern
448                Breakfast, Diner, Sandwich
444                        American, Bar Food
450                     Pizza, Seafood, Steak
443                       American, Breakfast
445                     Breakfast, Vegetarian
442                  American, Seafood, Steak
453                                  American
449                  Coffee and Tea, Desserts
455                                       NaN
446                       American, Breakfast
Name: Cuisines, dtype: object


In [8]:
# missing at Albany
df.loc[df['Restaurant Name'] == 'Cookie Shoppe', 'Cuisines'] = 'Coffee, Tea, cookie'
df.loc[df['Restaurant Name'] == "Pearly's Famous Country Cookng", 'Cuisines'] = 'American, Breakfast, Diner'
df.loc[df['Restaurant Name'] == "Jimmie's Hot Dogs", 'Cuisines'] = 'Hot Dogs'

In [9]:
df.loc[df['Restaurant Name'] == "Corkscrew Cafe", 'Cuisines'] = 'Coffee and Tea, Sandwich'
df.loc[df['Restaurant Name'] == 'Dovetail', 'Cuisines'] = 'Italian'
df.loc[df['Restaurant Name'] == 'HI Lite Bar & Lounge', 'Cuisines'] = 'American, Breakfast, Diner'
df.loc[df['Restaurant Name'] == 'Dovetail', 'Cuisines'] = 'Italian'
df.loc[df['Restaurant Name'] == 'Hillstone', 'Cuisines'] = 'American, BBQ, Sandwich'
df.loc[df['Restaurant Name'] == "Leonard's Bakery", 'Cuisines'] = 'Breakfast, Burger'
df.loc[df['Restaurant Name'] == 'Tybee Island Social Club', 'Cuisines'] = 'American, Seafood, Southern'


In [16]:

df.isnull().sum()

Restaurant ID           0
Restaurant Name         0
Country Code            0
City                    0
Address                 0
Locality                0
Locality Verbose        0
Longitude               0
Latitude                0
Cuisines                0
Average Cost for two    0
Currency                0
Has Table booking       0
Has Online delivery     0
Is delivering now       0
Switch to order menu    0
Price range             0
Aggregate rating        0
Rating color            0
Rating text             0
Votes                   0
dtype: int64

### Fixing Restaurant Names

In [36]:
restaurant_names = df['Restaurant Name']
# List out the strings that contain "���", "��", or "�"
error_restaurant_names = [name for name in restaurant_names if "�" in name]

# Print the filtered list
print("Error Restaurant Names: \n", error_restaurant_names)


Error Restaurant Names: 
 ['Terra�_o It��lia', 'Divino Fog��o', 'Caf�� Tu Tu Tango', "Hollerbach's Willow Tree Caf��", 'MoMo Caf�� - Courtyard By Marriott', 'Caf�� Bogchi', 'Caf�� Kitchen', 'Gallery Caf�� - Hyatt Place', 'bu��no', 'Queens Caf��', 'Caf�� Gramophone', 'M Cr��me', 'West��ross', 'bu��no', 'Caf�� Burger BC', 'The Basement Caf��', 'Caf�� Riverrun', 'Sahib��s Barbeque by Ohri��s', 'Hobnob Gourmet Caf��bar', 'Chemistry Caf��', 'It��s Sinful', 'Caf�� MRP', 'Caf�� Healthilicious', 'LaBont��', 'The Walled City - Caf�� & Lounge', 'Bon App��tit', 'Cottage Caf�� by Smoothie factory', 'Dialogue Lounge & Caf��', "Ping's Caf�� Orient", 'Delhite P��tisserie', 'Arabian & Turkish Caf��', 'Die B�_ckerei', 'High Street Caf��', 'The Junkyard Caf��', 'AMPM Caf�� & Bar', 'Caf�� Foreground', 'Rosart�� Chocolate', 'Caf�� Befikre', 'Caf�� 101', 'Tin Town Caf��', 'Superstar Caf��', 'The Fashion Street Caf��', 'Hearken Caf��', 'Remember Me Caf��', '4 Barrels Caf�� & Lounge', 'Caf�� Knosh - The Leel

In [35]:

# Replace incorrect names with correct ones
df['Restaurant Name'] = df['Restaurant Name'].replace({
   'Caf� Daniel Briand': 'Café Daniel Briand','Caf�� Daniel Briand':'Café Daniel Briand',
   'Pizza � Bessa': 'Pizza à Bessa','Pizza �� Bessa':'Pizza à Bessa',
   'Sandubas Caf�': 'Sandubas Café','Sandubas Caf��':'Sandubas Café',
   'Tayp�': 'Taypé','Tayp��': 'Taypé',
   'Manzu�': 'Manzú','Manzu��': 'Manzú',
   'Braseiro da G�vea': 'Braseiro da Gávea','Braseiro da G��vea': 'Braseiro da Gávea',
   'Zaz� Bistr� Tropical': 'Zazá Bistrô Tropical','Zaz�� Bistr�� Tropical': 'Zazá Bistrô Tropical',
   'Fil� de Ouro': 'Filé de Ouro','Fil�� de Ouro': 'Filé de Ouro',
   'Apraz�_vel': 'Aprazível',
   'Terra�_o It�lia': 'Terraço Itália',
   'Divino Fog�o': 'Divino Fogão',
   'Esquina Mocot�_': 'Esquina Mocotó',
   'Cev�_che Tapas Bar & Restaurant': 'Ceviché Tapas Bar & Restaurant',
   'Caf� Tu Tu Tango': 'Café Tu Tu Tango',
   "Hollerbach's Willow Tree Caf�": "Hollerbach's Willow Tree Café",
   'MoMo Caf� - Courtyard By Marriott': 'MoMo Café - Courtyard By Marriott',
   'Caf� Bogchi': 'Café Bogchi',
   'Caf� Kitchen': 'Café Kitchen',
   'Gallery Caf� - Hyatt Place': 'Gallery Café - Hyatt Place',
   "Longitude 77��03' Bar - Le Meridien Gurgaon": "Longitude 77°03' Bar - Le Meridien Gurgaon",
   'bu�no': 'buñno',
   'Queens Caf�': 'Queens Café',
   'Caf� Gramophone': 'Café Gramophone',
   'M Cr�me': 'M Crème',
   'West�ross': 'Westcross',
   'bu�no': 'buñno',
   "Chawla's�_": "Chawla's",
   "Chawla's�_": "Chawla's",
   'Caf� Burger BC': 'Café Burger BC',
   'The Basement Caf�': 'The Basement Café',
   'Caf� Riverrun': 'Café Riverrun',
   'Con�_u': 'Coñu',
   'Sahib�s Barbeque by Ohri�s': "Sahib's Barbeque by Ohri's",
   'Hobnob Gourmet Caf�bar': 'Hobnob Gourmet Cafébar',
   'Chemistry Caf�': 'Chemistry Café',
   'NESCAF� Illusions': 'NESCAFÉ Illusions',
   "Chawla's�_": "Chawla's",
   'It�s Sinful': "It's Sinful", 
   'Caf� MRP': 'Café MRP',
   'D�_ner Grill': 'Dîner Grill',
   'Caf� Healthilicious': 'Café Healthilicious',
   'D�_ner Grill': 'Dîner Grill',
   'LaBont�': 'LaBonté',
   'Chhalava - �__Lava': 'Chhalava - Lava',
   'The Walled City - Caf� & Lounge': 'The Walled City - Café & Lounge',
   'Bon App�tit': 'Bon Appétit',
   "Chawla's�_": "Chawla's",
   'Cottage Caf� by Smoothie factory': 'Cottage Café by Smoothie factory',
   'Dialogue Lounge & Caf�': 'Dialogue Lounge & Café',
   "Ping's Caf� Orient": "Ping's Café Orient",
   "Chawla's�_": "Chawla's",
   'Delhite P�tisserie': 'Delhite Pâtisserie',
   'Arabian & Turkish Caf�': 'Arabian & Turkish Café',
   'D�_ner Grill': 'Dîner Grill',
   'Die B�ckerei': 'Die Bäckerei',
   'High Street Caf�': 'High Street Café',
   'The Junkyard Caf�': 'The Junkyard Café',
   'AMPM Caf� & Bar': 'AMPM Café & Bar',
   'Caf� Foreground': 'Café Foreground',
   "Chawla's�_": "Chawla's",
   'Rosart� Chocolate': 'Rosarté Chocolate',
   'Caf� Befikre': 'Café Befikre',
   'Caf� 101': 'Café 101',
   'Tin Town Caf�': 'Tin Town Café',
   'Superstar Caf�': 'Superstar Café',
   'H�_agen-Dazs': 'Häagen-Dazs',
   'The Fashion Street Caf�': 'The Fashion Street Café',
   'Hearken Caf�': 'Hearken Café',
   'Remember Me Caf�': 'Remember Me Café',
   '4 Barrels Caf� & Lounge': '4 Barrels Café & Lounge',
   'Caf� Knosh - The Leela Ambience Convention Hotel': 'Café Knosh - The Leela Ambience Convention Hotel',
   'The Village Caf�': 'The Village Café',
   'Phonebooth Caf�': 'Phonebooth Café',
   'Caf� Doo Ghoont': 'Café Doo Ghoont',
   'TBH �� To Be Healthy': 'TBH – To Be Healthy',
   'They�_����': 'They',
   'Caff� La Poya': 'Caffè La Poya',
   'More Than Caf�': 'More Than Café',
   'Caff� La Poya': 'Caffè La Poya',
   'Elixir Health Caf�': 'Elixir Health Café',
   '#Urban Caf�': '#Urban Café',
   'The Chickmunks Caf�': 'The Chickmunks Café',
   "Chawla's�_": "Chawla's",
   'KBC�_': 'KBC',
   "Chef's Basket Pop Up Caf�": "Chef's Basket Pop Up Café",
   'Saut�ed Stories': 'Sautéed Stories',
   'Freshco - The Health Caf�': 'Freshco - The Health Café',
   'Eden Noodles Cafe �__·�_��_��_��': 'Eden Noodles Cafe',
   'Grand Caf� & Beach': 'Grand Café & Beach',
   'Masaba��۱ Kebap�_۱s۱': 'Masabaşı Kebapçısı',
   'Me��hur Tavac۱ Recep Usta': 'Meşhur Tavacı Recep Usta',
   '�ukura��a Sofras۱': 'Çukurağa Sofrasɪ',
   'Me��hur �_z�_elik Aspava':'Meşhur Özçelik Aspava',
    'Masaba��۱':'Masabaşı',
    'D�_vero��lu':'Döveroğlu', 
    'Pizza ��l Forno':'Pizza ël Forno', 
    'Emirgan S�_ti��':'Emirgan Sütüş', 
    'Leman K�_lt�_r':'Leman Kültür', 
    'Dem Karak�_y':'Dem Karaköy', 
    'Karak�_y G�_ll�_o��lu':'Karaköy Güllüoğlu', 
    'Ceviz A��ac۱':'Ceviz Ağacɪ', 
    'A���k Kahve':'Açık Kahve'
})

#### Fixing City Names

In [20]:

city_names = df['City']
# List out the strings that contain "���", "��", or "�"
error_city_names = [name for name in city_names if "�" in name]

# Print the filtered list
print("Error Restaurant Names: \n", error_city_names)


Error Restaurant Names: 
 []


In [17]:
df['City'] = df['City'].replace({
    'Bras�_lia':'Brasília',
    'S��o Paulo':'São Paulo',
    '��stanbul': 'İstanbul'
})

In [19]:
df

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
0,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City","Century City Mall, Poblacion, Makati City, Mak...",121.027535,14.565443,"French, Japanese, Desserts",...,Botswana Pula(P),Yes,No,No,No,3,4.8,Dark Green,Excellent,314
1,6304287,Izakaya Kikufuji,162,Makati City,"Little Tokyo, 2277 Chino Roces Avenue, Legaspi...","Little Tokyo, Legaspi Village, Makati City","Little Tokyo, Legaspi Village, Makati City, Ma...",121.014101,14.553708,Japanese,...,Botswana Pula(P),Yes,No,No,No,3,4.5,Dark Green,Excellent,591
2,6300002,Heat - Edsa Shangri-La,162,Mandaluyong City,"Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal...","Edsa Shangri-La, Ortigas, Mandaluyong City","Edsa Shangri-La, Ortigas, Mandaluyong City, Ma...",121.056831,14.581404,"Seafood, Asian, Filipino, Indian",...,Botswana Pula(P),Yes,No,No,No,4,4.4,Green,Very Good,270
3,6318506,Ooma,162,Mandaluyong City,"Third Floor, Mega Fashion Hall, SM Megamall, O...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.056475,14.585318,"Japanese, Sushi",...,Botswana Pula(P),No,No,No,No,4,4.9,Dark Green,Excellent,365
4,6314302,Sambo Kojin,162,Mandaluyong City,"Third Floor, Mega Atrium, SM Megamall, Ortigas...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.057508,14.584450,"Japanese, Korean",...,Botswana Pula(P),Yes,No,No,No,4,4.8,Dark Green,Excellent,229
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9546,5915730,Naml۱ Gurme,208,İstanbul,"Kemanke�� Karamustafa Pa��a Mahallesi, R۱ht۱m ...",Karak�_y,"Karak�_y, ��stanbul",28.977392,41.022793,Turkish,...,Turkish Lira(TL),No,No,No,No,3,4.1,Green,Very Good,788
9547,5908749,Ceviz Ağacɪ,208,İstanbul,"Ko��uyolu Mahallesi, Muhittin ��st�_nda�� Cadd...",Ko��uyolu,"Ko��uyolu, ��stanbul",29.041297,41.009847,"World Cuisine, Patisserie, Cafe",...,Turkish Lira(TL),No,No,No,No,3,4.2,Green,Very Good,1034
9548,5915807,Huqqa,208,İstanbul,"Kuru�_e��me Mahallesi, Muallim Naci Caddesi, N...",Kuru�_e��me,"Kuru�_e��me, ��stanbul",29.034640,41.055817,"Italian, World Cuisine",...,Turkish Lira(TL),No,No,No,No,4,3.7,Yellow,Good,661
9549,5916112,Açık Kahve,208,İstanbul,"Kuru�_e��me Mahallesi, Muallim Naci Caddesi, N...",Kuru�_e��me,"Kuru�_e��me, ��stanbul",29.036019,41.057979,Restaurant Cafe,...,Turkish Lira(TL),No,No,No,No,4,4.0,Green,Very Good,901


## step3 :

In [None]:
## Encoding categorical variables
label_encoder = LabelEncoder()
categorical_features = ['cuisine', 'price_range']  # Add other categorical features here
for feature in categorical_features:
    df[feature] = label_encoder.fit_transform(df[feature])

# Determine the criteria for restaurant recommendations
## This will depend on the user preferences. For example:
user_preferences = {
    'cuisine': 'Italian',
    'price_range': 'Medium'
}
# Convert user preferences to encoded form
for feature in user_preferences:
    user_preferences[feature] = label_encoder.transform([user_preferences[feature]])

# Implement a content-based filtering approach
## Compute the cosine similarity between user preferences and restaurants
user_vector = list(user_preferences.values())
restaurant_vectors = df[categorical_features].values
similarities = cosine_similarity([user_vector], restaurant_vectors)

# Get the top 5 recommended restaurants
top_5_index = similarities[0].argsort()[-5:][::-1]
recommended_restaurants = df.iloc[top_5_index]

print("Recommended Restaurants:")
print(recommended_restaurants)
