# Task - 2: Restaurant Recommendation System

In this task, we will develop a restaurant recommendation system based on user preferences. The system will allow users to select their desired cuisine and city, and it will recommend the best restaurants based on these choices.

## Objectives:
- Filter and preprocess a dataset of restaurants.
- Encode categorical variables for better handling in our recommendation system.
- Provide options for users to select their country, city, and preferred cuisine.
- Display the top restaurant recommendations along with their ratings and votes.
- Visualize the restaurant locations on a map using Folium.

This task aims to create an interactive experience for users, enabling them to find suitable dining options based on their preferences effectively.


In [58]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Step 1: Import Libraries and Load Dataset

In this step, we will import the required libraries, install Folium for map visualization, and load the restaurant dataset.

```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Install Folium for map visualization
!pip install folium

# Load the restaurant dataset
df = pd.read_csv("/content/drive/MyDrive/Internship/Dataset.csv")
df.head()


In [113]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [141]:
!pip install folium




In [114]:
df=pd.read_csv("/content/drive/MyDrive/Internship/Dataset.csv")
df.head()

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
0,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City","Century City Mall, Poblacion, Makati City, Mak...",121.027535,14.565443,"French, Japanese, Desserts",...,Botswana Pula(P),Yes,No,No,No,3,4.8,Dark Green,Excellent,314
1,6304287,Izakaya Kikufuji,162,Makati City,"Little Tokyo, 2277 Chino Roces Avenue, Legaspi...","Little Tokyo, Legaspi Village, Makati City","Little Tokyo, Legaspi Village, Makati City, Ma...",121.014101,14.553708,Japanese,...,Botswana Pula(P),Yes,No,No,No,3,4.5,Dark Green,Excellent,591
2,6300002,Heat - Edsa Shangri-La,162,Mandaluyong City,"Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal...","Edsa Shangri-La, Ortigas, Mandaluyong City","Edsa Shangri-La, Ortigas, Mandaluyong City, Ma...",121.056831,14.581404,"Seafood, Asian, Filipino, Indian",...,Botswana Pula(P),Yes,No,No,No,4,4.4,Green,Very Good,270
3,6318506,Ooma,162,Mandaluyong City,"Third Floor, Mega Fashion Hall, SM Megamall, O...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.056475,14.585318,"Japanese, Sushi",...,Botswana Pula(P),No,No,No,No,4,4.9,Dark Green,Excellent,365
4,6314302,Sambo Kojin,162,Mandaluyong City,"Third Floor, Mega Atrium, SM Megamall, Ortigas...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.057508,14.58445,"Japanese, Korean",...,Botswana Pula(P),Yes,No,No,No,4,4.8,Dark Green,Excellent,229


## Step 2: Data Cleaning

In this step, we will clean the dataset to ensure that our restaurant recommendation system operates effectively. This involves the following tasks:

1. **Checking for Missing Values**:
   We will check for any missing values (NaN) in the dataset. Missing values can lead to inaccuracies in our recommendations, so it's important to address them.

2. **Removing Rows with Missing Values**:
   After identifying columns with missing values, we will remove any rows that contain these NaN values. This ensures that our dataset is complete and ready for analysis.

3. **Checking for Duplicate Entries**:
   Duplicates can skew the results of our recommendation system. We will check for any duplicate rows in the dataset to maintain data integrity.

4. **Removing Duplicate Rows**:
   If any duplicates are found, we will remove them from the dataset. This step is crucial to avoid bias in our recommendations.

5. **Displaying the Cleaned Dataset**:
   Finally, we will display the first few rows of the cleaned dataset to verify that the data has been processed correctly.


In [115]:
# Print the columns of the DataFrame
print(df.columns)

Index(['Restaurant ID', 'Restaurant Name', 'Country Code', 'City', 'Address',
       'Locality', 'Locality Verbose', 'Longitude', 'Latitude', 'Cuisines',
       'Average Cost for two', 'Currency', 'Has Table booking',
       'Has Online delivery', 'Is delivering now', 'Switch to order menu',
       'Price range', 'Aggregate rating', 'Rating color', 'Rating text',
       'Votes'],
      dtype='object')


In [116]:
df.dtypes

Unnamed: 0,0
Restaurant ID,int64
Restaurant Name,object
Country Code,int64
City,object
Address,object
Locality,object
Locality Verbose,object
Longitude,float64
Latitude,float64
Cuisines,object


## Step 2: Data Cleaning - Handling Missing Values

In this step, we focus on cleaning the dataset by handling missing values, which is a crucial part of data preprocessing.

- **Checking for null values**: We first check for any missing (null) values in each column using `isnull().sum()`. This will give us a quick overview of the extent of missing data in the dataset.
  
- **Dropping columns with null values**: If any columns have missing values, we choose to drop them using the `dropna()` function. This ensures that the data we work with is clean and free from incomplete information.

Finally, we display the remaining columns to verify that the dataset is ready for further analysis.


In [126]:
df= df.drop_duplicates(subset=['Latitude', 'Longitude'], keep=False)


In [117]:
# Check for total null values in each column
null_values = df.isnull().sum()

# Display the total null values for each column
print(null_values)


Restaurant ID           0
Restaurant Name         0
Country Code            0
City                    0
Address                 0
Locality                0
Locality Verbose        0
Longitude               0
Latitude                0
Cuisines                9
Average Cost for two    0
Currency                0
Has Table booking       0
Has Online delivery     0
Is delivering now       0
Switch to order menu    0
Price range             0
Aggregate rating        0
Rating color            0
Rating text             0
Votes                   0
dtype: int64


In [118]:
# Drop columns with null values
df = df.dropna(axis=0, how='any')

# Display the remaining columns
print(df.columns)


Index(['Restaurant ID', 'Restaurant Name', 'Country Code', 'City', 'Address',
       'Locality', 'Locality Verbose', 'Longitude', 'Latitude', 'Cuisines',
       'Average Cost for two', 'Currency', 'Has Table booking',
       'Has Online delivery', 'Is delivering now', 'Switch to order menu',
       'Price range', 'Aggregate rating', 'Rating color', 'Rating text',
       'Votes'],
      dtype='object')


In [119]:
# Check for total null values in each column
null_values = df.isnull().sum()

# Display the total null values for each column
print(null_values)


Restaurant ID           0
Restaurant Name         0
Country Code            0
City                    0
Address                 0
Locality                0
Locality Verbose        0
Longitude               0
Latitude                0
Cuisines                0
Average Cost for two    0
Currency                0
Has Table booking       0
Has Online delivery     0
Is delivering now       0
Switch to order menu    0
Price range             0
Aggregate rating        0
Rating color            0
Rating text             0
Votes                   0
dtype: int64


## Step 3: Dropping Unnecessary Columns

In this step, we will remove columns from the dataset that are not relevant to our restaurant recommendation system. This helps simplify the dataset and focus on the most important features that will influence recommendations.

### Columns to be Dropped:
- **Restaurant ID**: This is a unique identifier for each restaurant, which does not contribute to our analysis.
- **Restaurant Name**: While it identifies the restaurant, it’s not necessary for the recommendation logic.
- **Address**: The full address is not required as we will use latitude and longitude for mapping.
- **Locality**: This column is redundant after the removal of "Locality Verbose."
- **Currency**: This is not directly relevant to our recommendation criteria.
- **Rating Color**: This does not provide useful information for recommendations.
- **Switch to order menu**: This is not necessary for the recommendation system.
- **Has Table Booking**: We might not focus on this criterion in our initial recommendation.


In [120]:
df.columns

Index(['Restaurant ID', 'Restaurant Name', 'Country Code', 'City', 'Address',
       'Locality', 'Locality Verbose', 'Longitude', 'Latitude', 'Cuisines',
       'Average Cost for two', 'Currency', 'Has Table booking',
       'Has Online delivery', 'Is delivering now', 'Switch to order menu',
       'Price range', 'Aggregate rating', 'Rating color', 'Rating text',
       'Votes'],
      dtype='object')

In [121]:
# Dropping unnecessary columns
df = df.drop(columns=['Restaurant ID', 'Address', 'Locality', 'Locality Verbose', 'Currency', 'Switch to order menu'])
df.head()

Unnamed: 0,Restaurant Name,Country Code,City,Longitude,Latitude,Cuisines,Average Cost for two,Has Table booking,Has Online delivery,Is delivering now,Price range,Aggregate rating,Rating color,Rating text,Votes
0,Le Petit Souffle,162,Makati City,121.027535,14.565443,"French, Japanese, Desserts",1100,Yes,No,No,3,4.8,Dark Green,Excellent,314
1,Izakaya Kikufuji,162,Makati City,121.014101,14.553708,Japanese,1200,Yes,No,No,3,4.5,Dark Green,Excellent,591
2,Heat - Edsa Shangri-La,162,Mandaluyong City,121.056831,14.581404,"Seafood, Asian, Filipino, Indian",4000,Yes,No,No,4,4.4,Green,Very Good,270
3,Ooma,162,Mandaluyong City,121.056475,14.585318,"Japanese, Sushi",1500,No,No,No,4,4.9,Dark Green,Excellent,365
4,Sambo Kojin,162,Mandaluyong City,121.057508,14.58445,"Japanese, Korean",1500,Yes,No,No,4,4.8,Dark Green,Excellent,229


In [122]:
# Drop the Rating color column
df = df.drop(columns=['Rating color'])

# Check the remaining columns
print(df.columns)


Index(['Restaurant Name', 'Country Code', 'City', 'Longitude', 'Latitude',
       'Cuisines', 'Average Cost for two', 'Has Table booking',
       'Has Online delivery', 'Is delivering now', 'Price range',
       'Aggregate rating', 'Rating text', 'Votes'],
      dtype='object')


In [123]:
df.dtypes


Unnamed: 0,0
Restaurant Name,object
Country Code,int64
City,object
Longitude,float64
Latitude,float64
Cuisines,object
Average Cost for two,int64
Has Table booking,object
Has Online delivery,object
Is delivering now,object


In [124]:
# Convert Yes/No columns to 1/0
df['Has Table booking'] = df['Has Table booking'].replace({'Yes': 1, 'No': 0})
df['Has Online delivery'] = df['Has Online delivery'].replace({'Yes': 1, 'No': 0})
df['Is delivering now'] = df['Is delivering now'].replace({'Yes': 1, 'No': 0})

# Check the resulting DataFrame
df.head()


  df['Has Table booking'] = df['Has Table booking'].replace({'Yes': 1, 'No': 0})
  df['Has Online delivery'] = df['Has Online delivery'].replace({'Yes': 1, 'No': 0})
  df['Is delivering now'] = df['Is delivering now'].replace({'Yes': 1, 'No': 0})


Unnamed: 0,Restaurant Name,Country Code,City,Longitude,Latitude,Cuisines,Average Cost for two,Has Table booking,Has Online delivery,Is delivering now,Price range,Aggregate rating,Rating text,Votes
0,Le Petit Souffle,162,Makati City,121.027535,14.565443,"French, Japanese, Desserts",1100,1,0,0,3,4.8,Excellent,314
1,Izakaya Kikufuji,162,Makati City,121.014101,14.553708,Japanese,1200,1,0,0,3,4.5,Excellent,591
2,Heat - Edsa Shangri-La,162,Mandaluyong City,121.056831,14.581404,"Seafood, Asian, Filipino, Indian",4000,1,0,0,4,4.4,Very Good,270
3,Ooma,162,Mandaluyong City,121.056475,14.585318,"Japanese, Sushi",1500,0,0,0,4,4.9,Excellent,365
4,Sambo Kojin,162,Mandaluyong City,121.057508,14.58445,"Japanese, Korean",1500,1,0,0,4,4.8,Excellent,229


In [127]:
df.shape

(8467, 14)

## Step 5: Function to Get User Preferences

In this step, we will create a function named `get_user_preferences(df)` that will interactively ask the user to select their preferences for the restaurant recommendation system. This function will guide the user through the selection process of the country, city, and cuisine.

### Function Explanation

**Function Name**: `get_user_preferences(df)`

**Parameters**:
- `df`: The DataFrame containing the restaurant dataset.

### Function Steps:
1. **Display Available Countries**:
   - The function first retrieves the unique country codes from the DataFrame.
   - It prints a list of available country codes for the user to choose from, indexed for easy selection.

2. **User Selects Country**:
   - The user is prompted to enter the index of their preferred country code.
   - The selected country code is then stored for filtering subsequent options.

3. **Filter Cities Based on Country**:
   - The function filters the DataFrame to find cities that belong to the selected country.
   - It displays the available cities for the user to choose from, again indexed for easy selection.

4. **User Selects City**:
   - The user inputs the index corresponding to their preferred city.
   - The selected city is stored for further filtering.

5. **Filter Cuisines Based on City**:
   - The function then filters the DataFrame again to find unique cuisines available in the selected city.
   - It displays these cuisines to the user in a numbered list.

6. **User Selects Cuisine**:
   - The user selects their desired cuisine by entering the corresponding index.
   - The selected cuisine is returned alongside the selected city and country code.

### Return Value:
The function returns three values: `selected_cuisine`, `selected_city`, and `selected_country`, which can be used to filter the restaurant dataset for recommendations.


In [135]:
def get_user_preferences(df):
    # Print available country codes and names
    print("Available Countries:")
    country_options = df['Country Code'].unique()
    for idx, code in enumerate(country_options):
        print(f"{idx}: {code}")  # Display country code options

    # User selects country code
    country_selection = int(input("Select a country code (enter index): "))
    selected_country = country_options[country_selection]

    # Filter cities based on the selected country
    cities_in_country = df[df['Country Code'] == selected_country]['City'].unique()
    print(f"\nAvailable Cities in {selected_country}:")
    for idx, city in enumerate(cities_in_country):
        print(f"{idx}: {city}")  # Display city options

    city_selection = int(input("Select a city (enter index): "))
    selected_city = cities_in_country[city_selection]

    # Filter cuisines based on the selected city
    cuisines_in_city = df[df['City'] == selected_city]['Cuisines'].unique()
    print(f"\nAvailable Cuisines in {selected_city}:")
    for idx, cuisine in enumerate(cuisines_in_city):
        print(f"{idx}: {cuisine}")  # Display cuisine options

    cuisine_selection = int(input("Select a cuisine (enter index): "))
    selected_cuisine = cuisines_in_city[cuisine_selection]

    return selected_cuisine, selected_city, selected_country


## Step 6: Function to Filter Restaurants

In this step, we will create a function named `filter_restaurants(df, cuisine, city)` that filters the restaurant dataset based on the user's selected cuisine and city. This function is crucial for narrowing down the list of restaurants to only those that match the user's preferences.

### Function Explanation

**Function Name**: `filter_restaurants(df, cuisine, city)`

**Parameters**:
- `df`: The DataFrame containing the restaurant dataset.
- `cuisine`: The cuisine type selected by the user.
- `city`: The city selected by the user.

### Function Steps:
1. **Filtering the DataFrame**:
   - The function uses boolean indexing to filter the DataFrame `df`.
   - It checks two conditions:
     - The `Cuisines` column must match the selected `cuisine`.
     - The `City` column must match the selected `city`.
   - Both conditions are combined using the `&` operator to ensure that only restaurants that meet both criteria are included.

2. **Return Value**:
   - The function returns a new DataFrame, `filtered_df`, which contains only the restaurants that match the user's selected cuisine and city.


In [136]:
# Step 2: Filter the dataset based on user input
def filter_restaurants(df, cuisine, city):
    filtered_df = df[
        (df['Cuisines'] == cuisine) &
        (df['City'] == city)
    ]
    return filtered_df

## Step 7: Function to Display Top Restaurants

In this step, we will create a function named `display_top_restaurants(filtered_df)` that displays the top five restaurants based on the user's selected criteria. This function will provide valuable insights into the best dining options available.

### Function Explanation

**Function Name**: `display_top_restaurants(filtered_df)`

**Parameters**:
- `filtered_df`: A DataFrame containing the filtered restaurant data based on the user’s selected cuisine and city.

### Function Steps:
1. **Check for Empty DataFrame**:
   - The function first checks if `filtered_df` is empty.
   - If it is empty, a message indicating that no restaurants were found is printed, and the function exits early.

2. **Sorting the DataFrame**:
   - If the DataFrame is not empty, the function sorts the restaurants based on two criteria:
     - `Aggregate rating`: To prioritize higher-rated restaurants.
     - `Votes`: To consider the number of votes as an indicator of popularity.
   - The function uses `nlargest(5, ['Aggregate rating', 'Votes'])` to retrieve the top five restaurants.

3. **Displaying Restaurant Details**:
   - The function iterates through the top five restaurants and prints relevant details for each:
     - Restaurant Name
     - Rating
     - Votes
     - Location (latitude and longitude)
   - A separator line is printed between each restaurant's details for better readability.


In [137]:
# Step 3: Sort and display the top five restaurants
def display_top_restaurants(filtered_df):
    if filtered_df.empty:
        print("No restaurants found matching your criteria.")
        return

    # Sort by Aggregate Rating and Votes
    top_restaurants = filtered_df.nlargest(5, ['Aggregate rating', 'Votes'])

    for idx, row in top_restaurants.iterrows():
        print(f"Restaurant Name: {row['Restaurant Name']}")
        print(f"Rating: {row['Aggregate rating']}")
        print(f"Votes: {row['Votes']}")
        print(f"Location: {row['Latitude']}, {row['Longitude']}")
        print("------------------------------")

## Step 8: Function to Visualize Restaurants on a Map

In this step, we will create a function named `visualize_restaurants_on_map(filtered_df)` that allows users to see the top restaurants on a map. This visualization provides a geographical context to the restaurant options, making it easier for users to locate them.

### Function Explanation

**Function Name**: `visualize_restaurants_on_map(filtered_df)`

**Parameters**:
- `filtered_df`: A DataFrame containing the filtered restaurant data based on the user’s selected cuisine and city.

### Function Steps:
1. **Check for Empty DataFrame**:
   - The function first checks if `filtered_df` is empty.
   - If it is empty, a message indicating that no restaurants are available to display on the map is printed, and the function exits early.

2. **Create a Base Map**:
   - If the DataFrame is not empty, the function creates a base map using the Folium library.
   - The map is centered around the average latitude and longitude of the restaurants in `filtered_df`, with an initial zoom level of 12.

3. **Adding Markers for Each Restaurant**:
   - The function iterates through each restaurant in `filtered_df`.
   - For each restaurant, a marker is added to the map at its corresponding latitude and longitude.
   - Each marker includes a popup that displays:
     - Restaurant Name
     - Rating
     - Votes
   - The markers are colored blue for visibility.

4. **Return the Map**:
   - Finally, the function returns the created map object for visualization.

In [138]:

# Function to visualize top restaurants on a map
def visualize_restaurants_on_map(filtered_df):
    if filtered_df.empty:
        print("No restaurants found to display on the map.")
        return

    # Create a base map centered around the first restaurant's location
    m = folium.Map(location=[filtered_df['Latitude'].mean(), filtered_df['Longitude'].mean()], zoom_start=12)

    # Add markers for each restaurant
    for idx, row in filtered_df.iterrows():
        folium.Marker(
            location=[row['Latitude'], row['Longitude']],
            popup=f"{row['Restaurant Name']}<br>Rating: {row['Aggregate rating']}<br>Votes: {row['Votes']}",
            icon=folium.Icon(color='blue')
        ).add_to(m)

    # Display the map
    return m


## Step 9: Main Flow for User Interaction and Visualization

In this final step, we will create the main flow of the application that allows users to select their preferences for country, city, and cuisine. Based on the user’s input, the program will filter the dataset, display the top restaurants, and visualize them on a map.


In [142]:
# Main flow with country, city, cuisine selection and map visualization
selected_cuisine, selected_city, selected_country = get_user_preferences(df)
if selected_cuisine and selected_city:
    filtered_restaurants = filter_restaurants(df, selected_cuisine, selected_city)
    display_top_restaurants(filtered_restaurants)

    # Visualize the top restaurants on the map
    restaurant_map = visualize_restaurants_on_map(filtered_restaurants)
    display(restaurant_map)  # Use display() to visualize the map

Available Countries:
0: 162
1: 30
2: 216
3: 14
4: 37
5: 184
6: 214
7: 1
8: 94
9: 148
10: 215
11: 166
12: 189
13: 191
14: 208
Select a country code (enter index): 7

Available Cities in 1:
0: Agra
1: Ahmedabad
2: Allahabad
3: Amritsar
4: Aurangabad
5: Bangalore
6: Bhopal
7: Bhubaneshwar
8: Chandigarh
9: Chennai
10: Coimbatore
11: Dehradun
12: Faridabad
13: Ghaziabad
14: Goa
15: Gurgaon
16: Guwahati
17: Hyderabad
18: Indore
19: Jaipur
20: Kanpur
21: Kochi
22: Kolkata
23: Lucknow
24: Ludhiana
25: Mangalore
26: Mohali
27: Mumbai
28: Mysore
29: Nagpur
30: Nashik
31: New Delhi
32: Noida
33: Panchkula
34: Patna
35: Puducherry
36: Pune
37: Ranchi
38: Secunderabad
39: Surat
40: Vadodara
41: Varanasi
42: Vizag
Select a city (enter index): 31

Available Cuisines in New Delhi:
0: Fast Food
1: North Indian, Seafood, Continental
2: South Indian, North Indian
3: South Indian, North Indian, Chinese
4: Mughlai
5: South Indian
6: Pizza
7: North Indian, Chinese
8: Chinese, North Indian
9: Mediterranean, 

## Conclusion

In this task, we developed a restaurant recommendation system that enables users to find top dining options based on their preferences for cuisine and location. The process involved several key steps, including data preprocessing, user input handling, filtering the dataset, and visualizing results on a map.

### Key Features:

- **User-Centric Design**: The system allows users to select their country, city, and desired cuisine through an interactive interface, enhancing the overall user experience.
  
- **Data Filtering and Sorting**: We efficiently filtered and sorted restaurant data based on user selections, ensuring that only relevant options are presented.

- **Visual Mapping**: By leveraging the Folium library, we visualized the locations of the top-rated restaurants on an interactive map, making it easier for users to explore their options in real time.

### Future Improvements:

While the current implementation provides a solid foundation, there are several potential improvements for future iterations:

1. **Enhanced Filtering Options**: Implement additional filters such as price range, dietary restrictions, and restaurant features (e.g., outdoor seating, pet-friendly).

2. **User Feedback System**: Allow users to rate and review restaurants after their visit, providing more data for refining recommendations.

3. **Integration with External APIs**: Explore integrating with restaurant review platforms or delivery services to provide real-time data and enhance user engagement.

Overall, this project successfully demonstrates how to combine data processing, user interaction, and visualization techniques to create a functional and engaging restaurant recommendation system. The approach taken can be adapted and expanded upon for various applications in data science and machine learning.
