## Zomato Restaurant
##### Project Description
Zomato Data Analysis is one of the most useful analysis for foodies who want to taste the best
cuisines of every part of the world which lies in their budget. This analysis is also for those who
want to find the value for money restaurants in various parts of the country for the cuisines.
Additionally, this analysis caters the needs of people who are striving to get the best cuisine of
the country and which locality of that country serves that cuisines with maximum number of
restaurants.

##### Data Storage:
This problem statement contains two datasets- Zomato.csv and country_code.csv.
Country_code.csv contains two variables:
* Country code
* Country name

The collected data has been stored in the Comma Separated Value file Zomato.csv. Each
restaurant in the dataset is uniquely identified by its Restaurant Id. Every Restaurant contains the following variables:
* Restaurant Id: Unique id of every restaurant across various cities of the world
* Restaurant Name: Name of the restaurant
* Country Code: Country in which restaurant is located
* City: City in which restaurant is located
* Address: Address of the restaurant
* Locality: Location in the city
* Locality Verbose: Detailed description of the locality
* Longitude: Longitude coordinate of the restaurant&#39;s location
* Latitude: Latitude coordinate of the restaurant&#39;s location
* Cuisines: Cuisines offered by the restaurant
* Average Cost for two: Cost for two people in different currencies ��
* Currency: Currency of the country
* Has Table booking: yes/no
* Has Online delivery: yes/ no
* Is delivering: yes/ no
* Switch to order menu: yes/no
* Price range: range of price of food
* Aggregate Rating: Average rating out of 5
* Rating color: depending upon the average rating color
* Rating text: text on the basis of rating of rating
* Votes: Number of ratings casted by people

Problem statement : In this dataset predict 2 things –
1) Average Cost for two
2) Price range


Hint : Use pandas merge operation -- pd.merge (df1,df2) to combine two datasets


Dataset Link-  
•	https://github.com/dsrscientist/dataset4/blob/main/Country-Code.xlsx
•	https://github.com/dsrscientist/dataset4/blob/main/zomato.csv


In [1]:
#Importing necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

In [None]:
#loading datasets
zomato_data = pd.read_csv("https://raw.githubusercontent.com/dsrscientist/dataset4/main/zomato.csv")
country_code = pd.read_excel("https://github.com/dsrscientist/dataset4/raw/main/Country-Code.xlsx")

In [None]:
#merging datasets on 'Country Code' to get full country names
zomato_data = pd.merge(zomato_data, country_code, on='Country Code', how='left')

In [None]:
#preprocessing data, dropping unnecessary columns
zomato_data.drop(['Restaurant ID', 'Restaurant Name', 'Address', 'Locality', 'Locality Verbose', 'Currency'], axis=1, inplace=True)

In [None]:
#encoding categorical variables
le = LabelEncoder()
zomato_data['Has Table booking'] = le.fit_transform(zomato_data['Has Table booking'])
zomato_data['Has Online delivery'] = le.fit_transform(zomato_data['Has Online delivery'])
zomato_data['Is delivering now'] = le.fit_transform(zomato_data['Is delivering now'])
zomato_data['Switch to order menu'] = le.fit_transform(zomato_data['Switch to order menu'])
zomato_data['Rating text'] = le.fit_transform(zomato_data['Rating text'])
zomato_data['Country'] = le.fit_transform(zomato_data['Country'])

In [None]:
#handling missing values
zomato_data.fillna(0, inplace=True)

In [None]:
#splitting data into features and target variables
X = zomato_data.drop(['Average Cost for two', 'Price range'], axis=1)
y_cost = zomato_data['Average Cost for two']
y_price_range = zomato_data['Price range']

In [None]:
#splitting data into training and testing sets
X_train, X_test, y_cost_train, y_cost_test = train_test_split(X, y_cost, test_size=0.2, random_state=42)
X_train, X_test, y_price_range_train, y_price_range_test = train_test_split(X, y_price_range, test_size=0.2, random_state=42)

In [None]:
#Model Training, Random Forest Regressor for predicting Average Cost for two
rf_cost = RandomForestRegressor(n_estimators=100, random_state=42)
rf_cost.fit(X_train, y_cost_train)

In [None]:
#Random Forest Regressor for predicting Price range
rf_price_range = RandomForestRegressor(n_estimators=100, random_state=42)
rf_price_range.fit(X_train, y_price_range_train)

In [None]:
#predictions
y_cost_pred = rf_cost.predict(X_test)
y_price_range_pred = rf_price_range.predict(X_test)

In [None]:
#Model Evaluation
mse_cost = mean_squared_error(y_cost_test, y_cost_pred)
mse_price_range = mean_squared_error(y_price_range_test, y_price_range_pred)

print("Mean Squared Error for Average Cost for two:", mse_cost)
print("Mean Squared Error for Price range:", mse_price_range)