# Task: Predict Restaurant Ratings

##### Objective: Build a machine learning model to predict the aggregate rating of a restaurant based on other features.

 Steps:
 
 1.Preprocess the dataset by handling missing values,
 encoding categorical variables, and splitting the data
 into training and testing sets.
 
 2.Select a regression algorithm (e.g., linear regression,
 decision tree regression) and train it on the training data.
 
 3.Evaluate the model's performance using appropriate
 regression metrics (e.g., mean squared error, R-squared)
 on the testing data.
 
 4.Interpret the model's results and analyze the most
 influential features affecting restaurant ratings

### Importing Libraries

In [30]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder,LabelEncoder
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LinearRegression
from sklearn import metrics
import numpy as np
from sklearn.tree import DecisionTreeRegressor

### Loading of the dataset

In [3]:
df  = pd.read_csv('Dataset.csv')
df

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
0,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City","Century City Mall, Poblacion, Makati City, Mak...",121.027535,14.565443,"French, Japanese, Desserts",...,Botswana Pula(P),Yes,No,No,No,3,4.8,Dark Green,Excellent,314
1,6304287,Izakaya Kikufuji,162,Makati City,"Little Tokyo, 2277 Chino Roces Avenue, Legaspi...","Little Tokyo, Legaspi Village, Makati City","Little Tokyo, Legaspi Village, Makati City, Ma...",121.014101,14.553708,Japanese,...,Botswana Pula(P),Yes,No,No,No,3,4.5,Dark Green,Excellent,591
2,6300002,Heat - Edsa Shangri-La,162,Mandaluyong City,"Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal...","Edsa Shangri-La, Ortigas, Mandaluyong City","Edsa Shangri-La, Ortigas, Mandaluyong City, Ma...",121.056831,14.581404,"Seafood, Asian, Filipino, Indian",...,Botswana Pula(P),Yes,No,No,No,4,4.4,Green,Very Good,270
3,6318506,Ooma,162,Mandaluyong City,"Third Floor, Mega Fashion Hall, SM Megamall, O...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.056475,14.585318,"Japanese, Sushi",...,Botswana Pula(P),No,No,No,No,4,4.9,Dark Green,Excellent,365
4,6314302,Sambo Kojin,162,Mandaluyong City,"Third Floor, Mega Atrium, SM Megamall, Ortigas...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.057508,14.584450,"Japanese, Korean",...,Botswana Pula(P),Yes,No,No,No,4,4.8,Dark Green,Excellent,229
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9546,5915730,Naml۱ Gurme,208,��stanbul,"Kemanke�� Karamustafa Pa��a Mahallesi, R۱ht۱m ...",Karak�_y,"Karak�_y, ��stanbul",28.977392,41.022793,Turkish,...,Turkish Lira(TL),No,No,No,No,3,4.1,Green,Very Good,788
9547,5908749,Ceviz A��ac۱,208,��stanbul,"Ko��uyolu Mahallesi, Muhittin ��st�_nda�� Cadd...",Ko��uyolu,"Ko��uyolu, ��stanbul",29.041297,41.009847,"World Cuisine, Patisserie, Cafe",...,Turkish Lira(TL),No,No,No,No,3,4.2,Green,Very Good,1034
9548,5915807,Huqqa,208,��stanbul,"Kuru�_e��me Mahallesi, Muallim Naci Caddesi, N...",Kuru�_e��me,"Kuru�_e��me, ��stanbul",29.034640,41.055817,"Italian, World Cuisine",...,Turkish Lira(TL),No,No,No,No,4,3.7,Yellow,Good,661
9549,5916112,A���k Kahve,208,��stanbul,"Kuru�_e��me Mahallesi, Muallim Naci Caddesi, N...",Kuru�_e��me,"Kuru�_e��me, ��stanbul",29.036019,41.057979,Restaurant Cafe,...,Turkish Lira(TL),No,No,No,No,4,4.0,Green,Very Good,901


## step1: Data Cleaning and preprocessing 

###### Handeling missing values

In [4]:
df.isnull().sum() #checking for null values

Restaurant ID           0
Restaurant Name         0
Country Code            0
City                    0
Address                 0
Locality                0
Locality Verbose        0
Longitude               0
Latitude                0
Cuisines                9
Average Cost for two    0
Currency                0
Has Table booking       0
Has Online delivery     0
Is delivering now       0
Switch to order menu    0
Price range             0
Aggregate rating        0
Rating color            0
Rating text             0
Votes                   0
dtype: int64

In [4]:
# This will return a DataFrame with only the rows where 'Cuisines' is null
df_with_null_cuisines = df[df['Cuisines'].isnull()]
df_with_null_cuisines

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
84,17284105,Cookie Shoppe,216,Albany,"115 N Jackson St, Albany, GA 31701",Albany,"Albany, Albany",-84.154,31.5772,,...,Dollar($),No,No,No,No,1,3.4,Orange,Average,34
87,17284211,Pearly's Famous Country Cookng,216,Albany,"814 N Slappey Blvd, Albany, GA 31701",Albany,"Albany, Albany",-84.1759,31.5882,,...,Dollar($),No,No,No,No,1,3.4,Orange,Average,36
94,17284158,Jimmie's Hot Dogs,216,Albany,"204 S Jackson St, Albany, GA 31701",Albany,"Albany, Albany",-84.1534,31.5751,,...,Dollar($),No,No,No,No,1,3.9,Yellow,Good,160
297,17374552,Corkscrew Cafe,216,Gainesville,"51 W Main St, Dahlonega, GA 30533",Dahlonega,"Dahlonega, Gainesville",-83.9858,34.5318,,...,Dollar($),No,No,No,No,3,3.9,Yellow,Good,209
328,17501439,Dovetail,216,Macon,"543 Cherry St, Macon, GA 31201",Macon,"Macon, Macon",-83.627979,32.83641,,...,Dollar($),No,No,No,No,3,3.8,Yellow,Good,102
346,17606621,HI Lite Bar & Lounge,216,Miller,"109 N Broadway Ave, Miller, SD 57362",Miller,"Miller, Miller",-98.9891,44.5158,,...,Dollar($),No,No,No,No,1,3.4,Orange,Average,11
368,17059060,Hillstone,216,Orlando,"215 South Orlando Avenue, Winter Park, FL 32789",Winter Park,"Winter Park, Orlando",-81.36526,28.596682,,...,Dollar($),No,No,No,No,3,4.4,Green,Very Good,1158
418,17142698,Leonard's Bakery,216,Rest of Hawaii,"933 Kapahulu Ave, Honolulu, HI 96816",Kaimuki,"Kaimuki, Rest of Hawaii",-157.813432,21.284586,,...,Dollar($),No,No,No,No,1,4.7,Dark Green,Excellent,707
455,17616465,Tybee Island Social Club,216,Savannah,"1311 Butler Ave, Tybee Island, GA 31328",Tybee Island,"Tybee Island, Savannah",-80.848297,31.99581,,...,Dollar($),No,No,No,No,1,3.9,Yellow,Good,309


In [5]:
df['Cuisines'].fillna('Unknown', inplace=True)# filling the Null values with 'Unkown'

#checking
df.isnull().sum()

Restaurant ID           0
Restaurant Name         0
Country Code            0
City                    0
Address                 0
Locality                0
Locality Verbose        0
Longitude               0
Latitude                0
Cuisines                0
Average Cost for two    0
Currency                0
Has Table booking       0
Has Online delivery     0
Is delivering now       0
Switch to order menu    0
Price range             0
Aggregate rating        0
Rating color            0
Rating text             0
Votes                   0
dtype: int64

###### Encoding categorical variables

In [6]:
#Encode categorical variables
label_encoder = LabelEncoder()
for column in df.columns:
    if df[column].dtype == type(object):
        df[column] = label_encoder.fit_transform(df[column])

In [7]:
df # all the columns values are now converted into numerical values

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
0,6317637,3748,162,73,8685,171,172,121.027535,14.565443,920,...,0,1,0,0,0,3,4.8,0,1,314
1,6304287,3172,162,73,6055,593,601,121.014101,14.553708,1111,...,0,1,0,0,0,3,4.5,0,1,591
2,6300002,2896,162,75,4684,308,314,121.056831,14.581404,1671,...,0,1,0,0,0,4,4.4,1,5,270
3,6318506,4707,162,75,8690,862,875,121.056475,14.585318,1126,...,0,0,0,0,0,4,4.9,0,1,365
4,6314302,5523,162,75,8689,862,875,121.057508,14.584450,1122,...,0,1,0,0,0,4,4.8,0,1,229
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9546,5915730,4443,208,140,5926,517,523,28.977392,41.022793,1813,...,11,0,0,0,0,3,4.1,1,5,788
9547,5908749,1310,208,140,5962,552,558,29.041297,41.009847,1825,...,11,0,0,0,0,3,4.2,1,5,1034
9548,5915807,3068,208,140,5966,554,561,29.034640,41.055817,1110,...,11,0,0,0,0,4,3.7,5,2,661
9549,5916112,512,208,140,5967,554,561,29.036019,41.057979,1657,...,11,0,0,0,0,4,4.0,1,5,901


## step2: Selecting a regression algorithm (e.g., linear regression, decision tree regression) and train it on the training data.

##### Split the data into training and testing sets


In [8]:
X = df.drop('Aggregate rating', axis=1)
y = df['Aggregate rating']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

###### Using Linear Regression

In [34]:

model = LinearRegression()
model.fit(X_train, y_train)

# Step 3: Evaluate the model's performance using appropriate regression metrics on the testing data
y_pred = model.predict(X_test)

##### using Decision Tree Regressor

In [37]:
# Create a Decision Tree Regressor and train it
regressor = DecisionTreeRegressor()
regressor.fit(X_train, y_train)
# Make predictions on the test set
y_pred = regressor.predict(X_test)

## step3: Evaluate the model's performance using appropriate regression metrics (e.g., mean squared error, R-squared) on the testing data.

###### Evaluate the model's performance Linear Regression

In [36]:


# Mean Absolute Error (MAE)
mae = metrics.mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error (MAE): {mae}")

# Mean Squared Error (MSE)
mse = metrics.mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse}")

# Root Mean Squared Error (RMSE)
rmse = np.sqrt(mse)
print(f"Root Mean Squared Error (RMSE): {rmse}")

# R-squared (Coefficient of Determination)
r2 = metrics.r2_score(y_test, y_pred)
print(f"R-squared (Coefficient of Determination): {r2}")

# Adjusted R-squared
n = len(y_test)  # number of samples
p = 1  # number of predictors
adjusted_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)
print(f"Adjusted R-squared: {adjusted_r2}")

# Mean Squared Logarithmic Error (MSLE)
msle = metrics.mean_squared_log_error(y_test, y_pred)
print(f"Mean Squared Logarithmic Error (MSLE): {msle}")



Mean Absolute Error (MAE): 0.9228204978205212
Mean Squared Error (MSE): 1.2188260580475843
Root Mean Squared Error (RMSE): 1.1040045552657762
R-squared (Coefficient of Determination): 0.4645133149437365
Adjusted R-squared: 0.4642328085607841
Mean Squared Logarithmic Error (MSLE): 0.20670733644526326


###### Evaluate the model's performance Decision Tree Regressor

In [38]:
# Mean Absolute Error (MAE)
mae = metrics.mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error (MAE): {mae}")

# Mean Squared Error (MSE)
mse = metrics.mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse}")

# Root Mean Squared Error (RMSE)
rmse = np.sqrt(mse)
print(f"Root Mean Squared Error (RMSE): {rmse}")

# R-squared (Coefficient of Determination)
r2 = metrics.r2_score(y_test, y_pred)
print(f"R-squared (Coefficient of Determination): {r2}")

# Adjusted R-squared
n = len(y_test)  # number of samples
p = 1  # number of predictors
adjusted_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)
print(f"Adjusted R-squared: {adjusted_r2}")

# Mean Squared Logarithmic Error (MSLE)
msle = metrics.mean_squared_log_error(y_test, y_pred)
print(f"Mean Squared Logarithmic Error (MSLE): {msle}")

Mean Absolute Error (MAE): 0.14934589220303504
Mean Squared Error (MSE): 0.055771847200418605
Root Mean Squared Error (RMSE): 0.23616063855015848
R-squared (Coefficient of Determination): 0.9754968468391159
Adjusted R-squared: 0.975484011242908
Mean Squared Logarithmic Error (MSLE): 0.0032708160379335107


## step4: The most influential features affecting restaurant ratings

###### influential features affecting restaurant ratings for linear regression

In [39]:
# Step 4: Interpret the model's results and analyze the most influential features affecting restaurant ratings
feature_importances = pd.DataFrame(model.coef_, X.columns, columns=['importance']).sort_values('importance', ascending=False)
print(feature_importances)

                        importance
Has Online delivery   6.761687e-01
Price range           4.956010e-01
Country Code          6.650737e-03
Longitude             8.784570e-04
Votes                 4.865499e-04
Locality Verbose      7.755834e-05
Restaurant Name       2.195503e-06
Average Cost for two  1.392884e-06
Switch to order menu -1.110223e-16
Restaurant ID        -3.726905e-08
Address              -2.235520e-05
Cuisines             -2.069269e-04
Locality             -3.731570e-04
Latitude             -1.243771e-03
City                 -3.035794e-03
Has Table booking    -6.483569e-03
Is delivering now    -3.570934e-02
Currency             -8.426026e-02
Rating color         -1.951402e-01
Rating text          -1.984704e-01


###### influential features affecting restaurant ratings for Decision Tree Regressor

In [20]:
# Get feature importances
importances = regressor.feature_importances_

# Create a DataFrame to display feature importances
feature_importances = pd.DataFrame(importances, index=X.columns, columns=['importance']).sort_values('importance', ascending=False)

print(feature_importances)

                      importance
Votes                   0.898996
Rating color            0.089388
Restaurant ID           0.002336
Address                 0.001410
Longitude               0.001408
Restaurant Name         0.001371
Cuisines                0.001238
Latitude                0.001106
Average Cost for two    0.000918
Locality                0.000561
Locality Verbose        0.000540
Has Online delivery     0.000240
City                    0.000238
Price range             0.000137
Has Table booking       0.000093
Is delivering now       0.000013
Currency                0.000006
Country Code            0.000002
Switch to order menu    0.000000
Rating text             0.000000
