Task: Predictive Modeling
Build a regression model to predict the aggregate rating of a restaurant based on available features.
Split the dataset into training and testing setsand evaluate the model's performance using appropriate metrics.
Experiment with different algorithms (e.g.,linear regression, decision trees, randomforest) and compare their performance.

1.Data Preprocessing

In [39]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
df=pd.read_csv('Dataset .csv')
#filling missing values in Cuisine column
most_common_cuisine=df['Cuisines'].mode()[0]
df['Cuisines'].fillna(most_common_cuisine)

#Encoding categorical columns
#Label Encoding currency
label_encoder=LabelEncoder()
df['Currency']=label_encoder.fit_transform(df['Currency'])
#One-hot encoding for Cuisine
df_encoded = pd.get_dummies(df, columns=['Cuisines'], drop_first=True)

#Standardizing the Numerical Features
scaler = StandardScaler()
numerical_columns = ['Average Cost for two', 'Longitude', 'Latitude']
df_encoded[numerical_columns] = scaler.fit_transform(df_encoded[numerical_columns])

#droppinng Unnecessary Columns
df_encoded.drop(columns=['Restaurant ID', 'Restaurant Name'], inplace=True)

2. Splitting the Dataset(80%-Training,20%-testing)

In [49]:
from sklearn.model_selection import train_test_split
X=df_encoded.drop(columns=['Aggregate rating'])
y=df_encoded['Aggregate rating']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)

3. Model Building

i)Convert Non Numeric Columns to Numeric

In [63]:
X_train = X_train.astype({col: 'int32' for col in X_train.select_dtypes(include='bool').columns})
X_test = X_test.astype({col: 'int32' for col in X_test.select_dtypes(include='bool').columns})
print(X_train.dtypes)
print(X_test.dtypes)

Country Code                                 int32
City                                        object
Address                                     object
Locality                                    object
Locality Verbose                            object
                                             ...  
Cuisines_Western, Asian, Cafe                int32
Cuisines_Western, Fusion, Fast Food          int32
Cuisines_World Cuisine                       int32
Cuisines_World Cuisine, Mexican, Italian     int32
Cuisines_World Cuisine, Patisserie, Cafe     int32
Length: 1841, dtype: object
Country Code                                 int32
City                                        object
Address                                     object
Locality                                    object
Locality Verbose                            object
                                             ...  
Cuisines_Western, Asian, Cafe                int32
Cuisines_Western, Fusion, Fast Food          int32
Cui

ii)Select Only Numerical Columns

In [65]:
X_train = X_train.select_dtypes(include=['int32', 'float64'])
X_test = X_test.select_dtypes(include=['int32', 'float64'])

# Check the shape of the training and testing datasets
print(X_train.shape)
print(X_test.shape)

(7640, 1831)
(1911, 1831)


a.Linear Regression

In [73]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score,root_mean_squared_error

#initializing linear Regression

lin_reg=LinearRegression()
lin_reg.fit(X_train,y_train)

#predict on test set
y_pred=lin_reg.predict(X_test)

#Evaluating the model
mae=mean_absolute_error(y_test,y_pred)
mse=mean_squared_error(y_test,y_pred)
rmse=root_mean_squared_error(y_test,y_pred)
r2=r2_score(y_test,y_pred,)

print("Linear Regression Performance:")
print(f"MAE:{mae}")
print(f"MSE:{mse}")
print(f"RMSE:{rmse}")
print(f"R2:{r2}")


Linear Regression Performance:
MAE:1.096367703447295
MSE:1.7005770832314193
RMSE:1.3040617635800151
R2:0.2528577979035217


b.Decision Tree Regression

In [85]:
from sklearn.tree import DecisionTreeRegressor

decision_tree=DecisionTreeRegressor(random_state=42)
decision_tree.fit(X_train,y_train)

y_pred=decision_tree.predict(X_test)

#evalate the model
mae=mean_absolute_error(y_test,y_pred)
mse=mean_squared_error(y_test,y_pred)
rmse=root_mean_squared_error(y_test,y_pred)
r2=r2_score(y_test,y_pred)
print("Decision Tree Regression Performance:")
print(f"MAE:{mae}")
print(f"MSE:{mse}")
print(f"RMSE:{rmse}")
print(f"R2:{r2}")

Decision Tree Regression Performance:
MAE:0.25802544176563014
MSE:0.16346177291308037
RMSE:0.40430405997600416
R2:0.928183679424396


c.Random Forest Regression

In [87]:
from sklearn.ensemble import RandomForestRegressor

ran_reg=RandomForestRegressor(random_state=42)
ran_reg.fit(X_train,y_train)

y_pred=ran_reg.predict(X_test)

mae=mean_absolute_error(y_test,y_pred)
mse=mean_squared_error(y_test,y_pred)
rmse=root_mean_squared_error(y_test,y_pred)
r2=r2_score(y_test,y_pred)
print("Random Forest Regression Performance:")
print(f"MAE:{mae}")
print(f"MSE:{mse}")
print(f"RMSE:{rmse}")
print(f"R2:{r2}")

Random Forest Regression Performance:
MAE:0.2213488705517616
MSE:0.11845429427762023
RMSE:0.34417189640878615
R2:0.9479575473837419
