### Level 3

**Task 1 : Predictive Modeling**

- Build a regression model to predict the
aggregate rating of a restaurant based on
available features.

- Split the dataset into training and testing sets
and evaluate the model's performance using
appropriate metrics.

- Experiment with different algorithms (e.g.,
linear regression, decision trees, random
forest) and compare their performance.

In [19]:
import pandas as pd
import numpy as np

df = pd.read_csv("/Users/pranay/Downloads/Cognifyz DS Internship/Dataset.csv")

In [21]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor

In [23]:
print(df.columns)

Index(['Restaurant ID', 'Restaurant Name', 'Country Code', 'City', 'Address',
       'Locality', 'Locality Verbose', 'Longitude', 'Latitude', 'Cuisines',
       'Average Cost for two', 'Currency', 'Has Table booking',
       'Has Online delivery', 'Is delivering now', 'Switch to order menu',
       'Price range', 'Aggregate rating', 'Rating color', 'Rating text',
       'Votes'],
      dtype='object')


In [25]:
# Target

y = df["Aggregate rating"]

In [27]:
df["Restaurant Name Length"] = df["Restaurant Name"].astype(str).apply(len)
df["Address Length"] = df["Address"].astype(str).apply(len)

print(df[["Restaurant Name Length", "Address Length"]].head())

   Restaurant Name Length  Address Length
0                      16              71
1                      16              67
2                      22              56
3                       4              70
4                      11              64


In [29]:
# Selecting features (only safe numeric)

X = df[["Average Cost for two", "Price range", "Votes", "Restaurant Name Length", "Address Length"]].copy()

In [31]:
# Fill missing just in case

X = X.fillna(X.median())

In [33]:
# Train test split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

models = {
    "Linear Regression": LinearRegression(),
    "Decision Tree": DecisionTreeRegressor(random_state=42),
    "Random Forest": RandomForestRegressor(random_state=42, n_estimators=200)
}

for name, model in models.items():
    model.fit(X_train, y_train)
    preds = model.predict(X_test)

    rmse = np.sqrt(mean_squared_error(y_test, preds))
    r2 = r2_score(y_test, preds)

    print(f"\n{name} Results:")
    print("RMSE:", rmse)
    print("R2 Score:", r2)


Linear Regression Results:
RMSE: 1.3231205076360941
R2 Score: 0.23085938099600445

Decision Tree Results:
RMSE: 0.47831713762359274
R2 Score: 0.8994831595857286

Random Forest Results:
RMSE: 0.3427541082578168
R2 Score: 0.9483854335500703
