 ## Task: Cuisine Classification
 #### Objective: Develop a machine learning model to classify restaurants based on their cuisines.
 Steps:
 - Preprocess the dataset by handling missing values and encoding categorical variables.
 - Split the data into training and testing sets.
 - Select a classification algorithm (e.g., logistic regression, random forest) and train it on the training data.
 - Evaluate the model's performance using appropriate classification metrics (e.g., accuracy, precision, recall) on the testing data.
 - Analyze the model's performance across different cuisines and identify any challenges or biases.


In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, MultiLabelBinarizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score



In [6]:
# Load the dataset
df = pd.read_csv('Dataset.csv')

# Display the first few rows of the dataset
df

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
0,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City","Century City Mall, Poblacion, Makati City, Mak...",121.027535,14.565443,"French, Japanese, Desserts",...,Botswana Pula(P),Yes,No,No,No,3,4.8,Dark Green,Excellent,314
1,6304287,Izakaya Kikufuji,162,Makati City,"Little Tokyo, 2277 Chino Roces Avenue, Legaspi...","Little Tokyo, Legaspi Village, Makati City","Little Tokyo, Legaspi Village, Makati City, Ma...",121.014101,14.553708,Japanese,...,Botswana Pula(P),Yes,No,No,No,3,4.5,Dark Green,Excellent,591
2,6300002,Heat - Edsa Shangri-La,162,Mandaluyong City,"Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal...","Edsa Shangri-La, Ortigas, Mandaluyong City","Edsa Shangri-La, Ortigas, Mandaluyong City, Ma...",121.056831,14.581404,"Seafood, Asian, Filipino, Indian",...,Botswana Pula(P),Yes,No,No,No,4,4.4,Green,Very Good,270
3,6318506,Ooma,162,Mandaluyong City,"Third Floor, Mega Fashion Hall, SM Megamall, O...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.056475,14.585318,"Japanese, Sushi",...,Botswana Pula(P),No,No,No,No,4,4.9,Dark Green,Excellent,365
4,6314302,Sambo Kojin,162,Mandaluyong City,"Third Floor, Mega Atrium, SM Megamall, Ortigas...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.057508,14.584450,"Japanese, Korean",...,Botswana Pula(P),Yes,No,No,No,4,4.8,Dark Green,Excellent,229
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9546,5915730,Naml۱ Gurme,208,��stanbul,"Kemanke�� Karamustafa Pa��a Mahallesi, R۱ht۱m ...",Karak�_y,"Karak�_y, ��stanbul",28.977392,41.022793,Turkish,...,Turkish Lira(TL),No,No,No,No,3,4.1,Green,Very Good,788
9547,5908749,Ceviz A��ac۱,208,��stanbul,"Ko��uyolu Mahallesi, Muhittin ��st�_nda�� Cadd...",Ko��uyolu,"Ko��uyolu, ��stanbul",29.041297,41.009847,"World Cuisine, Patisserie, Cafe",...,Turkish Lira(TL),No,No,No,No,3,4.2,Green,Very Good,1034
9548,5915807,Huqqa,208,��stanbul,"Kuru�_e��me Mahallesi, Muallim Naci Caddesi, N...",Kuru�_e��me,"Kuru�_e��me, ��stanbul",29.034640,41.055817,"Italian, World Cuisine",...,Turkish Lira(TL),No,No,No,No,4,3.7,Yellow,Good,661
9549,5916112,A���k Kahve,208,��stanbul,"Kuru�_e��me Mahallesi, Muallim Naci Caddesi, N...",Kuru�_e��me,"Kuru�_e��me, ��stanbul",29.036019,41.057979,Restaurant Cafe,...,Turkish Lira(TL),No,No,No,No,4,4.0,Green,Very Good,901


In [7]:
# Check for missing values
df.isnull().sum()

Restaurant ID           0
Restaurant Name         0
Country Code            0
City                    0
Address                 0
Locality                0
Locality Verbose        0
Longitude               0
Latitude                0
Cuisines                9
Average Cost for two    0
Currency                0
Has Table booking       0
Has Online delivery     0
Is delivering now       0
Switch to order menu    0
Price range             0
Aggregate rating        0
Rating color            0
Rating text             0
Votes                   0
dtype: int64

In [8]:
# missing at Albany
df.loc[df['Restaurant Name'] == 'Cookie Shoppe', 'Cuisines'] = 'Coffee, Tea, cookie'
df.loc[df['Restaurant Name'] == "Pearly's Famous Country Cookng", 'Cuisines'] = 'American, Breakfast, Diner'
df.loc[df['Restaurant Name'] == "Jimmie's Hot Dogs", 'Cuisines'] = 'Hot Dogs'
df.loc[df['Restaurant Name'] == "Corkscrew Cafe", 'Cuisines'] = 'Coffee and Tea, Sandwich'
df.loc[df['Restaurant Name'] == 'Dovetail', 'Cuisines'] = 'Italian'
df.loc[df['Restaurant Name'] == 'HI Lite Bar & Lounge', 'Cuisines'] = 'American, Breakfast, Diner'
df.loc[df['Restaurant Name'] == 'Dovetail', 'Cuisines'] = 'Italian'
df.loc[df['Restaurant Name'] == 'Hillstone', 'Cuisines'] = 'American, BBQ, Sandwich'
df.loc[df['Restaurant Name'] == "Leonard's Bakery", 'Cuisines'] = 'Breakfast, Burger'
df.loc[df['Restaurant Name'] == 'Tybee Island Social Club', 'Cuisines'] = 'American, Seafood, Southern'


In [9]:
# Check for missing values
df.isnull().sum()

Restaurant ID           0
Restaurant Name         0
Country Code            0
City                    0
Address                 0
Locality                0
Locality Verbose        0
Longitude               0
Latitude                0
Cuisines                0
Average Cost for two    0
Currency                0
Has Table booking       0
Has Online delivery     0
Is delivering now       0
Switch to order menu    0
Price range             0
Aggregate rating        0
Rating color            0
Rating text             0
Votes                   0
dtype: int64

In [11]:
# Fill missing values if necessary (e.g., with a placeholder or using imputation)
# Here we assume there are no missing values for simplicity

# Encode the 'Cuisines' column using MultiLabelBinarizer (as it contains multiple cuisines per restaurant)
mlb = MultiLabelBinarizer()
cuisines_encoded = mlb.fit_transform(df['Cuisines'].str.split(', '))

# Convert the encoded cuisines back to a DataFrame
cuisines_df = pd.DataFrame(cuisines_encoded, columns=mlb.classes_)

# Combine the original data with the encoded cuisines
df = pd.concat([df, cuisines_df], axis=1)

# Drop the original 'Cuisines' column as it's now encoded
df.drop(['Cuisines'], axis=1, inplace=True)

# Display the first few rows of the preprocessed dataset
df.head()


Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Average Cost for two,...,Tex-Mex,Thai,Tibetan,Turkish,Turkish Pizza,Vegetarian,Vietnamese,Western,World Cuisine,cookie
0,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City","Century City Mall, Poblacion, Makati City, Mak...",121.027535,14.565443,1100,...,0,0,0,0,0,0,0,0,0,0
1,6304287,Izakaya Kikufuji,162,Makati City,"Little Tokyo, 2277 Chino Roces Avenue, Legaspi...","Little Tokyo, Legaspi Village, Makati City","Little Tokyo, Legaspi Village, Makati City, Ma...",121.014101,14.553708,1200,...,0,0,0,0,0,0,0,0,0,0
2,6300002,Heat - Edsa Shangri-La,162,Mandaluyong City,"Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal...","Edsa Shangri-La, Ortigas, Mandaluyong City","Edsa Shangri-La, Ortigas, Mandaluyong City, Ma...",121.056831,14.581404,4000,...,0,0,0,0,0,0,0,0,0,0
3,6318506,Ooma,162,Mandaluyong City,"Third Floor, Mega Fashion Hall, SM Megamall, O...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.056475,14.585318,1500,...,0,0,0,0,0,0,0,0,0,0
4,6314302,Sambo Kojin,162,Mandaluyong City,"Third Floor, Mega Atrium, SM Megamall, Ortigas...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.057508,14.58445,1500,...,0,0,0,0,0,0,0,0,0,0


In [12]:
# Define the features and target
X = df.drop(['Restaurant ID', 'Restaurant Name', 'Country Code', 'City', 'Address', 'Locality', 'Locality Verbose', 'Longitude', 'Latitude', 'Currency', 'Has Table booking', 'Has Online delivery', 'Is delivering now', 'Switch to order menu', 'Price range', 'Aggregate rating', 'Rating color', 'Rating text', 'Votes'], axis=1)
y = cuisines_encoded

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [13]:
# Initialize the classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the classifier
clf.fit(X_train, y_train)


In [14]:
# Predict on the test data
y_pred = clf.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Generate classification report
report = classification_report(y_test, y_pred, target_names=mlb.classes_)

accuracy, report


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


(0.9089481946624803,
 '                   precision    recall  f1-score   support\n\n          Afghani       0.00      0.00      0.00         1\n          African       0.00      0.00      0.00         0\n         American       1.00      0.99      0.99        67\n           Andhra       0.00      0.00      0.00         2\n          Arabian       1.00      1.00      1.00         3\n        Argentine       0.00      0.00      0.00         0\n         Armenian       0.00      0.00      0.00         0\n            Asian       1.00      0.98      0.99        45\n     Asian Fusion       0.00      0.00      0.00         0\n         Assamese       0.00      0.00      0.00         0\n       Australian       0.00      0.00      0.00         0\n           Awadhi       1.00      1.00      1.00         1\n              BBQ       1.00      0.86      0.92         7\n           Bakery       1.00      0.98      0.99       138\n         Bar Food       1.00      0.67      0.80        12\n          Belgi

In [15]:
# Display the classification report
print(report)


                   precision    recall  f1-score   support

          Afghani       0.00      0.00      0.00         1
          African       0.00      0.00      0.00         0
         American       1.00      0.99      0.99        67
           Andhra       0.00      0.00      0.00         2
          Arabian       1.00      1.00      1.00         3
        Argentine       0.00      0.00      0.00         0
         Armenian       0.00      0.00      0.00         0
            Asian       1.00      0.98      0.99        45
     Asian Fusion       0.00      0.00      0.00         0
         Assamese       0.00      0.00      0.00         0
       Australian       0.00      0.00      0.00         0
           Awadhi       1.00      1.00      1.00         1
              BBQ       1.00      0.86      0.92         7
           Bakery       1.00      0.98      0.99       138
         Bar Food       1.00      0.67      0.80        12
          Belgian       0.00      0.00      0.00       