<a href="https://colab.research.google.com/github/rishikaa1/cognifyz_technologies/blob/main/task3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Task 3: Cuisine Classification
Objective: Develop a machine learing model to classify restaurants based on their cuisines.

# Importing the libraries



In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report



# Loading the dataset

In [2]:
data = pd.read_csv('Dataset .csv')


# Dropping columns that are not required

In [3]:
data.drop(['Restaurant ID', 'Country Code', 'City', 'Address', 'Locality', 'Locality Verbose', 'Longitude', 'Latitude', 'Currency', 'Has Table booking', 'Has Online delivery', 'Is delivering now', 'Switch to order menu', 'Price range', 'Aggregate rating', 'Rating color', 'Rating text', 'Votes'], axis=1, inplace=True)
print(data)

               Restaurant Name                          Cuisines  \
0             Le Petit Souffle        French, Japanese, Desserts   
1             Izakaya Kikufuji                          Japanese   
2       Heat - Edsa Shangri-La  Seafood, Asian, Filipino, Indian   
3                         Ooma                   Japanese, Sushi   
4                  Sambo Kojin                  Japanese, Korean   
...                        ...                               ...   
9546               Naml۱ Gurme                           Turkish   
9547              Ceviz A��ac۱   World Cuisine, Patisserie, Cafe   
9548                     Huqqa            Italian, World Cuisine   
9549               A���k Kahve                   Restaurant Cafe   
9550  Walter's Coffee Roastery                              Cafe   

      Average Cost for two  
0                     1100  
1                     1200  
2                     4000  
3                     1500  
4                     1500  
...      

# Handling missing values

In [4]:
data.isnull().sum()

Restaurant Name         0
Cuisines                9
Average Cost for two    0
dtype: int64

In [6]:
data_cleaned = data.dropna(subset=['Restaurant Name', 'Cuisines', 'Average Cost for two'])

# Encoding categorical data



In [7]:
encoder = LabelEncoder()
data['Restaurant Name'] = encoder.fit_transform(data['Restaurant Name'])
data['Cuisines'] = encoder.fit_transform(data['Cuisines'])
print(data)

      Restaurant Name  Cuisines  Average Cost for two
0                3748       920                  1100
1                3172      1111                  1200
2                2896      1671                  4000
3                4707      1126                  1500
4                5523      1122                  1500
...               ...       ...                   ...
9546             4443      1813                    80
9547             1310      1824                   105
9548             3068      1110                   170
9549              512      1657                   120
9550             7240       331                    55

[9551 rows x 3 columns]


# Building Random Forest Model

In [8]:
X = data[['Restaurant Name', 'Average Cost for two']]
y = data['Cuisines']


Splitting dataset into training set and test set

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)


Feature Scaling

In [10]:
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Training the model

In [11]:
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

Predicting the results

In [12]:
y_pred = classifier.predict(X_test)
print(list(zip(y_test, y_pred)))

[(1329, 110), (1416, 741), (953, 953), (1266, 788), (1306, 1306), (1306, 1329), (1406, 896), (1038, 390), (1520, 834), (331, 1723), (58, 58), (1667, 1743), (1288, 1288), (1306, 1514), (984, 586), (1778, 1306), (58, 58), (1615, 919), (518, 828), (1306, 1329), (1749, 828), (1514, 1425), (623, 186), (54, 54), (1306, 1329), (497, 1306), (1306, 1494), (331, 201), (660, 154), (1711, 1514), (331, 331), (1709, 1111), (1665, 306), (1520, 1514), (1765, 186), (1306, 1306), (1514, 186), (986, 986), (1749, 1749), (1765, 1262), (842, 758), (1761, 1597), (1306, 1306), (1514, 1520), (300, 1031), (1306, 1554), (1105, 1098), (892, 920), (1329, 1329), (497, 905), (996, 1236), (1617, 1306), (1794, 670), (878, 1102), (828, 1699), (828, 828), (1723, 837), (1008, 1212), (1306, 1306), (201, 1262), (1554, 841), (833, 689), (331, 1381), (1275, 1765), (1329, 741), (1792, 1792), (1394, 358), (1723, 1723), (186, 1306), (1626, 1626), (18, 1795), (177, 186), (58, 58), (497, 1587), (105, 1090), (1373, 1313), (934, 54

# Evaluating Model's Performance

In [14]:
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix: ")
print(cm)

print("Classification Report: ")
print(classification_report(y_test, y_pred))

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: ",accuracy)

Confusion Matrix: 
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Classification Report: 
              precision    recall  f1-score   support

           3       0.00      0.00      0.00         1
           4       0.00      0.00      0.00         1
           6       0.00      0.00      0.00         5
           7       0.00      0.00      0.00         0
          11       0.00      0.00      0.00         1
          14       0.00      0.00      0.00         1
          18       0.00      0.00      0.00         2
          19       0.00      0.00      0.00         0
          20       0.00      0.00      0.00         0
          21       0.00      0.00      0.00         1
          22       0.00      0.00      0.00         1
          27       0.00      0.00      0.00         0
          29       0.00      0.00      0.00         0
          30       0.00      0.00      0.00         1
          31       0.00   

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


#Conclusion:
The model's accuracy is 21.87%, indicating suboptimal performance. Additionally, the low macro-average and weighted-average scores suggest that the model is performing poorly across all cuisines, rather than just a few. This could be due to a variety of reasons, including an imbalanced dataset with some cuisines having significantly fewer samples than others, or the presence of irrelevant or noisy features that are confusing the model.