## Task: Restaurant Recommendation
###### Objective: Create a restaurant recommendation system based on user preferences.

 Steps:
 
 Preprocess the dataset by handling missing values and encoding categorical variables.
 
 Determine the criteria for restaurant recommendations (e.g., cuisine preference, price range).
 
 Implement a content-based filtering approach where users are recommended restaurants similar to their preferred criteria.
 
 Test the recommendation system by providing sample user preferences and evaluating the quality of recommendations.


In [2]:
# Import necessary libraries
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics.pairwise import cosine_similarity

In [18]:
# Load the dataset
df = pd.read_csv('dataset.csv')

In [4]:
# Preprocessing
## Handle missing values (you might want to handle these differently depending on your dataset)
df.isnull().sum()

Restaurant ID           0
Restaurant Name         0
Country Code            0
City                    0
Address                 0
Locality                0
Locality Verbose        0
Longitude               0
Latitude                0
Cuisines                9
Average Cost for two    0
Currency                0
Has Table booking       0
Has Online delivery     0
Is delivering now       0
Switch to order menu    0
Price range             0
Aggregate rating        0
Rating color            0
Rating text             0
Votes                   0
dtype: int64

In [6]:
# Separate the data into rows with known Cuisines and unknown Cuisines
data= df # making a copy of the original data set
known_cuisines = data[data['Cuisines'].notna()]
unknown_cuisines = data[data['Cuisines'].isna()]

In [8]:
unknown_cuisines['Cuisines']

84     NaN
87     NaN
94     NaN
297    NaN
328    NaN
346    NaN
368    NaN
418    NaN
455    NaN
Name: Cuisines, dtype: object

In [7]:
filtered_known_cuisines= known_cuisines[[ 'City','Locality','Aggregate rating', 'Votes', 'Price range']]

In [10]:
known_cuisines.columns

Index(['Restaurant ID', 'Restaurant Name', 'Country Code', 'City', 'Address',
       'Locality', 'Locality Verbose', 'Longitude', 'Latitude', 'Cuisines',
       'Average Cost for two', 'Currency', 'Has Table booking',
       'Has Online delivery', 'Is delivering now', 'Switch to order menu',
       'Price range', 'Aggregate rating', 'Rating color', 'Rating text',
       'Votes'],
      dtype='object')

In [11]:
#Encode categorical variables
label_encoder = LabelEncoder()
for column in known_cuisines.columns:
    if known_cuisines[column].dtype == type(object):
        known_cuisines[column] = label_encoder.fit_transform(known_cuisines[column])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  known_cuisines[column] = label_encoder.fit_transform(known_cuisines[column])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  known_cuisines[column] = label_encoder.fit_transform(known_cuisines[column])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  known_cuisines[column] = label_encoder.fit_transfor

In [12]:
known_cuisines

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
0,6317637,3742,162,73,8677,171,172,121.027535,14.565443,920,...,0,1,0,0,0,3,4.8,0,1,314
1,6304287,3167,162,73,6047,592,600,121.014101,14.553708,1111,...,0,1,0,0,0,3,4.5,0,1,591
2,6300002,2892,162,75,4676,308,314,121.056831,14.581404,1671,...,0,1,0,0,0,4,4.4,1,5,270
3,6318506,4700,162,75,8682,860,873,121.056475,14.585318,1126,...,0,0,0,0,0,4,4.9,0,1,365
4,6314302,5515,162,75,8681,860,873,121.057508,14.584450,1122,...,0,1,0,0,0,4,4.8,0,1,229
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9546,5915730,4436,208,139,5918,516,522,28.977392,41.022793,1813,...,11,0,0,0,0,3,4.1,1,5,788
9547,5908749,1310,208,139,5954,551,557,29.041297,41.009847,1824,...,11,0,0,0,0,3,4.2,1,5,1034
9548,5915807,3063,208,139,5958,553,560,29.034640,41.055817,1110,...,11,0,0,0,0,4,3.7,5,2,661
9549,5916112,512,208,139,5959,553,560,29.036019,41.057979,1657,...,11,0,0,0,0,4,4.0,1,5,901


### Classifier Model

In [17]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import accuracy_score, r2_score


In [19]:
# Select the features for the prediction model
# = ['Average Cost for two', 'Has Table booking', 'Has Online delivery', 'Price range']

X = known_cuisines.drop('Cuisines', axis = 1)
y = known_cuisines['Cuisines']


In [None]:
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Assuming X is your feature set and y are the labels
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a SVM Classifier
clf = svm.SVC(kernel='linear') # Linear Kernel

# Train the model using the training sets
clf.fit(X_train, y_train)

# Predict the response for test dataset
y_pred = clf.predict(X_test)

# Model Accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))


In [30]:

r2 = r2_score(y_test, y_pred)
print("R_squared :", r2)

R_squared : -0.38867607761542455


In [None]:
# Predict the missing Cuisines in the unknown_cuisines data
unknown_cuisines['Cuisines'] = dt.predict(unknown_cuisines[features])

# Transform the predicted numerical Cuisines back to their original categorical values
unknown_cuisines['Cuisines'] = le.inverse_transform(unknown_cuisines['Cuisines'])

# Combine the known_cuisines and unknown_cuisines data back together
data = pd.concat([known_cuisines, unknown_cuisines])


## step3 :

In [None]:
## Encoding categorical variables
label_encoder = LabelEncoder()
categorical_features = ['cuisine', 'price_range']  # Add other categorical features here
for feature in categorical_features:
    df[feature] = label_encoder.fit_transform(df[feature])

# Determine the criteria for restaurant recommendations
## This will depend on the user preferences. For example:
user_preferences = {
    'cuisine': 'Italian',
    'price_range': 'Medium'
}
# Convert user preferences to encoded form
for feature in user_preferences:
    user_preferences[feature] = label_encoder.transform([user_preferences[feature]])

# Implement a content-based filtering approach
## Compute the cosine similarity between user preferences and restaurants
user_vector = list(user_preferences.values())
restaurant_vectors = df[categorical_features].values
similarities = cosine_similarity([user_vector], restaurant_vectors)

# Get the top 5 recommended restaurants
top_5_index = similarities[0].argsort()[-5:][::-1]
recommended_restaurants = df.iloc[top_5_index]

print("Recommended Restaurants:")
print(recommended_restaurants)
