# Recommendation System GPT-4-Turbo

Building a content-based recommender system. When a restaurant name is provided, the system will analyze the reviews of other restaurants and recommend options with similar reviews. The recommendations will be sorted by rating, prioritizing the highest-rated restaurants.



### Notebook Overview

1. **Loading the Dataset:**
   - Import necessary libraries and load the data.
   
2. **Data Cleaning:**
   - Remove redundant columns.
   - Rename columns for better clarity.
   - Eliminate duplicate entries.
   - Clean individual columns as required.
   - Remove any NaN values from the dataset.
   - Apply additional transformations as needed.

3. **Text Preprocessing:**
   - Remove unnecessary words from reviews.
   - Strip out links and other extraneous elements.
   - Eliminate unwanted symbols.

4. **Recommendation System:**
   - Develop and implement the recommendation algorithm.

5. **LLM GPT-4_Turbo :**
   - Enhancing Recommendations with GPT-4.

### Importing Libraries

In [None]:
# Importing Libraries
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import r2_score
import warnings
warnings.filterwarnings('always')
warnings.filterwarnings('ignore')
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from sklearn.metrics.pairwise import linear_kernel
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [None]:
# Mount Google Drive.
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Loading the dataset

In [None]:
# Reading the dataset
zomato_real=pd.read_csv("/content/drive/MyDrive/Assignments/Customer Analytics GPT-4/data.csv")

# Prints the first N rows of a DataFrame
zomato_real.head()

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,dish_liked,cuisines,approx_cost(for two people),reviews_list,menu_item,listed_in(type),listed_in(city)
0,https://www.zomato.com/bangalore/jalsa-banasha...,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,Yes,Yes,4.1/5,775,080 42297555\r\n+91 9743772233,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari
1,https://www.zomato.com/bangalore/spice-elephan...,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,Yes,No,4.1/5,787,080 41714161,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari
2,https://www.zomato.com/SanchurroBangalore?cont...,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,Yes,No,3.8/5,918,+91 9663487993,Banashankari,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari
3,https://www.zomato.com/bangalore/addhuri-udupi...,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,No,No,3.7/5,88,+91 9620009302,Banashankari,Quick Bites,Masala Dosa,"South Indian, North Indian",300,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari
4,https://www.zomato.com/bangalore/grand-village...,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,No,No,3.8/5,166,+91 8026612447\r\n+91 9901210005,Basavanagudi,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari


In [None]:
zomato_real.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51717 entries, 0 to 51716
Data columns (total 17 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   url                          51717 non-null  object
 1   address                      51717 non-null  object
 2   name                         51717 non-null  object
 3   online_order                 51717 non-null  object
 4   book_table                   51717 non-null  object
 5   rate                         43942 non-null  object
 6   votes                        51717 non-null  int64 
 7   phone                        50509 non-null  object
 8   location                     51696 non-null  object
 9   rest_type                    51490 non-null  object
 10  dish_liked                   23639 non-null  object
 11  cuisines                     51672 non-null  object
 12  approx_cost(for two people)  51371 non-null  object
 13  reviews_list                 51

### Data Cleaning and Feature Engineering

In [None]:
# Deleting Unnnecessary Columns
zomato = zomato_real.drop(['url','dish_liked','phone'],axis=1) # Dropping the column "dish_liked", "phone", "url" and saving the new dataset as "zomato"

In [None]:
# Removing the Duplicates
zomato.duplicated().sum()
zomato.drop_duplicates(inplace=True)

In [None]:
# Remove the NaN values from the dataset
zomato.isnull().sum()
zomato.dropna(how='any',inplace=True)
zomato.info() # .info() function is used to get a concise summary of the dataframe

<class 'pandas.core.frame.DataFrame'>
Index: 43499 entries, 0 to 51716
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   address                      43499 non-null  object
 1   name                         43499 non-null  object
 2   online_order                 43499 non-null  object
 3   book_table                   43499 non-null  object
 4   rate                         43499 non-null  object
 5   votes                        43499 non-null  int64 
 6   location                     43499 non-null  object
 7   rest_type                    43499 non-null  object
 8   cuisines                     43499 non-null  object
 9   approx_cost(for two people)  43499 non-null  object
 10  reviews_list                 43499 non-null  object
 11  menu_item                    43499 non-null  object
 12  listed_in(type)              43499 non-null  object
 13  listed_in(city)              43499 n

In [None]:
# Reading Column Names
zomato.columns

Index(['address', 'name', 'online_order', 'book_table', 'rate', 'votes',
       'location', 'rest_type', 'cuisines', 'approx_cost(for two people)',
       'reviews_list', 'menu_item', 'listed_in(type)', 'listed_in(city)'],
      dtype='object')

In [None]:
# Changing the column names
zomato = zomato.rename(columns={'approx_cost(for two people)':'cost','listed_in(type)':'type',
                                  'listed_in(city)':'city'})
zomato.columns

Index(['address', 'name', 'online_order', 'book_table', 'rate', 'votes',
       'location', 'rest_type', 'cuisines', 'cost', 'reviews_list',
       'menu_item', 'type', 'city'],
      dtype='object')

In [None]:
# Some Transformations
zomato['cost'] = zomato['cost'].astype(str) # Changing the cost to string
zomato['cost'] = zomato['cost'].apply(lambda x: x.replace(',','.')) # Using lambda function to replace ',' from cost
zomato['cost'] = zomato['cost'].astype(float) # Changing the cost to Float
zomato.info()

<class 'pandas.core.frame.DataFrame'>
Index: 43499 entries, 0 to 51716
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   address       43499 non-null  object 
 1   name          43499 non-null  object 
 2   online_order  43499 non-null  object 
 3   book_table    43499 non-null  object 
 4   rate          43499 non-null  object 
 5   votes         43499 non-null  int64  
 6   location      43499 non-null  object 
 7   rest_type     43499 non-null  object 
 8   cuisines      43499 non-null  object 
 9   cost          43499 non-null  float64
 10  reviews_list  43499 non-null  object 
 11  menu_item     43499 non-null  object 
 12  type          43499 non-null  object 
 13  city          43499 non-null  object 
dtypes: float64(1), int64(1), object(12)
memory usage: 5.0+ MB


In [None]:
# Reading Rate of dataset
zomato['rate'].unique()

array(['4.1/5', '3.8/5', '3.7/5', '3.6/5', '4.6/5', '4.0/5', '4.2/5',
       '3.9/5', '3.1/5', '3.0/5', '3.2/5', '3.3/5', '2.8/5', '4.4/5',
       '4.3/5', 'NEW', '2.9/5', '3.5/5', '2.6/5', '3.8 /5', '3.4/5',
       '4.5/5', '2.5/5', '2.7/5', '4.7/5', '2.4/5', '2.2/5', '2.3/5',
       '3.4 /5', '-', '3.6 /5', '4.8/5', '3.9 /5', '4.2 /5', '4.0 /5',
       '4.1 /5', '3.7 /5', '3.1 /5', '2.9 /5', '3.3 /5', '2.8 /5',
       '3.5 /5', '2.7 /5', '2.5 /5', '3.2 /5', '2.6 /5', '4.5 /5',
       '4.3 /5', '4.4 /5', '4.9/5', '2.1/5', '2.0/5', '1.8/5', '4.6 /5',
       '4.9 /5', '3.0 /5', '4.8 /5', '2.3 /5', '4.7 /5', '2.4 /5',
       '2.1 /5', '2.2 /5', '2.0 /5', '1.8 /5'], dtype=object)

In [None]:
# Removing '/5' from Rates
# Filter out rows where 'rate' is 'NEW' or '-'
zomato = zomato.loc[zomato.rate != 'NEW']
zomato = zomato.loc[zomato.rate != '-'].reset_index(drop=True)

# Update the lambda function to use built-in str
remove_slash = lambda x: x.replace('/5', '') if isinstance(x, str) else x
zomato.rate = zomato.rate.apply(remove_slash).str.strip().astype('float')

# Display the first few entries of the 'rate' column
print(zomato['rate'].head())

0    4.1
1    4.1
2    3.8
3    3.7
4    3.8
Name: rate, dtype: float64


In [None]:
# Adjust the column names
zomato.name = zomato.name.apply(lambda x:x.title())
zomato.online_order.replace(('Yes','No'),(True, False),inplace=True)
zomato.book_table.replace(('Yes','No'),(True, False),inplace=True)
zomato.cost.unique()

array([800.  , 300.  , 600.  , 700.  , 550.  , 500.  , 450.  , 650.  ,
       400.  , 900.  , 200.  , 750.  , 150.  , 850.  , 100.  ,   1.2 ,
       350.  , 250.  , 950.  ,   1.  ,   1.5 ,   1.3 , 199.  ,   1.1 ,
         1.6 , 230.  , 130.  ,   1.7 ,   1.35,   2.2 ,   1.4 ,   2.  ,
         1.8 ,   1.9 , 180.  , 330.  ,   2.5 ,   2.1 ,   3.  ,   2.8 ,
         3.4 ,  50.  ,  40.  ,   1.25,   3.5 ,   4.  ,   2.4 ,   2.6 ,
         1.45,  70.  ,   3.2 , 240.  ,   6.  ,   1.05,   2.3 ,   4.1 ,
       120.  ,   5.  ,   3.7 ,   1.65,   2.7 ,   4.5 ,  80.  ])

In [None]:
zomato.head()

Unnamed: 0,address,name,online_order,book_table,rate,votes,location,rest_type,cuisines,cost,reviews_list,menu_item,type,city
0,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,True,True,4.1,775,Banashankari,Casual Dining,"North Indian, Mughlai, Chinese",800.0,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari
1,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,True,False,4.1,787,Banashankari,Casual Dining,"Chinese, North Indian, Thai",800.0,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari
2,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,True,False,3.8,918,Banashankari,"Cafe, Casual Dining","Cafe, Mexican, Italian",800.0,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari
3,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,False,False,3.7,88,Banashankari,Quick Bites,"South Indian, North Indian",300.0,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari
4,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,False,False,3.8,166,Basavanagudi,Casual Dining,"North Indian, Rajasthani",600.0,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari


In [None]:
zomato['city'].unique()

array(['Banashankari', 'Bannerghatta Road', 'Basavanagudi', 'Bellandur',
       'Brigade Road', 'Brookefield', 'BTM', 'Church Street',
       'Electronic City', 'Frazer Town', 'HSR', 'Indiranagar',
       'Jayanagar', 'JP Nagar', 'Kalyan Nagar', 'Kammanahalli',
       'Koramangala 4th Block', 'Koramangala 5th Block',
       'Koramangala 6th Block', 'Koramangala 7th Block', 'Lavelle Road',
       'Malleshwaram', 'Marathahalli', 'MG Road', 'New BEL Road',
       'Old Airport Road', 'Rajajinagar', 'Residency Road',
       'Sarjapur Road', 'Whitefield'], dtype=object)

In [None]:
zomato.head()

Unnamed: 0,address,name,online_order,book_table,rate,votes,location,rest_type,cuisines,cost,reviews_list,menu_item,type,city
0,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,True,True,4.1,775,Banashankari,Casual Dining,"North Indian, Mughlai, Chinese",800.0,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari
1,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,True,False,4.1,787,Banashankari,Casual Dining,"Chinese, North Indian, Thai",800.0,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari
2,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,True,False,3.8,918,Banashankari,"Cafe, Casual Dining","Cafe, Mexican, Italian",800.0,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari
3,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,False,False,3.7,88,Banashankari,Quick Bites,"South Indian, North Indian",300.0,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari
4,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,False,False,3.8,166,Basavanagudi,Casual Dining,"North Indian, Rajasthani",600.0,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari


In [None]:
# Checking Null values
zomato.isnull().sum()

Unnamed: 0,0
address,0
name,0
online_order,0
book_table,0
rate,0
votes,0
location,0
rest_type,0
cuisines,0
cost,0


In [None]:
# Computing Mean Rating
restaurants = list(zomato['name'].unique())
zomato['Mean Rating'] = 0

for i in range(len(restaurants)):
    zomato['Mean Rating'][zomato['name'] == restaurants[i]] = zomato['rate'][zomato['name'] == restaurants[i]].mean()

In [None]:
zomato.head()

Unnamed: 0,address,name,online_order,book_table,rate,votes,location,rest_type,cuisines,cost,reviews_list,menu_item,type,city,Mean Rating
0,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,True,True,4.1,775,Banashankari,Casual Dining,"North Indian, Mughlai, Chinese",800.0,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari,4.118182
1,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,True,False,4.1,787,Banashankari,Casual Dining,"Chinese, North Indian, Thai",800.0,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari,4.1
2,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,True,False,3.8,918,Banashankari,"Cafe, Casual Dining","Cafe, Mexican, Italian",800.0,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari,3.8
3,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,False,False,3.7,88,Banashankari,Quick Bites,"South Indian, North Indian",300.0,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari,3.7
4,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,False,False,3.8,166,Basavanagudi,Casual Dining,"North Indian, Rajasthani",600.0,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari,3.8


In [None]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range = (1,5))

zomato[['Mean Rating']] = scaler.fit_transform(zomato[['Mean Rating']]).round(2)

zomato.sample(3)

Unnamed: 0,address,name,online_order,book_table,rate,votes,location,rest_type,cuisines,cost,reviews_list,menu_item,type,city,Mean Rating
12385,"12, Pragathi Mansion, 1st Cross, Koramangala 5...",Cafe Jezve,True,False,3.8,18,Koramangala 5th Block,Cafe,Cafe,500.0,"[('Rated 4.0', ""RATED\n Quint Little Arabian ...",[],Dine-out,Frazer Town,3.65
7440,"130, 1st Cross, Jwoti Nivas College Road, Kora...",Bonsouth,True,True,4.2,2616,Koramangala 5th Block,Casual Dining,"Chettinad, Andhra, Kerala",1.3,"[('Rated 4.0', 'RATED\n WeÃ\x83Ã\x83Ã\x82Ã...",[],Delivery,BTM,4.1
37125,"404, 11th Cross, 1st N Block, Opposite Corpora...",Goli Vada Pav No. 1,False,False,3.6,42,Rajajinagar,Quick Bites,Street Food,200.0,"[('Rated 3.0', ""RATED\n I've tried almost eve...",[],Delivery,Rajajinagar,3.24


In [None]:
zomato.head()

Unnamed: 0,address,name,online_order,book_table,rate,votes,location,rest_type,cuisines,cost,reviews_list,menu_item,type,city,Mean Rating
0,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,True,True,4.1,775,Banashankari,Casual Dining,"North Indian, Mughlai, Chinese",800.0,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari,3.99
1,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,True,False,4.1,787,Banashankari,Casual Dining,"Chinese, North Indian, Thai",800.0,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari,3.97
2,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,True,False,3.8,918,Banashankari,"Cafe, Casual Dining","Cafe, Mexican, Italian",800.0,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari,3.58
3,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,False,False,3.7,88,Banashankari,Quick Bites,"South Indian, North Indian",300.0,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari,3.45
4,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,False,False,3.8,166,Basavanagudi,Casual Dining,"North Indian, Rajasthani",600.0,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari,3.58


### Text Preprocessing

Some of the common text preprocessing / cleaning steps are:

 - Lower casing
 - Removal of Punctuations
 - Removal of Stopwords
 - Removal of URLs
 - Spelling correction

In [None]:
# 5 examples of these columns before text processing:
zomato[['reviews_list', 'cuisines']].sample(5)

Unnamed: 0,reviews_list,cuisines
20769,"[('Rated 3.0', ""RATED\n Finding this place is...",Chinese
18847,"[('Rated 5.0', ""RATED\n Fantastic place but I...","North Indian, Asian, Seafood, Chinese"
8917,"[('Rated 5.0', 'RATED\n Access in decent area...",North Indian
14538,"[('Rated 1.0', ""RATED\n ridiculous management...","Biryani, Mughlai, Chinese"
29487,"[('Rated 4.0', 'RATED\n A nice place to have ...","North Indian, Chinese"


In [None]:
# Lower Casing
zomato["reviews_list"] = zomato["reviews_list"].str.lower()
zomato[['reviews_list', 'cuisines']].sample(5)

Unnamed: 0,reviews_list,cuisines
36022,"[('rated 5.0', 'rated\n one afternoon we visi...","Beverages, Fast Food"
33478,"[('rated 3.5', 'rated\n friday evening.. me n...","Desserts, Cafe, Beverages, Burger, Fast Food"
11425,"[('rated 5.0', 'rated\n food quality is excel...",Kerala
38737,"[('rated 4.0', ""rated\n tasty food reasonably...","Seafood, Mangalorean"
35389,"[('rated 5.0', ""rated\n churn !!! the name sa...","Desserts, Ice Cream"


In [None]:
# Removal of Puctuations
import string
PUNCT_TO_REMOVE = string.punctuation
def remove_punctuation(text):
    """custom function to remove the punctuation"""
    return text.translate(str.maketrans('', '', PUNCT_TO_REMOVE))

zomato["reviews_list"] = zomato["reviews_list"].apply(lambda text: remove_punctuation(text))
zomato[['reviews_list', 'cuisines']].sample(5)

Unnamed: 0,reviews_list,cuisines
9421,rated 40 ratedn fanoos since the 1950s so coo...,"Arabian, Biryani, Rolls, Kebab"
3566,rated 10 ratedn bakwaas food i ordered zomato...,"North Indian, South Indian, Chinese"
1201,rated 40 ratedn ambiance is cute very neat co...,"Biryani, Fast Food, North Indian"
29485,rated 40 ratedn tried their seafood butter ga...,"Chinese, Thai"
17165,rated 40 ratedn overall 45nnby chance i was j...,"North Indian, Biryani"


In [None]:
# Removal of Stopwords
from nltk.corpus import stopwords
STOPWORDS = set(stopwords.words('english'))
def remove_stopwords(text):
    """custom function to remove the stopwords"""
    return " ".join([word for word in str(text).split() if word not in STOPWORDS])

zomato["reviews_list"] = zomato["reviews_list"].apply(lambda text: remove_stopwords(text))

In [None]:
# Removal of URLS
def remove_urls(text):
    url_pattern = re.compile(r'https?://\S+|www\.\S+')
    return url_pattern.sub(r'', text)

zomato["reviews_list"] = zomato["reviews_list"].apply(lambda text: remove_urls(text))

In [None]:
zomato[['reviews_list', 'cuisines']].sample(5)

Unnamed: 0,reviews_list,cuisines
40306,rated 40 ratedn wonderful experience thulpnnth...,"Cafe, Burger, Italian, Salad"
34373,rated 30 ratedn oknvisually appealingndefinite...,Bakery
2851,rated 10 ratedn packed full meal meal received...,South Indian
40277,rated 40 ratedn ordered water maramari juice m...,Beverages
25823,rated 50 ratedn really fond thick shakes outle...,"Desserts, Beverages, Ice Cream"


In [None]:
# RESTAURANT NAMES:
restaurant_names = list(zomato['name'].unique())
restaurant_names

['Jalsa',
 'Spice Elephant',
 'San Churro Cafe',
 'Addhuri Udupi Bhojana',
 'Grand Village',
 'Timepass Dinner',
 'Rosewood International Hotel - Bar & Restaurant',
 'Onesta',
 'Penthouse Cafe',
 'Smacznego',
 'Cafã\x83Â\x83Ã\x82Â\x83Ã\x83Â\x82Ã\x82Â\x83Ã\x83Â\x83Ã\x82Â\x82Ã\x83Â\x82Ã\x82Â© Down The Alley',
 'Cafe Shuffle',
 'The Coffee Shack',
 'Caf-Eleven',
 'Cafe Vivacity',
 'Catch-Up-Ino',
 "Kirthi'S Biryani",
 'T3H Cafe',
 '360 Atoms Restaurant And Cafe',
 'The Vintage Cafe',
 'Woodee Pizza',
 'Cafe Coffee Day',
 'My Tea House',
 'Hide Out Cafe',
 'Cafe Nova',
 'Coffee Tindi',
 'Sea Green Cafe',
 'Cuppa',
 "Srinathji'S Cafe",
 'Redberrys',
 'Foodiction',
 'Sweet Truth',
 'Ovenstory Pizza',
 'Faasos',
 'Behrouz Biryani',
 'Fast And Fresh',
 'Szechuan Dragon',
 'Empire Restaurant',
 'Maruthi Davangere Benne Dosa',
 'Chaatimes',
 'Havyaka Mess',
 "Mcdonald'S",
 "Domino'S Pizza",
 'Hotboxit',
 'Kitchen Garden',
 'Recipe',
 'Beijing Bites',
 'Tasty Bytes',
 'Petoo',
 'Shree Cool Point'

In [None]:
def get_top_words(column, top_nu_of_words, nu_of_word):

    vec = CountVectorizer(ngram_range= nu_of_word, stop_words='english')

    bag_of_words = vec.fit_transform(column)

    sum_words = bag_of_words.sum(axis=0)

    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]

    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)

    return words_freq[:top_nu_of_words]

In [None]:
zomato.head()

Unnamed: 0,address,name,online_order,book_table,rate,votes,location,rest_type,cuisines,cost,reviews_list,menu_item,type,city,Mean Rating
0,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,True,True,4.1,775,Banashankari,Casual Dining,"North Indian, Mughlai, Chinese",800.0,rated 40 ratedn beautiful place dine inthe int...,[],Buffet,Banashankari,3.99
1,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,True,False,4.1,787,Banashankari,Casual Dining,"Chinese, North Indian, Thai",800.0,rated 40 ratedn dinner family turned good choo...,[],Buffet,Banashankari,3.97
2,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,True,False,3.8,918,Banashankari,"Cafe, Casual Dining","Cafe, Mexican, Italian",800.0,rated 30 ratedn ambience good enough pocket fr...,[],Buffet,Banashankari,3.58
3,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,False,False,3.7,88,Banashankari,Quick Bites,"South Indian, North Indian",300.0,rated 40 ratedn great food proper karnataka st...,[],Buffet,Banashankari,3.45
4,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,False,False,3.8,166,Basavanagudi,Casual Dining,"North Indian, Rajasthani",600.0,rated 40 ratedn good restaurant neighbourhood ...,[],Buffet,Banashankari,3.58


In [None]:
zomato.sample(5)

Unnamed: 0,address,name,online_order,book_table,rate,votes,location,rest_type,cuisines,cost,reviews_list,menu_item,type,city,Mean Rating
24726,"1, SJR Primus, Adjacent Raheja Arcade, Koraman...",Punjab Bistro,True,True,4.4,459,Koramangala 7th Block,"Casual Dining, Bar",North Indian,1.5,rated 40 ratedn first time bfc get together fo...,[],Dine-out,Koramangala 5th Block,4.16
36796,"2010, 2nd floor, 100 Feet Road, HAL Second Sta...",Head O State,False,True,3.8,200,Indiranagar,"Casual Dining, Bar","Continental, Chinese, North Indian",1.0,rated 40 ratedn monday night walked head state...,[],Pubs and bars,Old Airport Road,3.58
5658,"Mahadevapura Outer Ring Road, Doddanakundi, Ma...",Ovenstory Pizza,True,False,4.1,74,Marathahalli,Delivery,Pizza,750.0,rated 50 ratedn overstory known delicious pizz...,"['Farmfresh Supreme', 'Middle Eastern Supreme'...",Delivery,Brookefield,3.78
16155,"4, 10th Cross, Byrasandra, Jayanagar, Bangalore",Trendz Corner,False,False,3.6,31,Jayanagar,Quick Bites,"South Indian, North Indian",200.0,rated 50 ratedn best biriyani forever trendz c...,[],Delivery,Jayanagar,3.32
2935,"28/29, 9th Main Road, 3rd Block, Jayanagar, Ba...",Wahab,True,False,3.7,97,Jayanagar,Quick Bites,"North Indian, Mughlai, Chinese",350.0,rated 30 ratedn ordered chicken kababit overpr...,[],Dine-out,Basavanagudi,3.45


In [None]:
zomato.shape

(41237, 15)

In [None]:
zomato.columns

Index(['address', 'name', 'online_order', 'book_table', 'rate', 'votes',
       'location', 'rest_type', 'cuisines', 'cost', 'reviews_list',
       'menu_item', 'type', 'city', 'Mean Rating'],
      dtype='object')

In [None]:
zomato = zomato.drop(['address','rest_type', 'type', 'menu_item', 'votes'],axis=1)

In [None]:
# Randomly sample 60% of your dataframe
df_percent = zomato.sample(frac=0.5)

In [None]:
df_percent.shape

(20618, 10)

### Term Frequency-Inverse Document Frequency
Term Frequency-Inverse Document Frequency (TF-IDF) vectors for each document. This will give you a matrix where each column represents a word in the overview vocabulary (all the words that appear in at least one document) and each column represents a restaurant, as before.

TF-IDF is the statistical method of evaluating the significance of a word in a given document.

TF — Term frequency(tf) refers to how many times a given term appears in a document.

IDF — Inverse document frequency(idf) measures the weight of the word in the document, i.e if the word is common or rare in the entire document.
The TF-IDF intuition follows that the terms that appear frequently in a document are less important than terms that rarely appear.
Fortunately, scikit-learn gives you a built-in TfIdfVectorizer class that produces the TF-IDF matrix quite easily.

In [None]:
df_percent.set_index('name', inplace=True)

In [None]:
indices = pd.Series(df_percent.index)

In [None]:
# Creating tf-idf matrix
tfidf = TfidfVectorizer(analyzer='word', ngram_range=(1, 2), min_df=1, stop_words='english')
tfidf_matrix = tfidf.fit_transform(df_percent['reviews_list'])

In [None]:
cosine_similarities = linear_kernel(tfidf_matrix, tfidf_matrix)

In [None]:
def recommend(name, cosine_similarities=cosine_similarities):
    # Create a list to store top restaurants
    recommend_restaurant = []

    # Find the index of the restaurant entered
    idx = indices[indices == name].index[0]

    # Find restaurants with a similar cosine similarity value and order them
    score_series = pd.Series(cosine_similarities[idx]).sort_values(ascending=False)

    # Extract top 30 restaurant indexes with a similar cosine similarity value
    top30_indexes = list(score_series.iloc[1:31].index)  # Exclude the first index as it will be the restaurant itself

    # Names of the top 30 restaurants
    for each in top30_indexes:
        recommend_restaurant.append(list(df_percent.index)[each])

    # Creating the new DataFrame to show similar restaurants
    df_new = pd.DataFrame(columns=['cuisines', 'Mean Rating', 'cost'])

    # Create the top 30 similar restaurants with selected columns
    for each in recommend_restaurant:
        temp_df = df_percent[['cuisines', 'Mean Rating', 'cost']][df_percent.index == each]
        df_new = pd.concat([df_new, temp_df.sample()])  # Use pd.concat instead of append

    # Drop duplicate entries and sort by the highest rating
    df_new = df_new.drop_duplicates(subset=['cuisines', 'Mean Rating', 'cost'], keep=False)
    df_new = df_new.sort_values(by='Mean Rating', ascending=False).head(10)

    print('TOP %s RESTAURANTS LIKE %s WITH SIMILAR REVIEWS: ' % (str(len(df_new)), name))

    return df_new

In [None]:
# HERE IS A RANDOM RESTAURANT. LET'S SEE THE DETAILS ABOUT THIS RESTAURANT:
df_percent[df_percent.index == 'Madeena Hotel'].head()

Unnamed: 0_level_0,online_order,book_table,rate,location,cuisines,cost,reviews_list,city,Mean Rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Madeena Hotel,True,False,3.9,HSR,"North Indian, Mughlai, Biryani",400.0,rated 30 ratedn must try place hardcore non ve...,Koramangala 5th Block,3.75
Madeena Hotel,True,False,4.0,Koramangala 5th Block,"North Indian, Mughlai, Biryani",400.0,rated 40 ratedn must try place hardcore non ve...,Koramangala 5th Block,3.75
Madeena Hotel,True,False,3.9,HSR,"North Indian, Mughlai, Biryani",400.0,rated 50 ratedn fast easy access rated 30 rate...,HSR,3.75
Madeena Hotel,True,False,3.9,Bannerghatta Road,"North Indian, Mughlai, Biryani",400.0,rated 40 ratedn anybody want taste muslim sout...,Bannerghatta Road,3.75
Madeena Hotel,True,False,3.9,Bannerghatta Road,"North Indian, Mughlai, Biryani",400.0,rated 40 ratedn anybody want taste muslim sout...,Basavanagudi,3.75


In [None]:
recommend('Madeena Hotel')

TOP 8 RESTAURANTS LIKE Madeena Hotel WITH SIMILAR REVIEWS: 


Unnamed: 0,cuisines,Mean Rating,cost
Hotel Tom'S Restaurant,"Mangalorean, Seafood, Chinese, North Indian",4.15,1.0
Parrattha Ssinghh,North Indian,4.01,250.0
Altaf'S Chillies Restaurant,"North Indian, Chinese",3.61,500.0
Kollapuri'S,Maharashtrian,3.52,600.0
Paratha Plaza,North Indian,3.41,200.0
Beijing Bites,"Chinese, Thai",3.36,850.0
Beijing Bites,"Chinese, Thai",3.36,600.0
Tandoor Garden,"North Indian, Chinese, Kebab",3.32,350.0


### Collaborative Model

In [None]:
zomato.head(2)

Unnamed: 0,name,online_order,book_table,rate,location,cuisines,cost,reviews_list,city,Mean Rating
0,Jalsa,True,True,4.1,Banashankari,"North Indian, Mughlai, Chinese",800.0,rated 40 ratedn beautiful place dine inthe int...,Banashankari,3.99
1,Spice Elephant,True,False,4.1,Banashankari,"Chinese, North Indian, Thai",800.0,rated 40 ratedn dinner family turned good choo...,Banashankari,3.97


In [None]:
!pip install implicit

Collecting implicit
  Downloading implicit-0.7.2-cp310-cp310-manylinux2014_x86_64.whl.metadata (6.1 kB)
Downloading implicit-0.7.2-cp310-cp310-manylinux2014_x86_64.whl (8.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.9/8.9 MB[0m [31m39.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: implicit
Successfully installed implicit-0.7.2


In [None]:
import pandas as pd
from scipy.sparse import csr_matrix
from implicit.als import AlternatingLeastSquares

In [None]:
# Create a new column 'numeric_id' and use it as the identifier
zomato['numeric_id'] = pd.factorize(zomato['name'])[0]

# Create a sparse matrix using the numeric_id column
sparse_matrix = pd.pivot_table(zomato, values='Mean Rating', index='numeric_id', columns='name', fill_value=0)

# Convert the dataframe to a CSR sparse matrix
sparse_matrix_csr = csr_matrix(sparse_matrix.values)

# Initialize and train the collaborative filtering model
model = AlternatingLeastSquares(factors=50, regularization=0.01, iterations=50)
model.fit(sparse_matrix_csr)

# Use the head of the zomato dataframe as the test set
test_set = zomato.head()[['numeric_id', 'Mean Rating']]

# Create a list to store recommended items
recommended_items = []

# For each user in the test set, recommend items
for user_id in test_set['numeric_id'].unique():
    similar_items = model.similar_items(user_id, N=10)
    recommended_items.extend([item[0] for item in similar_items])

# Convert the recommended items IDs to unique values
recommended_items = list(set(recommended_items))

# Optionally, map back to original item names
recommended_item_names = zomato.loc[zomato['numeric_id'].isin(recommended_items), 'name'].unique()

  0%|          | 0/50 [00:00<?, ?it/s]

In [None]:
# print the recommenede restaurants
print("Top 10 recommended Restaurants:")
print(zomato.loc[zomato['numeric_id'].isin(recommended_items)][['name','Mean Rating']])

Top 10 recommended Restaurants:
                        name  Mean Rating
0                      Jalsa         3.99
1             Spice Elephant         3.97
2            San Churro Cafe         3.58
3      Addhuri Udupi Bhojana         3.45
4              Grand Village         3.58
14           San Churro Cafe         3.58
256           Spice Elephant         3.97
400                    Jalsa         3.99
414          San Churro Cafe         3.58
441          San Churro Cafe         3.58
485                    Jalsa         3.99
490           Spice Elephant         3.97
502          San Churro Cafe         3.58
573    Addhuri Udupi Bhojana         3.45
648            Grand Village         3.58
1936           Grand Village         3.58
1942                   Jalsa         3.99
1944         San Churro Cafe         3.58
1973         San Churro Cafe         3.58
2158         San Churro Cafe         3.58
2367          Spice Elephant         3.97
2385                   Jalsa         3.99
25

### Enhancing Recommendations with GPT-4

This initiative focuses on integrating GPT-4 into our existing recommender system to elevate recommendation quality. By harnessing GPT-4's advanced natural language processing capabilities, we can better interpret user preferences, analyze contextual information, and generate more personalized suggestions. The goal is to create a dynamic recommendation engine that not only relies on historical data but also understands user intent and context, ultimately enhancing user engagement and satisfaction. This collaboration aims to transform the way users discover relevant content and products, leading to improved outcomes and experiences.

In [None]:
!pip install openai==0.28

Collecting openai==0.28
  Downloading openai-0.28.0-py3-none-any.whl.metadata (13 kB)
Downloading openai-0.28.0-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: openai
Successfully installed openai-0.28.0


In [None]:
from google.colab import userdata
import pandas as pd
import openai

In [None]:
# Access secret OpenAI key
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
openai.api_key = OPENAI_API_KEY

In [None]:
# LLM model
def generate_description(restaurant_name):
    prompt = f"Describe {restaurant_name} in a concise manner."
    response = openai.ChatCompletion.create(
        model="gpt-4-turbo",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=30,
        temperature=0.7,
    )
    return response.choices[0].message['content'].strip(), response.usage

# Function to generate a food menu description
def generate_menu_description(restaurant_name):
    prompt = f"Generate a brief food menu for {restaurant_name}."
    response = openai.ChatCompletion.create(
        model="gpt-4-turbo",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=30,
        temperature=0.7,
    )
    return response.choices[0].message['content'].strip(), response.usage

In [None]:
# Function to recommend similar restaurants
def recommend(name, cosine_similarities, indices, df_percent):
    recommend_restaurant = []
    total_prompt_tokens = 0
    total_completion_tokens = 0

    idx = indices[indices == name].index[0]  # Find the index of the restaurant entered

    # Find similar restaurants based on cosine similarity
    scores_series = pd.Series(cosine_similarities[idx]).sort_values(ascending=False)
    top_indexes = scores_series.iloc[1:10].index  # Limit to top 3 similar restaurants

    for each in top_indexes:
        recommend_restaurant.append(df_percent.index[each])

    # Creating a new DataFrame to show similar restaurants
    df_new = pd.DataFrame(columns=['Restaurant Name', 'Cuisines', 'Mean Rating', 'Cost', 'Description', 'Food Menu'])

    for each in recommend_restaurant:
        description, description_usage = generate_description(each)
        menu_description, menu_usage = generate_menu_description(each)

        total_prompt_tokens += description_usage['prompt_tokens'] + menu_usage['prompt_tokens']
        total_completion_tokens += description_usage['completion_tokens'] + menu_usage['completion_tokens']

        # Create a temporary DataFrame for the current restaurant
        temp_df = df_percent[['cuisines', 'Mean Rating', 'cost']].loc[[each]].copy()
        temp_df['Description'] = description
        temp_df['Food Menu'] = menu_description
        temp_df['Restaurant Name'] = each  # Add restaurant name

        # Concatenate the temporary DataFrame to the main DataFrame
        df_new = pd.concat([df_new, temp_df], ignore_index=True)

    # Drop duplicates and keep top 3 by highest Mean Rating
    df_new = df_new.drop_duplicates(subset=['Cuisines'])
    df_new = df_new.sort_values(by='Mean Rating', ascending=False).head(3)  # Limit to top 3

    # Clean and format output
    print('TOP 3 RESTAURANTS LIKE & WITH SIMILAR REVIEWS:')
    for index, row in df_new.iterrows():
        print(f"\nRestaurant Name: {row['Restaurant Name']}")
        print(f"Cuisines: {row['Cuisines']}")
        print(f"Mean Rating: {row['Mean Rating']}")
        print(f"Cost: {row['Cost']}")
        print(f"Description: {row['Description']}")
        print(f"Menu Description: {row['Food Menu']}")

    # Calculate total tokens and cost
    total_tokens = total_prompt_tokens + total_completion_tokens
    total_cost = total_tokens * 0.00001  # Cost per token ($10.00 / 1M tokens)

    print(f"\nTokens Used: {total_tokens}")
    print(f"Prompt Tokens: {total_prompt_tokens}")
    print(f"Completion Tokens: {total_completion_tokens}")
    print(f"Total Cost (USD): ${total_cost:.10f}")

    return df_new

In [None]:
# Example call to the recommend function (assuming cosine_similarities, indices, and df_percent are defined)
recommendations = recommend('Madeena Hotel', cosine_similarities, indices, df_percent)

TOP 3 RESTAURANTS LIKE & WITH SIMILAR REVIEWS:

Restaurant Name: Madeena Hotel
Cuisines: nan
Mean Rating: 3.75
Cost: nan
Description: Madeena Hotel is a hospitality establishment known for offering comfortable accommodations, essential amenities, and attentive service. Typically, it caters to both business and leisure
Menu Description: **Madeena Hotel - Dining Menu**

**Starters**
- Hummus with Pita Bread
- Falafel Bites with Tahini Sauce

Tokens Used: 847
Prompt Tokens: 307
Completion Tokens: 540
Total Cost (USD): $0.0084700000


#### References
 - [How to build a Restaurant Recommendation Engine](https://medium.com/analytics-vidhya/how-to-build-a-restaurant-recommendation-engine-part-1-21aadb5dac6e)