Lala Yamazaki, Hailey Wilson, Kirstin Tretter

## Business Objective
 - Increase customer satisfaction and service quality by enhancing the dining experience based on customer feedback. 
 - What do customers appreciate most about a restaurant and what are areas where improvements are needed? 
 - Identify common positive and negative themes to understand the emotions and opinions of customers.
 - 

## Technical Objective
 - Analyze the data to understand trends in the reviews using Natural Language Processing:
 - Use NLTK and techniques to analyze the reviews
 - Bag of words
 - Discovering Common words
 - Tokenization, Lemmatizing, and Stopwords
 - Tagging text and n-grams

## Section 1: Import Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline 
import seaborn as sns

pd.set_option('display.max_columns',500)
#allows for up to 500 columns to be displayed when viewing a dataframe
pd.set_option('display.max_rows', 500)
# up to 500 rows

import warnings
warnings.simplefilter("ignore")   
#To hide deprecation warnings 

from IPython.core.display import display, HTML
# New trick to widen the scree
display(HTML("<style>.container { width:95% !important; }</style>"))
#Widens the code landscape

from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()

## Section 2: Import Data

In [2]:
df_restaurants = pd.read_csv("data/Restaurants.csv", index_col = None, header = 0)
df_restaurants.info()
df_restaurants.head()
# import data from a csv file

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63944 entries, 0 to 63943
Data columns (total 15 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Unnamed: 0    63944 non-null  int64  
 1   business_id   63944 non-null  object 
 2   name          63944 non-null  object 
 3   address       63474 non-null  object 
 4   city          63944 non-null  object 
 5   state         63944 non-null  object 
 6   postal_code   63850 non-null  object 
 7   latitude      63944 non-null  float64
 8   longitude     63944 non-null  float64
 9   stars         63944 non-null  float64
 10  review_count  63944 non-null  int64  
 11  is_open       63944 non-null  int64  
 12  attributes    62310 non-null  object 
 13  categories    63944 non-null  object 
 14  hours         50742 non-null  object 
dtypes: float64(3), int64(3), object(9)
memory usage: 7.3+ MB


Unnamed: 0.1,Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
0,8,pQeaRpvuhoEqudo3uymHIQ,The Empanadas House,404 E Green St,Champaign,IL,61820,40.110446,-88.233073,4.5,5,1,"{'RestaurantsAttire': ""u'casual'"", 'Restaurant...","Ethnic Food, Food Trucks, Specialty Food, Impo...","{'Monday': '11:30-14:30', 'Tuesday': '11:30-14..."
1,20,CsLQLiRoafpJPJSkNX2h5Q,Middle East Deli,4508 E Independence Blvd,Charlotte,NC,28205,35.194894,-80.767442,3.0,5,0,"{'RestaurantsGoodForGroups': 'True', 'OutdoorS...","Food, Restaurants, Grocery, Middle Eastern",
2,24,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,"15480 Bayview Avenue, unit D0110",Aurora,ON,L4G 7J1,44.010962,-79.448677,4.5,4,1,"{'RestaurantsTableService': 'False', 'Restaura...","Restaurants, Cheesesteaks, Poutineries","{'Monday': '11:0-22:0', 'Tuesday': '11:0-22:0'..."
3,25,lu7vtrp_bE9PnxWfA8g4Pg,Banzai Sushi,300 John Street,Thornhill,ON,L3T 5W4,43.820492,-79.398466,4.5,7,1,"{'GoodForKids': 'True', 'RestaurantsTakeOut': ...","Japanese, Fast Food, Food Court, Restaurants",
4,30,9sRGfSVEfLhN_km60YruTA,Apadana Restaurant,13071 Yonge Street,Richmond Hill,ON,L4E 1A5,43.947011,-79.454862,3.0,3,1,"{'Ambience': ""{'touristy': False, 'hipster': F...","Persian/Iranian, Turkish, Middle Eastern, Rest...","{'Tuesday': '12:0-21:0', 'Wednesday': '12:0-21..."


In [3]:
df_users = pd.read_csv("data/Full_User.csv", index_col = None, header = 0)
df_users.info()
df_users.head()
# import data from a csv file

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1399953 entries, 0 to 1399952
Data columns (total 22 columns):
 #   Column              Non-Null Count    Dtype  
---  ------              --------------    -----  
 0   Unnamed: 0          1399953 non-null  int64  
 1   user_id             1399953 non-null  object 
 2   name                1399926 non-null  object 
 3   review_count        1399953 non-null  int64  
 4   yelping_since       1399953 non-null  object 
 5   useful              1399953 non-null  int64  
 6   funny               1399953 non-null  int64  
 7   cool                1399953 non-null  int64  
 8   elite               70213 non-null    object 
 9   fans                1399953 non-null  int64  
 10  average_stars       1399953 non-null  float64
 11  compliment_hot      1399953 non-null  int64  
 12  compliment_more     1399953 non-null  int64  
 13  compliment_profile  1399953 non-null  int64  
 14  compliment_cute     1399953 non-null  int64  
 15  compliment_list

Unnamed: 0.1,Unnamed: 0,user_id,name,review_count,yelping_since,useful,funny,cool,elite,fans,average_stars,compliment_hot,compliment_more,compliment_profile,compliment_cute,compliment_list,compliment_note,compliment_plain,compliment_cool,compliment_funny,compliment_writer,compliment_photos
0,0,---1lKK3aKOuomHnwAkAow,Monera,263,2007-06-04 01:37:45,500,180,201,201020112012.0,17,3.93,2,3,2,1,0,5,9,9,9,9,0
1,1,---94vtJ_5o_nikEs6hUjg,Joe,5,2016-05-27 04:50:39,3,0,1,,0,5.0,0,0,0,0,0,0,0,0,0,0,0
2,2,---PLwSf5gKdIoVnyRHgBA,Rae,3,2015-07-31 00:53:27,0,0,0,,0,4.33,0,0,0,0,0,0,0,0,0,0,0
3,3,---RfKzBwQ8t3wu-LXvx3w,Jason,1,2015-11-23 14:19:04,0,0,0,,0,5.0,0,0,0,0,0,0,0,0,0,0,0
4,4,---cu1hq55BP9DWVXXKHZg,Jack,66,2009-04-18 23:10:01,134,54,24,,0,3.7,0,0,0,0,0,3,2,0,0,0,0


In [4]:
df_reviews = pd.read_csv("data/Ontario_Reviews.csv", index_col = None, header = 0)
df_reviews.info()
df_reviews.head()
# import data from a csv file

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 668875 entries, 0 to 668874
Data columns (total 11 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   Unnamed: 0   668875 non-null  int64  
 1   business_id  668875 non-null  object 
 2   name         668875 non-null  object 
 3   review_id    668875 non-null  object 
 4   user_id      668875 non-null  object 
 5   stars        668875 non-null  float64
 6   useful       668875 non-null  float64
 7   funny        668875 non-null  float64
 8   cool         668875 non-null  float64
 9   text         668875 non-null  object 
 10  date         668875 non-null  object 
dtypes: float64(4), int64(1), object(6)
memory usage: 56.1+ MB


Unnamed: 0.1,Unnamed: 0,business_id,name,review_id,user_id,stars,useful,funny,cool,text,date
0,0,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,gwU0Brt7bYIgFJeBhrmmmA,coSXdeklwuZWjjS0n38Tcg,5.0,0.0,0.0,0.0,The cheese steak is great here. The guys cooki...,2019-01-26 14:59:53
1,1,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,wt8t7boUicBVutypSIFBfg,pCYmjT_-KrBvFfiazMlaLQ,4.0,0.0,0.0,0.0,I've been here 4 times now and the Philly chee...,2019-11-16 17:02:55
2,2,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,tZ5DuKIfUuuhxGYV7ywhvw,8U9jNGWvX1kZR-So3KNPiA,5.0,0.0,0.0,0.0,Excellent Excellent customer service!!\nWill d...,2019-10-11 19:59:57
3,3,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,rqa4ZB6yz-68G5GEXuZJyA,xo8HykGB7Ekm_QKrMRg3Zw,4.0,0.0,0.0,0.0,Came here on a Friday night and only 1 table w...,2019-07-26 13:24:49
4,4,lu7vtrp_bE9PnxWfA8g4Pg,Banzai Sushi,Q7iupWCt3UpRQSMKp4zO9A,dSTRQSeCqMTbs7l8KF_xJg,4.0,1.0,0.0,0.0,Been coming here since I was in grade 9 so abo...,2015-04-16 05:23:15


## Section 3: Data preparation

### Section 3a: Data cleaning for restaurants

In [5]:
df_restaurants = df_restaurants.drop(['Unnamed: 0'], axis=1)
# drop Unnamed column

In [6]:
df_restaurants['address'] = df_restaurants['address'].fillna('Unknown')
df_restaurants['postal_code'] = df_restaurants['postal_code'].fillna('Unknown')
df_restaurants['attributes'] = df_restaurants['attributes'].fillna('{}')
df_restaurants['hours'] = df_restaurants['hours'].fillna('{}')
# fill missing values with unknown

In [7]:
df_restaurants['hours'] = df_restaurants['hours'].apply(eval)
# convert hours column to a dictionary

In [8]:
df_restaurants = df_restaurants.rename(columns = {'stars':'restaurant_stars'})
df_restaurants.info()
# rename name to restaurant_stars before merging

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63944 entries, 0 to 63943
Data columns (total 14 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   business_id       63944 non-null  object 
 1   name              63944 non-null  object 
 2   address           63944 non-null  object 
 3   city              63944 non-null  object 
 4   state             63944 non-null  object 
 5   postal_code       63944 non-null  object 
 6   latitude          63944 non-null  float64
 7   longitude         63944 non-null  float64
 8   restaurant_stars  63944 non-null  float64
 9   review_count      63944 non-null  int64  
 10  is_open           63944 non-null  int64  
 11  attributes        63944 non-null  object 
 12  categories        63944 non-null  object 
 13  hours             63944 non-null  object 
dtypes: float64(3), int64(2), object(9)
memory usage: 6.8+ MB


In [9]:
df_restaurants = df_restaurants.rename(columns = {'review_count':'rest_review_count'})
df_restaurants.info()
# rename review count to rest review count before merging

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63944 entries, 0 to 63943
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   business_id        63944 non-null  object 
 1   name               63944 non-null  object 
 2   address            63944 non-null  object 
 3   city               63944 non-null  object 
 4   state              63944 non-null  object 
 5   postal_code        63944 non-null  object 
 6   latitude           63944 non-null  float64
 7   longitude          63944 non-null  float64
 8   restaurant_stars   63944 non-null  float64
 9   rest_review_count  63944 non-null  int64  
 10  is_open            63944 non-null  int64  
 11  attributes         63944 non-null  object 
 12  categories         63944 non-null  object 
 13  hours              63944 non-null  object 
dtypes: float64(3), int64(2), object(9)
memory usage: 6.8+ MB


In [10]:
df_restaurants.head()
# check data

Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,restaurant_stars,rest_review_count,is_open,attributes,categories,hours
0,pQeaRpvuhoEqudo3uymHIQ,The Empanadas House,404 E Green St,Champaign,IL,61820,40.110446,-88.233073,4.5,5,1,"{'RestaurantsAttire': ""u'casual'"", 'Restaurant...","Ethnic Food, Food Trucks, Specialty Food, Impo...","{'Monday': '11:30-14:30', 'Tuesday': '11:30-14..."
1,CsLQLiRoafpJPJSkNX2h5Q,Middle East Deli,4508 E Independence Blvd,Charlotte,NC,28205,35.194894,-80.767442,3.0,5,0,"{'RestaurantsGoodForGroups': 'True', 'OutdoorS...","Food, Restaurants, Grocery, Middle Eastern",{}
2,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,"15480 Bayview Avenue, unit D0110",Aurora,ON,L4G 7J1,44.010962,-79.448677,4.5,4,1,"{'RestaurantsTableService': 'False', 'Restaura...","Restaurants, Cheesesteaks, Poutineries","{'Monday': '11:0-22:0', 'Tuesday': '11:0-22:0'..."
3,lu7vtrp_bE9PnxWfA8g4Pg,Banzai Sushi,300 John Street,Thornhill,ON,L3T 5W4,43.820492,-79.398466,4.5,7,1,"{'GoodForKids': 'True', 'RestaurantsTakeOut': ...","Japanese, Fast Food, Food Court, Restaurants",{}
4,9sRGfSVEfLhN_km60YruTA,Apadana Restaurant,13071 Yonge Street,Richmond Hill,ON,L4E 1A5,43.947011,-79.454862,3.0,3,1,"{'Ambience': ""{'touristy': False, 'hipster': F...","Persian/Iranian, Turkish, Middle Eastern, Rest...","{'Tuesday': '12:0-21:0', 'Wednesday': '12:0-21..."


### Section 3b: Data cleaning for users

In [11]:
df_users = df_users.drop(['Unnamed: 0'], axis=1)
# drop unnecessary columns

In [12]:
df_users['name'] = df_users['name'].fillna('Unknown')
df_users['elite'] = df_users['elite'].fillna(0)
# handle missing values

In [13]:
df_users['yelping_since'] = pd.to_datetime(df_users['yelping_since'])
# convert 'yelping_since' column to datetime

In [14]:
df_users = df_users.rename(columns = {'name':'user_name'})
df_users
# rename name to user_name before merging

Unnamed: 0,user_id,user_name,review_count,yelping_since,useful,funny,cool,elite,fans,average_stars,compliment_hot,compliment_more,compliment_profile,compliment_cute,compliment_list,compliment_note,compliment_plain,compliment_cool,compliment_funny,compliment_writer,compliment_photos
0,---1lKK3aKOuomHnwAkAow,Monera,263,2007-06-04 01:37:45,500,180,201,201020112012,17,3.93,2,3,2,1,0,5,9,9,9,9,0
1,---94vtJ_5o_nikEs6hUjg,Joe,5,2016-05-27 04:50:39,3,0,1,0,0,5.00,0,0,0,0,0,0,0,0,0,0,0
2,---PLwSf5gKdIoVnyRHgBA,Rae,3,2015-07-31 00:53:27,0,0,0,0,0,4.33,0,0,0,0,0,0,0,0,0,0,0
3,---RfKzBwQ8t3wu-LXvx3w,Jason,1,2015-11-23 14:19:04,0,0,0,0,0,5.00,0,0,0,0,0,0,0,0,0,0,0
4,---cu1hq55BP9DWVXXKHZg,Jack,66,2009-04-18 23:10:01,134,54,24,0,0,3.70,0,0,0,0,0,3,2,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1399948,zzzDGuK7upfx0W31A4gfCA,Leticia,4,2017-02-01 20:45:31,4,0,3,0,0,4.00,0,0,0,0,0,0,0,0,0,0,0
1399949,zzzPVqSxSvjzlLR3Q7wsUw,Alex,12,2010-05-01 08:12:02,4,1,3,0,0,4.17,0,0,0,0,0,0,0,1,1,0,0
1399950,zzzTkKLFo9CaeZnfO4TvzA,Alfonso,13,2014-02-25 21:03:55,13,18,8,0,0,2.31,0,0,0,0,0,1,0,0,0,0,0
1399951,zzzmshdEWLFCApxETl1TGQ,Dave,3,2012-10-08 22:47:56,0,0,0,0,0,5.00,0,0,0,0,0,0,0,0,0,0,0


In [15]:
df_users = df_users.rename(columns = {'review_count':'user_review_count'})
df_users
# rename name to user_review before merging

Unnamed: 0,user_id,user_name,user_review_count,yelping_since,useful,funny,cool,elite,fans,average_stars,compliment_hot,compliment_more,compliment_profile,compliment_cute,compliment_list,compliment_note,compliment_plain,compliment_cool,compliment_funny,compliment_writer,compliment_photos
0,---1lKK3aKOuomHnwAkAow,Monera,263,2007-06-04 01:37:45,500,180,201,201020112012,17,3.93,2,3,2,1,0,5,9,9,9,9,0
1,---94vtJ_5o_nikEs6hUjg,Joe,5,2016-05-27 04:50:39,3,0,1,0,0,5.00,0,0,0,0,0,0,0,0,0,0,0
2,---PLwSf5gKdIoVnyRHgBA,Rae,3,2015-07-31 00:53:27,0,0,0,0,0,4.33,0,0,0,0,0,0,0,0,0,0,0
3,---RfKzBwQ8t3wu-LXvx3w,Jason,1,2015-11-23 14:19:04,0,0,0,0,0,5.00,0,0,0,0,0,0,0,0,0,0,0
4,---cu1hq55BP9DWVXXKHZg,Jack,66,2009-04-18 23:10:01,134,54,24,0,0,3.70,0,0,0,0,0,3,2,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1399948,zzzDGuK7upfx0W31A4gfCA,Leticia,4,2017-02-01 20:45:31,4,0,3,0,0,4.00,0,0,0,0,0,0,0,0,0,0,0
1399949,zzzPVqSxSvjzlLR3Q7wsUw,Alex,12,2010-05-01 08:12:02,4,1,3,0,0,4.17,0,0,0,0,0,0,0,1,1,0,0
1399950,zzzTkKLFo9CaeZnfO4TvzA,Alfonso,13,2014-02-25 21:03:55,13,18,8,0,0,2.31,0,0,0,0,0,1,0,0,0,0,0
1399951,zzzmshdEWLFCApxETl1TGQ,Dave,3,2012-10-08 22:47:56,0,0,0,0,0,5.00,0,0,0,0,0,0,0,0,0,0,0


In [16]:
df_users.info()
df_users.head()
# check the data

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1399953 entries, 0 to 1399952
Data columns (total 21 columns):
 #   Column              Non-Null Count    Dtype         
---  ------              --------------    -----         
 0   user_id             1399953 non-null  object        
 1   user_name           1399953 non-null  object        
 2   user_review_count   1399953 non-null  int64         
 3   yelping_since       1399953 non-null  datetime64[ns]
 4   useful              1399953 non-null  int64         
 5   funny               1399953 non-null  int64         
 6   cool                1399953 non-null  int64         
 7   elite               1399953 non-null  object        
 8   fans                1399953 non-null  int64         
 9   average_stars       1399953 non-null  float64       
 10  compliment_hot      1399953 non-null  int64         
 11  compliment_more     1399953 non-null  int64         
 12  compliment_profile  1399953 non-null  int64         
 13  compliment_c

Unnamed: 0,user_id,user_name,user_review_count,yelping_since,useful,funny,cool,elite,fans,average_stars,compliment_hot,compliment_more,compliment_profile,compliment_cute,compliment_list,compliment_note,compliment_plain,compliment_cool,compliment_funny,compliment_writer,compliment_photos
0,---1lKK3aKOuomHnwAkAow,Monera,263,2007-06-04 01:37:45,500,180,201,201020112012,17,3.93,2,3,2,1,0,5,9,9,9,9,0
1,---94vtJ_5o_nikEs6hUjg,Joe,5,2016-05-27 04:50:39,3,0,1,0,0,5.0,0,0,0,0,0,0,0,0,0,0,0
2,---PLwSf5gKdIoVnyRHgBA,Rae,3,2015-07-31 00:53:27,0,0,0,0,0,4.33,0,0,0,0,0,0,0,0,0,0,0
3,---RfKzBwQ8t3wu-LXvx3w,Jason,1,2015-11-23 14:19:04,0,0,0,0,0,5.0,0,0,0,0,0,0,0,0,0,0,0
4,---cu1hq55BP9DWVXXKHZg,Jack,66,2009-04-18 23:10:01,134,54,24,0,0,3.7,0,0,0,0,0,3,2,0,0,0,0


### Section 3c: Data cleaning for reviews

In [17]:
df_reviews = df_reviews.drop(['Unnamed: 0'], axis=1)
# drop unnecessary columns

In [18]:
df_reviews['date'] = pd.to_datetime(df_reviews['date'])
# convert date column to datetime

In [19]:
df_reviews = df_reviews.rename(columns = {'stars':'review_stars'})
df_reviews
# rename to user review count before merging

Unnamed: 0,business_id,name,review_id,user_id,review_stars,useful,funny,cool,text,date
0,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,gwU0Brt7bYIgFJeBhrmmmA,coSXdeklwuZWjjS0n38Tcg,5.0,0.0,0.0,0.0,The cheese steak is great here. The guys cooki...,2019-01-26 14:59:53
1,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,wt8t7boUicBVutypSIFBfg,pCYmjT_-KrBvFfiazMlaLQ,4.0,0.0,0.0,0.0,I've been here 4 times now and the Philly chee...,2019-11-16 17:02:55
2,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,tZ5DuKIfUuuhxGYV7ywhvw,8U9jNGWvX1kZR-So3KNPiA,5.0,0.0,0.0,0.0,Excellent Excellent customer service!!\nWill d...,2019-10-11 19:59:57
3,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,rqa4ZB6yz-68G5GEXuZJyA,xo8HykGB7Ekm_QKrMRg3Zw,4.0,0.0,0.0,0.0,Came here on a Friday night and only 1 table w...,2019-07-26 13:24:49
4,lu7vtrp_bE9PnxWfA8g4Pg,Banzai Sushi,Q7iupWCt3UpRQSMKp4zO9A,dSTRQSeCqMTbs7l8KF_xJg,4.0,1.0,0.0,0.0,Been coming here since I was in grade 9 so abo...,2015-04-16 05:23:15
...,...,...,...,...,...,...,...,...,...,...
668870,9Q0fPWAjUweoFDk0kafuzQ,Nishi Sushi,UflF294ggjTR7OXJZyX14Q,tRZAC_H5RHrjvyvtufcNXQ,4.0,0.0,0.0,0.0,Went: 7:30 pm Thu May 21 2015 (6th couple date...,2015-07-08 22:49:17
668871,9Q0fPWAjUweoFDk0kafuzQ,Nishi Sushi,olpGCrvMKZjsFpYPwvwLbQ,2CALR5iCk-ZkyFcKJ27DUA,4.0,2.0,1.0,1.0,This is one of our regular spots to get take o...,2014-10-24 00:41:31
668872,9Q0fPWAjUweoFDk0kafuzQ,Nishi Sushi,PfdVcAzoTaJ-AeGSQXCPzQ,i7mBC7m7k2FLrdVUx0UqUg,3.0,3.0,2.0,3.0,"For food I would give them a 3, for service I ...",2014-07-22 17:58:25
668873,9Q0fPWAjUweoFDk0kafuzQ,Nishi Sushi,_NDUdjPh1llvsyUELmEPSg,2uiwzYXk8xVcv6U_ds5f0Q,5.0,0.0,0.0,0.0,"Food was fresh, tasted great and was well pres...",2012-10-28 16:11:39


In [20]:
df_reviews.info()
df_reviews.head()
# Check the dataframe

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 668875 entries, 0 to 668874
Data columns (total 10 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   business_id   668875 non-null  object        
 1   name          668875 non-null  object        
 2   review_id     668875 non-null  object        
 3   user_id       668875 non-null  object        
 4   review_stars  668875 non-null  float64       
 5   useful        668875 non-null  float64       
 6   funny         668875 non-null  float64       
 7   cool          668875 non-null  float64       
 8   text          668875 non-null  object        
 9   date          668875 non-null  datetime64[ns]
dtypes: datetime64[ns](1), float64(4), object(5)
memory usage: 51.0+ MB


Unnamed: 0,business_id,name,review_id,user_id,review_stars,useful,funny,cool,text,date
0,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,gwU0Brt7bYIgFJeBhrmmmA,coSXdeklwuZWjjS0n38Tcg,5.0,0.0,0.0,0.0,The cheese steak is great here. The guys cooki...,2019-01-26 14:59:53
1,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,wt8t7boUicBVutypSIFBfg,pCYmjT_-KrBvFfiazMlaLQ,4.0,0.0,0.0,0.0,I've been here 4 times now and the Philly chee...,2019-11-16 17:02:55
2,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,tZ5DuKIfUuuhxGYV7ywhvw,8U9jNGWvX1kZR-So3KNPiA,5.0,0.0,0.0,0.0,Excellent Excellent customer service!!\nWill d...,2019-10-11 19:59:57
3,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,rqa4ZB6yz-68G5GEXuZJyA,xo8HykGB7Ekm_QKrMRg3Zw,4.0,0.0,0.0,0.0,Came here on a Friday night and only 1 table w...,2019-07-26 13:24:49
4,lu7vtrp_bE9PnxWfA8g4Pg,Banzai Sushi,Q7iupWCt3UpRQSMKp4zO9A,dSTRQSeCqMTbs7l8KF_xJg,4.0,1.0,0.0,0.0,Been coming here since I was in grade 9 so abo...,2015-04-16 05:23:15


### Section 3d: Prepping the data

In [21]:
df_rest_reviews_2merged = pd.merge(df_reviews, df_restaurants, how='left', on='business_id')
df_rest_reviews_2merged.info()
# merging 2 df first on left

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 668875 entries, 0 to 668874
Data columns (total 23 columns):
 #   Column             Non-Null Count   Dtype         
---  ------             --------------   -----         
 0   business_id        668875 non-null  object        
 1   name_x             668875 non-null  object        
 2   review_id          668875 non-null  object        
 3   user_id            668875 non-null  object        
 4   review_stars       668875 non-null  float64       
 5   useful             668875 non-null  float64       
 6   funny              668875 non-null  float64       
 7   cool               668875 non-null  float64       
 8   text               668875 non-null  object        
 9   date               668875 non-null  datetime64[ns]
 10  name_y             668862 non-null  object        
 11  address            668862 non-null  object        
 12  city               668862 non-null  object        
 13  state              668862 non-null  object  

In [22]:
df_reviews_on = pd.merge(df_rest_reviews_2merged, df_users, how='left', on='user_id')
df_reviews_on.info()
df_reviews_on.head()
# left merge of 2 dataframes and then again merge with the third df on inner
df_reviews_on['state'].value_counts()
# show value counts of state

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 668875 entries, 0 to 668874
Data columns (total 43 columns):
 #   Column              Non-Null Count   Dtype         
---  ------              --------------   -----         
 0   business_id         668875 non-null  object        
 1   name_x              668875 non-null  object        
 2   review_id           668875 non-null  object        
 3   user_id             668875 non-null  object        
 4   review_stars        668875 non-null  float64       
 5   useful_x            668875 non-null  float64       
 6   funny_x             668875 non-null  float64       
 7   cool_x              668875 non-null  float64       
 8   text                668875 non-null  object        
 9   date                668875 non-null  datetime64[ns]
 10  name_y              668862 non-null  object        
 11  address             668862 non-null  object        
 12  city                668862 non-null  object        
 13  state               668862 no

state
ON    668862
Name: count, dtype: int64

In [23]:
df_reviews_on.head()
# filter for restaurants in a specific state (ON)

Unnamed: 0,business_id,name_x,review_id,user_id,review_stars,useful_x,funny_x,cool_x,text,date,name_y,address,city,state,postal_code,latitude,longitude,restaurant_stars,rest_review_count,is_open,attributes,categories,hours,user_name,user_review_count,yelping_since,useful_y,funny_y,cool_y,elite,fans,average_stars,compliment_hot,compliment_more,compliment_profile,compliment_cute,compliment_list,compliment_note,compliment_plain,compliment_cool,compliment_funny,compliment_writer,compliment_photos
0,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,gwU0Brt7bYIgFJeBhrmmmA,coSXdeklwuZWjjS0n38Tcg,5.0,0.0,0.0,0.0,The cheese steak is great here. The guys cooki...,2019-01-26 14:59:53,Philthy Phillys,"15480 Bayview Avenue, unit D0110",Aurora,ON,L4G 7J1,44.010962,-79.448677,4.5,4.0,1.0,"{'RestaurantsTableService': 'False', 'Restaura...","Restaurants, Cheesesteaks, Poutineries","{'Monday': '11:0-22:0', 'Tuesday': '11:0-22:0'...",Jay,5,2017-10-30 11:37:05,1,1,0,0,0,3.6,0,0,0,0,0,0,0,0,0,0,0
1,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,wt8t7boUicBVutypSIFBfg,pCYmjT_-KrBvFfiazMlaLQ,4.0,0.0,0.0,0.0,I've been here 4 times now and the Philly chee...,2019-11-16 17:02:55,Philthy Phillys,"15480 Bayview Avenue, unit D0110",Aurora,ON,L4G 7J1,44.010962,-79.448677,4.5,4.0,1.0,"{'RestaurantsTableService': 'False', 'Restaura...","Restaurants, Cheesesteaks, Poutineries","{'Monday': '11:0-22:0', 'Tuesday': '11:0-22:0'...",Jeff,20,2017-01-06 03:02:46,0,0,0,0,0,3.86,0,0,0,0,0,0,0,0,0,0,0
2,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,tZ5DuKIfUuuhxGYV7ywhvw,8U9jNGWvX1kZR-So3KNPiA,5.0,0.0,0.0,0.0,Excellent Excellent customer service!!\nWill d...,2019-10-11 19:59:57,Philthy Phillys,"15480 Bayview Avenue, unit D0110",Aurora,ON,L4G 7J1,44.010962,-79.448677,4.5,4.0,1.0,"{'RestaurantsTableService': 'False', 'Restaura...","Restaurants, Cheesesteaks, Poutineries","{'Monday': '11:0-22:0', 'Tuesday': '11:0-22:0'...",Shoaib Tanvir Accounting,7,2018-09-12 14:15:38,1,0,0,0,0,4.14,0,0,0,0,0,0,0,0,0,0,0
3,eBEfgOPG7pvFhb2wcG9I7w,Philthy Phillys,rqa4ZB6yz-68G5GEXuZJyA,xo8HykGB7Ekm_QKrMRg3Zw,4.0,0.0,0.0,0.0,Came here on a Friday night and only 1 table w...,2019-07-26 13:24:49,Philthy Phillys,"15480 Bayview Avenue, unit D0110",Aurora,ON,L4G 7J1,44.010962,-79.448677,4.5,4.0,1.0,"{'RestaurantsTableService': 'False', 'Restaura...","Restaurants, Cheesesteaks, Poutineries","{'Monday': '11:0-22:0', 'Tuesday': '11:0-22:0'...",Terry,34,2014-02-11 23:38:56,16,8,11,0,0,3.09,0,0,0,0,0,2,2,2,2,1,0
4,lu7vtrp_bE9PnxWfA8g4Pg,Banzai Sushi,Q7iupWCt3UpRQSMKp4zO9A,dSTRQSeCqMTbs7l8KF_xJg,4.0,1.0,0.0,0.0,Been coming here since I was in grade 9 so abo...,2015-04-16 05:23:15,Banzai Sushi,300 John Street,Thornhill,ON,L3T 5W4,43.820492,-79.398466,4.5,7.0,1.0,"{'GoodForKids': 'True', 'RestaurantsTakeOut': ...","Japanese, Fast Food, Food Court, Restaurants",{},Arielle,195,2015-01-04 14:19:45,336,98,207,2015201620172018,13,3.86,6,2,0,1,0,4,12,10,10,14,4


In [24]:
df_reviews_on['name_x'].value_counts()
# show value counts of state

name_x
Pai Northern Thai Kitchen     2834
McDonald's                    2145
Jack Astor's Bar & Grill      1905
KINTON RAMEN                  1819
Banh Mi Boys                  1747
                              ... 
Island Pot                       3
Main Street Diner                3
The Mexican Burrito              3
Red Dice                         3
Mikasa Japanese Restaurant       3
Name: count, Length: 12048, dtype: int64

In [25]:
df_reviews_on['name_y'].value_counts()
# show value counts of state

name_y
Pai Northern Thai Kitchen    2834
McDonald's                   2145
Jack Astor's Bar & Grill     1905
KINTON RAMEN                 1819
Banh Mi Boys                 1747
                             ... 
Shahi Nan Kabab                 3
Island Pot                      3
Main Street Diner               3
The Mexican Burrito             3
Bar10der                        3
Name: count, Length: 12046, dtype: int64

In [26]:
df_reviews_on = df_reviews_on.drop(['name_y'], axis=1)
df_reviews_on.info()
# drop unnecessary column

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 668875 entries, 0 to 668874
Data columns (total 42 columns):
 #   Column              Non-Null Count   Dtype         
---  ------              --------------   -----         
 0   business_id         668875 non-null  object        
 1   name_x              668875 non-null  object        
 2   review_id           668875 non-null  object        
 3   user_id             668875 non-null  object        
 4   review_stars        668875 non-null  float64       
 5   useful_x            668875 non-null  float64       
 6   funny_x             668875 non-null  float64       
 7   cool_x              668875 non-null  float64       
 8   text                668875 non-null  object        
 9   date                668875 non-null  datetime64[ns]
 10  address             668862 non-null  object        
 11  city                668862 non-null  object        
 12  state               668862 non-null  object        
 13  postal_code         668862 no

In [27]:
df_reviews_on = df_reviews_on.rename(columns = {'name_x':'name'})
df_reviews_on.info()
# rename name to user_name before merging

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 668875 entries, 0 to 668874
Data columns (total 42 columns):
 #   Column              Non-Null Count   Dtype         
---  ------              --------------   -----         
 0   business_id         668875 non-null  object        
 1   name                668875 non-null  object        
 2   review_id           668875 non-null  object        
 3   user_id             668875 non-null  object        
 4   review_stars        668875 non-null  float64       
 5   useful_x            668875 non-null  float64       
 6   funny_x             668875 non-null  float64       
 7   cool_x              668875 non-null  float64       
 8   text                668875 non-null  object        
 9   date                668875 non-null  datetime64[ns]
 10  address             668862 non-null  object        
 11  city                668862 non-null  object        
 12  state               668862 non-null  object        
 13  postal_code         668862 no

In [28]:
df_reviews_on.shape

(668875, 42)

In [29]:
df_reviews_on = df_reviews_on.drop_duplicates('text')
df_reviews_on.shape

(667448, 42)

## Section 4: Create df_reviews_on with 5 stars and 1 star

### Section 4a: Create df_reviews_on with 5 stars and 1 star
 - Create a df reviews to include reviews with 5 stars
 - Create a df reviews to include reviews with below 5 star
 - Convert reviews to strings for both dfs

In [30]:
df_reviews_on_5 = df_reviews_on[df_reviews_on['review_stars']== 5.0]
df_reviews_on_5.info()
# create df to include review with rating above 5

<class 'pandas.core.frame.DataFrame'>
Index: 192949 entries, 0 to 668874
Data columns (total 42 columns):
 #   Column              Non-Null Count   Dtype         
---  ------              --------------   -----         
 0   business_id         192949 non-null  object        
 1   name                192949 non-null  object        
 2   review_id           192949 non-null  object        
 3   user_id             192949 non-null  object        
 4   review_stars        192949 non-null  float64       
 5   useful_x            192949 non-null  float64       
 6   funny_x             192949 non-null  float64       
 7   cool_x              192949 non-null  float64       
 8   text                192949 non-null  object        
 9   date                192949 non-null  datetime64[ns]
 10  address             192948 non-null  object        
 11  city                192948 non-null  object        
 12  state               192948 non-null  object        
 13  postal_code         192948 non-nul

In [31]:
df_reviews_on_1 = df_reviews_on[df_reviews_on['review_stars']== 1.0]
df_reviews_on_1.info()
# create df to include review with 1 star

<class 'pandas.core.frame.DataFrame'>
Index: 74712 entries, 18 to 668866
Data columns (total 42 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   business_id         74712 non-null  object        
 1   name                74712 non-null  object        
 2   review_id           74712 non-null  object        
 3   user_id             74712 non-null  object        
 4   review_stars        74712 non-null  float64       
 5   useful_x            74712 non-null  float64       
 6   funny_x             74712 non-null  float64       
 7   cool_x              74712 non-null  float64       
 8   text                74712 non-null  object        
 9   date                74712 non-null  datetime64[ns]
 10  address             74705 non-null  object        
 11  city                74705 non-null  object        
 12  state               74705 non-null  object        
 13  postal_code         74705 non-null  object       

### Section 4b: Convert reviews to strings for both dfs

In [32]:
text_reviews_5 = df_reviews_on_5['text'].to_string(index= False)
display(type(text_reviews_5))
#text_reviews_5
# convert reviews to string

str

In [33]:
text_reviews_1 = df_reviews_on_1['text'].to_string(index= False)
display(type(text_reviews_1))
#text_reviews_1
# convert reviews to string

str

## Section 5: Create n-grams for reviews with 5 stars


### Section 5a: Create a counter for reviews above 5 stars 
 - Create a counter for reviews above 5 stars 
 - Import stopwords from nltk.corpus
 - Tokenize text-reviews 6 and remove stop words
 - Show 20 most common words and add between 5 and 10 words to exclude from the list
 - Rerun the function
 - Show 20 most common words and create a bigramsfor 20 comon two word prhases

In [34]:
import nltk
#nltk.download()
from collections import Counter

from nltk.corpus import stopwords
nltk.download('stopwords')
# import nltk and stopwords from nltk

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/lalayamazaki/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [35]:
cachedStopWords = set(stopwords.words("english"))
cachedStopWords
# set a var to english stopwords

{'a',
 'about',
 'above',
 'after',
 'again',
 'against',
 'ain',
 'all',
 'am',
 'an',
 'and',
 'any',
 'are',
 'aren',
 "aren't",
 'as',
 'at',
 'be',
 'because',
 'been',
 'before',
 'being',
 'below',
 'between',
 'both',
 'but',
 'by',
 'can',
 'couldn',
 "couldn't",
 'd',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'down',
 'during',
 'each',
 'few',
 'for',
 'from',
 'further',
 'had',
 'hadn',
 "hadn't",
 'has',
 'hasn',
 "hasn't",
 'have',
 'haven',
 "haven't",
 'having',
 'he',
 'her',
 'here',
 'hers',
 'herself',
 'him',
 'himself',
 'his',
 'how',
 'i',
 'if',
 'in',
 'into',
 'is',
 'isn',
 "isn't",
 'it',
 "it's",
 'its',
 'itself',
 'just',
 'll',
 'm',
 'ma',
 'me',
 'mightn',
 "mightn't",
 'more',
 'most',
 'mustn',
 "mustn't",
 'my',
 'myself',
 'needn',
 "needn't",
 'no',
 'nor',
 'not',
 'now',
 'o',
 'of',
 'off',
 'on',
 'once',
 'only',
 'or',
 'other',
 'our',
 'ours',
 'ourselves',
 'out',
 'over',
 'own',
 'r

In [36]:
tokens_5 = [w for w in nltk.word_tokenize(text_reviews_5.lower())
        if w.isalpha()]
#tokenize reviews and with lower case

In [37]:
no_stops_5 = [t for t in tokens_5
        if t not in cachedStopWords]
#remove stop words with a function

In [38]:
Counter(no_stops_5).most_common(20)
# show 20 most common words

[('place', 33571),
 ('food', 24907),
 ('great', 21914),
 ('best', 17121),
 ('love', 12934),
 ('amazing', 11158),
 ('good', 10429),
 ('service', 9846),
 ('restaurant', 9323),
 ('one', 8579),
 ('came', 6858),
 ('went', 6664),
 ('time', 6520),
 ('delicious', 6461),
 ('really', 5832),
 ('first', 5687),
 ('toronto', 5656),
 ('favourite', 5193),
 ('go', 4799),
 ('excellent', 4617)]

In [39]:
cachedStopWords.update(('place', 'really', 'one', 'great', 'went', 'could', 'would', 'love', 'good', 'first', 'last', 'time', 'yum', 'ever', 'yes'))
# add more stopwords

In [40]:
no_stops_5 = [t for t in tokens_5
        if t not in cachedStopWords]
#remove stop words with a function

In [41]:
Counter(no_stops_5).most_common(20)
# show 20 most common words

[('food', 24907),
 ('best', 17121),
 ('amazing', 11158),
 ('service', 9846),
 ('restaurant', 9323),
 ('came', 6858),
 ('delicious', 6461),
 ('toronto', 5656),
 ('favourite', 5193),
 ('go', 4799),
 ('excellent', 4617),
 ('sushi', 4557),
 ('always', 3961),
 ('chicken', 3727),
 ('lunch', 3634),
 ('spot', 3538),
 ('dinner', 3410),
 ('pizza', 3316),
 ('friendly', 3299),
 ('nice', 3284)]

In [42]:
from nltk.util import ngrams
# import lib for ngrams

In [43]:
bigrams = ngrams(no_stops_5, 2)
Counter(bigrams).most_common(20)
# create bigram for no stops_above_5
# run a counter for 20 common two word phrases

[(('food', 'service'), 1737),
 (('amazing', 'food'), 1736),
 (('hidden', 'gem'), 1170),
 (('food', 'amazing'), 1056),
 (('delicious', 'food'), 941),
 (('service', 'food'), 923),
 (('excellent', 'food'), 864),
 (('friendly', 'staff'), 801),
 (('hands', 'best'), 762),
 (('best', 'sushi'), 760),
 (('thai', 'food'), 715),
 (('excellent', 'service'), 625),
 (('amazing', 'service'), 617),
 (('food', 'delicious'), 604),
 (('dim', 'sum'), 596),
 (('far', 'best'), 594),
 (('probably', 'best'), 571),
 (('best', 'thai'), 541),
 (('indian', 'food'), 541),
 (('friendly', 'service'), 540)]

### Section 5b: Create a counter using lemmatizing for reviews with 5 stars 
 - Create ngrams with a lemmatized version
 - Create a bi-gram, tri-gram and quard-gram versions

In [44]:
from nltk.stem import WordNetLemmatizer
# import library wordnetlemmatizer

In [45]:
wordnet_lemmatizer = WordNetLemmatizer()
# import wordNetLemmatizer

In [46]:
lemmatized_5 = [wordnet_lemmatizer.lemmatize(t) for t in no_stops_5]
# create a var that lemmatizes words in no stops_above_5

In [47]:
text_bow_5 = Counter(lemmatized_5)
text_bow_5.most_common(20)
# create a counter

[('food', 25152),
 ('best', 17122),
 ('amazing', 11158),
 ('restaurant', 10733),
 ('service', 9910),
 ('came', 6858),
 ('delicious', 6464),
 ('toronto', 5658),
 ('favourite', 5279),
 ('go', 4911),
 ('excellent', 4617),
 ('sushi', 4566),
 ('spot', 3970),
 ('always', 3961),
 ('friend', 3875),
 ('chicken', 3735),
 ('lunch', 3665),
 ('pizza', 3531),
 ('dinner', 3460),
 ('friendly', 3299)]

In [48]:
bigrams_lem_5 = ngrams(lemmatized_5, 2)
Counter(bigrams_lem_5).most_common(20)
# create a bi-gram

[(('food', 'service'), 1757),
 (('amazing', 'food'), 1738),
 (('hidden', 'gem'), 1225),
 (('food', 'amazing'), 1060),
 (('delicious', 'food'), 946),
 (('service', 'food'), 936),
 (('excellent', 'food'), 867),
 (('friendly', 'staff'), 815),
 (('hand', 'best'), 768),
 (('best', 'sushi'), 762),
 (('thai', 'food'), 719),
 (('excellent', 'service'), 629),
 (('amazing', 'service'), 620),
 (('food', 'delicious'), 610),
 (('dim', 'sum'), 600),
 (('far', 'best'), 594),
 (('probably', 'best'), 571),
 (('best', 'pizza'), 562),
 (('indian', 'food'), 546),
 (('friendly', 'service'), 542)]

In [49]:
trigrams_lem_5 = ngrams(lemmatized_5, 3)
Counter(trigrams_lem_5).most_common(20)
# create a trigram

[(('best', 'thai', 'food'), 263),
 (('amazing', 'food', 'service'), 181),
 (('best', 'fish', 'chip'), 170),
 (('excellent', 'food', 'service'), 164),
 (('food', 'amazing', 'service'), 162),
 (('best', 'indian', 'food'), 159),
 (('food', 'excellent', 'service'), 156),
 (('ca', 'say', 'enough'), 143),
 (('amazing', 'food', 'amazing'), 129),
 (('food', 'friendly', 'staff'), 113),
 (('service', 'amazing', 'food'), 112),
 (('best', 'thai', 'restaurant'), 110),
 (('best', 'dim', 'sum'), 107),
 (('say', 'enough', 'thing'), 102),
 (('food', 'friendly', 'service'), 99),
 (('let', 'start', 'saying'), 96),
 (('ca', 'go', 'wrong'), 95),
 (('food', 'reasonable', 'price'), 95),
 (('service', 'delicious', 'food'), 89),
 (('best', 'chinese', 'food'), 86)]

In [50]:
quadgrams_lem_5 = ngrams(lemmatized_5, 4)
Counter(quadgrams_lem_5).most_common(20)
# create a quadgram

[(('ca', 'say', 'enough', 'thing'), 65),
 (('amazing', 'food', 'amazing', 'service'), 49),
 (('best', 'thai', 'food', 'toronto'), 44),
 (('excellent', 'food', 'excellent', 'service'), 38),
 (('food', 'even', 'better', 'service'), 34),
 (('amazing', 'service', 'amazing', 'food'), 27),
 (('finally', 'got', 'chance', 'try'), 26),
 (('best', 'middle', 'eastern', 'food'), 23),
 (('best', 'fish', 'chip', 'toronto'), 21),
 (('best', 'thai', 'food', 'city'), 19),
 (('best', 'indian', 'food', 'toronto'), 18),
 (('delicious', 'food', 'friendly', 'staff'), 17),
 (('delicious', 'food', 'friendly', 'service'), 16),
 (('ca', 'wait', 'go', 'back'), 16),
 (('amazing', 'food', 'even', 'better'), 16),
 (('excellent', 'service', 'delicious', 'food'), 16),
 (('best', 'thai', 'restaurant', 'toronto'), 16),
 (('hidden', 'gem', 'tucked', 'away'), 16),
 (('best', 'pork', 'bone', 'soup'), 15),
 (('hand', 'best', 'thai', 'food'), 14)]

## Section 6: Create n-grams for reviews with 1 star

### Section 6a: Create n-grams for reviews with 1 star

 - Create a counter for reviews with 1 star
 - Create a bigrams for no_stops_1

In [51]:
tokens_1 = [w for w in nltk.word_tokenize(text_reviews_1.lower())
        if w.isalpha()]
#tokenize reviews and with lower case

In [52]:
no_stops_1 = [t for t in tokens_1
        if t not in cachedStopWords]
#remove stop words with a function

In [53]:
Counter(no_stops_1).most_common(20)
# show 20 most common words

[('food', 8698),
 ('service', 7523),
 ('worst', 4835),
 ('ordered', 3797),
 ('restaurant', 3161),
 ('came', 3024),
 ('terrible', 2600),
 ('bad', 2454),
 ('horrible', 2332),
 ('experience', 2277),
 ('go', 1938),
 ('give', 1768),
 ('like', 1757),
 ('location', 1589),
 ('order', 1585),
 ('got', 1520),
 ('never', 1499),
 ('review', 1386),
 ('customer', 1379),
 ('used', 1362)]

In [54]:
cachedStopWords.update(('would', 'could', 'ever','probably', 'got', 'give', 'restaurant', 'horrible', 'let', 'customer', 'first', 'last', 'time', 'never', 'terrible', 'twice'))

In [55]:
no_stops_1 = [t for t in tokens_1
        if t not in cachedStopWords]
#remove stop words with a function

In [56]:
Counter(no_stops_1).most_common(20)
# show 20 most common words

[('food', 8698),
 ('service', 7523),
 ('worst', 4835),
 ('ordered', 3797),
 ('came', 3024),
 ('bad', 2454),
 ('experience', 2277),
 ('go', 1938),
 ('like', 1757),
 ('location', 1589),
 ('order', 1585),
 ('review', 1386),
 ('used', 1362),
 ('star', 1336),
 ('lunch', 1331),
 ('chicken', 1312),
 ('th', 1206),
 ('night', 1204),
 ('dinner', 1191),
 ('w', 1157)]

In [57]:
bigrams = ngrams(no_stops_1, 2)
Counter(bigrams).most_common(20)
# create bigram for no stops_1
# run a counter for 20 common two word phrases

[(('worst', 'service'), 928),
 (('bad', 'service'), 535),
 (('food', 'service'), 441),
 (('worst', 'experience'), 435),
 (('poor', 'service'), 387),
 (('service', 'food'), 372),
 (('zero', 'stars'), 292),
 (('slow', 'service'), 289),
 (('worst', 'food'), 229),
 (('bad', 'experience'), 220),
 (('service', 'bad'), 218),
 (('food', 'poisoning'), 209),
 (('dim', 'sum'), 207),
 (('food', 'bad'), 197),
 (('food', 'ok'), 193),
 (('quality', 'food'), 188),
 (('bad', 'food'), 187),
 (('many', 'times'), 172),
 (('stay', 'away'), 166),
 (('ordered', 'delivery'), 166)]

### Section 6b: Create a counter using lemmatizing for reviews with 1 star
 - Create ngrams with a lemmatized version
 - Create a bi-gram, tri-gram, quad-gram versions

In [58]:
lemmatized_1 = [wordnet_lemmatizer.lemmatize(t) for t in no_stops_1]
# create a var that lemmatizes words in no stops_1

In [59]:
text_bow_1 = Counter(lemmatized_1)
text_bow_1.most_common(20)

[('food', 8776),
 ('service', 7603),
 ('worst', 4835),
 ('ordered', 3797),
 ('came', 3024),
 ('bad', 2455),
 ('experience', 2454),
 ('review', 2408),
 ('star', 2391),
 ('go', 1980),
 ('like', 1778),
 ('location', 1689),
 ('order', 1669),
 ('friend', 1597),
 ('used', 1362),
 ('lunch', 1338),
 ('chicken', 1318),
 ('night', 1227),
 ('th', 1206),
 ('dinner', 1203)]

In [60]:
text_chart_1 = pd.DataFrame(text_bow_1.most_common(20), columns = ['word', 'freq'])
text_chart_1

Unnamed: 0,word,freq
0,food,8776
1,service,7603
2,worst,4835
3,ordered,3797
4,came,3024
5,bad,2455
6,experience,2454
7,review,2408
8,star,2391
9,go,1980


In [61]:
bigrams_lem_1 = ngrams(lemmatized_1, 2)
Counter(bigrams_lem_1).most_common(20)
# create a bi-gram

[(('worst', 'service'), 940),
 (('bad', 'service'), 549),
 (('worst', 'experience'), 491),
 (('food', 'service'), 447),
 (('poor', 'service'), 394),
 (('zero', 'star'), 391),
 (('service', 'food'), 381),
 (('slow', 'service'), 289),
 (('bad', 'experience'), 233),
 (('worst', 'food'), 229),
 (('service', 'bad'), 221),
 (('food', 'poisoning'), 209),
 (('dim', 'sum'), 208),
 (('write', 'review'), 206),
 (('food', 'bad'), 200),
 (('food', 'ok'), 194),
 (('quality', 'food'), 190),
 (('bad', 'food'), 188),
 (('year', 'ago'), 187),
 (('made', 'reservation'), 173)]

In [62]:
trigrams_lem_1 = ngrams(lemmatized_1, 3)
Counter(trigrams_lem_1).most_common(20)
# create a trigram

[(('worst', 'dining', 'experience'), 94),
 (('food', 'bad', 'service'), 58),
 (('worst', 'service', 'experienced'), 55),
 (('ordered', 'uber', 'eats'), 53),
 (('service', 'bad', 'food'), 47),
 (('bad', 'service', 'food'), 41),
 (('food', 'okay', 'service'), 40),
 (('worst', 'chinese', 'food'), 40),
 (('bad', 'service', 'bad'), 38),
 (('worst', 'service', 'food'), 36),
 (('wish', 'zero', 'star'), 35),
 (('worst', 'service', 'received'), 35),
 (('food', 'ok', 'service'), 34),
 (('extremely', 'slow', 'service'), 30),
 (('worst', 'service', 'experience'), 30),
 (('usually', 'write', 'review'), 30),
 (('poor', 'quality', 'food'), 29),
 (('normally', 'write', 'review'), 28),
 (('food', 'poor', 'service'), 27),
 (('bad', 'food', 'bad'), 27)]

In [63]:
quadgrams_lem_1 = ngrams(lemmatized_1, 4)
Counter(quadgrams_lem_1).most_common(20)
# create a quadgram

[(('bad', 'service', 'bad', 'food'), 17),
 (('bad', 'food', 'bad', 'service'), 16),
 (('worst', 'dining', 'experience', 'life'), 15),
 (('hate', 'leave', 'bad', 'review'), 8),
 (('ordered', 'uber', 'eats', 'food'), 6),
 (('took', 'hour', 'get', 'food'), 6),
 (('food', 'mediocre', 'best', 'service'), 6),
 (('food', 'even', 'worse', 'service'), 6),
 (('ordered', 'delivery', 'took', 'hour'), 6),
 (('food', 'took', 'way', 'long'), 5),
 (('food', 'ok', 'nothing', 'special'), 5),
 (('food', 'okay', 'nothing', 'special'), 5),
 (('came', 'friend', 'birthday', 'dinner'), 5),
 (('service', 'took', 'hour', 'get'), 5),
 (('slow', 'service', 'waited', 'min'), 5),
 (('hate', 'leaving', 'bad', 'review'), 5),
 (('poor', 'service', 'long', 'wait'), 5),
 (('ordered', 'via', 'uber', 'eats'), 5),
 (('service', 'took', 'min', 'get'), 5),
 (('slow', 'service', 'waited', 'hour'), 5)]

## Section 7: Analysis of the results for 5-star and 1-star reviews with specific words

### Section 7a: Analysis of the results for 5-star with specific words
 - Create a new df and show the results with selected features
 - Filter df to see review with the word 'friendly staff' in 1-star reviews
 - Explain th results

In [64]:
pd.set_option('display.max_colwidth', None)
df_reviews_on_content_5 = df_reviews_on_5[['name', 'categories', 'review_stars', 'text']]
df_reviews_on_content_5
#create df with selected features

Unnamed: 0,name,categories,review_stars,text
0,Philthy Phillys,"Restaurants, Cheesesteaks, Poutineries",5.0,The cheese steak is great here. The guys cooking it are great and manager runs a tight ship.
2,Philthy Phillys,"Restaurants, Cheesesteaks, Poutineries",5.0,"Excellent Excellent customer service!!\nWill def go back and try more items. I had the philly steak original , great size n great toppings! They also have burgers and poutines.\nGreat experience!"
5,Banzai Sushi,"Japanese, Fast Food, Food Court, Restaurants",5.0,"Great Sushi, and unbeatable prices! Only downfall is that they are cash only, and close by 7pm"
7,Banzai Sushi,"Japanese, Fast Food, Food Court, Restaurants",5.0,"I have been had takeout sushi here quite often for several years. Good taste, good price, friendly people!"
9,Banzai Sushi,"Japanese, Fast Food, Food Court, Restaurants",5.0,"I have been getting sushi from Banzai Sushi since it opened about 10 years ago. Sushi is always very consistent and fresh. The owners (who seem to be a couple) are always prepared to make a type of sushi that they may not on display. Just ask! The pricing is very reasonable, $3.50 for 6 pieces of any type of sushi. The sushi pieces are also quite large, which gives you true value for money. \n\nThe owner also sells imported soda from the US, such as Vanilla Coke or Cherry Coke which I love. Overall, I have never had a bad experience here. Very low key, but great, it is always great tasting and FRESH! \n\nIf you are in the area, go in an pick up some sushi, soup or a bento box. You can take it home or eat it at the tables in the area that was once a mall."
...,...,...,...,...
668841,Q's Shawarma,"Restaurants, Mediterranean",5.0,"On paper, this place has NOTHING going for it - middle of no where in an industrial area, part of the same lot as a gas station and auto body shops, on a highway that attracts 18 ton trailers way more than cars. I mean who visits a place like that?\n\nWell, I did - and I totally LOVED it. This place is very hole in the wall, CASH only, minimal furniture and no decor. They serve up plates veyr quickly and have a process for it - be ready with your order when the guy asks - he goes fast through the toppings - you snooze, you lose! \n\nWe had the chicken shawarma plate - absolute best ever! Rice was tasty, chicken was tender, flavorful and moist - loved every bite of it!\n\nWe ended up using american money bec I am the hightailing chick who decides every place must use a credit card, and the guy charged us like Canadian money - which means I lost a few dollars, but hey - the food was delicious, so no complains!"
668845,Q's Shawarma,"Restaurants, Mediterranean",5.0,"Amazing food, I got the hockey shawarma plate, it is huge they give really good portions and the quality is amazing. The staff here is also really friendly, I highly recommend this place.\n\n\nPs this is also halal / kosher"
668863,Steak & Cheese & Quick Pita Restaurant,"Sandwiches, Restaurants",5.0,"I FREAKING LOVE THE CHEESE & STEAK 8-INCH SUB!! AND IT'S 5.99 ONLY AFTER TAX!\n\nI rarely eat out, but the cheese & steak sub is definitely the best food in downtown. They are very generous with the meat portion and the meat is tasty as hell. Could've done with more cheese tho. Nevertheless, this a great go-to place when you're hungry. I wish I'm still hungry so that I can eat 2nd offering :("
668873,Nishi Sushi,"Japanese, Sushi Bars, Restaurants",5.0,"Food was fresh, tasted great and was well presented. Very cozy and clean environment, and very friendly service. Also, I like the privacy of the booths. :)"


In [65]:
df_reviews_on_content_5['text'] = df_reviews_on_content_5['text'].fillna('')
#replace NaN values with a non-null value

In [66]:
df_reviews_on_content_5[df_reviews_on_content_5['text'].str.contains('friendly staff')].sample(10)
# filter df to include reviews with friendly staff

Unnamed: 0,name,categories,review_stars,text
499805,Rag Doll Eatery,"Restaurants, Comfort Food, Burgers, Vegan, Soul Food, Pizza, Vegetarian, Salad",5.0,"Discovered this cute eatery in my area and quickly fell in love with it. It had a fun rock vibe, very friendly staff, quick service and most importantly delicious food! There is music playing in the background, loud enough to notice but quiet enough that you can still talk easily without yelling. I got the Rock Candy Drink (Malibu Rum, Peach Schnapps, Mango & Passion Fruit Juices). Next up were appetizers, sweet & sour meatballs and ravioli bites! Both were delicious! The meatballs themselves were flavourful and not dry at all, they also came in a yummy sauce. The ravioli bites were the perfect blend of cheese and crunch and were delicious. The main course: Fleetwood MAC 'n cheese tater tots. The Mac and cheese was creamy and flavourful, the tater tots had the perfect amount of crunch. Would definitely go back again!"
239805,Donburi,"Japanese, Asian Fusion, Restaurants, Chinese",5.0,"I really do love this place! \n\nRight at the moment from entering, I felt welcomed, all the way until the time I left. Where you get a huge Japanese goodbye! And it really felt genuine too! \n\nService is outstanding as other Yelpers have said. The waitress even apologized for the long wait on the dessert which took me by surprise because I think only a minute passed by since I had ordered my dessert! \n\nI got the curry Katsu, and I loved it! (I've had a lot of curry katsu's in my time). Perfect crispy katsu, with rich thick curry sauce! \n\nMy boyfriend got the hamburger don, which was so so for him. But I really liked it! \n\nHe then got the green tea bruele and I had the Japanese special cake and they were both superb! \n\nI think it's a combination of the extremely friendly staff with a great atmosphere and fantastic food that has tied it all together for me. \n\nDefinitely recommended, and I will be back in the near future! I'm really happy I found this place while looking for places to eat at midnight! Haha xD"
383164,New Kalyani,"Indian, Sri Lankan, Restaurants",5.0,We accidentally stumbled across this place while looking for a place to eat visiting Canada. Awesome food. Awesome service. Truly authentic and traditional. Very friendly staff too. No place to dine in.
487433,Barque Smokehouse,"Food, Smokehouse, American (Traditional), Bars, Hookah Bars, Barbeque, Restaurants, Breakfast & Brunch, Nightlife, Chicken Wings",5.0,"It's a casual atmosphere so ditch the fancy gear and put on your most comfortable sweatpants and be prepared to EAT. \nI've been here twice so far and both times the service was top notch with friendly staff (from the front desk to wait staff to the kitchen), a small and cozy atmosphere and plenty of character. \nBut we come here for the food. \nBarque's has been a magnet for Roncesvalles. Thanks to this restaurant it was my first time visiting the neighbourhood.\nPlatter and individual options available. Best bet for two people is the current menu special where you can pick 1-2-3 : three meats, two sides, one appetizer. \nDo yourself a favour and indulge in the Barque Obama (haha) ribs. Not too sticky, smoky and a hint of sweetness. \nThe pulled pork (or was it chicken) was equally impressive. Brisket - never been a fan so I'll just rate it ""good"".\nAlso do this for your tastebuds and a happy belly - the Cajun mac and cheese. Don't ask me no questions. Just order and enjoy. Thank me later. \nIt's not Memphis BBQ but it's Toronto BBQ and it's done right."
424602,Rick's Cafe,"Coffee & Tea, Food, Cafes, Restaurants",5.0,"Solid espresso, friendly staff, great views & people watching if Kensington, free fast wifi. What else would one want from a coffee shop?"
81575,Soos,"Restaurants, Malaysian",5.0,"Best place going around. Food is full of flavor and spice. Great atmosphere, friendly staff and the Asian tacos and chicken wings are amazing!!"
436645,Black Rock Coffee,"Internet Cafes, Coffee & Tea, Restaurants, Food, Cafes",5.0,"Delicious coffe, fantastic atmosphere and friendly staff. Perfect for meeting friends or just taking a break."
302609,JOEY Eaton Centre,"American (New), Sports Bars, Restaurants, Canadian (New), Lounges, Nightlife, Bars",5.0,"The restaurant itself is gorgeous, paired with friendly staff and great ambience. I came here as a group with some of my peers for lunch and we all had a good time. I ordered the Calamari Fritti which tasted great, probably my favourite calamari next to the one at the Hazelton in Yorkville. The only complain I can think of regarding the food was that it was a bit too salty for my liking. But other than that it was fine.\nThere was a little issue that came up during my visit with a staff member, but the manager (At least I think it was the manager) was very prompt to call me back and do whatever he could to fix the issue, so I commend him for that! Will definitely be coming back soon."
424600,Hub Coffee,"Restaurants, Sandwiches, Coffee & Tea, Breakfast & Brunch, Food",5.0,"Really great coffee shop with friendly staff. They make a great latte and Americano. They also have a great brunch and lunch menu with smoothies made with yogurt and apple juice. They also have lots of baked goods. \n\nThere are lots of tables, including a section in the back which is great because it means you can usually walk in and find somewhere to sit without waiting. \n\nThey offer drip coffee to go in a cool way. If you are in a rush you can fill up yourself pay with change without having to wait in line. \n\nThe crowd is a mix of young people, creative types and families with young kids. It's a great community gathering point with a chill atmosphere."
77400,Teriyaki Experience,"Restaurants, Asian Fusion, Japanese, Fast Food",5.0,"My family are the Teriyaki Experience Brampton regular guests, they always making good and fresh food, the restaurant is very clean, friendly staffs : )"


Analysis: The quadgrams from 5-star reviews suggest a positive customer experience. 
Phrases like “friendly staff”,  "amazing food amazing service" and "excellent food excellent service" indicate a high level of satisfaction with both the food quality and the service provided. 
The specific mention of various cuisines such as Thai, Indian, and Middle Eastern shows a diverse culinary appreciation. 
The recurring word "best" highlights that customers are more than satisfied with their experiences at the restaurants.

### Section 7b: Analysis of the results for 1-star with specific words
 - Create a new df and show the results with selected features
 - Filter df to see review with the word 'waited' in 1-star reviews
 - Explain th results

In [67]:
pd.set_option('display.max_colwidth', None)
df_reviews_on_content_1 = df_reviews_on_1[['name', 'categories', 'review_stars', 'text']]
df_reviews_on_content_1
#create df with selected features

Unnamed: 0,name,categories,review_stars,text
18,Gourmet Burger Company,"Restaurants, Burgers, Food",1.0,"What happened? Your burgers use to be the best around. We tried your new Angus Beef burgers which were dry, overcooked and tasteless. It was like you squeezed all the juice and flavour out of the burgers. Fries were overcooked and dry as well. I know good fries are double-fried but these were more than triple-fried. Your service is always friendly but food very disappointing this time."
25,Gourmet Burger Company,"Restaurants, Burgers, Food",1.0,There is a reason there is no one here! I have been to this location one other time and it was much better. I will never go back again. Hamburgers were like sawdust. Staff was rude. not worth the price. I can't wait for burger Priest to open on Kipling and queensway. I am sure gourmet burger will end up closing
50,Mi Mi Restaurant,"Vietnamese, Restaurants",1.0,"The food was good/decent.\nThe low review is for the LACK of signage in front that this is CASH ONLY.\nIt's 2017 if you don't have debit, you are both lazy and most likely 'particular' about what gets claimed at the end of the year... When I have to walk down the street, pay a debit fee and banking fee to get your 'blessed' cash, it's annoying. There are a lot of us 'cashless' folks these days, get on-board. Ultimately I would go back to the other PHO places in the area, easier/cheaper. \nPlus what is it with Vietnamese restaurant/daycare?! Wandering toddler switching between crying or hollering ain't endearing!"
101,Mi Mi Restaurant,"Vietnamese, Restaurants",1.0,This pho was so fucking bad I couldn't even find a rating for how shit it was. Constructive criticism don't dilute the broth
143,Pizzaville,"Restaurants, Pizza",1.0,Terrible rude people!! Will never go there again and ripped me off too!!! Dough was hard and cheese tasted terrible. Skimpy on the pepperoni.
...,...,...,...,...
668858,Steak & Cheese & Quick Pita Restaurant,"Sandwiches, Restaurants",1.0,"This place is overpriced for the quality of food. I ordered a shawarma plate and the shawarma was too dry and didn't taste fresh. The salad included was too small that it felt like I had next to none, and it did not taste fresh either. Customer service is slow and should be more efficient. I was not satisfied for spending $10 (including tax) for it."
668859,Steak & Cheese & Quick Pita Restaurant,"Sandwiches, Restaurants",1.0,"I decided to give this place another try. The gyros were horrible the first time so I tried the falafels this time. Big mistake. Bland and stale seem to be the main ingredients for this place. It also took more than 10 minutes for them to make it too. I thought it was a fast food type place, not a sit-down type place judging by the name but I guess I was wrong. It's a shame because the owner seems quite nice. I just can't bring myself to give this place more than 1 star because the food is just so bad."
668860,Steak & Cheese & Quick Pita Restaurant,"Sandwiches, Restaurants",1.0,"I'm sorry but this place doesn't deserve me spending time writing a review about it... just DON'T GO THERE and hopefully something better will appear there soon. Man I feel like an ass, I apologize to the owners who are very nice people but COME ON, serve some proper food!!\n\nEverything tastes bad, but it's super cheap especially on their daily specials (usually prices $3-4). I've never had such bland, tasteless food in my life."
668865,Steak & Cheese & Quick Pita Restaurant,"Sandwiches, Restaurants",1.0,"I am sorry, I tried. At least three times.\n\nIt's been awhile since I've been here, so I don't want to condemn them without a retrial, but I was left deeply dissatisfied with what I got every single time, back when they moved into this area. As I mentioned with Queenslice, I find their lack of focus represented in the signage- ""Toronto's #1 shawarma house!""/""the original philly style sandwich!""/""juice bar!"" to be indicative of greater problems in the kitchen. But, it has managed to stick around for probably close to two years, so you never know.\n\nAll I know: \n-the falafel balls were dry, bordering on stale,\n-the shawarma was gristly, with too many of those *hard* chunks that never fail to disturb me,\n-i was charged extra for tabouli and pickles...I think the only included toppings were the straight, base veggies of everyplace (tomatoes, lettuce, onion). This was probably what bothered me the most.\nNot cool, guys."


In [68]:
df_reviews_on_content_1['text'] = df_reviews_on_content_1['text'].fillna('')
#replace NaN values with a non-null value

In [69]:
df_reviews_on_content_1[df_reviews_on_content_1['text'].str.contains('waited')].sample(20)
# filter df to include reviews with 'waited'

Unnamed: 0,name,categories,review_stars,text
18333,Milestones Restaurants,"American (Traditional), American (New), Comfort Food, Restaurants, Nightlife, Canadian (New), Bars",1.0,"I would give NEGATIVE STARS if it's possible. WORST DINING EXPERIENCE EVER. Came here on a Friday night and the service was extremely slow. Our server was new and she got our order wrong and we waited extra time for our order to be correctly brought to the table. This place was so loud, you can barely hear when you talk to your guests. \n\nThe service was horrible. The server literally came over twice the whole dining experience. If she can't handle so many tables at once, maybe the management is to blame.\n\nSpoke to the manager in the purple/blue shirt and he was a joke. He was arrogant and refused to admit the restaurant server had any issues or the fact service was at a snail pace. \n\nThe restaurant is racist and does not care about anyone!\n\nWorst dining experience for overpriced food that was not even what we ordered! Pathetic that's all I can say!\n\nService 0/5\nManagement 0/5\nFood 1/5\n\nAvoid this place and don't ruin your night !"
460723,Matisse Restaurant + Bar,"American (New), Nightlife, Restaurants, Bars, Cocktail Bars, Canadian (New)",1.0,"The service is very bad for a place like this. They were chatting with each other/ playing on their phones instead of serving us. We waited at the entrance for more than5 min, there are 5 tables max in the whole restaurants. Only time they were fast was to bring us the bill which I just started my dessert, and then I waited for forever to call some one for the machine for my card. The food is very average for the price. Not recommend this place!"
145112,"On The Curve, Hot Stove & Wine Bar & Patio","Restaurants, Mediterranean",1.0,"A very limited menu option which has not been updated on there wedsite. The food it self tastes was if it were reheated, and absolutely no tast, it was hard to eat half of my meal. Services was horrible, I waited an hour and a half for my drinks. All of meals came out a different times. When wanting to speak with a supervisor they told me she was in a meeting and continued to be in the meeting for the remainder of the evening there. On the website it states live Dj at 9:30pm. The Dj didn't come in until 11pm. And didnt get started until 11:30 And by that time I had enough! \nWouldn't recommend this place at all. Sorry"
46951,Red Lobster,"Restaurants, American (Traditional), Seafood",1.0,"Just came here early dinner with my family. The service sucks, waited our food for more than 30 minutes and staff are not friendly. The foods are very salty especially shrimps, lobster taste weird and sirloin is tender but not tasty! It's okey to be costly if the food is good. Very expensive food, not good food! We are all disappointed! Definitely, we're not coming back here at all!!!"
506828,Hooters,"American (Traditional), Nightlife, Bars, Chicken Wings, Restaurants, Sports Bars",1.0,"Our experience at Hooters today was less than mediocre. We waited 25 minutes before a server came over to tell us ""it will be a few more minutes"" we had to go over to a random table and ask them to use their menus because we never got any. After 30 mins we got our drinks and then after another 30 mins we got our food. Well not all of us. There were 7 girls at our table and only 5 ordered food- wings and onion rings (doesn't take long in the deep fryer). We were missing a platter and a plate of onion rings- which came another 30 mins later. At this point we asked to see a manager- who honestly did not care at all that we were waiting for this long and that we were basically ignored the whole night. The manager said ""sorry she's kinda new - we can take the platter off the bill and can I make anything else faster for you?"" Well no - that was the last of food and drinks and we had 3 servers at the table how were they all new? he just said ""ok"". \nWe were a table full of servers so we get it's busy sometimes \n BUT WORST SERVICE EVER. Not even great ""hooters"" could have fixed this experience. 0.5 star."
180102,Tinuno,"Filipino, Restaurants",1.0,"We booked a table for 4 at 7:30 pm, sat down at 7:40....waited for over an hr...it is now 8:45 and the food still hasn't come...definitely won't come back."
428630,Globe Bistro,"Local Flavor, American (New), Restaurants, French, Canadian (New)",1.0,"I came here for a birthday brunch. There was a group of approximately 12-15 of us. My friend heard good things about this restaurant, so she booked the restaurant.\n\nWhen we walked in, we were greeted immediately and seated. The person serving coffee was really kind and came by every 15 minutes to top up coffee.\n\nWhen we placed our order, we were surprised that the food was taking a really long time. Our waiter had done a really poor job of serving our table because after we ordered our food, it took the restaurant over 1.5 hours to serve us. They had screwed up our order and put us on the bottom of the list. The last 2 people to join our table got their food and the rest of us waited and waited while other tables who had arrived half hour or more after us, got served their food. \n\nWhen I finally got my order of eggs benedict, the eggs were overcooked and the pork was overdone. It would've been a much more pleasant experience had the waiter brought bread or something to help carry us over until the food arrived when we had asked, but he had completely forgotten about us.\n\nThe manager at least comped us for the trouble and bad service. It's unfortunate because the waiter really should have pushed the kitchen to make our food ahead of the other tables who arrived and ordered an hour after us. Incredibly disappointing, but glad the manager took the steps to rectify the situation."
156068,El Furniture Warehouse - Bloor St,"Nightlife, American (Traditional), Restaurants, Bars, Pubs, Comfort Food, Gastropubs",1.0,"I live in the area and was incredibly excited to see the opening of El Furniture Warehouse. My first time going, I was pleased with the service and my food - I got the beet & goat cheese salad. The next time I went, I ordered the chicken tacos - they were absolutely tasteless. I found the service to be slower, but I was willing to forgive that.\n\nWilling to give it another try, I went again this past weekend with a friend. The host at the front door was not all that friendly. He sat us with some other individuals at a table because it was busy (no problem, I don't mind meeting people!). It was about 9:15 and after a long day, we were only just ready for dinner. Not that I should have to justify whether or not I was ordering a drink, but after a long day in the sun, we just ordered food. Our server had attitude when we told her we didn't want a drink at that time. We waited approximately 30 minutes and watched everyone else around us get served. There was no acknowledgment of our waiting (for two salads). \n\nMy negative experience is not the only negative one that I have heard about EFW. Needless to say, I won't be back. Sad, because I am so close and it had a lot of potential."
8672,Pizza Pizza,"Pizza, Italian, Chicken Wings, Restaurants",1.0,"I went in and bought an X-Large walk in pizza, and I told the guy I wanted half pepperoni, half cheese. He told me "" Left side cheese, right side pepperoni. "" I said fine, and waited about 15 min for the order. To my surprise. When I opened the box. The right side had the cheese and the left side had pepperoni. I was extremely disappointed, but I took the pizza home and eat it anyway."
570286,Karahi Point,"Halal, Pakistani, Indian, Restaurants",1.0,"Just went to this location with my family. We were served salad and a jug of water as we waited for our order to arrive. The salad had a hair in it. We all drank the water and realized at the bottom of the jug there were TWO DEAD FLIES. \n\nWe told the staff and all they said was sorry. Couldn't bring myself to eat their food, we just got up and left. Wish I took a picture of it at that time, but I was so traumatized with what had just happened and the waitress quickly came and took the jug away.\n\nIf this is what you see in the water and salad, imagine what goes on behind the scenes when they make your food. Hope we don't get sick. Never going there again."


Analysis: The 1-star reviews have quadgrams that clearly show dissatisfaction. 
Phrases like "bad service bad food" and "worst dining experience life" illustrate strong negative sentiments towards both the service and the food quality. 
The frequent appearance of words such as "hate," "worst," and "bad" across different quadgrams emphasizes the severity of the negative experiences. 
Timing issues are also hinted at with phrases like "took hour get food" and "slow service waited hour," suggesting that slow service is a significant factor in negative reviews.

### Section 7c: Create a DataFrame for quadgrams for below and above 5 stars
 - Create a df to include quadgrams of 5 and 1 star reviews
 - Concatenate 5 and 1 star reviews df in one

In [70]:
quadgram_lem_5 = ngrams(lemmatized_5, 4)
Counter(quadgram_lem_5).most_common(20)
# display a quadgram

[(('ca', 'say', 'enough', 'thing'), 65),
 (('amazing', 'food', 'amazing', 'service'), 49),
 (('best', 'thai', 'food', 'toronto'), 44),
 (('excellent', 'food', 'excellent', 'service'), 38),
 (('food', 'even', 'better', 'service'), 34),
 (('amazing', 'service', 'amazing', 'food'), 27),
 (('finally', 'got', 'chance', 'try'), 26),
 (('best', 'middle', 'eastern', 'food'), 23),
 (('best', 'fish', 'chip', 'toronto'), 21),
 (('best', 'thai', 'food', 'city'), 19),
 (('best', 'indian', 'food', 'toronto'), 18),
 (('delicious', 'food', 'friendly', 'staff'), 17),
 (('delicious', 'food', 'friendly', 'service'), 16),
 (('ca', 'wait', 'go', 'back'), 16),
 (('amazing', 'food', 'even', 'better'), 16),
 (('excellent', 'service', 'delicious', 'food'), 16),
 (('best', 'thai', 'restaurant', 'toronto'), 16),
 (('hidden', 'gem', 'tucked', 'away'), 16),
 (('best', 'pork', 'bone', 'soup'), 15),
 (('hand', 'best', 'thai', 'food'), 14)]

In [71]:
quadgram_lem_1 = ngrams(lemmatized_1, 4)
Counter(quadgram_lem_1).most_common(20)
# display a quadgram

[(('bad', 'service', 'bad', 'food'), 17),
 (('bad', 'food', 'bad', 'service'), 16),
 (('worst', 'dining', 'experience', 'life'), 15),
 (('hate', 'leave', 'bad', 'review'), 8),
 (('ordered', 'uber', 'eats', 'food'), 6),
 (('took', 'hour', 'get', 'food'), 6),
 (('food', 'mediocre', 'best', 'service'), 6),
 (('food', 'even', 'worse', 'service'), 6),
 (('ordered', 'delivery', 'took', 'hour'), 6),
 (('food', 'took', 'way', 'long'), 5),
 (('food', 'ok', 'nothing', 'special'), 5),
 (('food', 'okay', 'nothing', 'special'), 5),
 (('came', 'friend', 'birthday', 'dinner'), 5),
 (('service', 'took', 'hour', 'get'), 5),
 (('slow', 'service', 'waited', 'min'), 5),
 (('hate', 'leaving', 'bad', 'review'), 5),
 (('poor', 'service', 'long', 'wait'), 5),
 (('ordered', 'via', 'uber', 'eats'), 5),
 (('service', 'took', 'min', 'get'), 5),
 (('slow', 'service', 'waited', 'hour'), 5)]

In [72]:
df_quadgram_5 = (pd.Series(nltk.ngrams(lemmatized_5, 4)).value_counts())
df_quadgram_5 = pd.DataFrame(df_quadgram_5).reset_index()
df_quadgram_5.rename(columns={'index': 'word_5', 'count':'freq_5'},inplace=True)
df_quadgram_5 = df_quadgram_5.head(20)
df_quadgram_5
# convert dataset to df, reset index, rename columns and show 20 rows 

Unnamed: 0,word_5,freq_5
0,"(ca, say, enough, thing)",65
1,"(amazing, food, amazing, service)",49
2,"(best, thai, food, toronto)",44
3,"(excellent, food, excellent, service)",38
4,"(food, even, better, service)",34
5,"(amazing, service, amazing, food)",27
6,"(finally, got, chance, try)",26
7,"(best, middle, eastern, food)",23
8,"(best, fish, chip, toronto)",21
9,"(best, thai, food, city)",19


In [73]:
df_quadgram_1 = (pd.Series(nltk.ngrams(lemmatized_1, 4)).value_counts())
df_quadgram_1 = pd.DataFrame(df_quadgram_1).reset_index()
df_quadgram_1.rename(columns={'index': 'word_1', 'count':'freq_1'},inplace=True)
df_quadgram_1 = df_quadgram_1.head(20)
df_quadgram_1
# convert dataset to df, reset index, rename columns and show 20 rows

Unnamed: 0,word_1,freq_1
0,"(bad, service, bad, food)",17
1,"(bad, food, bad, service)",16
2,"(worst, dining, experience, life)",15
3,"(hate, leave, bad, review)",8
4,"(took, hour, get, food)",6
5,"(ordered, uber, eats, food)",6
6,"(food, mediocre, best, service)",6
7,"(ordered, delivery, took, hour)",6
8,"(food, even, worse, service)",6
9,"(ordered, via, uber, eats)",5


In [74]:
df_quadgram = pd.concat((df_quadgram_5, df_quadgram_1), axis = 1)
df_quadgram
# concatenate 2 df

Unnamed: 0,word_5,freq_5,word_1,freq_1
0,"(ca, say, enough, thing)",65,"(bad, service, bad, food)",17
1,"(amazing, food, amazing, service)",49,"(bad, food, bad, service)",16
2,"(best, thai, food, toronto)",44,"(worst, dining, experience, life)",15
3,"(excellent, food, excellent, service)",38,"(hate, leave, bad, review)",8
4,"(food, even, better, service)",34,"(took, hour, get, food)",6
5,"(amazing, service, amazing, food)",27,"(ordered, uber, eats, food)",6
6,"(finally, got, chance, try)",26,"(food, mediocre, best, service)",6
7,"(best, middle, eastern, food)",23,"(ordered, delivery, took, hour)",6
8,"(best, fish, chip, toronto)",21,"(food, even, worse, service)",6
9,"(best, thai, food, city)",19,"(ordered, via, uber, eats)",5


## Section 8: Final analysis with quadgrams of 5 and 1 star reviews

Why Bag of Words (BoW) Technique?

BoW is computationally efficient and capable of handling large datasets.
BoW offers good feature extraction capabilities, such as word counts, to detect patterns in customer feedback.
BoW allows for a simpler comparison of words, which helps to identify common themes among reviews more efficiently. 

Lemmatizing or Stemming?

Lemmatizing improves accuracy by reducing words to their base or root form. 
Unlike stemming, lemmatizing captures the meaning and context of words more effectively. 
Lemmatizing is essential in understanding the sentiments of the customer reviews.


Final Thoughts:

Using BoW and lemmatization, we were able to extract key themes and trends from the customer reviews, which is critical for understanding what aspects of the dining experience need improvement.
The insights gained from this analysis can guide tailored responses and actions by the restaurants to improve customer satisfaction. 
This is data-driven decision-making!