## Problem Statements

The F&B space has witnessed massive growth in the past few years. And with the augment of high competition, restaurants owners are striving hard to be ahead of the other competitors. 

The goal is to conduct sentimental analysis on customers reviews to identify the factors that contributes to the success of a restaurant. And use classification system to determine whether or not a restaurant will be successful.


In [2]:
# import libraries
import pandas as pd    
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.dates as mdates
import numpy as np

import nltk
from nltk.tokenize import sent_tokenize, word_tokenize, RegexpTokenizer
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
from nltk.stem.porter import PorterStemmer
from nltk.corpus import stopwords
from nltk.sentiment.vader import SentimentIntensityAnalyzer

import re
from sklearn.metrics import accuracy_score

from datetime import datetime

import pickle

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

## Load data

In [6]:
user_reviews = pd.read_pickle('./data/portland_reviews.pkl')

In [7]:
user_reviews.head()

NameError: name 'user_reviews' is not defined

In [8]:
# reset index
user_reviews = user_reviews.reset_index()

In [9]:
# Check missing value
user_reviews.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 753055 entries, 0 to 753054
Data columns (total 10 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   index        753055 non-null  int64  
 1   review_id    753055 non-null  object 
 2   user_id      753055 non-null  object 
 3   business_id  753055 non-null  object 
 4   stars        753055 non-null  float64
 5   useful       753055 non-null  int64  
 6   funny        753055 non-null  int64  
 7   cool         753055 non-null  int64  
 8   text         753055 non-null  object 
 9   date         753055 non-null  object 
dtypes: float64(1), int64(4), object(5)
memory usage: 57.5+ MB


### Add on data. Sentiment score for state 'MA' & 'OR' excluding Portland

### Load data

In [6]:
user_reviews_state = pd.read_pickle('./data/state_user_reviews.pkl')

In [8]:
user_reviews_state.head()

NameError: name 'user_reviews_state' is not defined

In [9]:
user_reviews_state.shape

NameError: name 'user_reviews_state' is not defined

In [9]:
# reset index
user_reviews_state = user_reviews_state.reset_index()

In [10]:
# Check missing value
user_reviews_state.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1529250 entries, 0 to 1529249
Data columns (total 10 columns):
 #   Column       Non-Null Count    Dtype  
---  ------       --------------    -----  
 0   index        1529250 non-null  int64  
 1   review_id    1529250 non-null  object 
 2   user_id      1529250 non-null  object 
 3   business_id  1529250 non-null  object 
 4   stars        1529250 non-null  float64
 5   useful       1529250 non-null  int64  
 6   funny        1529250 non-null  int64  
 7   cool         1529250 non-null  int64  
 8   text         1529250 non-null  object 
 9   date         1529250 non-null  object 
dtypes: float64(1), int64(4), object(5)
memory usage: 116.7+ MB


## Preprocessing text

1. Removing punctuations 
2. Lower casing
3. Removing stopword
4. Tokenization
5. Lemmatization


In [10]:
import string 
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [11]:
stopwords = nltk.corpus.stopwords.words('english')

In [12]:
# Declare functions for preprocessing
def preprocessing(text):
    punctuationfree= "".join([i for i in text if i not in string.punctuation]) ## remove punctuation
    sentence = punctuationfree.lower() #convert uppercase to lowercase
    nostopword = " ".join([word for word in sentence.split() if word not in (stopwords)])
    tokenizer = RegexpTokenizer(r'[a-zA-Z0-9]+') # extract the sentence that contain alphabat and numbers only.
    tokens = tokenizer.tokenize(nostopword) #tokenize the word
    lemma_words=[lemmatizer.lemmatize(w) for w in tokens]
    new_text = ' '.join(lemma_words) #rejoin the sentence
    return nostopword

In [6]:
user_reviews['clean_msg']= user_reviews['text'].apply(lambda x:preprocessing(x))
user_reviews.head()

NameError: name 'user_reviews' is not defined

In [11]:
%%time
user_reviews_state['clean_msg']= user_reviews_state['text'].apply(lambda x:preprocessing(x))
user_reviews_state.head()

Wall time: 9min 33s


Unnamed: 0,index,review_id,user_id,business_id,stars,useful,funny,cool,text,date,clean_msg
0,0,lWC-xP3rd6obsecCYsGZRg,ak0TdVmGKo4pwqdJSTLwWw,buF9druCkbuXLX526sGELQ,4.0,3,1,1,Apparently Prides Osteria had a rough summer a...,2014-10-11 03:34:02,apparently prides osteria rough summer evidenc...
1,5,J4a2TuhDasjn2k3wWtHZnQ,RNm_RWkcd02Li2mKPRe7Eg,xGXzsc-hzam-VArK6eTvtw,1.0,2,0,0,"This place used to be a cool, chill place. Now...",2018-01-21 04:41:03,place used cool chill place bunch neanderthal ...
2,6,28gGfkLs3igtjVy61lh77Q,Q8c91v7luItVB0cMFF_mRA,EXOsmAB1s71WePlQk0WZrA,2.0,0,0,0,"The setting is perfectly adequate, and the foo...",2006-04-16 02:58:44,setting perfectly adequate food comes close di...
3,9,KKVFopqzcVfcubIBxmIjVA,99RsBrARhhx60UnAC4yDoA,EEHhKSxUvJkoPSzeGKkpVg,5.0,0,0,0,I work in the Pru and this is the most afforda...,2014-05-07 18:10:21,work pru affordable tasty place food court dea...
4,18,btNWW2kdJYfwpTDyzJO3Iw,DECuRZwkUw8ELQZfNGef2Q,zmZ3HkVCeZPBefJJxzdJ7A,4.0,0,0,0,Nothing special but good enough. I like anoth...,2012-12-04 04:29:47,nothing special good enough like another one m...


In [71]:
# Calculate numbers of words
user_reviews['words'] = [len(x.split()) for x in user_reviews['clean_msg'].tolist()]

In [12]:
# Calculate numbers of words
user_reviews_state['words'] = [len(x.split()) for x in user_reviews_state['clean_msg'].tolist()]

In [49]:
# drop rows where words count more than 512 so that no-error for running sentiment analysis models from hugging face.
#user_reviews.drop( user_reviews[ user_reviews['words'] >= 512].index , inplace=True)

In [50]:
user_reviews = user_reviews.reset_index()

In [13]:
user_reviews_state = user_reviews_state.reset_index()

In [83]:
user_reviews.shape

(753055, 12)

In [14]:
user_reviews_state.shape

(1529250, 13)

### Sentiment Analysis using hugging face

In [84]:
#!pip install --upgrade tensorflow
#!pip install transformers

In [7]:
from transformers import pipeline

In [8]:
sentiment = pipeline('sentiment-analysis')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification.

All the layers of TFDistilBertForSequenceClassification were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


In [87]:
user_reviews['sentiment_dict'] = user_reviews['clean_msg'].apply(lambda review: sentiment(review, truncation=True))

user_reviews

Unnamed: 0,index,review_id,user_id,business_id,stars,useful,funny,cool,text,date,clean_msg,words,sentiment_dict
0,35,nJTSr-EGNhhA5o146THkPg,T9O5pkKKlNvr-qqfefDlbA,luOZQ9YBrWwP8mYrS4rNoA,4.0,2,0,0,This place has some of the BEST chinese take-o...,2008-12-03 04:39:26,place best chinese takeout portland dont let i...,49,"[{'label': 'POSITIVE', 'score': 0.997427761554..."
1,42,lJ7rzbvT-l8KO8lHfEsXsg,LV1ME-ibA2h0IGyFUUWhaQ,H_RM2u1WWGU1HkKZrYq2Ow,5.0,0,0,0,Incredible donuts. Sometimes you have to go ea...,2017-08-07 19:34:13,incredible donuts sometimes go early full sele...,10,"[{'label': 'POSITIVE', 'score': 0.999814927577..."
2,45,m-9DK7NwYedIPj1RQ_sXdw,JuM-lH05m6Ln8OPUTg8p0g,H_RM2u1WWGU1HkKZrYq2Ow,5.0,0,0,0,"Dont bother going to voodoo, just come here in...",2016-09-18 17:06:01,dont bother going voodoo come instead unique f...,14,"[{'label': 'NEGATIVE', 'score': 0.877250492572..."
3,49,EO5rALvJMkK8QEvUNs1gxg,u2xPfv6_wcKt-lW-C1cV8A,9P-lp3AWDXGayDqJz9VPwQ,2.0,0,0,0,The ramen here is less than great. It came out...,2018-02-11 03:30:12,ramen less great came luke warm oily service r...,11,"[{'label': 'POSITIVE', 'score': 0.629967510700..."
4,67,OH9E5SaGBQsPX3IktM30mg,7mWnNVk2n99JxkvV3PW0nA,Un6u2cECyV4nZb_HGZ-uTA,4.0,1,0,1,It's crazy how establishments on the west coas...,2011-02-13 16:38:09,crazy establishments west coast many yelp revi...,101,"[{'label': 'POSITIVE', 'score': 0.958464384078..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...
753050,8635355,CCv8DP1zZyoI0FvgEdeTgg,7pGvyqTBe5vcmKG29Gcz3Q,d69y3CN9_SQKrEnXXqQr8g,5.0,1,0,0,King burrito is my go to Burritos when I'm nea...,2020-10-05 18:30:46,king burrito go burritos im near enough area b...,79,"[{'label': 'POSITIVE', 'score': 0.995506882667..."
753051,8635371,pevp4H0U6Q7UKl-J9PBLLQ,yDeqOLo8pp1xpzHWBKFEfA,2JKien1H998FluEYs0xIrg,5.0,3,0,1,Where do I start with Buranko? Everything I tr...,2020-12-06 23:19:24,start buranko everything tried matcha cocktail...,18,"[{'label': 'NEGATIVE', 'score': 0.976583719253..."
753052,8635391,XpVatkv32ZiY3Mv_cR26Mw,UvlvbgQaADuIoE2bEQYJ1A,_VF1CWhsQWv77Yi92ORo1w,5.0,0,0,0,Wonderful! We stopped in twice while taking ou...,2021-01-20 20:07:19,wonderful stopped twice taking son bacon nonba...,60,"[{'label': 'POSITIVE', 'score': 0.994185268878..."
753053,8635397,FfhmA0G0zrRjHskp-7O8UQ,IlxM3NGJOtNXPz5cupqNDQ,dmkDZKPsK8lmwFuLiFQ0Zw,5.0,0,0,0,Yes please! We had the Arepas falafel style an...,2021-01-25 14:53:13,yes please arepas falafel style maccurles fres...,15,"[{'label': 'POSITIVE', 'score': 0.999734461307..."


In [16]:
# Split MA and OR user reviews into 6 parts to pass into huggingface transformer separately.
x = np.array_split(user_reviews_state, 6)

In [17]:

user_reviews_state_b1 = x[0]

user_reviews_state_b2 = x[1]

user_reviews_state_b3 = x[2]


user_reviews_state_b4 = x[3]

user_reviews_state_b5 = x[4]

user_reviews_state_b6 = x[5]

In [22]:
%%time
user_reviews_state_b1['sentiment_dict'] = user_reviews_state_b1['clean_msg'].apply(lambda review: sentiment(review, truncation=True))
user_reviews_state_b1

Wall time: 7h 25min 29s


Unnamed: 0,level_0,index,review_id,user_id,business_id,stars,useful,funny,cool,text,date,clean_msg,words,sentiment_dict
0,0,0,lWC-xP3rd6obsecCYsGZRg,ak0TdVmGKo4pwqdJSTLwWw,buF9druCkbuXLX526sGELQ,4.0,3,1,1,Apparently Prides Osteria had a rough summer a...,2014-10-11 03:34:02,apparently prides osteria rough summer evidenc...,174,"[{'label': 'POSITIVE', 'score': 0.941629588603..."
1,1,5,J4a2TuhDasjn2k3wWtHZnQ,RNm_RWkcd02Li2mKPRe7Eg,xGXzsc-hzam-VArK6eTvtw,1.0,2,0,0,"This place used to be a cool, chill place. Now...",2018-01-21 04:41:03,place used cool chill place bunch neanderthal ...,27,"[{'label': 'NEGATIVE', 'score': 0.998412013053..."
2,2,6,28gGfkLs3igtjVy61lh77Q,Q8c91v7luItVB0cMFF_mRA,EXOsmAB1s71WePlQk0WZrA,2.0,0,0,0,"The setting is perfectly adequate, and the foo...",2006-04-16 02:58:44,setting perfectly adequate food comes close di...,20,"[{'label': 'POSITIVE', 'score': 0.975937008857..."
3,3,9,KKVFopqzcVfcubIBxmIjVA,99RsBrARhhx60UnAC4yDoA,EEHhKSxUvJkoPSzeGKkpVg,5.0,0,0,0,I work in the Pru and this is the most afforda...,2014-05-07 18:10:21,work pru affordable tasty place food court dea...,32,"[{'label': 'POSITIVE', 'score': 0.992754995822..."
4,4,18,btNWW2kdJYfwpTDyzJO3Iw,DECuRZwkUw8ELQZfNGef2Q,zmZ3HkVCeZPBefJJxzdJ7A,4.0,0,0,0,Nothing special but good enough. I like anoth...,2012-12-04 04:29:47,nothing special good enough like another one m...,17,"[{'label': 'NEGATIVE', 'score': 0.998852014541..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
254870,254870,1365416,dSjc_GhOmeyXTZp_AuY7ow,wWjhYtHyA_q4EI692RT16A,EmZep9cupsMffqCAvjr5fg,4.0,0,0,1,Healthy fast food? What a concept. I'm a huge ...,2007-10-18 00:48:29,healthy fast food concept im huge fan veggie b...,24,"[{'label': 'POSITIVE', 'score': 0.999549329280..."
254871,254871,1365421,hCeBIEV8-55ptdp9spfzzA,k-SEvnJluFQjAsaGSgE9Sg,c67rQbz3CEXyI0nd5kG-Uw,3.0,0,0,1,"Ahh, Starbucks' rival situated right across th...",2010-01-05 09:41:21,ahh starbucks rival situated right across stre...,81,"[{'label': 'NEGATIVE', 'score': 0.984495878219..."
254872,254872,1365432,Qg2G4RgisZ7as-rm7Zd9ow,ld5H3Nf3xhUuvybL1uMFng,-hdRA7xgZPnDozuYhkFycg,1.0,1,0,0,Go here if you (or your kids) have a hankering...,2018-01-19 22:08:01,go kids hankering games otherwise worst possib...,39,"[{'label': 'NEGATIVE', 'score': 0.999691128730..."
254873,254873,1365434,cirIXjD3m8hBo8ULygVKGQ,c3uVgOugNz-7IrAZHEIFsg,06feX4qEHFMcPZsiWtvYfw,4.0,0,0,0,Excellent coffee. I have had their Iced Coffee...,2016-06-05 20:01:28,excellent coffee iced coffee cold brew previou...,34,"[{'label': 'POSITIVE', 'score': 0.997478663921..."


In [23]:
user_reviews_state_b1.to_pickle('./data/user_reviews_state_b1.pkl')

In [24]:
%%time
user_reviews_state_b2['sentiment_dict'] = user_reviews_state_b2['clean_msg'].apply(lambda review: sentiment(review, truncation=True))
user_reviews_state_b2

Wall time: 7h 26min 44s


Unnamed: 0,level_0,index,review_id,user_id,business_id,stars,useful,funny,cool,text,date,clean_msg,words,sentiment_dict
254875,254875,1365439,vXXDRPPe_DWXpchRrtzULg,OsZ_5G7zbXycL6Evf0ElBA,TvR5gt1ZiPx5Hnouz_i0pA,4.0,0,0,0,Two words: Nutella. Latte. Even if you're not ...,2018-01-09 01:19:43,two words nutella latte even youre particularl...,98,"[{'label': 'POSITIVE', 'score': 0.989696919918..."
254876,254876,1365441,BfE_9K_kiG02L4U_K2r1CQ,QzTXHL2oFRghfKLabzkXzA,VnuD2cojPTWd3nIHQjnL8w,5.0,1,0,0,"I was in Boston for a week on business, and I ...",2011-06-25 18:53:58,boston week business excited would opportunity...,129,"[{'label': 'POSITIVE', 'score': 0.980813443660..."
254877,254877,1365447,wgLyWOkV1xFnE-pFp9_rQg,eEKPzAEdgeEoy19o0Yq4Ng,yNYyM-xdZDh_rITFDMp2tg,2.0,0,0,0,Went there for All-you-can-eat dinner on Sunda...,2016-10-12 20:53:24,went allyoucaneat dinner sunday friend didnt w...,133,"[{'label': 'NEGATIVE', 'score': 0.995622098445..."
254878,254878,1365448,VLa5AGGtCJI5pWUVPjt9cA,kMpSSx0VEiLFiA18sGbEpQ,tfdIB3AlviYKVYguaT1S8g,5.0,2,0,2,Just came here tonight after they opened yeste...,2009-09-27 00:41:54,came tonight opened yesterday say wow half doz...,236,"[{'label': 'NEGATIVE', 'score': 0.994411051273..."
254879,254879,1365451,asxCQ4KCe12qu0XT7mtwXA,NJOspTE_Ms-b8ACPKY73Ww,WYnXpXym2DMtwchDcbilmg,4.0,2,1,2,Great dough means great Pizza\n\nFor whatever ...,2018-01-30 19:23:49,great dough means great pizza whatever reason ...,93,"[{'label': 'POSITIVE', 'score': 0.972228586673..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
509745,509745,2841025,TqjIKMZghsihYuDjLZMRvA,ZM6XDLkChp6X_jYylpEjZQ,3kUWjyRGWc_RUGDw9X3qUw,4.0,1,0,0,its great food in cleveland circle. the break...,2013-08-23 17:53:48,great food cleveland circle breakfast sandwich...,45,"[{'label': 'POSITIVE', 'score': 0.998139023780..."
509746,509746,2841026,xpNAQCfx-35eAfu65DqrQA,kUdX6itbIh0Fjh41UXgxiQ,kt2mUzQEBX5oYngareeyQA,5.0,1,0,0,I went out with my high school friends and the...,2017-10-29 22:42:11,went high school friends wives last night rose...,32,"[{'label': 'POSITIVE', 'score': 0.999428212642..."
509747,509747,2841028,wWvXHveDv1JpyP6k3lwhcQ,4QWiI4dWxxd2wU6vEeTiSA,Tbq4hgX7uUzgFUWDlmqnBw,3.0,0,0,0,"So, not like this affected my opinion of this ...",2011-03-30 00:03:45,like affected opinion place found boston doesn...,62,"[{'label': 'NEGATIVE', 'score': 0.981866538524..."
509748,509748,2841029,HgVM_QB-3xTbS_edJYJ4Fw,UytHAPnIKow1gmab3NfGzg,dbljEYM1md67eqYbK2LHNA,4.0,0,1,0,"For those not in the know of the North Shore, ...",2015-12-29 23:04:57,know north shore say imo chinese restaurant ot...,169,"[{'label': 'POSITIVE', 'score': 0.951515376567..."


In [25]:
user_reviews_state_b2.to_pickle('./data/user_reviews_state_b2.pkl')

In [26]:
%%time
user_reviews_state_b3['sentiment_dict'] = user_reviews_state_b3['clean_msg'].apply(lambda review: sentiment(review, truncation=True))
user_reviews_state_b3

Wall time: 6h 58min 14s


Unnamed: 0,level_0,index,review_id,user_id,business_id,stars,useful,funny,cool,text,date,clean_msg,words,sentiment_dict
509750,509750,2841047,tSGXfr7IeeooBtx1na8fBg,qqhx6Z0PGTs1Z6JxxyKD6Q,UAPC8w0gfnay_2bjo6Oxlw,5.0,0,0,0,"Delicious, delicious, delicious! Stop thinkin...",2012-12-17 17:16:50,delicious delicious delicious stop thinking ea...,16,"[{'label': 'POSITIVE', 'score': 0.999845743179..."
509751,509751,2841050,01qIQR_UJGvBb-nyxAtgiQ,QVT7iO4asnO4iv2kCYFzaQ,3DjBQEw27_cMm_0H5I93CQ,3.0,2,1,0,I've been here for breakfast a few times and h...,2015-02-23 14:16:59,ive breakfast times generally good experience ...,84,"[{'label': 'NEGATIVE', 'score': 0.993640840053..."
509752,509752,2841061,LD57M-CFpSg-IwolAVAsdA,IKpOC19QI_I96vdJhrMEaA,TJ2TqRQPi4V-QqVSyG8fcA,3.0,3,3,4,"If it's a cold winter day, or you're just look...",2014-07-21 15:33:59,cold winter day youre looking betterthanramen ...,99,"[{'label': 'POSITIVE', 'score': 0.967772424221..."
509753,509753,2841068,rUH0kkGHcmpPyIBe6JBIeg,Y7ihQ8cVQTxO4LTYVuyd2Q,63DvXSks1tHIDajOGvwnRQ,4.0,3,2,4,This place has good dessert. I have been comin...,2011-09-05 22:22:31,place good dessert coming years never disappoi...,70,"[{'label': 'POSITIVE', 'score': 0.993344426155..."
509754,509754,2841069,AlKvgIw-xhGc-5vHzVbnqQ,UCiPaQcM_UgCMbJqtrVduQ,mtG5TI0nPbrty2zf8Gz7Jg,3.0,0,0,0,"this place is right next to my office, and see...",2014-08-18 20:50:37,place right next office seemed like good spot ...,46,"[{'label': 'NEGATIVE', 'score': 0.980192840099..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
764620,764620,4383303,GANvZNlQnBcNYcnhm6m3aA,5dZ_waKhnA3VKAjoauasfw,WeAMq_YGgAnuayjJHi8gQg,5.0,1,0,0,This place is amazing!! Earth Wind and Fire sa...,2018-05-20 13:37:20,place amazing earth wind fire sandwich incredi...,25,"[{'label': 'POSITIVE', 'score': 0.999782741069..."
764621,764621,4383306,1FXODyjS4KaX6F44y0eL9Q,GMough92VMnVOqIDau0DTw,E595x5S7SkVXePXRb1zfeg,5.0,0,0,0,Practically ate my fingers off! So delicious a...,2016-04-25 00:37:39,practically ate fingers delicious flavorful ne...,52,"[{'label': 'POSITIVE', 'score': 0.998603403568..."
764622,764622,4383307,o8fYNhBoVrePyi_RxzgJUw,EmJWWbJiqOU6N2vpO1-v6Q,dmLrPLWGLGkI3qBixFDEvw,4.0,0,0,0,"For their food, this is my favourite bakery in...",2013-10-27 02:29:02,food favourite bakery chinatown make best pork...,113,"[{'label': 'NEGATIVE', 'score': 0.995082020759..."
764623,764623,4383313,03aDdj-oi4NtPj0OJgYhow,0EFQQB40D24-WuIFH6-rpQ,bt25E3D09r6rNSwUwYr1pw,3.0,1,1,1,One of the first things I want to find to be c...,2007-06-17 01:25:15,one first things want find comfortable new tow...,96,"[{'label': 'NEGATIVE', 'score': 0.873414993286..."


In [27]:
user_reviews_state_b3.to_pickle('./data/user_reviews_state_b3.pkl')

In [28]:
%%time
user_reviews_state_b4['sentiment_dict'] = user_reviews_state_b4['clean_msg'].apply(lambda review: sentiment(review, truncation=True))
user_reviews_state_b4

Wall time: 7h 9min 4s


Unnamed: 0,level_0,index,review_id,user_id,business_id,stars,useful,funny,cool,text,date,clean_msg,words,sentiment_dict
764625,764625,4383319,65nvHVPpLa5__iGasT2umA,faE9EeR2Ju5gC4LbvL13UQ,j_fhXdgayrGKD1xTr2bJJQ,4.0,1,0,0,It's a small cozy place with exotic decor and ...,2012-10-11 20:18:28,small cozy place exotic decor tasty food tried...,23,"[{'label': 'POSITIVE', 'score': 0.995652914047..."
764626,764626,4383327,5idTgJPdpLayFwUZy7z2PQ,Ow0U1QnwvrQK0Wcig5fWkg,4gclbVl9p3st4vjBTTNN4Q,5.0,0,0,0,I was in the mood for something quick so I sto...,2009-10-25 18:12:29,mood something quick stopped new pats trattori...,66,"[{'label': 'POSITIVE', 'score': 0.997915804386..."
764627,764627,4383328,MAaEO8yWYkp8QgpZHhnMqA,u6WOVeQaV-Psk2lO_5sVMA,3hW1nSUX8BsNSJGxGqcZCw,4.0,0,1,0,good. fresh. great service. good prices. prett...,2014-02-27 01:09:45,good fresh great service good prices pretty am...,13,"[{'label': 'POSITIVE', 'score': 0.999777019023..."
764628,764628,4383329,5BUGffUjkc5wX85cWYOAYA,bpC9Gt5fcN_1OjDEgjuvpw,pXXrBOdooMzg12bW57xWDw,1.0,2,0,0,I was craving a Five Guys hot dog and went her...,2013-06-25 15:15:51,craving five guys hot dog went really sloppy b...,44,"[{'label': 'NEGATIVE', 'score': 0.989550054073..."
764629,764629,4383331,I-buQfYH7H5h4EYUQAazEw,89yr7wNIk8TE03yKfTRLuQ,SspVW7z1OvbojeFs6InjJA,4.0,3,0,1,"Save this restaurant, please!\n\nOn a recent F...",2008-05-10 23:42:39,save restaurant please recent friday night tw ...,174,"[{'label': 'NEGATIVE', 'score': 0.569564104080..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1019495,1019495,5712377,EqcwHtdg2Dt1821c7QUunQ,gfQqQYI5_hCAGEHlHXIz2Q,hzSyQBWeoX94WOpUtxuWVg,3.0,3,0,2,"After many failed attempts, we finally made it...",2008-07-24 13:44:15,many failed attempts finally made cottage orde...,59,"[{'label': 'NEGATIVE', 'score': 0.992341160774..."
1019496,1019496,5712382,5uWdsQhVmNjHr8uAW4_MTg,8NQfyo7Q4fkI8wvJ_VjKPg,6SMj1FNYRUxINR1smZMcrw,4.0,0,0,0,It was a beautiful day for a picnic in the par...,2009-05-07 17:04:15,beautiful day picnic parkso decided give strip...,50,"[{'label': 'POSITIVE', 'score': 0.998371183872..."
1019497,1019497,5712386,CJnYnhXBf-jvD8nmEDgU6Q,raDxhVz-c1sjlKNc3Mitiw,TR5H53J3ASZAswpDxgEMtg,5.0,3,2,2,"Good burritos, nice atmosphere. We sat there f...",2014-07-24 16:29:00,good burritos nice atmosphere sat chatting own...,13,"[{'label': 'POSITIVE', 'score': 0.998202204704..."
1019498,1019498,5712388,goPtIHfQleZt7xzj1KKwmw,3Bv-fauoFlnSkcydjUnInQ,bOsi1qC5rkfZ5-JKuP38Bg,1.0,3,0,0,Food was very bad. Mussels tasted very off (as...,2016-08-25 02:15:04,food bad mussels tasted bad items felt like ma...,28,"[{'label': 'NEGATIVE', 'score': 0.999747574329..."


In [29]:
user_reviews_state_b4.to_pickle('./data/user_reviews_state_b4.pkl')

In [18]:
%%time
user_reviews_state_b5['sentiment_dict'] = user_reviews_state_b5['clean_msg'].apply(lambda review: sentiment(review, truncation=True))
user_reviews_state_b5

Wall time: 7h 17min 17s


Unnamed: 0,level_0,index,review_id,user_id,business_id,stars,useful,funny,cool,text,date,clean_msg,words,sentiment_dict
1019500,1019500,5712400,WExK1XiQIQBTwXKkm0vs7A,-aOZfLtxyWvc8ywDS7AIKQ,kEu0XqDrQNbzkBF8BZta3w,4.0,0,0,0,My friend and I swung by Sibling Rivalry just ...,2009-02-27 08:38:22,friend swung sibling rivalry dessert dinner ad...,55,"[{'label': 'POSITIVE', 'score': 0.997495472431..."
1019501,1019501,5712414,Im3YZ7c338buIh8BUafX_A,8GFaImY3Z4h83uP8JjWiBQ,89vuE7Q_TiEjsBbrznoFlA,2.0,0,0,0,"boy, what an average, expensive meal. $30 for...",2016-11-09 00:42:38,boy average expensive meal 30 crab cake shrimp...,16,"[{'label': 'NEGATIVE', 'score': 0.939373910427..."
1019502,1019502,5712417,UqVOAIAqpRrZaGjcxLzPlA,UAvHIpbXzQqJAXePDTJkZw,s9c3ufuhJ3phG1XI0zey0Q,4.0,1,0,0,Excellent place to grab breakfast! It was a ho...,2016-01-16 18:46:29,excellent place grab breakfast horribly rainy ...,70,"[{'label': 'POSITIVE', 'score': 0.509173989295..."
1019503,1019503,5712419,mD0zoCYxiIW5BBtMxmH7og,U2HffzKY5zOxEWrAhNijZQ,58cDJU4cub1o0o0B4h9GrA,4.0,0,0,0,We went to Craigie on Main for my graduation d...,2012-05-26 19:23:10,went craigie main graduation dinner initially ...,380,"[{'label': 'POSITIVE', 'score': 0.660971760749..."
1019504,1019504,5712420,sh57JUNzNlsWBR_uNotUMA,_f2gLN3JSI-WYpgakE66aw,DuT5BBmKRtcGYQzKkoEiZw,2.0,0,0,0,Come here if you want to see a replica of the ...,2011-05-21 17:36:05,come want see replica bar tv show purchase sou...,42,"[{'label': 'NEGATIVE', 'score': 0.997037887573..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1274370,1274370,7209211,biswCr00Fog-w31YfYOX7g,QdESTLJtOtjEL3LvRwzUPQ,CkuBBI7ROUFNNF8mSQl8tQ,4.0,1,1,1,"Falafel that's actually fresh, soft, and hot! ...",2012-05-10 18:03:14,falafel thats actually fresh soft hot youre ne...,31,"[{'label': 'POSITIVE', 'score': 0.997953057289..."
1274371,1274371,7209214,vDi4k3Sw8x0TxF-cBtWCNA,B13luJQsN3P0yIR93DBxbg,9RH1xBIMMAZXg9s_XW_sLg,5.0,0,0,0,Best Italian in town period. You just cannot ...,2017-09-27 20:15:22,best italian town period cannot go wrong place...,27,"[{'label': 'POSITIVE', 'score': 0.964645028114..."
1274372,1274372,7209215,jerpldZeJC_KvoioThtJIA,Ha12xaINJ0-aEbjnLoAAyQ,WH7YE7GN_7nKcpuoHh03eA,5.0,1,1,0,"Amazing, filling donuts that are cheaper than ...",2014-11-23 03:12:52,amazing filling donuts cheaper one dunkin donu...,41,"[{'label': 'POSITIVE', 'score': 0.961159706115..."
1274373,1274373,7209232,NaoNuWI9OhwpTNgBuYhLsA,nIg5Gb-mpG3NS9n2zi5d9A,MgmTEMoY2RgZS1Q02HacGQ,4.0,0,0,0,"Good coffee, chill vibe, no free internet. P.S...",2012-04-18 23:53:09,good coffee chill vibe free internet ps super ...,14,"[{'label': 'POSITIVE', 'score': 0.996456444263..."


In [19]:
user_reviews_state_b5.to_pickle('./data/user_reviews_state_b5.pkl')

In [20]:
%%time
user_reviews_state_b6['sentiment_dict'] = user_reviews_state_b6['clean_msg'].apply(lambda review: sentiment(review, truncation=True))
user_reviews_state_b6

Wall time: 7h 16min 43s


Unnamed: 0,level_0,index,review_id,user_id,business_id,stars,useful,funny,cool,text,date,clean_msg,words,sentiment_dict
1274375,1274375,7209234,KWSAYwWRXxbEs7eDXnZASw,Bc4GZfIUJu9363QQVACZVA,udMDXtKnJMQecjnqipfbSQ,5.0,0,0,0,Go-to place for cheap sushi! I always come her...,2013-01-18 05:46:24,goto place cheap sushi always come friendfamil...,79,"[{'label': 'NEGATIVE', 'score': 0.983477294445..."
1274376,1274376,7209235,_6DifNONvHahJP_K9re_ww,6Mw5RwKjVaBSyP1nR0j9iw,IOpXmCLtQ3OHvvMzOXaPbg,4.0,0,0,0,I had a late lunch there twice and both times ...,2011-06-12 20:05:24,late lunch twice times pretty quiet food came ...,26,"[{'label': 'POSITIVE', 'score': 0.998069226741..."
1274377,1274377,7209245,eMD3HPImmUjs7BD3d0UyQQ,1R5Q9RYWyLA25oc-jVgpCQ,8p7gjoSXCqFga82Ei7LTVg,4.0,0,0,0,I've ordered plenty of delivery from this plac...,2018-03-30 16:07:00,ive ordered plenty delivery place major mistak...,19,"[{'label': 'POSITIVE', 'score': 0.755156755447..."
1274378,1274378,7209246,ftxfFrl_AJK-vsYl3UXaMg,dHgttjMtwU2yLuzpagnxKg,pD3l4wyuebOvbrxBrXmXxQ,4.0,0,1,0,We went there last night. It was pretty good. ...,2014-10-19 20:33:51,went last night pretty good 2 kinds oysters pe...,34,"[{'label': 'POSITIVE', 'score': 0.999579370021..."
1274379,1274379,7209253,A8UR12Xse6SNNDXMdyfYEw,v9PVFhU_SvJwVlF_iv1s7w,ZKJVjVofznYh1x2kSTyfGw,4.0,0,0,0,This place has the best nachos in Southie...fo...,2011-11-19 01:31:34,place best nachos southiefor whatever thats wo...,22,"[{'label': 'NEGATIVE', 'score': 0.999425411224..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1529245,1529245,8635381,DwGTdEcpVg9n7juq5rRFyA,9U30DANobkrn3Zlu6T9p6Q,biYgFkftCPue7g3upflUOg,4.0,2,0,0,Trade's 3 star average is baffling. I get some...,2012-01-04 01:40:36,trades 3 star average baffling get people whin...,154,"[{'label': 'NEGATIVE', 'score': 0.958492338657..."
1529246,1529246,8635386,Kc9bVQvMwUjehqhPkDBEvQ,wbhaPKJIYryGrCBVWJ7Nmg,cgNDiWCaSlqqxx1A6r65bA,5.0,0,0,0,Please support this place! They are trying to ...,2021-01-17 02:04:29,please support place trying everything right p...,21,"[{'label': 'POSITIVE', 'score': 0.974796473979..."
1529247,1529247,8635388,czSnN2gwIyfqOVt2qcfi-g,1WGPvc_cDXt-IPVNqg5BOA,FxveeHL_B0Kkz1KjPKyF3A,5.0,0,0,0,They make a really great burger for the league...,2021-01-18 02:55:18,make really great burger league theyre playing...,23,"[{'label': 'NEGATIVE', 'score': 0.661733031272..."
1529248,1529248,8635395,1LrBZbLNfkBsFrHwEBlfSg,fYkURme6Piqxu4qUjQV3PQ,gEQxTJDoJYaW0l_6FYtf8g,5.0,1,0,0,Best pizza in the neighborhood!!! Love the thi...,2020-12-05 21:32:45,best pizza neighborhood love crust moderate am...,10,"[{'label': 'POSITIVE', 'score': 0.992046713829..."


In [21]:
user_reviews_state_b6.to_pickle('./data/user_reviews_state_b6.pkl')

## Merge all the user reviews for state MA and OR

In [3]:
user_reviews_portland = pd.read_pickle('./data/user_review_sentiment.pkl')

In [5]:
user_reviews_portland.drop(labels=['sentiment_label','sentiment_score','sentiment_final'], axis = 1, inplace =True)

In [15]:
user_reviews_b1 = pd.read_pickle('./data/user_reviews_state_b1.pkl')

In [16]:
user_reviews_b2 = pd.read_pickle('./data/user_reviews_state_b2.pkl')

In [17]:
user_reviews_b3 = pd.read_pickle('./data/user_reviews_state_b3.pkl')

In [18]:
user_reviews_b4 = pd.read_pickle('./data/user_reviews_state_b4.pkl')

In [19]:
user_reviews_b5 = pd.read_pickle('./data/user_reviews_state_b5.pkl')

In [20]:
user_reviews_b6 = pd.read_pickle('./data/user_reviews_state_b6.pkl')

In [21]:
user_reviews_final = pd.concat([user_reviews_portland, user_reviews_b1, user_reviews_b2, user_reviews_b3, user_reviews_b4, user_reviews_b5, user_reviews_b6], ignore_index=True)

In [25]:
user_reviews_final.drop(labels=['index','level_0'], axis=1, inplace = True) 

In [31]:
user_reviews_final.to_pickle('./data/user_reviews_final.pkl')

### Create New column for distilbert analysis 'label'

In [32]:
user_reviews_final['sentiment_label'] = user_reviews_final['sentiment_dict'].apply(lambda score_dict: score_dict[0]['label'])

In [33]:
## Label POSITIVE = 1 , NEGATIVE = -1

user_reviews_final['sentiment_label'].replace({'POSITIVE': 1, 'NEGATIVE': -1},inplace=True)

### Create New column for distilbert analysis 'score'

In [34]:
user_reviews_final['sentiment_score'] = user_reviews_final['sentiment_dict'].apply(lambda score_dict: score_dict[0]['score'])

In [35]:
# Multiply sentiment_label * sentiment_score to get compound score.
user_reviews_final['sentiment_final'] = user_reviews_final['sentiment_label'] * user_reviews_final['sentiment_score']

In [36]:
### Export user_reviews with sentiment score
user_reviews_final.to_pickle('./data/user_reviews_sentiment_final.pkl')

### Calculate the mean sentiment score for each restaurants groupby 'businessid'

In [37]:
user_reviews_final.groupby(by='business_id')['sentiment_final'].mean()

business_id
--6COJIAjkQwSUZci_4PJQ    0.594357
--UNNdnHRhsyFUbDgumdtQ    0.417455
-00d-Qb0q2TcWn-8LBHDZg   -0.273694
-0Gbsd7ztvTyFpl7jF0DIw    0.297091
-0iqnv7MjKrgh7Q7bYRlUQ   -0.000902
                            ...   
zyNQhunb1mcSUUbnqVcU1w   -0.148889
zyauuvAYdVweBK4L7wBRmw   -0.118130
zzO0rjxjVAutcqFnI4VvAg   -0.483112
zzcdycb7S42VnnZkwE4yNA   -0.728696
zzpmoTVq4yn86U7ArHyFBQ    0.380740
Name: sentiment_final, Length: 18093, dtype: float64

In [45]:
# Create dataframe for the mean sentiment score
sentiment_score = user_reviews_final.groupby(by='business_id')['sentiment_final'].mean().to_frame().reset_index()

In [46]:
sentiment_score.head()

Unnamed: 0,business_id,sentiment_final
0,--6COJIAjkQwSUZci_4PJQ,0.594357
1,--UNNdnHRhsyFUbDgumdtQ,0.417455
2,-00d-Qb0q2TcWn-8LBHDZg,-0.273694
3,-0Gbsd7ztvTyFpl7jF0DIw,0.297091
4,-0iqnv7MjKrgh7Q7bYRlUQ,-0.000902


In [47]:
sentiment_score.describe()

Unnamed: 0,sentiment_final
count,18093.0
mean,0.079334
std,0.383843
min,-0.999254
25%,-0.170261
50%,0.118951
75%,0.353336
max,0.999456


In [49]:
## Export sentiment_score
sentiment_score.to_csv('./data/sentiment_score.csv', index =False)