# Recalibrated Restaurant Rating Based On Sentiment Analysis - Part 2

## Analysis (continued)
### Clustering

In [1]:
#Import relevant packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.cluster import AgglomerativeClustering
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score

In [2]:
# Reading dataframe - output from Part 1 notebook; trying to avoid running the Roberta model every time
df = pd.read_csv('data/master_df.csv')

In [3]:
km = KMeans(n_clusters = 5, random_state=5)

#### K-Mean Clustering - Version 1

In [4]:
km.fit(df[['roberta_neg','roberta_neu','roberta_pos']])

  super()._check_params_vs_input(X, default_n_init=10)


In [5]:
df['Recalibrated_Star_Rating'] = km.labels_

In [6]:
# Distribution of recalibrated star rating for 5 star reviews
df[df['review_rating']==5]['Recalibrated_Star_Rating'].value_counts()

0    1756
4      58
2      23
1       6
3       1
Name: Recalibrated_Star_Rating, dtype: int64

In [7]:
# Distribution of recalibrated star rating for 4 star reviews
df[df['review_rating']==4]['Recalibrated_Star_Rating'].value_counts()

0    431
4     73
2     24
1     24
3      6
Name: Recalibrated_Star_Rating, dtype: int64

In [8]:
# Distribution of recalibrated star rating for 3 star reviews
df[df['review_rating']==3]['Recalibrated_Star_Rating'].value_counts()

0    44
4    43
3    43
1    39
2    30
Name: Recalibrated_Star_Rating, dtype: int64

In [9]:
# Distribution of recalibrated star rating for 2 star reviews
df[df['review_rating']==2]['Recalibrated_Star_Rating'].value_counts()

3    59
1    22
2    10
4     9
0     1
Name: Recalibrated_Star_Rating, dtype: int64

In [10]:
# Distribution of recalibrated star rating for 1 star reviews
df[df['review_rating']==1]['Recalibrated_Star_Rating'].value_counts()

3    154
1     27
2      8
4      5
0      2
Name: Recalibrated_Star_Rating, dtype: int64

In [11]:
# Evaluating reassigned stars
# Cluster 4 means 4 star rating
df[df['Recalibrated_Star_Rating']==4]['review_rating'].value_counts().sort_index()

1.0     5
2.0     9
3.0    43
4.0    73
5.0    58
Name: review_rating, dtype: int64

In [12]:
# Cluster 3 means 1 star rating
df[df['Recalibrated_Star_Rating']==3]['review_rating'].value_counts().sort_index()

1.0    154
2.0     59
3.0     43
4.0      6
5.0      1
Name: review_rating, dtype: int64

In [13]:
# Cluster 2 means 3 star rating
df[df['Recalibrated_Star_Rating']==2]['review_rating'].value_counts().sort_index()

1.0     8
2.0    10
3.0    30
4.0    24
5.0    23
Name: review_rating, dtype: int64

In [14]:
# Cluster 1 means 3 star rating
df[df['Recalibrated_Star_Rating']==1]['review_rating'].value_counts().sort_index()

1.0    27
2.0    22
3.0    39
4.0    24
5.0     6
Name: review_rating, dtype: int64

In [15]:
# Cluster 0 means 5 star rating
df[df['Recalibrated_Star_Rating']==0]['review_rating'].value_counts().sort_index()

1.0       2
2.0       1
3.0      44
4.0     431
5.0    1756
Name: review_rating, dtype: int64

In [16]:
df[(df['Recalibrated_Star_Rating']==0) & (df['review_rating']==3)]['text']

106     Utopia has been in the neighborhood for as lon...
169     Came here before a concert at Lincoln Center b...
195     A lovely, hole-in-the-wall little cafe tucked ...
362     Allow me to articulate a meticulously consider...
403     Pros: the atmosphere is gorgeous and you can't...
544                       First time here. Good was okay.
548     Loved their pizza and calzones! We loved how t...
718     Classic NYC Italian restaurant with okay food....
764     First off this place is smaller than it looks....
1064    We went on a Thursday night and it was not bus...
1071    I got the taco salad, and it wasn‚Äôt bad. In fa...
1198    My boyfriend and I wanted to have the Olive Ga...
1223    We just wanted a quick bite to eat after a lon...
1261    Cool atmosphere. Food was ok, I would not reco...
1277    Came across this place in the food hall, the m...
1279    The lobster mac and cheese was top ten. The wi...
1326    Spacious place, big menu\nSchezuan mushroom du...
1353    Nah 

In [17]:
# Example of misclassification - Reclassified as 5 star review, but sounds like more of 3-4 star review
df[(df['Recalibrated_Star_Rating']==0) & (df['review_rating']==3)]['text'].iloc[1]

'Came here before a concert at Lincoln Center because of all the outstanding reviews.   The crust is definitely foccacia-like, well charred while still being soft inside.   I got a funghi slice and an eggplant parmigiana slice.   My son got a buffalo chicken ranch and a pepperoni, and my husband got a Calabrese and a pepperoni.  There was no line at around 6:30 which was great, though they were clearly doing business.  Tables inside were about half taken.  It was very warm inside so we opted for the nice "al fresco" option they offer outside.  Pizza was reheated quickly, cut into pieces and promptly served to our table.  That said, although the crust was perfectly cooked - most of the toppings were disappointing.   Not "bad" pizza- but definitely not the finest NY has to offer.  Would I come back?  Sure.  Will it be a priority?   Nah.  If you\'re looking for decent pizza, stop by.   If you\'re looking for traditional NY pizza, keep looking.'

In [18]:
# Example of misclassification - User gave 3 star review, but seems to deserve 4+ stars
df[(df['Recalibrated_Star_Rating']==0) & (df['review_rating']==3)]['text'].iloc[-1]

'When I think about hip New York dining spots, Casino is the first thing that comes to mind.\n\nCasino‚Äôs interiors have this natural look with slopes and curves as if the guys who built it worked with the natural landscape. It‚Äôs dark and cozy inside. A really nice place to rest from the cold November wind.\n\nWe had oysters and the non-alcoholic drinks. Pretty good.\n\nBest to reserve a table if you plan on going as the place gets packed pretty quick. New York locals crowd.\n\nShout out to Quentin and the rest of the crew for the hospitality. He gave us a seat even though we just walked in.'

In [19]:
# Example of misclassification - 5 star rating although the user wants to leave 3.5 stars. 
# Per review, definitely not 5 star review
df[(df['Recalibrated_Star_Rating']==0) & (df['review_rating']==3)]['text'].iloc[-2]

'The vibe was great, very beautiful interior and feels like you go back in time. The espresso martini was great, not sure if it‚Äôs worth its price though. Food was a 3.5 out of 5, can‚Äôt say I‚Äôll be coming back for a dinner, but for a drink and atmosphere I would. If I could leave 3.5 stars I‚Äôd do that. Overall not a bad experience, however the service could be better.\n\nUpdated: Don‚Äôt order the chicken dish.'

In [20]:
# Example of misclassification - 5 star rating but sounds very mediocre review
df[(df['Recalibrated_Star_Rating']==3) & (df['review_rating']==5)]['text'].iloc[0]

'Made a reservation in advance to be here. The place is always crowded.\n\nGood food, more on the expensive side.\n\nFood is not spicy. The gravy of most of the dishes is delectable.'

In [21]:
df[(df['Recalibrated_Star_Rating']==3) & (df['review_rating']==4)]['text'].iloc[0]

'Nice place but not 5 stars, 3 kind of sandwich and at noon only 2 available.\nGot there on reviews but at least the lobster sandwich is not so fantastic as expected.'

##### Observation
Recalibrated star rating system, from time to time, correctly reclassifies user star rating into the "right" bucket as you can see from the above examples. However, there are also many cases where the reclassifying system incorrectly labeled a given review. 

It is difficult to evaluate the performance of the model as there is no human evaluation of each user review. 

#### K-Means Clustering - Version 2

In [22]:
km2 = KMeans(n_clusters = 5, random_state=5)

In [23]:
km2.fit(df[['roberta_pos','roberta_neg']])

  super()._check_params_vs_input(X, default_n_init=10)


In [24]:
df['Recalibrated_Star_Rating_v2'] = km2.labels_

In [25]:
df[df['review_rating']==5]['Recalibrated_Star_Rating_v2'].value_counts()

0    1759
4      58
2      22
1       4
3       1
Name: Recalibrated_Star_Rating_v2, dtype: int64

In [26]:
df[df['review_rating']==4]['Recalibrated_Star_Rating_v2'].value_counts()

0    434
4     68
2     28
1     22
3      6
Name: Recalibrated_Star_Rating_v2, dtype: int64

In [27]:
df[df['review_rating']==3]['Recalibrated_Star_Rating_v2'].value_counts()

0    45
3    43
4    41
1    38
2    32
Name: Recalibrated_Star_Rating_v2, dtype: int64

In [28]:
df[df['review_rating']==2]['Recalibrated_Star_Rating_v2'].value_counts()

3    58
1    21
2    13
4     8
0     1
Name: Recalibrated_Star_Rating_v2, dtype: int64

In [29]:
df[df['review_rating']==1]['Recalibrated_Star_Rating_v2'].value_counts()

3    155
1     26
2      9
4      4
0      2
Name: Recalibrated_Star_Rating_v2, dtype: int64

In [30]:
df[(df['Recalibrated_Star_Rating_v2']==0) & (df['review_rating']==3)]['text'].iloc[3]

"Allow me to articulate a meticulously considered assessment of my dining experience at Bar 56, a venue exuding sophistication and meticulous design. Every facet, from the meticulously planned silverware to the elegantly crafted glasses, resonates with a meticulous investment of time and effort. The atmosphere exudes a palpable aura of class, making it an impeccable choice for those seeking to dazzle a date.\n\nUpon entering, the attention to detail becomes evident, as guests are welcomed at the door with a warm smile, and their coats are graciously attended to. The establishment boasts an impressive selection of 56 wines, complemented by a standby sommelier for those in need of guidance. The service, impeccably detail-oriented yet discreet enough not to disrupt conversations, attests to a refined dining experience. Crumbs vanish seamlessly, a testament to their attentiveness.\n\nRegrettably, the zenith of elegance encountered a nadir in the realm of cuisine. Opting for Potato Croquett

In [31]:
df[(df['Recalibrated_Star_Rating_v2']==3) & (df['review_rating']==4)]['text'].iloc[2]

"I stopped for a quick bite here. Love they have vegan options, but the tortillas were not that great. It's a fusion taco spot, so def different than your normal tacos. I prob won't go back since they are not on their tortillas game ü§∑üèª\u200d‚ôÄÔ∏è the service was okay too. I dined in but my food was served in plastic to go boxes."

##### Observation
Slight change in k-means cluster parameters essentially gave similar performance to the original k-means cluster

#### Agglomerative Clustering

In [32]:
ac = AgglomerativeClustering(n_clusters=5)

In [33]:
ac.fit(df[['roberta_neg','roberta_neu','roberta_pos']])

In [34]:
df['Recalibrated_Star_Rating_v3'] = ac.labels_

In [35]:
df[df['review_rating']==5]['Recalibrated_Star_Rating_v3'].value_counts()

1    1745
3      64
0      30
4       3
2       2
Name: Recalibrated_Star_Rating_v3, dtype: int64

In [36]:
df[df['review_rating']==4]['Recalibrated_Star_Rating_v3'].value_counts()

1    418
3     69
0     43
4     21
2      7
Name: Recalibrated_Star_Rating_v3, dtype: int64

In [37]:
df[df['review_rating']==3]['Recalibrated_Star_Rating_v3'].value_counts()

2    52
1    42
0    39
4    33
3    33
Name: Recalibrated_Star_Rating_v3, dtype: int64

In [38]:
df[df['review_rating']==2]['Recalibrated_Star_Rating_v3'].value_counts()

2    66
0    16
4    14
3     4
1     1
Name: Recalibrated_Star_Rating_v3, dtype: int64

In [39]:
df[df['review_rating']==1]['Recalibrated_Star_Rating_v3'].value_counts()

2    162
4     19
0      9
3      4
1      2
Name: Recalibrated_Star_Rating_v3, dtype: int64

In [40]:
df[(df['Recalibrated_Star_Rating_v3']==2) & (df['review_rating']==4)]['text'].iloc[1]

'Good not great.  Not what it used to be.  Food didn‚Äôt seem to have the same presentation and flavor as it used to be during the peak 10+ years ago.  Service was good and attentive but pasta took very long to come out.  Place is nice looking but I miss it being a stuffier crowd.'

In [41]:
df[(df['Recalibrated_Star_Rating_v3']==2) & (df['review_rating']==4)]['text'].iloc[5]

"I have never had a less than great meal at Davids but on this 1 day my steak didn't taste as great as it usually does... The taste was just off this day and I cant put my finger on it why. Other than that I do love Davids and will continue to go."

In [42]:
df[(df['Recalibrated_Star_Rating_v3']==2) & (df['review_rating']==1)]['text'].iloc[5]

"I ordered just a cappuccino and waited for more than 15 minutes, after which I got an answer that the machine wasn't working. They didn‚Äôt ask to provide me with any ‚Äúexcuse gift.‚Äù If I remember right, the concept of McDonald‚Äôs is their speed in providing drinks and food. But not anymore‚Ä¶\nOther day ordered the French fries here - it was awful. I got the feeling that it was fried in a car oil."

In [43]:
df[(df['Recalibrated_Star_Rating_v3']==1) & (df['review_rating']==1)]['text'].iloc[1]

'Delicious dumplings were only surpassed by the warm beer.  Limit your ordering to that.'

In [44]:
df[(df['Recalibrated_Star_Rating_v3']==1) & (df['review_rating']==1)]['text'].iloc[0]

'I never left any place without a tip.\nThat was my first time.\nI was surprised at how fast they brought the food; it was really nice.\nAlso, the food that was served before our order, such as olives and bread  was better than my pizza, which I‚Äôve ordered.\n\nHalf of the pizza was burnt.\nI didn‚Äôt eat that, as thin-dough Italian pizza, which is burnt, tastes awful.\nNone of the waiters care about it.\nOne guy took everything from our table, except the burnt pizza.\nThe guy who took our order and came with the bill completely ignored my complaint.\nHe was more interested in the tip than their service.\nWhen I said that pizza was burnt, he said, ‚ÄòThat‚Äôs fine, that‚Äôs fine.‚Äô\n\nNobody apologized; they didn‚Äôt care at all.\nHe didn‚Äôt even say goodbye when we said thank you.\nAwful, poor service.\nI will definitely never be back.'

In [45]:
df[(df['Recalibrated_Star_Rating_v3']==1) & (df['review_rating']==5)]['text'].iloc[1]

'The smells of pizza draw you in. When you walk in the front door, see the pizza and prices you know you are in the right place.\n\nThe pizza was delicious. A++\n\nThe counter service happy and prompt. A++\n\nThe atmosphere is a pizza joint styled place. Grab a seat. Eat your pizza. Enjoy.'

##### Oberservations
Again, I do not see a good improvement in the clustering performance. There are still instances where good and bad reviews are clustered in the same cluster as you can see above. 

## Conclusion

text

text