# Recalibrated Restaurant Rating Based On Sentiment Analysis - Part 2

<img src="https://static1.squarespace.com/static/5b1590a93c3a53e49c6d280d/t/5fd058bf5efba8153b21ad7f/1607489731350/restaurant-reviews-16x9.jpg?format=1500w" alt="Image Alt Text">

## Analysis (continued)
### Clustering
With the sentiment scores for each text review, we will then try to cluster the reviews based on the sentiment scores to assign recalibrated star rating.

Two clustering methods will be used:
1. K-Means
2. Agglomerative

In [1]:
#Import relevant packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.cluster import AgglomerativeClustering
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score

In [2]:
# Reading dataframe - output from Part 1 notebook; trying to avoid running the Roberta model every time
df = pd.read_csv('data/master_df.csv')

In [3]:
km = KMeans(n_clusters = 5, random_state=5)

#### K-Mean Clustering - Version 1

In [4]:
# Skipping scaling of the data as Roberta model output is already on the same scale 0-1
km.fit(df[['roberta_neg','roberta_neu','roberta_pos']])

  super()._check_params_vs_input(X, default_n_init=10)


In [5]:
df['km_cluster'] = km.labels_

In [6]:
km.cluster_centers_

array([[0.00572318, 0.02753872, 0.9667381 ],
       [0.60364954, 0.30573502, 0.09061544],
       [0.19034219, 0.52838321, 0.2812746 ],
       [0.88076173, 0.10425816, 0.01498011],
       [0.06183021, 0.23258002, 0.70558977]])

Looks like: 
- Cluster 0 is 5 star
- Cluster 4 is 4 star
- Cluster 2 is 3 star
- Cluster 1 is 2 star
- Cluster 3 is 1 star

In [7]:
# New clustering for 5 star reviews
df[df['review_rating']==5]['km_cluster'].value_counts()

0    1756
4      58
2      23
1       6
3       1
Name: km_cluster, dtype: int64

In [8]:
# New clustering for 4 star reviews
df[df['review_rating']==4]['km_cluster'].value_counts()

0    431
4     73
2     24
1     24
3      6
Name: km_cluster, dtype: int64

In [9]:
# New clustering for 3 star reviews
df[df['review_rating']==3]['km_cluster'].value_counts()

0    44
4    43
3    43
1    39
2    30
Name: km_cluster, dtype: int64

In [10]:
# New clustering for 2 star reviews
df[df['review_rating']==2]['km_cluster'].value_counts()

3    59
1    22
2    10
4     9
0     1
Name: km_cluster, dtype: int64

In [11]:
# New clustering for 1 star reviews
df[df['review_rating']==1]['km_cluster'].value_counts()

3    154
1     27
2      8
4      5
0      2
Name: km_cluster, dtype: int64

In [12]:
# Note how many reviews were reclassified according to the new clusters
df[df['km_cluster']==4]['review_rating'].value_counts().sort_index()

1.0     5
2.0     9
3.0    43
4.0    73
5.0    58
Name: review_rating, dtype: int64

In [13]:
df[df['km_cluster']==3]['review_rating'].value_counts().sort_index()

1.0    154
2.0     59
3.0     43
4.0      6
5.0      1
Name: review_rating, dtype: int64

In [14]:
df[df['km_cluster']==2]['review_rating'].value_counts().sort_index()

1.0     8
2.0    10
3.0    30
4.0    24
5.0    23
Name: review_rating, dtype: int64

In [15]:
df[df['km_cluster']==1]['review_rating'].value_counts().sort_index()

1.0    27
2.0    22
3.0    39
4.0    24
5.0     6
Name: review_rating, dtype: int64

In [16]:
df[df['km_cluster']==0]['review_rating'].value_counts().sort_index()

1.0       2
2.0       1
3.0      44
4.0     431
5.0    1756
Name: review_rating, dtype: int64

In [17]:
# List of 5 star reviews based on new cluster, which got 3 star reviews from reviewers
df[(df['km_cluster']==0) & (df['review_rating']==3)]['text']

106     Utopia has been in the neighborhood for as lon...
169     Came here before a concert at Lincoln Center b...
195     A lovely, hole-in-the-wall little cafe tucked ...
362     Allow me to articulate a meticulously consider...
403     Pros: the atmosphere is gorgeous and you can't...
544                       First time here. Good was okay.
548     Loved their pizza and calzones! We loved how t...
718     Classic NYC Italian restaurant with okay food....
764     First off this place is smaller than it looks....
1064    We went on a Thursday night and it was not bus...
1071    I got the taco salad, and it wasn’t bad. In fa...
1198    My boyfriend and I wanted to have the Olive Ga...
1223    We just wanted a quick bite to eat after a lon...
1261    Cool atmosphere. Food was ok, I would not reco...
1277    Came across this place in the food hall, the m...
1279    The lobster mac and cheese was top ten. The wi...
1326    Spacious place, big menu\nSchezuan mushroom du...
1353    Nah fo

In [18]:
# Example of good re-classification - User gave 3 star review, but seems to deserve 4+ stars
df[(df['km_cluster']==0) & (df['review_rating']==3)]['text'].iloc[-1]

'When I think about hip New York dining spots, Casino is the first thing that comes to mind.\n\nCasino’s interiors have this natural look with slopes and curves as if the guys who built it worked with the natural landscape. It’s dark and cozy inside. A really nice place to rest from the cold November wind.\n\nWe had oysters and the non-alcoholic drinks. Pretty good.\n\nBest to reserve a table if you plan on going as the place gets packed pretty quick. New York locals crowd.\n\nShout out to Quentin and the rest of the crew for the hospitality. He gave us a seat even though we just walked in.'

In [19]:
# Example of good re-classification - User gave 3 star review, but seems to deserve 4+ stars
df[(df['km_cluster']==0) & (df['review_rating']==3)]['text'].iloc[-5]

"Picked up my daughter after school and went to grab a bite to eat we decided on tacos and ended up here.. Cute little corner spot in the  Lower East side  With table and chairs for outdoor seating. We ordered chicken tacos and steak tacos tacos and in order of guacamole and chips. The tacos were very well seasoned and tasty the but my favorite was the chips in guacamole and the green sauce I really don't know the name for it but we put it on our tacos we even put it on our nachos so it was pretty good I do recommend this place for a quick bite if you really don't have any thoughts on where to eat This would be a good spot to grab a couple of tacos😁"

In [20]:
# Example of misclassification - Reclassified as 5 star review, but sounds like more of 3-4 star review
df[(df['km_cluster']==0) & (df['review_rating']==3)]['text'].iloc[1]

'Came here before a concert at Lincoln Center because of all the outstanding reviews.   The crust is definitely foccacia-like, well charred while still being soft inside.   I got a funghi slice and an eggplant parmigiana slice.   My son got a buffalo chicken ranch and a pepperoni, and my husband got a Calabrese and a pepperoni.  There was no line at around 6:30 which was great, though they were clearly doing business.  Tables inside were about half taken.  It was very warm inside so we opted for the nice "al fresco" option they offer outside.  Pizza was reheated quickly, cut into pieces and promptly served to our table.  That said, although the crust was perfectly cooked - most of the toppings were disappointing.   Not "bad" pizza- but definitely not the finest NY has to offer.  Would I come back?  Sure.  Will it be a priority?   Nah.  If you\'re looking for decent pizza, stop by.   If you\'re looking for traditional NY pizza, keep looking.'

In [21]:
# Example of misclassification - 5 star rating although the user wants to leave 3.5 stars. 
# Per review, definitely not 5 star review
df[(df['km_cluster']==0) & (df['review_rating']==3)]['text'].iloc[-2]

'The vibe was great, very beautiful interior and feels like you go back in time. The espresso martini was great, not sure if it’s worth its price though. Food was a 3.5 out of 5, can’t say I’ll be coming back for a dinner, but for a drink and atmosphere I would. If I could leave 3.5 stars I’d do that. Overall not a bad experience, however the service could be better.\n\nUpdated: Don’t order the chicken dish.'

In [22]:
# Example of misclassification - 4 star rating but sounds very mediocre review
df[(df['km_cluster']==3) & (df['review_rating']==4)]['text'].iloc[5]

"It's a very long wait on a Sunday. I arrived around 11.15 and the wait for one person was an hour. For two people or more it was a 3 hour wait.\n\nThe famous pancakes were delicious, even as a single stack was quite large. The home fries were just broken up hash browns so a little disappointed.\n\nI would go back if I was super hungry, but because of the long wait I had eaten something small beforehand."

In [23]:
# Example of misclassification - 4 star rating but sounds more like a 3 star review
df[(df['km_cluster']==3) & (df['review_rating']==4)]['text'].iloc[0]

'Nice place but not 5 stars, 3 kind of sandwich and at noon only 2 available.\nGot there on reviews but at least the lobster sandwich is not so fantastic as expected.'

#####  Observation
Recalibrated star rating system, from time to time, correctly reclassifies user star rating into the "right" bucket as you can see from the above examples. However, there are also many cases where the reclassifying system incorrectly labeled a given review. 

Unfortunately, it is difficult to evaluate the performance of the reclassification model as there is no supervised or human re-evaluated star rating of each user review to compare against. 

#### K-Means Clustering - Version 2

In [24]:
km2 = KMeans(n_clusters = 5, random_state=5)

In [25]:
# Using different clustering parameters
km2.fit(df[['roberta_pos','roberta_neg']])

  super()._check_params_vs_input(X, default_n_init=10)


In [26]:
df['km_cluster_v2'] = km2.labels_

In [27]:
# Didn't change from the original k-means result
km2.cluster_centers_

array([[0.96633443, 0.00569745],
       [0.0785209 , 0.61483279],
       [0.2869981 , 0.21553158],
       [0.01446248, 0.88076899],
       [0.70265119, 0.05749273]])

Looks like: 
- Cluster 0 is 5 star
- Cluster 4 is 4 star
- Cluster 2 is 3 star
- Cluster 1 is 2 star
- Cluster 3 is 1 star

In [28]:
df[df['review_rating']==5]['km_cluster_v2'].value_counts()

0    1759
4      58
2      22
1       4
3       1
Name: km_cluster_v2, dtype: int64

In [29]:
df[df['review_rating']==4]['km_cluster_v2'].value_counts()

0    434
4     68
2     28
1     22
3      6
Name: km_cluster_v2, dtype: int64

In [30]:
df[df['review_rating']==3]['km_cluster_v2'].value_counts()

0    45
3    43
4    41
1    38
2    32
Name: km_cluster_v2, dtype: int64

In [31]:
df[df['review_rating']==2]['km_cluster_v2'].value_counts()

3    58
1    21
2    13
4     8
0     1
Name: km_cluster_v2, dtype: int64

In [32]:
df[df['review_rating']==1]['km_cluster_v2'].value_counts()

3    155
1     26
2      9
4      4
0      2
Name: km_cluster_v2, dtype: int64

In [33]:
# Example of good re-classification - user was harsh on the star rating for a very good review
df[(df['km_cluster_v2']==0) & (df['review_rating']==3)]['text'].iloc[6]

"Loved their pizza and calzones! We loved how the pizza bread were chewy and had some garlicky flavor, slices were big and affordable too. Their menu options is wide and freshly made! Will definitely go back when I'm in the city."

In [34]:
# Example of misclassification - classified as 5 star review, but it is actually 3 star
df[(df['km_cluster_v2']==0) & (df['review_rating']==3)]['text'].iloc[8]

'First off this place is smaller than it looks.  The tables are tiny.  The pizza is OK.  It is not amazing but not awful.  The ingredients are fresh and tasty.  The service is great here. Prices are good.  The place is pretty and stylish.'

##### Observation
Slight change in k-means cluster parameters essentially gave similar performance to the original k-means cluster

#### Agglomerative Clustering

In [35]:
# Using different clustering method
ac = AgglomerativeClustering(n_clusters=5)

In [36]:
ac.fit(df[['roberta_neg','roberta_neu','roberta_pos']])

In [37]:
df['ac_cluster'] = ac.labels_

In [38]:
# Cluster 1 seems to be very positive reviews
df[df['review_rating']==5]['ac_cluster'].value_counts()

1    1745
3      64
0      30
4       3
2       2
Name: ac_cluster, dtype: int64

In [39]:
df[df['review_rating']==4]['ac_cluster'].value_counts()

1    418
3     69
0     43
4     21
2      7
Name: ac_cluster, dtype: int64

In [40]:
df[df['review_rating']==3]['ac_cluster'].value_counts()

2    52
1    42
0    39
4    33
3    33
Name: ac_cluster, dtype: int64

In [41]:
# Cluster 2 appear to be quite negative reviews
df[df['review_rating']==2]['ac_cluster'].value_counts()

2    66
0    16
4    14
3     4
1     1
Name: ac_cluster, dtype: int64

In [42]:
df[df['review_rating']==1]['ac_cluster'].value_counts()

2    162
4     19
0      9
3      4
1      2
Name: ac_cluster, dtype: int64

In [43]:
# Example of misclassification - User gave 4 star rating and the review looks decent, but model gave poor rating
df[(df['ac_cluster']==2) & (df['review_rating']==4)]['text'].iloc[1]

'Good not great.  Not what it used to be.  Food didn’t seem to have the same presentation and flavor as it used to be during the peak 10+ years ago.  Service was good and attentive but pasta took very long to come out.  Place is nice looking but I miss it being a stuffier crowd.'

In [44]:
# Example of correct classification - bad rating and bad review
df[(df['ac_cluster']==2) & (df['review_rating']==1)]['text'].iloc[1]

'Food was not what I expected I asked for a mix player and I got left smal chunks for chicken and lab. And the portions are very small compared to other trucks. Also got some of the fried food, it was straight out of the freezer and tasted bad\nCould use a lot of improvement. Definitely not worth the time to go there.'

In [45]:
# Example of good re-classification - User gave 3 star review, but seems to deserve 4+ stars
df[(df['ac_cluster']==1) & (df['review_rating']==3)]['text'].iloc[2]

"A lovely, hole-in-the-wall little cafe tucked in the corner of a boba shop with tasty food, including an impressively vast selection of vegan and vegetarian options. There's no seating inside the shop, but the Eleanor Roosevelt Memorial Park is just a few blocks away if you're alright with enjoying your food outside! The food was tasty, but a bit heavy and oversaturated with oil for my taste -- I opted for the egg white omelette, which didn't include any cheese (yay for me and my lactose intolerant friends), but did have a boat load of fresh veggies! It was incredibly tasty, and honestly way bigger than I was expecting, but did make me feel pretty greasy afterwards. The potatoes were even more saturated in oil and a bit oversalted, but still delicious. The meal also came with toast, which was just white bread toasted in what smelled like butter, so I didn't end up partaking, but the friend that I had gone to brunch with said that it was oddly chewy (potentially because it had to be wr

##### Oberservations
Again, I do not see a good improvement in the clustering performance: general distribution of clustering per each user star rating bucket do not significantly differ from that of the K-means methods. There are still instances where good and bad reviews are clustered in the same cluster as you can see above. And, again, I see cases where good reviews had mediocre user star rating. 

## Conclusion
### Final Observations
The sentiment analysis models (i.e., VADER and Roberta) did a good job providing general sentiment of a collection of reviews. In fact, it was able to decently distinguish overall sentiment per star rating buckets on an aggregated level. However, for a specific/individual review, the model often provided incorrect sentiment scores, especially in a situation where the text reviews were sarcastic or had nuanced criticisms. 

Although the reclassification model often mis-categorized individual reviews in a wrong bucket, on the bright side, we were able to see some occasions where star rating reclassification model correctly reclassified star rating based on text reviews as some users were overly harsh or loose on their star ratings. Hence, we do see an opportunity and benefit of deploying such model. 

### Recommendations:
From the project, following are my recommendations to the PM:
1. Post example reviews per different star rating buckets to create a general criteria/guideline for what each star rating means (to prevent very harsh or loose star ratings)
2. Beta-test ML+NLP/LLM-based restaurant star rating system (i.e., based on sentiment analysis model) in addition to the traditional star rating system and get user's feedback
3. Beta-test star rating assistant feature: when there appears to be a big mismatch in text review's sentiment score and user's star rating, assistant feature can provide a suggested star rating, which user can override if user would like to

## Next Steps
1. Re-run the analysis and re-train the model with bigger dataset
2. Research into more advanced sentiment analysis model that can better pick up nuances (perhaps, utilize ChatGPT 4.0 API)