## Predicting Starbucks Reviews as 5 star or below

#### Introduction

We're looking through a dataset of 749 Starbucks customer reviews. Each review has an associated rating on a 1-5 scale.
Using Natural Language Processing, we will create a Logistic Regression to predict if a review is a 5 or not based on the words used in the review.
This type of prediction could be useful in estimating a rating from reviews that have no rating associated.

#### Imports

In [1]:
import nltk
import pandas as pd
import numpy as np
import string
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

#### Prepare the Data

We're stripping the dataset of identifiers since we're only interested in the text connected with the rating.

In [2]:
starbucks = pd.read_csv('reviews_data.csv')
starbucks

Unnamed: 0,name,location,Date,Rating,Review,Image_Links
0,Helen,"Wichita Falls, TX","Reviewed Sept. 13, 2023",5.0,Amber and LaDonna at the Starbucks on Southwes...,['No Images']
1,Courtney,"Apopka, FL","Reviewed July 16, 2023",5.0,** at the Starbucks by the fire station on 436...,['No Images']
2,Daynelle,"Cranberry Twp, PA","Reviewed July 5, 2023",5.0,I just wanted to go out of my way to recognize...,['https://media.consumeraffairs.com/files/cach...
3,Taylor,"Seattle, WA","Reviewed May 26, 2023",5.0,Me and my friend were at Starbucks and my card...,['No Images']
4,Tenessa,"Gresham, OR","Reviewed Jan. 22, 2023",5.0,I’m on this kick of drinking 5 cups of warm wa...,['https://media.consumeraffairs.com/files/cach...
...,...,...,...,...,...,...
845,Becky,"Agoura Hills, CA","Reviewed July 13, 2006",,I ordered two venti frappacino's without whipp...,['No Images']
846,Bob,"Goodrich, MI","Reviewed Jan. 3, 2005",,No Review Text,['No Images']
847,Erik,"Valley Village, CA","Reviewed Nov. 5, 2004",,"DEMANDED TIPS FROM ME, THEN MADE ME WAIT UNTIL...",['No Images']
848,Andrew,"Fallbrook, CA","Reviewed Oct. 20, 2004",,No Review Text,['No Images']


In [4]:
starbucks_trim = starbucks.loc[:, ('Rating', 'Review')].dropna()

In [5]:
starbucks_trim

Unnamed: 0,Rating,Review
0,5.0,Amber and LaDonna at the Starbucks on Southwes...
1,5.0,** at the Starbucks by the fire station on 436...
2,5.0,I just wanted to go out of my way to recognize...
3,5.0,Me and my friend were at Starbucks and my card...
4,5.0,I’m on this kick of drinking 5 cups of warm wa...
...,...,...
700,1.0,I ordered Via Starbucks coffee online. I recei...
701,3.0,"My name is Ric **, I am journalist by professi..."
702,1.0,"The bagel was ice cold, not cut and not toasted."
703,1.0,"In the morning of Monday, August 15, 2011, at ..."


In [51]:
#pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)


#### Preprocess text

First, we're changing our ratings to be function as True/False. Is this a 5 star rating or not?
Second, we're listing the words in each review without common "stopwords" and punctuation that should not have a bearing on the sentiment of the review.

In [6]:
X = starbucks_trim['Review']
y = (starbucks_trim['Rating'] == 5).astype(int)

In [7]:
def cleanText(text):
    clean_token_line = []
    text = [word.lower() for word in text]
    token_sent = [word_tokenize(word) for word in text]
    
    for line in token_sent:
        clean_token_line.append([word for word in line if not word in (list(stopwords.words('english')) + list(string.punctuation) + ["“","’","--","”"] )])
    return clean_token_line

In [8]:
cleanText(X)

[['amber',
  'ladonna',
  'starbucks',
  'southwest',
  'parkway',
  'always',
  'warm',
  'welcoming',
  'always',
  'smile',
  'voice',
  'greet',
  'drive-thru',
  'customer',
  'service',
  'always',
  'spot-on',
  'always',
  'get',
  'order',
  'right',
  'smile',
  'would',
  'actually',
  'give',
  '5',
  'stars',
  'available'],
 ['starbucks',
  'fire',
  'station',
  '436',
  'altamonte',
  'springs',
  'fl',
  'made',
  'day',
  'finally',
  'helped',
  'figure',
  'way',
  'make',
  'drink',
  'love',
  'took',
  'time',
  'talk',
  '2',
  'minutes',
  'make',
  'experience',
  'better',
  'used',
  'much',
  'appreciated',
  'bad',
  'experiences',
  'one',
  'another',
  'starbucks',
  'closest',
  'work',
  'building',
  'drinks',
  'great',
  'along',
  'great',
  'customer',
  'service',
  'specific',
  'baristas',
  'niko',
  'refreshing',
  'speak',
  'pleasant',
  'drink',
  'perfect',
  'store',
  '11956'],
 ['wanted',
  'go',
  'way',
  'recognize',
  'starbucks',

### Bag of words method

In [9]:
vectorizer_bow = CountVectorizer(max_features=1000)
X_bow = vectorizer_bow.fit_transform(X)

### Split the data

In [10]:
X_train_bow, X_test_bow, y_train_bow, y_test_bow = train_test_split(X_bow, y, test_size=0.2, random_state=42)

#### Train a classification model

In [11]:
model_bow = LogisticRegression()
model_bow.fit(X_train_bow, y_train_bow)

In [12]:
y_pred_bow = model_bow.predict(X_test_bow)

#### Evaluate the model

In [13]:
print("Using bag of words our scores are:")
accuracy_bow = accuracy_score(y_test_bow, y_pred_bow)
print("Accuracy:", accuracy_bow)

precision_bow = precision_score(y_test_bow, y_pred_bow)
print("Precision:", precision_bow)

recall_bow = recall_score(y_test_bow, y_pred_bow)
print("Recall:", recall_bow)

f1_bow = f1_score(y_test_bow, y_pred_bow)
print("F1 Score:", f1_bow)

tn_bow, fp_bow, fn_bow, tp_bow = confusion_matrix(y_test_bow, y_pred_bow).ravel()
specificity_bow = tn_bow / (tn_bow + fp_bow)
print("Specificity:", specificity_bow)

conf_matrix_bow = confusion_matrix(y_test_bow, y_pred_bow)
print("Confusion Matrix:")
print(conf_matrix_bow)
# [[TN, FP]
#  [FN, TP]]

Using bag of words our scores are:
Accuracy: 0.8936170212765957
Precision: 0.8181818181818182
Recall: 0.4090909090909091
F1 Score: 0.5454545454545455
Specificity: 0.9831932773109243
Confusion Matrix:
[[117   2]
 [ 13   9]]


### Bigram method

In [14]:
vectorizer_bigram = CountVectorizer(ngram_range=(1, 2), max_features=1000)
X_bigram = vectorizer_bigram.fit_transform(X)

In [15]:
X_train_bigram, X_test_bigram, y_train_bigram, y_test_bigram = train_test_split(X_bigram, y, test_size=0.2, random_state=42)

In [16]:
model_bigram = LogisticRegression()
model_bigram.fit(X_train_bigram, y_train_bigram)

In [17]:
y_pred_bigram = model_bigram.predict(X_test_bigram)

In [18]:
print("Using bigrams our scores are:")
accuracy_bigram = accuracy_score(y_test_bigram, y_pred_bigram)
print("Accuracy:", accuracy_bigram)

precision_bigram = precision_score(y_test_bigram, y_pred_bigram)
print("Precision:", precision_bigram)

recall_bigram = recall_score(y_test_bigram, y_pred_bigram)
print("Recall:", recall_bigram)

f1_bigram = f1_score(y_test_bigram, y_pred_bigram)
print("F1 Score:", f1_bigram)

tn_bigram, fp_bigram, fn_bigram, tp_bigram = confusion_matrix(y_test_bigram, y_pred_bigram).ravel()
specificity_bigram = tn_bigram / (tn_bigram + fp_bigram)
print("Specificity:", specificity_bigram)

conf_matrix_bigram = confusion_matrix(y_test_bigram, y_pred_bigram)
print("Confusion Matrix:")
print(conf_matrix_bigram)


Using bigrams our scores are:
Accuracy: 0.9078014184397163
Precision: 0.8461538461538461
Recall: 0.5
F1 Score: 0.6285714285714286
Specificity: 0.9831932773109243
Confusion Matrix:
[[117   2]
 [ 11  11]]


### Trigram method

In [19]:
vectorizer_trigram = CountVectorizer(ngram_range=(1, 3), max_features=1000)
X_trigram = vectorizer_trigram.fit_transform(X)

In [20]:
X_train_trigram, X_test_trigram, y_train_trigram, y_test_trigram = train_test_split(X_trigram, y, test_size=0.2, random_state=42)

In [21]:
model_trigram = LogisticRegression()
model_trigram.fit(X_train_trigram, y_train_trigram)

In [22]:
y_pred_trigram = model_trigram.predict(X_test_trigram)

In [23]:
print("Using trigrams our scores are:")
accuracy_trigram = accuracy_score(y_test_trigram, y_pred_trigram)
print("Accuracy:", accuracy_trigram)

precision_trigram = precision_score(y_test_trigram, y_pred_trigram)
print("Precision:", precision_trigram)

recall_trigram = recall_score(y_test_trigram, y_pred_trigram)
print("Recall:", recall_trigram)

f1_trigram = f1_score(y_test_trigram, y_pred_trigram)
print("F1 Score:", f1_trigram)

tn_trigram, fp_trigram, fn_trigram, tp_trigram = confusion_matrix(y_test_trigram, y_pred_trigram).ravel()
specificity_trigram = tn_trigram / (tn_trigram + fp_trigram)
print("Specificity:", specificity_trigram)

conf_matrix_trigram = confusion_matrix(y_test_trigram, y_pred_trigram)
print("Confusion Matrix:")
print(conf_matrix_trigram)


Using trigrams our scores are:
Accuracy: 0.900709219858156
Precision: 0.8333333333333334
Recall: 0.45454545454545453
F1 Score: 0.5882352941176471
Specificity: 0.9831932773109243
Confusion Matrix:
[[117   2]
 [ 12  10]]


All of the models performed at above 80%. Considering that the sentiment in a review is not always reflected in the rating,
this seems to predict fairly well. 
This prediction could be helpful in filling in missing data in datasets like this or could also be used where only a review is given without context for a rating. 