# Goal :
Use NB to predict which reviews are high priority vs low priority


In [1]:
# import libraries
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer , TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score

In [2]:
import text_preprocessing

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
pd.set_option('display.max_colwidth', None)

In [4]:
data =  pd.read_csv('popchip_reviews.csv')
data.head()

Unnamed: 0.1,Unnamed: 0,ID,userID,Rating,Priority,Title,Text
0,0,1,usr204,5,High,Love these snacks!,"Honestly didn't think I'd get addicted to these but here we areâ€¦ the BBQ flavour literally disappears in 5 minutes because I can't stop eating them. Crunchy but not oily, and they somehow feel 'light' even though I eat like 3 bags at once lol."
1,1,2,usr118,3,Medium,Decent but too salty,They're okay I guess? The texture is nice but omg the salt. After a few pieces I needed a whole bottle of water. My partner loves them but I'm kinda meh about it.
2,2,3,usr550,4,Low,Good for parties,Bought these for a movie night and everyone kept asking what brand they were. I wish the bag was bigger though. The sour cream flavor is ðŸ”¥ but a bit inconsistent between batches??
3,3,4,usr802,2,High,Not for me,"I really wanted to like these because people hype them up all over social media, but honestly they taste kinda stale to me. Maybe I got a bad batch but still disappointing, especially for the price ngl."
4,4,5,usr990,1,High,Terrible â€” never again,Idk what happened but the whole bag tasted like burnt cardboard ðŸ˜­. I even asked my sibling to try and they spit it out immediately. Super weird chemical smell too??? yeah no thanks.


In [None]:
# Remove all rows where Priority == "Medium"
data = data[data["Priority"] != "Medium"].reset_index(drop=True)

print(data["Priority"].value_counts())
print("Remaining rows:", len(data))

Priority
Low     21
High    20
Name: count, dtype: int64
Remaining rows: 41


In [6]:
# quick EDA
data.shape

(41, 7)

In [7]:
data.Priority.value_counts()

Priority
Low     21
High    20
Name: count, dtype: int64

In [8]:
data['clean_text']  = text_preprocessing.clean_normalize(data.Text)
data.head()

Unnamed: 0.1,Unnamed: 0,ID,userID,Rating,Priority,Title,Text,clean_text
0,0,1,usr204,5,High,Love these snacks!,"Honestly didn't think I'd get addicted to these but here we areâ€¦ the BBQ flavour literally disappears in 5 minutes because I can't stop eating them. Crunchy but not oily, and they somehow feel 'light' even though I eat like 3 bags at once lol.",honestly not think d addicted BBQ flavour literally disappear 5 minute not stop eat crunchy oily feel light eat like 3 bag lol
1,2,3,usr550,4,Low,Good for parties,Bought these for a movie night and everyone kept asking what brand they were. I wish the bag was bigger though. The sour cream flavor is ðŸ”¥ but a bit inconsistent between batches??,buy movie night keep ask brand wish bag big sour cream flavor bit inconsistent batch
2,3,4,usr802,2,High,Not for me,"I really wanted to like these because people hype them up all over social media, but honestly they taste kinda stale to me. Maybe I got a bad batch but still disappointing, especially for the price ngl.",want like people hype social medium honestly taste kinda stale maybe get bad batch disappoint especially price ngl
3,4,5,usr990,1,High,Terrible â€” never again,Idk what happened but the whole bag tasted like burnt cardboard ðŸ˜­. I even asked my sibling to try and they spit it out immediately. Super weird chemical smell too??? yeah no thanks.,idk happen bag taste like burn cardboard ask sibling try spit immediately super weird chemical smell yeah thank
4,6,7,usr221,5,Low,My fav snack rn,I keep a bag of these in my desk drawer at work and they're literally saving my life during long meetings. Perfect balance of flavor and crunch. They're basically my fuel ðŸ« .,bag desk drawer work literally save life long meeting Perfect balance flavor crunch basically fuel


In [9]:
cv = CountVectorizer(stop_words='english', ngram_range=(1,2), min_df =0.2, )
X = cv.fit_transform(data.clean_text)

In [10]:
X_df = pd.DataFrame(X.toarray(), columns= cv.get_feature_names_out())
X_df

Unnamed: 0,bad,bad day,bag,buy,buy complain,complain,complain lol,day,day idk,expect,...,ngl,ngl expect,place,say,say thing,texture,texture place,thing,thing maybe,weird
0,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,1,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
3,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,1,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,1,1,0,1,1,1,1,1,1,1,...,1,1,0,0,0,0,0,0,0,0
8,0,0,1,0,0,0,0,0,0,0,...,0,0,0,1,1,0,0,1,1,1
9,1,1,0,1,1,1,1,1,1,0,...,0,0,0,0,0,0,0,0,0,0


In [11]:
X_df.sum()

bad              11
bad day          10
bag              12
buy              12
buy complain     10
complain         10
complain lol     10
day              10
day idk          10
expect           12
expect flavor    12
factory          10
factory bad      10
flavor           14
friend           16
friend say       16
good             23
good kinda       16
idk              11
kinda            33
kinda good       16
kinda weird      16
lol              11
maybe            27
maybe factory    10
ngl              13
ngl expect       12
place            14
say              16
say thing        16
texture          14
texture place    14
thing            16
thing maybe      16
weird            17
dtype: int64

In [12]:
# define Priority col as output
y = data.Priority
y.head()

0    High
1     Low
2    High
3    High
4     Low
Name: Priority, dtype: object

In [13]:
# to get the code you need, ask Chatgpt "give me boiler plate naive bayes python code"

X_train, X_test, y_train, y_test =  train_test_split(X_df, y, test_size=0.2, random_state=42)

model = MultinomialNB()
model.fit(X_train, y_train)

# predict
y_pred = model.predict(X_test)
print('Accuracy :', accuracy_score(y_test, y_pred))
print(classification_report(y_test,y_pred))

Accuracy : 0.6666666666666666
              precision    recall  f1-score   support

        High       0.50      0.67      0.57         3
         Low       0.80      0.67      0.73         6

    accuracy                           0.67         9
   macro avg       0.65      0.67      0.65         9
weighted avg       0.70      0.67      0.68         9



In [14]:
new_reviews = pd.Series([
    "I swear the seasoning changes every bag", 
    "felt a bit stale but still finished it lol", 
    "my partner thinks I'm exaggerating but idk", 
    "honestly could eat these everyday ngl", 
    "super messy crumbs everywhere but tasty", 
    "the bag looked bigger online smh",
    "kinda tastes like something burnt??",
    "ate the whole thing even tho I said I wouldn't",
    "not sure if it's the oil or the flavouring but weird",
    "they slap late at night fr"
])

In [15]:
new_reviews

0                 I swear the seasoning changes every bag
1              felt a bit stale but still finished it lol
2              my partner thinks I'm exaggerating but idk
3                   honestly could eat these everyday ngl
4                 super messy crumbs everywhere but tasty
5                        the bag looked bigger online smh
6                     kinda tastes like something burnt??
7          ate the whole thing even tho I said I wouldn't
8    not sure if it's the oil or the flavouring but weird
9                              they slap late at night fr
dtype: object

Use Naive Bayes to tell us which of these reviews are low and high priority

In [16]:
# clean and normalize raw data
new_reviews_clean = text_preprocessing.clean_normalize(new_reviews)
new_reviews_clean

0        swear seasoning change bag
1         feel bit stale finish lol
2    partner think m exaggerate idk
3         honestly eat everyday ngl
4           super messy crumb tasty
5           bag look big online smh
6             kinda taste like burn
7             eat thing tho say not
8         sure oil flavouring weird
9                slap late night fr
dtype: object

In [19]:
# vectorize the data
# here we are not going to say fit part in the cv.fit_transform because fit creates columns and transform calculates the values
# since we want to have the same columns we won't be using fit part , just transform is enough
new_reviews_df = pd.DataFrame(cv.transform(new_reviews_clean).toarray(), columns=cv.get_feature_names_out())
new_reviews_df

Unnamed: 0,bad,bad day,bag,buy,buy complain,complain,complain lol,day,day idk,expect,...,ngl,ngl expect,place,say,say thing,texture,texture place,thing,thing maybe,weird
0,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,1,0,0
8,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Now make a prediction

In [20]:
model.predict(new_reviews_df)

array(['Low', 'High', 'High', 'Low', 'High', 'Low', 'High', 'High',
       'High', 'High'], dtype='<U4')

# Compare models
use tfidf and logistic regression

In [None]:
tv = TfidfVectorizer(stop_words='english', ngram_range=(1,2), min_df = .2)
Xt = tv.fit_transform(data.clean_text)
Xt_df = pd.DataFrame(Xt.toarray(), columns= tv.get_feature_names_out())
Xt_df # input

In [23]:
y.head() 

0    High
1     Low
2    High
3    High
4     Low
Name: Priority, dtype: object

In [24]:
# fit a model
# train_test split
Xt_train, Xt_test, yt_train, yt_test = train_test_split(Xt_df, y, test_size=0.2, random_state=42)

# Model - Logistic Regression
model_lr = LogisticRegression()
model_lr.fit(Xt_train, yt_train)

# Predict 
y_pred_lr = model_lr.predict(Xt_test)

# Evaluate
print('Accuracy: ', accuracy_score(yt_test, y_pred_lr))
print(classification_report(yt_test, y_pred_lr))

Accuracy:  0.5555555555555556
              precision    recall  f1-score   support

        High       0.43      1.00      0.60         3
         Low       1.00      0.33      0.50         6

    accuracy                           0.56         9
   macro avg       0.71      0.67      0.55         9
weighted avg       0.81      0.56      0.53         9



Another way of comparing models is looking at their prediction scores

In [27]:
data['Predictions_NB'] = model.predict_proba(X_df)[:,0]

In [28]:
data['Predictions_LR'] = model_lr.predict_proba(X_df)[:,0]

In [30]:
data.head(2)

Unnamed: 0.1,Unnamed: 0,ID,userID,Rating,Priority,Title,Text,clean_text,Predictions_NB,Predictions_LR
0,0,1,usr204,5,High,Love these snacks!,"Honestly didn't think I'd get addicted to these but here we areâ€¦ the BBQ flavour literally disappears in 5 minutes because I can't stop eating them. Crunchy but not oily, and they somehow feel 'light' even though I eat like 3 bags at once lol.",honestly not think d addicted BBQ flavour literally disappear 5 minute not stop eat crunchy oily feel light eat like 3 bag lol,0.602984,0.660209
1,2,3,usr550,4,Low,Good for parties,Bought these for a movie night and everyone kept asking what brand they were. I wish the bag was bigger though. The sour cream flavor is ðŸ”¥ but a bit inconsistent between batches??,buy movie night keep ask brand wish bag big sour cream flavor bit inconsistent batch,0.331271,0.354337


In [31]:
# sort probability columns to see if the models think the reviews are low vs high priority ones
data.sort_values(by='Predictions_NB', ascending=False)

Unnamed: 0.1,Unnamed: 0,ID,userID,Rating,Priority,Title,Text,clean_text,Predictions_NB,Predictions_LR
28,43,44,usr370,2,Low,My go-to chips,kinda good but also kinda weird??. kinda good but also kinda weird??. my friend said the same thing so maybe it's not just me.,kinda good kinda weird kinda good kinda weird friend say thing maybe,0.999643,0.972764
14,22,23,usr206,4,High,Too crunchy lol,my friend said the same thing so maybe it's not just me. smelled funny but tasted good?? confusing. kinda good but also kinda weird??.,friend say thing maybe smell funny taste good confusing kinda good kinda weird,0.996273,0.932935
36,54,55,usr966,4,High,My go-to chips,honestly not sure what happened but. kinda good but also kinda weird??. my friend said the same thing so maybe it's not just me.,honestly sure happen kinda good kinda weird friend say thing maybe,0.994105,0.901628
37,55,56,usr705,3,High,Not my fav,smelled funny but tasted good?? confusing. my friend said the same thing so maybe it's not just me. my friend said the same thing so maybe it's not just me.,smell funny taste good confuse friend say thing maybe friend say thing maybe,0.993112,0.868571
8,12,13,usr649,3,High,Tastes amazing,my friend said the same thing so maybe it's not just me. kinda good but also kinda weird??. I finished the whole bag accidentally.,friend say thing maybe kinda good kinda weird finish bag accidentally,0.993034,0.909505
25,35,36,usr754,3,Low,Pretty solid snack,kinda good but also kinda weird??. smelled funny but tasted good?? confusing. I keep buying them even though I complain lol.,kinda good kinda weird smell funny taste good confusing buy complain lol,0.989691,0.883924
27,39,40,usr346,2,Low,I buy these weekly,my friend said the same thing so maybe it's not just me. my friend said the same thing so maybe it's not just me. honestly not sure what happened but.,friend say thing maybe friend say thing maybe honestly sure happen,0.989126,0.813235
21,30,31,usr822,2,Low,Not worth the price,kinda good but also kinda weird??. my friend said the same thing so maybe it's not just me. maybe the factory was having a bad day idk.,kinda good kinda weird friend say thing maybe maybe factory have bad day idk,0.966995,0.721997
20,29,30,usr781,1,High,Actually impressed,ngl I expected more from this flavor. my friend said the same thing so maybe it's not just me. kinda good but also kinda weird??.,ngl expect flavor friend say thing maybe kinda good kinda weird,0.957641,0.79974
22,32,33,usr523,1,High,Nope. Just nope.,maybe the factory was having a bad day idk. I keep buying them even though I complain lol. kinda good but also kinda weird??.,maybe factory have bad day idk buy complain lol kinda good kinda weird,0.913215,0.587067
