### Sentiment analysis on sample data (Buffalo joes, Chicago) using NLTK (Natural language Toolkit) and VADER ( Valence Aware Dictionary for Sentiment Reasoning) model.

Add in the necessary imports

In [13]:
import pandas as pd
import numpy as np
from nltk.sentiment import SentimentIntensityAnalyzer
from tqdm.notebook import tqdm

Read in the reviews from the CSV file using pandas. Reviews scrapped from yelp using requests and BeautifulSoup

In [3]:
df = pd.read_csv('../data/buffalo_joes_reviews.csv')
print(df)

                                                review
0    My favorite wings around.  I found this place ...
1    I thought it was pretty good. It's been awhile...
2    I used to love Buffjoes in my teens and 20's, ...
3    A friend of mind introduced me Buffalo Joes! I...
4    I just came back to buff joes after a several ...
..                                                 ...
315  For starters, I admit to being a wing snob. I ...
316  I've eaten here so many times, but only now ge...
317  I'm sorry to say.....after 30 plus years of go...
318  I visited here late Thursday night. I was happ...
319  Horrible customer service. I ordered the food ...

[320 rows x 1 columns]


Initialize the Sentiment object using NLTK's VADER model

In [4]:
sia = SentimentIntensityAnalyzer()
print(sia)

<nltk.sentiment.vader.SentimentIntensityAnalyzer object at 0x00000232B2592350>


Sample test using only one Review from the DataFrame. the SentimentIntensityAnalyzer polarity score uses a Bag of Words approuch without use of Context. I.E. everywood is treated seperatly. The output includes 'neg', 'neu', 'pos', 'compound'. the 'neg', 'neu', 'pos' stand for negative, neutral, and positive respectfully. Those show the percentage of the review it thinks is each. In this example, It thinks that its 5.3% negative, 64.1% neutral, and 30.6% positive, summing to 100%. the compound ranges from -1-1 showing the overall score of the review. This review was giving a .8775 meaning the sentiment was quite positive overall.  

In [5]:
# Pulling a random review
example = df['review'][23]
print(example)
print(sia.polarity_scores(example))

Stopped here to grab a bit to eat on our way to Glacier National Park and were very impressed with the delicious food, western decor and fantastic service!
{'neg': 0.053, 'neu': 0.641, 'pos': 0.306, 'compound': 0.8775}


### Looping through all of the reviews using TQDM to show a progress bar. Adapted from Rob Mulla Sentiment Analyis 

In [14]:
res = {}
for i, row in tqdm(df.iterrows(), total=len(df)):
    text = row['review']
    res[i] = sia.polarity_scores(text)

  0%|          | 0/320 [00:00<?, ?it/s]

Merge the dataframes

In [35]:
vaders = pd.DataFrame(res).T
vaders = pd.merge(df, vaders, left_index=True, right_index=True, how='left')

Some more sample sentiments

In [49]:
print(vaders['review'][54])
print(vaders['compound'][54])
print(vaders['review'][136])
print(vaders['compound'][136])
print(vaders['review'][257])
print(vaders['compound'][257])

# 136 and 257


This is a quiet, unassuming local joint. Came here with a friend and ordered a Buffalo chicken sandwich. I enjoyed the sandwich - chicken was nice and crispy with a tasty bun (5/5). However, I felt that the interior of the restaurant could use some TLC. Next time I will probably get my food to go.
0.8519
How do you even write a review for the greatest wing joint on gods green earth? If there ain't jalepenos on your wings and a large RC cola next to your tray, you did it wrong. I've been going here since I was a kid. My only complaint is that buff joes has very effectively ruined wings for me as nobody can ever compare.
-0.0735
Everyone seems to rave about their wings and honestly, I thought they were pretty decent. But their burger (or at least the one I had- their bacon cheddar burger) was far from being a decent eat. It was soggy, and the bun fell apart in the middle of my first bite. My burger patty slid out along with majority of the content and that is a huge no-no for me! If wing