# Sentiment Analysis Demonstration

This is a simple sentiment analysis demonstration from Medium

https://medium.com/analytics-vidhya/simple-sentiment-analysis-python-bf9de2d75d0

## Imports & Downloads

In [2]:
import nltk

In [3]:
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/olivia/nltk_data...


True

In [5]:
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer

## Read in the data

If you need to download the data, then get it from here: 
https://www.kaggle.com/datasets/arushchillar/disneyland-reviews

In [6]:
data = pd.read_csv('DisneylandReviews.csv', encoding='latin-1')

In [9]:
data.head(5)

Unnamed: 0,Review_ID,Rating,Year_Month,Reviewer_Location,Review_Text,Branch
0,670772142,4,2019-4,Australia,If you've ever been to Disneyland anywhere you...,Disneyland_HongKong
1,670682799,4,2019-5,Philippines,Its been a while since d last time we visit HK...,Disneyland_HongKong
2,670623270,4,2019-4,United Arab Emirates,Thanks God it wasn t too hot or too humid wh...,Disneyland_HongKong
3,670607911,4,2019-4,Australia,HK Disneyland is a great compact park. Unfortu...,Disneyland_HongKong
4,670607296,4,2019-4,United Kingdom,"the location is not in the city, took around 1...",Disneyland_HongKong


In [7]:
data.shape

(42656, 6)

In [8]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 42656 entries, 0 to 42655
Data columns (total 6 columns):
Review_ID            42656 non-null int64
Rating               42656 non-null int64
Year_Month           42656 non-null object
Reviewer_Location    42656 non-null object
Review_Text          42656 non-null object
Branch               42656 non-null object
dtypes: int64(2), object(4)
memory usage: 2.0+ MB


## Prepare the data

In [10]:
reviews = data[['Review_ID', 'Review_Text']]

In [11]:
reviews.head(5)

Unnamed: 0,Review_ID,Review_Text
0,670772142,If you've ever been to Disneyland anywhere you...
1,670682799,Its been a while since d last time we visit HK...
2,670623270,Thanks God it wasn t too hot or too humid wh...
3,670607911,HK Disneyland is a great compact park. Unfortu...
4,670607296,"the location is not in the city, took around 1..."


## How to apply VADER to one review

In [14]:
rev = reviews['Review_Text'][10]
rev

"Disneyland never cease to amaze me! I've been to Disneyland florida and I thought I have exhausted the kid in me but nope! I still had so much fun in disneyland hong kong. 2 DL off my bucketlist and more to come!     "

In [16]:
analyzer = SentimentIntensityAnalyzer()
analyzer.polarity_scores(rev)

{'neg': 0.083, 'neu': 0.788, 'pos': 0.129, 'compound': 0.621}

In [21]:
body = reviews.Review_Text
neg, neu, pos, compound = [],[],[],[]
for review in body:
    res = analyzer.polarity_scores(review)
    neg.append(res['neg'])
    neu.append(res['neu'])
    pos.append(res['pos'])
    compound.append(res['compound'])

In [22]:
len(compound)

42656

In [23]:
reviews.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 42656 entries, 0 to 42655
Data columns (total 2 columns):
Review_ID      42656 non-null int64
Review_Text    42656 non-null object
dtypes: int64(1), object(1)
memory usage: 666.6+ KB


In [24]:
reviews['Negative'] = neg
reviews['Neutral'] = neu
reviews['Positive'] = pos
reviews['Compound'] = compound
reviews.head(5)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  reviews['Negative'] = neg
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  reviews['Neutral'] = neu
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  reviews['Positive'] = pos


Unnamed: 0,Review_ID,Review_Text,Negative,Neutral,Positive,Compound
0,670772142,If you've ever been to Disneyland anywhere you...,0.0,0.887,0.113,0.7069
1,670682799,Its been a while since d last time we visit HK...,0.04,0.73,0.231,0.9901
2,670623270,Thanks God it wasn t too hot or too humid wh...,0.024,0.742,0.235,0.992
3,670607911,HK Disneyland is a great compact park. Unfortu...,0.08,0.76,0.16,0.8489
4,670607296,"the location is not in the city, took around 1...",0.0,0.899,0.101,0.2846


In [29]:
tags = []
for i in range(len(reviews)):
    winning_val = max(neg[i], pos[i])
    if(winning_val == neg[i]):
        tags.append('Negative')
    else:
        tags.append('Positive')

In [30]:
reviews['Sentiment_Tag'] = tags
reviews.head()

Unnamed: 0,Review_ID,Review_Text,Negative,Neutral,Positive,Compound,Sentiment_Tag
0,670772142,If you've ever been to Disneyland anywhere you...,0.0,0.887,0.113,0.7069,Positive
1,670682799,Its been a while since d last time we visit HK...,0.04,0.73,0.231,0.9901,Positive
2,670623270,Thanks God it wasn t too hot or too humid wh...,0.024,0.742,0.235,0.992,Positive
3,670607911,HK Disneyland is a great compact park. Unfortu...,0.08,0.76,0.16,0.8489,Positive
4,670607296,"the location is not in the city, took around 1...",0.0,0.899,0.101,0.2846,Positive
