<a href="https://colab.research.google.com/github/jegazhu/python-projects/blob/main/Sentiment_Analysis_with_VADER.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Dataset and Library
The dataset that will be used as a sample in this notebook is the [Sentiment Labelled Sentences](https://archive-beta.ics.uci.edu/ml/datasets/sentiment+labelled+sentences) from the open source UCI Machine Learning Repository and the Python library is [VADER](https://pypi.org/project/vaderSentiment/#:~:text=VADER%20(Valence%20Aware%20Dictionary%20and,on%20texts%20from%20other%20domains) which means Valence Aware Dictionary and sEntiment Reasoner), a lexicon and rule-based sentiment analysis tool

#Dataset download and preprocessing

In [1]:
%%capture
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00331/sentiment%20labelled%20sentences.zip

In [2]:
!unzip '/content/sentiment labelled sentences.zip'

Archive:  /content/sentiment labelled sentences.zip
   creating: sentiment labelled sentences/
  inflating: sentiment labelled sentences/.DS_Store  
   creating: __MACOSX/
   creating: __MACOSX/sentiment labelled sentences/
  inflating: __MACOSX/sentiment labelled sentences/._.DS_Store  
  inflating: sentiment labelled sentences/amazon_cells_labelled.txt  
  inflating: sentiment labelled sentences/imdb_labelled.txt  
  inflating: __MACOSX/sentiment labelled sentences/._imdb_labelled.txt  
  inflating: sentiment labelled sentences/readme.txt  
  inflating: __MACOSX/sentiment labelled sentences/._readme.txt  
  inflating: sentiment labelled sentences/yelp_labelled.txt  
  inflating: __MACOSX/._sentiment labelled sentences  


In [3]:
import pandas as pd

In [4]:
df1 = pd.read_csv('/content/sentiment labelled sentences/amazon_cells_labelled.txt',delimiter='\t',names=['review','labelled_sentiment'])
df2 = pd.read_csv('/content/sentiment labelled sentences/imdb_labelled.txt',delimiter='\t',names=['review','labelled_sentiment'])
df3 = pd.read_csv('/content/sentiment labelled sentences/yelp_labelled.txt',delimiter='\t',names=['review','labelled_sentiment'])

In [5]:
df = pd.concat([df1,df2,df3],axis=0,ignore_index=True)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2748 entries, 0 to 2747
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   review              2748 non-null   object
 1   labelled_sentiment  2748 non-null   int64 
dtypes: int64(1), object(1)
memory usage: 43.1+ KB


In [7]:
df

Unnamed: 0,review,labelled_sentiment
0,So there is no way for me to plug it in here i...,0
1,"Good case, Excellent value.",1
2,Great for the jawbone.,1
3,Tied to charger for conversations lasting more...,0
4,The mic is great.,1
...,...,...
2743,I think food should have flavor and texture an...,0
2744,Appetite instantly gone.,0
2745,Overall I was not impressed and would not go b...,0
2746,"The whole experience was underwhelming, and I ...",0


#Sentiment_score calculation and labelling with VADER

In [8]:
%%capture 
!pip install vaderSentiment

In [9]:
import vaderSentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

In [10]:
df['vader_sentiment_score'] = df['review'].apply(lambda x: analyzer.polarity_scores(x)['compound'])

In [11]:
df

Unnamed: 0,review,labelled_sentiment,vader_sentiment_score
0,So there is no way for me to plug it in here i...,0,-0.3535
1,"Good case, Excellent value.",1,0.8402
2,Great for the jawbone.,1,0.6249
3,Tied to charger for conversations lasting more...,0,-0.6145
4,The mic is great.,1,0.6249
...,...,...,...
2743,I think food should have flavor and texture an...,0,0.0000
2744,Appetite instantly gone.,0,0.0000
2745,Overall I was not impressed and would not go b...,0,-0.3724
2746,"The whole experience was underwhelming, and I ...",0,0.0000


In [12]:
df['vader_sentiment']= df['vader_sentiment_score'].map(lambda x:int(1) if x>=0.05 else int(0) if x<=-0.05 else int(2))

In [13]:
df

Unnamed: 0,review,labelled_sentiment,vader_sentiment_score,vader_sentiment
0,So there is no way for me to plug it in here i...,0,-0.3535,0
1,"Good case, Excellent value.",1,0.8402,1
2,Great for the jawbone.,1,0.6249,1
3,Tied to charger for conversations lasting more...,0,-0.6145,0
4,The mic is great.,1,0.6249,1
...,...,...,...,...
2743,I think food should have flavor and texture an...,0,0.0000,2
2744,Appetite instantly gone.,0,0.0000,2
2745,Overall I was not impressed and would not go b...,0,-0.3724,0
2746,"The whole experience was underwhelming, and I ...",0,0.0000,2


In [14]:
df.loc[df['vader_sentiment'] ==1, "vader_sentiment_label"] ="positive"
df.loc[df['vader_sentiment'] ==2, "vader_sentiment_label"] ="neutral"
df.loc[df['vader_sentiment'] ==0, "vader_sentiment_label"] ="negative"

In [15]:
df

Unnamed: 0,review,labelled_sentiment,vader_sentiment_score,vader_sentiment,vader_sentiment_label
0,So there is no way for me to plug it in here i...,0,-0.3535,0,negative
1,"Good case, Excellent value.",1,0.8402,1,positive
2,Great for the jawbone.,1,0.6249,1,positive
3,Tied to charger for conversations lasting more...,0,-0.6145,0,negative
4,The mic is great.,1,0.6249,1,positive
...,...,...,...,...,...
2743,I think food should have flavor and texture an...,0,0.0000,2,neutral
2744,Appetite instantly gone.,0,0.0000,2,neutral
2745,Overall I was not impressed and would not go b...,0,-0.3724,0,negative
2746,"The whole experience was underwhelming, and I ...",0,0.0000,2,neutral
