<a href="https://colab.research.google.com/github/souradipta93/Social_Media_-_Web_Analytics/blob/main/Sentiment_Analysis_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to sentiment analysis

In [1]:
#Install in anaconda prompt <br>
!pip install vaderSentiment

Collecting vaderSentiment
[?25l  Downloading https://files.pythonhosted.org/packages/76/fc/310e16254683c1ed35eeb97386986d6c00bc29df17ce280aed64d55537e9/vaderSentiment-3.3.2-py2.py3-none-any.whl (125kB)
[K     |██▋                             | 10kB 11.7MB/s eta 0:00:01[K     |█████▏                          | 20kB 16.7MB/s eta 0:00:01[K     |███████▉                        | 30kB 20.1MB/s eta 0:00:01[K     |██████████▍                     | 40kB 22.7MB/s eta 0:00:01[K     |█████████████                   | 51kB 25.0MB/s eta 0:00:01[K     |███████████████▋                | 61kB 26.3MB/s eta 0:00:01[K     |██████████████████▏             | 71kB 27.0MB/s eta 0:00:01[K     |████████████████████▉           | 81kB 27.4MB/s eta 0:00:01[K     |███████████████████████▍        | 92kB 27.7MB/s eta 0:00:01[K     |██████████████████████████      | 102kB 28.9MB/s eta 0:00:01[K     |████████████████████████████▋   | 112kB 28.9MB/s eta 0:00:01[K     |██████████████████████████

In [2]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import pandas as pd

## Call the sentiment analyzer object

In [3]:
analyser = SentimentIntensityAnalyzer()

In [4]:
score1 = analyser.polarity_scores("Good evening, and thank you. We are happy to welcome you to Milwaukee")

In [5]:
score1

{'compound': 0.9022, 'neg': 0.0, 'neu': 0.427, 'pos': 0.573}

In [6]:
pd.DataFrame.from_dict(score1, orient='index').T

Unnamed: 0,neg,neu,pos,compound
0,0.0,0.427,0.573,0.9022


In [7]:
df = pd.DataFrame.from_dict(score1, orient='index').T

In [8]:
df['sent']="Good evening, and thank you. We are happy to welcome you to Milwaukee"

In [9]:
df.head()

Unnamed: 0,neg,neu,pos,compound,sent
0,0.0,0.427,0.573,0.9022,"Good evening, and thank you. We are happy to w..."


### Function to get polarity score for sentence in a dataframe

In [10]:
def sentiment_analyzer_scores(sentence):
    score = analyser.polarity_scores(sentence)
    df = pd.DataFrame.from_dict(score, orient='index').T
    df['sent'] = sentence
    df = df.reindex(columns=['sent','neg','neu','pos', 'compound'])
    return df.head()

In [11]:
sentiment_analyzer_scores("Good evening, and thank you. We are happy to welcome you to Milwaukee")

Unnamed: 0,sent,neg,neu,pos,compound
0,"Good evening, and thank you. We are happy to w...",0.0,0.427,0.573,0.9022


## Impact of punctuations on sentiment scores

In [12]:
sent1 = 'the movie was horrible'
sent2 = 'the movie was horrible!'
sent3 = 'the movie was horrible!!'

In [13]:
sentiment_analyzer_scores(sent1)

Unnamed: 0,sent,neg,neu,pos,compound
0,the movie was horrible,0.538,0.462,0.0,-0.5423


In [14]:
sentiment_analyzer_scores(sent2)

Unnamed: 0,sent,neg,neu,pos,compound
0,the movie was horrible!,0.558,0.442,0.0,-0.5848


In [15]:
sentiment_analyzer_scores(sent3)

Unnamed: 0,sent,neg,neu,pos,compound
0,the movie was horrible!!,0.577,0.423,0.0,-0.6229


**compund score increases with increase in number of exclamation marks**

## Impact of upper case on sentiment scores

In [16]:
sent1 = 'the product quality was horrible'
sent2 = 'the product quality was HORRIBLE'

In [17]:
sentiment_analyzer_scores(sent1)

Unnamed: 0,sent,neg,neu,pos,compound
0,the product quality was horrible,0.467,0.533,0.0,-0.5423


In [18]:
sentiment_analyzer_scores(sent2)

Unnamed: 0,sent,neg,neu,pos,compound
0,the product quality was HORRIBLE,0.514,0.486,0.0,-0.6408


**Using upper case letters to emphasize a sentiment-relevant word in the presence of other non-capitalized words, increases the magnitude of the sentiment intensity**

## Impact of degree modifiers (adjectives, adverbs) on sentiment

In [19]:
sent1 = "the image quality was good"
sent2 = "the image quality was extremely good"
sent3 = "the image quality was reasonably good"
sent4 = "the image quality was marginally good"

In [20]:
sentiment_analyzer_scores(sent1)

Unnamed: 0,sent,neg,neu,pos,compound
0,the image quality was good,0.0,0.58,0.42,0.4404


In [21]:
sentiment_analyzer_scores(sent2)

Unnamed: 0,sent,neg,neu,pos,compound
0,the image quality was extremely good,0.0,0.61,0.39,0.4927


In [22]:
sentiment_analyzer_scores(sent3)

Unnamed: 0,sent,neg,neu,pos,compound
0,the image quality was reasonably good,0.0,0.633,0.367,0.4404


In [23]:
sentiment_analyzer_scores(sent4)

Unnamed: 0,sent,neg,neu,pos,compound
0,the image quality was marginally good,0.0,0.657,0.343,0.3832


## Impact of conjunctions

In [24]:
sent1 = "the movie was good, but the background score was horrible"

In [25]:
sentiment_analyzer_scores(sent1)

Unnamed: 0,sent,neg,neu,pos,compound
0,"the movie was good, but the background score w...",0.323,0.544,0.133,-0.5859


## Does VADER capture emojis?

In [26]:
sentiment_analyzer_scores('😊')

Unnamed: 0,sent,neg,neu,pos,compound
0,😊,0.0,0.333,0.667,0.7184


In [27]:
sentiment_analyzer_scores('😥')

Unnamed: 0,sent,neg,neu,pos,compound
0,😥,0.275,0.268,0.456,0.3291
