## <font color ='green'> VADER, or the Valence Aware Dictionary and sEntiment Reasoner </font>

#### There are two types of sentiment analyzing approcahes - Polarity and Valence based. 
#### VADER is a VALENCE based sentiment analyzer. 

#### Valence based approach taken into consideration the "intensity" of a word as opposed to only the polarity (+ve or -ve). For ex. "Great" is more treated as more +ve as opposed to "Good". 

#### Sentiment analysis is a perfect approach to sieve and respond to unstructured data - social media, product reviews, etc. 

### Ideal scale for classification based on compound value:
   #### 1.  Neutral = -0.5 > and <= 0.5
   #### 2. Positive = >0.5
   #### 3. Negative = <=-0.5


#### References:
#### http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf

In [14]:
import pandas as pd #Importing the PANDAS python library
import numpy as np #importing Numpy
%matplotlib inline 

#from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer #initiating VADER instance

import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

analyser = SentimentIntensityAnalyzer()

### Sourcing twitter data from Kaggle

In [15]:
#https://www.kaggle.com/crowdflower/twitter-airline-sentiment

sentences = pd.read_csv('../input/Tweets.csv')

len(sentences)

In [16]:
sentences.columns #I dont need all the columns for this demo

In [17]:
sentences.head()

### How does United stack up against its competitors (based on human scoring)?

In [18]:
sentences.groupby(['airline', 'airline_sentiment']).size().unstack().plot(kind='bar',figsize=(11, 5))

In [19]:
sentences = sentences[['airline_sentiment', 'airline','text' ]] #this is all I need
sentences.head()

In [20]:
sentences = sentences[sentences['airline']=='United'] #filtering dataset for United
print(len(sentences))
sentences = sentences.reset_index(drop = True)
sentences.head(10)

In [21]:
sentences.groupby('airline_sentiment').size().plot(kind='bar')

### Quick example of our ML/AI engine - VADER (a Python package)

In [22]:
def print_sentiment_scores(sentence):
    snt = analyser.polarity_scores(sentence)  #Calling the polarity analyzer
    print("{:-<40} {}".format(sentence, str(snt)))

In [23]:
print_sentiment_scores("United flight was a bad experience") #Compound value scale = -1 to 1 (-ve to +ve)

### Calculating score for each tweet in the dataframe/dataset

In [24]:
%time   #to calulate the time it takes the algorithm to compute a VADER score

i=0 #counter

compval1 = [ ]  #empty list to hold our computed 'compound' VADER scores


while (i<len(sentences)):

    k = analyser.polarity_scores(sentences.iloc[i]['text'])
    compval1.append(k['compound'])
    
    i = i+1
    
#converting sentiment values to numpy for easier usage

compval1 = np.array(compval1)

len(compval1)

In [25]:
sentences['VADER score'] = compval1

In [26]:
sentences.head(20)

In [27]:
%time

#Assigning score categories and logic
i = 0

predicted_value = [ ] #empty series to hold our predicted values

while(i<len(sentences)):
    if ((sentences.iloc[i]['VADER score'] >= 0.7)):
        predicted_value.append('positive')
        i = i+1
    elif ((sentences.iloc[i]['VADER score'] > 0) & (sentences.iloc[i]['VADER score'] < 0.7)):
        predicted_value.append('neutral')
        i = i+1
    elif ((sentences.iloc[i]['VADER score'] <= 0)):
        predicted_value.append('negative')
        i = i+1
        

In [28]:
sentences['predicted sentiment'] = predicted_value

In [29]:
len(sentences['predicted sentiment'])

In [30]:
sentences.head(20)

## Let's take a closer look at our results

In [31]:
madeit = sentences[sentences['airline_sentiment']== sentences['predicted sentiment']]

In [32]:
len(madeit)/len(sentences)

In [33]:
madeit.head(20)

In [34]:

sentences.groupby('predicted sentiment').size().plot(kind='bar')

In [35]:
didntmakeit = sentences[sentences['airline_sentiment'] != sentences['predicted sentiment']]

In [36]:
didntmakeit.reset_index(drop=True, inplace=True)
didntmakeit.head(20)

## Examples of where our algorithm did not make the correct prediction

In [37]:
didntmakeit.iloc[8]

In [38]:
didntmakeit.iloc[8]['text']

### Recurring themes/words in the NEGATIVELY branded tweets

In [39]:
from wordcloud import WordCloud,STOPWORDS
import matplotlib.pyplot as plt 

In [40]:
df = madeit[madeit['predicted sentiment']=='negative']

words = ' '.join(df['text'])
cleaned_word = " ".join([word for word in words.split()
                            if 'http' not in word
                                and not word.startswith('@')
                                and word != 'RT'
                            ])

stopwords = set(STOPWORDS)
stopwords.add("amp")
stopwords.add("flight")
stopwords.add("united")
stopwords.add("plane")
stopwords.add("now")

wordcloud = WordCloud(stopwords=stopwords,
                      background_color='black',
                      width=3000,
                      height=2500
                     ).generate(cleaned_word)

In [41]:
type(cleaned_word)

In [42]:
plt.figure(1,figsize=(12, 12))
plt.imshow(wordcloud)
plt.axis('off')
plt.show()

### Recurring themes/words in the POSITIVELY branded tweets

In [43]:
df = madeit[madeit['predicted sentiment']=='positive']

words = ' '.join(df['text'])
cleaned_word = " ".join([word for word in words.split()
                            if 'http' not in word
                                and not word.startswith('@')
                                and word != 'RT'
                                and word !='&amp'
                            ])

stopwords = set(STOPWORDS)
stopwords.add("amp")
stopwords.add("flight")
stopwords.add("flights")
stopwords.add("united")
stopwords.add("plane")

wordcloud = WordCloud(stopwords=stopwords,
                      background_color='black',
                      width=3000,
                      height=2500
                     ).generate(cleaned_word)

In [44]:
plt.figure(1,figsize=(12, 12))
plt.imshow(wordcloud)
plt.axis('off')
plt.show()