## Sentiment Analysis using VADER - Twitter Pipeline

[VADER](https://github.com/cjhutto/vaderSentiment) (Valence Aware Dictionary and Sentiment Reasoner) is a rule-based model for sentiment analysis that takes into account polarity (positive vs. negative) but also intensity of a sentiment.

In [1]:
# To install:


# !conda install -c conda-forge vadersentiment 

# ---- or with pip ----
# !pip install vaderSentiment

In [2]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import pandas as pd
pd.set_option("max_colwidth", 100)
import re



In [3]:
analyser = SentimentIntensityAnalyzer()

### Get data

In [4]:
with open('takei_tweets.txt', 'r') as f:
    text = f.read()

In [5]:
tweets=[x for x in text.splitlines() if x]  # removes empty strings while splitting by newline
tweets[:5]

['RT @GeorgeTakei: Oh, Kai, what a bright light you are. 💕 Your poise and courage are far beyond your years. https://t.co/HB5smCZdNR',
 'RT @GeorgeTakei: What is wrong with him? https://t.co/QDOzOURktz',
 'RT @GeorgeTakei: Such a moving clip. 💕 Thank you @PressSec for these powerful words in solidarity with our trans youth and their families.…',
 'RT @GeorgeTakei: Poor snowflakes. https://t.co/6zV0m8YoF4',
 'RT @GeorgeTakei: Byyyeeee.👋 https://t.co/YAdc0M1H6t']

In [7]:
df = pd.DataFrame(tweets, columns=['tweet_text'])
df

Unnamed: 0,tweet_text
0,"RT @GeorgeTakei: Oh, Kai, what a bright light you are. 💕 Your poise and courage are far beyond y..."
1,RT @GeorgeTakei: What is wrong with him? https://t.co/QDOzOURktz
2,RT @GeorgeTakei: Such a moving clip. 💕 Thank you @PressSec for these powerful words in solidarit...
3,RT @GeorgeTakei: Poor snowflakes. https://t.co/6zV0m8YoF4
4,RT @GeorgeTakei: Byyyeeee.👋 https://t.co/YAdc0M1H6t
...,...
101,"They burnin’ math books as a last ditch effort to make folks think 74,216,154 &gt; 81,268,924."
102,RT @GeorgeTakei: Just awful. https://t.co/C945YgiEH1
103,RT @GeorgeTakei: What the Republican Party has become. https://t.co/jLQKlxaStc
104,RT @GeorgeTakei: This is absurd and offensive on its face. The Trumpification of America continu...


###  Clean data

In [8]:
mentions_regex= '@[A-Za-z0-9]+'  # "+" means one or more times
url_regex='https?:\/\/\S+' # this will catch most URLs; "?" means 0 or 1 time; "S" is anything but whitespace
hashtag_regex= '#'
rt_regex= 'RT\s'

def clean_tweets(tweet):
    tweet = re.sub(mentions_regex, '', tweet)  # removes @mentions
    tweet = re.sub(hashtag_regex, '', tweet) # removes hashtag symbol
    tweet = re.sub(rt_regex, '', tweet) # removes RT to announce retweet   
    tweet = re.sub(url_regex, '', tweet) # removes most URLs
    tweet = re.sub(':', '', tweet)
    return tweet

In [9]:
print(tweets[0])
print('\nbecomes\n')
print(clean_tweets(tweets[0]))

RT @GeorgeTakei: Oh, Kai, what a bright light you are. 💕 Your poise and courage are far beyond your years. https://t.co/HB5smCZdNR

becomes

 Oh, Kai, what a bright light you are. 💕 Your poise and courage are far beyond your years. 


In [10]:
df.tweet_text = df.tweet_text.apply(clean_tweets)

In [11]:
df.head(5)

Unnamed: 0,tweet_text
0,"Oh, Kai, what a bright light you are. 💕 Your poise and courage are far beyond your years."
1,What is wrong with him?
2,Such a moving clip. 💕 Thank you for these powerful words in solidarity with our trans youth an...
3,Poor snowflakes.
4,Byyyeeee.👋


### SA using VADER

In [13]:
# Make dataframe of polarity scores
pol_scores = df['tweet_text'].apply(analyser.polarity_scores).apply(pd.Series)
pol_scores.head(3)


Unnamed: 0,neg,neu,pos,compound
0,0.0,0.606,0.394,0.886
1,0.437,0.563,0.0,-0.4767
2,0.0,0.59,0.41,0.8957


In [14]:
df=pd.concat([df, pol_scores['compound']], axis=1)
df

Unnamed: 0,tweet_text,compound
0,"Oh, Kai, what a bright light you are. 💕 Your poise and courage are far beyond your years.",0.8860
1,What is wrong with him?,-0.4767
2,Such a moving clip. 💕 Thank you for these powerful words in solidarity with our trans youth an...,0.8957
3,Poor snowflakes.,-0.4767
4,Byyyeeee.👋,0.4939
...,...,...
101,"They burnin’ math books as a last ditch effort to make folks think 74,216,154 &gt; 81,268,924.",0.0000
102,Just awful.,-0.4588
103,What the Republican Party has become.,0.4019
104,This is absurd and offensive on its face. The Trumpification of America continues...,-0.4588


In [17]:
df.groupby('compound').count()

Unnamed: 0_level_0,tweet_text
compound,Unnamed: 1_level_1
-0.9231,2
-0.7717,1
-0.7065,1
-0.6557,1
-0.5719,2
-0.4767,4
-0.4588,2
-0.3612,1
-0.2263,1
-0.1513,2
