# Comparing VADER scores

Here we want to compare the scores from VADER to the scores assigned in the Senti140 original data ('original', in the data) 

In [1]:
import pandas as pd

In [3]:
df = pd.read_csv('../data/vaderScores.csv')
df.head()

Unnamed: 0,originalScore,sentence,compoundScore,positiveScore,neutralScore,negativeScore
0,4,@stellargirl I loooooooovvvvvveee my Kindle2. ...,0.5695,0.268,0.614,0.117
1,4,Reading my kindle2... Love it... Lee childs i...,0.7964,0.47,0.53,0.0
2,4,"Ok, first assesment of the #kindle2 ...it fuck...",0.4724,0.278,0.722,0.0
3,4,@kenburbary You'll love your Kindle2. I've had...,0.7772,0.285,0.593,0.122
4,4,@mikefish Fair enough. But i have the Kindle2...,0.8402,0.5,0.5,0.0


using the scoring rule available from the VADER git repo: https://github.com/cjhutto/vaderSentiment:

* positive sentiment: compound score >= 0.5
* neutral sentiment: (compound score > -0.5) and (compound score < 0.5)
* negative sentiment: compound score <= -0.5

let's assign a classification to the tweets

In [13]:
df['vaderClassification'] = ['positive' if x >=  0.5 
                             else 'negative' if x <= -0.5
                             else 'neutral'
                             for x in df['compoundScore']]


the mapping between originalScore and classification is (from http://help.sentiment140.com/for-students/)

* 0 = negative,
* 2 = neutral, 
* 4 = positive

so let's code that up too:

df['senti140Classification'] = ['positive' if x ==  4 
                             else 'negative' if x == 0
                             else 'neutral'
                             for x in df['originalScore']]

## Comparing VADER scores to Sentiment 140 scores

#### How many tweets do we get the same classification?
now, let's see how often these scores agree...

In [23]:
df[df['vaderClassification'] == df['senti140Classification']].shape[0]

299

which is what percentage of tweets?


In [24]:
df[df['vaderClassification'] == df['senti140Classification']].shape[0] / df.shape[0]

0.6004016064257028

#### How many are polar opposites (the 'problematic' case)

In [32]:
polarOpposite = df.copy()
polarOpposite = polarOpposite[( (df['vaderClassification'] == 'positive' ) & (df['senti140Classification'] == 'negative') ) 
                              |
                                (df['vaderClassification'] == 'negative' ) & (df['senti140Classification'] == 'positive')]

In [33]:
polarOpposite.shape[0]

21

In [34]:
polarOpposite.shape[0] / df.shape[0]

0.04216867469879518

what tweets are they?

In [46]:
polarOpposite.shape
pd.options.display.max_colwidth = 140

polarOpposite

# newdf = polarOpposite[['sentence', 'vaderClassification', 'senti140Classification']]

# for index, row in newdf.iterrows():
#     sentence = polarOpposite['sentence']
#     print(sentence)

Unnamed: 0,sentence,vaderClassification,senti140Classification
18,"@ludajuice Lebron is a Beast, but I'm still cheering 4 the A..til the end.",positive,negative
24,"good news, just had a call from the Visa office, saying everything is fine.....what a relief! I am sick of scams out there! Stealing!",negative,positive
136,Night at the Museum tonite instead of UP. :( oh well. that 4 yr old better enjoy it. LOL,positive,negative
139,Tell me again why we are giving more $$ to GM?? We should use that $ for all the programs that support the unemployed.,positive,negative
140,@jdreiss oh yes but if GM dies it will only be worth more boo hahaha,positive,negative
149,I hate Time Warner! Soooo wish I had Vios. Cant watch the fricken Mets game w/o buffering. I feel like im watching free internet porn.,positive,negative
150,Ahh...got rid of stupid time warner today &amp; now taking a nap while the roomies cook for me. Pretty good end for a monday :),positive,negative
162,"My wrist still hurts. I have to get it looked at. I HATE the dr/dentist/scary places. :( Time to watch Eagle eye. If you want to join, txt!",negative,positive
231,"@MMBarnhill yay, glad you got the phone! Still, damn you, AT&amp;T.",positive,negative
255,@Lou911 Lebron is MURDERING shit.,negative,positive
