# Predicting the Sentiment of a Tweet
This notebook will explore the use of two different machine learning models trained to predict the sentiment of any given Tweet.

TODO define sentiment...scale of 0 to 1.0

## Importing the Pre-Trained Models

In [16]:
import os, sys
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
sys.path.append(os.path.join(os.getcwd(),'src'))
from NaiveBayesTwitter import predict_tweet_sentiment as nb_predict_sentiment
from LstmTwitter import predict_tweet_sentiment as lstm_predict_sentiment
from LstmTwitter import data_dir

# Naive Bayes Model

## Example Test Cases for Illustration

In [5]:
test_tweet = "I ordered just once from TerribleCo, they screwed up, never used the app again."
sentiment = nb_predict_sentiment(test_tweet)
print(f"Naive Bayes Sentiment Prediction : {sentiment:.2f}")

Naive Bayes Sentiment Prediction : 0.22


In [6]:
test_tweet = "I loved the show today! It was amazing."
sentiment = nb_predict_sentiment(test_tweet)
print(f"Naive Bayes Sentiment Prediction : {sentiment:.2f}")

Naive Bayes Sentiment Prediction : 0.76


In [7]:
test_tweet = "No idea."
sentiment = nb_predict_sentiment(test_tweet)
print(f"Naive Bayes Sentiment Prediction : {sentiment:.2f}")

Naive Bayes Sentiment Prediction : 0.57


In [8]:
test_tweet = "Good to see you yesterday. Let me know next time you're in town."
sentiment = nb_predict_sentiment(test_tweet)
print(f"Naive Bayes Sentiment Prediction : {sentiment:.2f}")

Naive Bayes Sentiment Prediction : 0.74


In [9]:
test_tweet = "You're a terrible pool player."
sentiment = nb_predict_sentiment(test_tweet)
print(f"Naive Bayes Sentiment Prediction : {sentiment:.2f}")

Naive Bayes Sentiment Prediction : 0.11


In [10]:
test_tweet = "I'm sick of waiting in line. I doubt this is going to be worth it."
sentiment = nb_predict_sentiment(test_tweet)
print(f"Naive Bayes Sentiment Prediction : {sentiment:.2f}")

Naive Bayes Sentiment Prediction : 0.13


## Shortcomings of the Naive Bayes Model
Here are a few test cases that show some of the shortcomings of the Naive Bayes model.

In [11]:
test_tweet = "I'm not happy to see you."
sentiment = nb_predict_sentiment(test_tweet)
print(f"Naive Bayes Sentiment Prediction : {sentiment:.2f}")

Naive Bayes Sentiment Prediction : 0.75


In [12]:
# show a few more edge cases

# Deep Learning Model - LSTM 

## Example Test Cases for Illustration

In [13]:
test_tweet = "Good to see you yesterday. Let me know next time you're in town."
sentiment = lstm_predict_sentiment(test_tweet)
print(f"LSTM Sentiment Prediction : {sentiment:.2f}")

LSTM Sentiment Prediction : 1.00


In [14]:
test_tweet = "I'm not happy to see you."
sentiment = lstm_predict_sentiment(test_tweet)
print(f"LSTM Sentiment Prediction : {sentiment:.2f}")

LSTM Sentiment Prediction : 0.01


## Testing with a Sample Data Set

In [17]:
import pandas as pd
# Provide the sample data set to use
data_file = os.path.join(data_dir, 'sentiment140', 'training.1600000.processed.noemoticon.csv')
# Reading the dataset with no columns titles and with latin encoding 
df_raw = pd.read_csv(data_file, encoding = "ISO-8859-1", header=None)
 # As the data has no column titles, we will add our own
df_raw.columns = ["label", "time", "date", "query", "username", "text"]
# Show the first 5 rows of the dataframe.
# You can specify the number of rows to be shown as follows: df_raw.head(10)
df_raw.head()

Unnamed: 0,label,time,date,query,username,text
0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, t..."
1,0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by ...
2,0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Man...
3,0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire
4,0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all...."


In [18]:
# Remove every column except for the text and the label
df_raw = df_raw[['label', 'text']]
df_raw.head()

Unnamed: 0,label,text
0,0,"@switchfoot http://twitpic.com/2y1zl - Awww, t..."
1,0,is upset that he can't update his Facebook by ...
2,0,@Kenichan I dived many times for the ball. Man...
3,0,my whole body feels itchy and like its on fire
4,0,"@nationwideclass no, it's not behaving at all...."


This sample data set is huge (1,600,000 rows). For this demonstration, we can trim the data to $\frac{1}{100} $th of its original size. That still gives us 16,000 Tweets to analyze. Which is plenty for the purposes of this demonstration. We can also maintain the data balance while trimming the dataframe. This will let us have an equal distribution of positive/negative sentiment Tweets.

In [26]:
# Separating positive and negative rows
df_pos = df_raw[df_raw['label'] == 4]
df_neg = df_raw[df_raw['label'] == 0]
print(f"Original data set number of positive Tweets: {len(df_pos)}")
print(f"Original data set number of negative Tweets: {len(df_neg)}")
# Only retaining 1/100th of our data from each output group
df_pos = df_pos.iloc[:int(len(df_pos)/10000)]
df_neg = df_neg.iloc[:int(len(df_neg)/10000)]
print(f"Sample data set number of positive Tweets: {len(df_pos)}")
print(f"Sample data set number of negative Tweets: {len(df_neg)}")
# Join the positive and negative groups and store them back into a single dataframe
df = pd.concat([df_pos, df_neg])
df.head()

Original data set number of positive Tweets: 800000
Original data set number of negative Tweets: 800000
Sample data set number of positive Tweets: 80
Sample data set number of negative Tweets: 80


Unnamed: 0,label,text
800000,4,I LOVE @Health4UandPets u guys r the best!!
800001,4,im meeting up with one of my besties tonight! ...
800002,4,"@DaRealSunisaKim Thanks for the Twitter add, S..."
800003,4,Being sick can be really cheap when it hurts t...
800004,4,@LovesBrooklyn2 he has that effect on everyone


In [45]:
df['nb_sentiment'] = df['text'].apply(nb_predict_sentiment)
df['lstm_sentiment'] = df['text'].apply(lstm_predict_sentiment)
df.head()

KeyboardInterrupt: 

### 10 Tweets with Highest Sentiment - NB

In [49]:
pd.set_option('display.max_colwidth', None)
top_ten_nb_tweets = df.sort_values(by=['nb_sentiment'], ascending=False)[:10]
top_ten_nb_tweets.head(10)

Unnamed: 0,label,text,lstm_sentiment,nb_sentiment
800054,4,"@QuotableBuffy I got a bunch of Buffy songs too! One of my faves is &quot;Vivian&quot; by Nerf Herder, when Faith met Spike in Buffy's body.",0.99729,0.999983
800035,4,@hawaii808shellz hAhAHA!! omG! we wer bOth laughiN off d hOOk! cuz das hOW we roLLL...ryt sheLdawg?,0.841972,0.999094
800061,4,@fuzeb they are so serious too while singing like.. whoa hehe lsd maybe? j/k lolol,0.953605,0.99869
800002,4,"@DaRealSunisaKim Thanks for the Twitter add, Sunisa! I got to meet you once at a HIN show here in the DC area and you were a sweetheart.",0.972336,0.998607
800008,4,"@tommcfly ah, congrats mr fletcher for finally joining twitter",0.999225,0.998022
800020,4,Didn't place in the Peeps contest but thanks for voting anyways.,0.839463,0.99575
800034,4,@wisdomous you're welcome. glad you enjoyed it.,0.999851,0.994929
800014,4,@LutheranLucciol Make sure you DM me if you post a link to that video! &lt;LOL&gt;So I don't miss it Better get permission and blessing first?,0.985767,0.994149
800016,4,@michellardi i really don't know. i think its Globe! yeah! sana gumaling na ko para alam ko na din kung makakasama ako! ),0.268277,0.992661
800079,4,@mattgalloway Thanks for the hook up with @CarlyRush and suggesting me again bro! you rock!,0.667948,0.990486


### 10 Tweets with Highest Sentiment - LSTM

In [50]:
top_ten_lstm_tweets = df.sort_values(by=['lstm_sentiment'], ascending=False)[:10]
top_ten_lstm_tweets.head(10)

Unnamed: 0,label,text,lstm_sentiment,nb_sentiment
800057,4,uploading pictures on friendster,0.99998,0.788164
800015,4,Just added tweetie to my new iPhone,0.999963,0.897971
800078,4,had a good tech meeting at clubZone - dinner was sushi,0.999863,0.882736
800034,4,@wisdomous you're welcome. glad you enjoyed it.,0.999851,0.994929
800062,4,loving life... and loving you,0.999789,0.72759
800017,4,@nicolerichie: your picture is very sweet,0.999703,0.711334
800053,4,"@ladygaga Can't wait to see ur hot ass in Austin! woot woot!!! annnd love the bob with purple, i went the royal color way as well",0.999678,0.957558
800041,4,"@JonathanRKnight Hi Jon! Great to hear from you! See you on the cruise, I cannot wait! Hope all is well on the Knight bus! You are loved!",0.999443,0.907231
800056,4,"Morning Tweetland, a long day ahead! Hope everyone has a great day",0.99941,0.963194
800008,4,"@tommcfly ah, congrats mr fletcher for finally joining twitter",0.999225,0.998022


### 10 Tweets with Lowest Sentiment - NB

In [51]:
bottom_ten_nb_tweets = df.sort_values(by=['nb_sentiment'], ascending=True)[:10]
bottom_ten_nb_tweets.head(10)

Unnamed: 0,label,text,lstm_sentiment,nb_sentiment
44,0,Falling asleep. Just heard about that Tracy girl's body being found. How sad My heart breaks for that family.,0.100498,0.000105
42,0,"Sad, sad, sad. I don't know why but I hate this feeling I wanna sleep and I still can't!",1.1e-05,0.000178
39,0,Bed. Class 8-12. Work 12-3. Gym 3-5 or 6. Then class 6-10. Another day that's gonna fly by. I miss my girlfriend,0.015455,0.000367
55,0,"@andywana Not sure what they are, only that they are PoS! As much as I want to, I dont think can trade away company assets sorry andy!",0.010563,0.000494
16,0,Hollis' death scene will hurt me severely to watch on film wry is directors cut not out now?,0.044554,0.000991
65,0,@Starrbby too bad I won't be around I lost my job and can't even pay my phone bill lmao aw shucks,0.019793,0.001098
60,0,"@BatManYNG I miss my ps3, it's out of commission Wutcha playing? Have you copped 'Blood On The Sand'?",0.014642,0.001463
35,0,ok I'm sick and spent an hour sitting in the shower cause I was too sick to stand and held back the puke like a champ. BED now,0.04463,0.001658
76,0,"@ashleyac My donkey is sensitive about such comments. Nevertheless, he'd (and me'd) be glad to see your mug asap. Charger is still awol.",0.321593,0.00198
48,0,is strangely sad about LiLo and SamRo breaking up.,0.000133,0.002354


### 10 Tweets with Lowest Sentiment - LSTM

In [53]:
bottom_ten_lstm_tweets = df.sort_values(by=['lstm_sentiment'], ascending=True)[:20]
bottom_ten_lstm_tweets.head(20)

Unnamed: 0,label,text,lstm_sentiment,nb_sentiment
42,0,"Sad, sad, sad. I don't know why but I hate this feeling I wanna sleep and I still can't!",1.1e-05,0.000178
1,0,is upset that he can't update his Facebook by texting it... and might cry as a result School today also. Blah!,1.7e-05,0.005633
75,0,No picnic my phone smells like citrus.,7.4e-05,0.453582
48,0,is strangely sad about LiLo and SamRo breaking up.,0.000133,0.002354
23,0,this week is not going as i had hoped,0.000276,0.313808
66,0,Damm back to school tomorrow,0.000698,0.054309
58,0,Ugh....92 degrees tomorrow,0.001214,0.083526
43,0,@JonathanRKnight Awww I soo wish I was there to see you finally comfortable! Im sad that I missed it,0.001334,0.003522
79,0,wonders why someone that u like so much can make you so unhappy in a split seccond . depressed .,0.00137,0.004687
27,0,im sad now Miss.Lilly,0.002518,0.023385
