## Sentiment Analysis of Trump Tweets

The notebook follows the tutorial of Amazon comprehend to generate an example sentiment analysis with Trump's tweets:
https://aws.amazon.com/blogs/machine-learning/detect-sentiment-from-customer-reviews-using-amazon-comprehend/

First import the packages.

In [14]:
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_columns', None)  
pd.set_option('display.max_colwidth', -1)

Now loading the tweets from trump.

In [15]:
df_tweets = pd.read_csv('data/trumptweets-1515775693.tweets.csv')

In [58]:
df_tweets.head()

Unnamed: 0,status_id,created_at,user_id,screen_name,text,source,display_text_width,reply_to_status_id,reply_to_user_id,reply_to_screen_name,is_quote,is_retweet,favorite_count,retweet_count,hashtags,symbols,urls_url,urls_t.co,urls_expanded_url,media_url,media_t.co,media_expanded_url,media_type,ext_media_url,ext_media_t.co,ext_media_expanded_url,ext_media_type,mentions_user_id,mentions_screen_name,lang,quoted_status_id,quoted_text,quoted_created_at,quoted_source,quoted_favorite_count,quoted_retweet_count,quoted_user_id,quoted_screen_name,quoted_name,quoted_followers_count,quoted_friends_count,quoted_statuses_count,quoted_location,quoted_description,quoted_verified,retweet_status_id,retweet_text,retweet_created_at,retweet_source,retweet_favorite_count,retweet_user_id,retweet_screen_name,retweet_name,retweet_followers_count,retweet_friends_count,retweet_statuses_count,retweet_location,retweet_description,retweet_verified,place_url,place_name,place_full_name,place_type,country,country_code,geo_coords,coords_coords,bbox_coords
0,x1864367186,2009-05-20 22:29:47,x25073877,realDonaldTrump,Read a great interview with Donald Trump that appeared in The New York Times Magazine: http://tinyurl.com/qsx4o6,Twitter Web Client,112,,,,False,False,11,11,,,,,,,,,,,,,,,,en,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,x9273573134835712,2010-11-29 15:52:46,x25073877,realDonaldTrump,"Congratulations to Evan Lysacek for being nominated SI sportsman of the year. He's a great guy, and he has my vote! #EvanForSI",Twitter Web Client,127,,,,False,False,7,32,EvanForSI,,,,,,,,,,,,,,,en,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,x29014512646,2010-10-28 18:53:40,x25073877,realDonaldTrump,"I was on The View this morning. We talked about The Apprentice. Tonight's episode is a great one--tough, exciting and surprising. 10 pm/NBC",Twitter Web Client,139,,,,False,False,6,36,,,,,,,,,,,,,,,,en,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,x7483813542232064,2010-11-24 17:20:54,x25073877,realDonaldTrump,Tomorrow night's episode of The Apprentice delivers excitement at QVC along with appearances by Isaac Mizrahi and Cathie Black. 10 pm on NBC,Twitter Web Client,140,,,,False,False,17,37,,,,,,,,,,,,,,,,en,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,x5775731054,2009-11-16 21:06:10,x25073877,realDonaldTrump,"Donald Trump Partners with TV1 on New Reality Series Entitled, Omarosa's Ultimate Merger: http://tinyurl.com/yk5m3lc",Twitter Web Client,116,,,,False,False,3,6,,,,,,,,,,,,,,,,en,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [8]:
df_tweets.describe()

Unnamed: 0,display_text_width,favorite_count,retweet_count,ext_media_type,quoted_source,quoted_favorite_count,quoted_retweet_count,quoted_followers_count,quoted_friends_count,quoted_statuses_count,retweet_source,retweet_favorite_count,retweet_followers_count,retweet_friends_count,retweet_statuses_count
count,32826.0,32826.0,32826.0,0.0,0.0,276.0,276.0,276.0,276.0,276.0,0.0,494.0,494.0,494.0,494.0
mean,109.698257,9055.790806,2804.625053,,,7841.217391,4041.894928,5500690.0,3337.996377,49810.717391,,29565.587045,3346368.0,6986.408907,44878.412955
std,35.857616,26286.393818,7946.08035,,,44475.708665,33723.741706,13387730.0,13728.875815,80816.639783,,22907.405341,8771462.0,39068.233127,72935.437182
min,4.0,0.0,0.0,,,0.0,0.0,2.0,0.0,44.0,,2155.0,696.0,2.0,128.0
25%,89.0,22.0,18.0,,,92.0,69.75,6430.25,122.75,6822.0,,13103.5,262323.2,45.0,5329.0
50%,122.0,76.0,121.0,,,822.5,585.5,176469.0,593.0,19131.0,,25364.0,1108517.0,586.0,13915.0
75%,137.0,2125.75,1320.75,,,4359.0,2014.75,1742667.0,1800.5,43122.25,,38698.5,2002899.0,1557.0,42848.5
max,296.0,617587.0,361835.0,,,714674.0,557891.0,46562570.0,194242.0,486261.0,,171077.0,98903720.0,625524.0,353748.0


Assign a rec_id to each row.

In [9]:
df_tweets = df_tweets.assign(rec_id=np.arange(len(df_tweets))).reset_index(drop=True)

Now generate for each tweet an txt file. After that we load the txt files in AWS S3 bucket.

In [10]:
file = 'data/trumptweets/trumptweets_rec_id_{}.txt'

for index, row in df_tweets[:1000].iterrows():
    rec_id = row['rec_id']
    with open (file.format(rec_id), 'w') as f:
        f.write(str(row['text']))

Create table with Amazon Athena to query the results of sentiment analysis. Now we can query the table and sort the tweets according to the negative score.

``` mysql
CREATE EXTERNAL TABLE IF NOT EXISTS default.ReviewSentimentAnalysis (
  `ImageLocation` string,
  `Timestamp` string,   
  `Sentiment` string,
  `Positive` string,
  `Negative` string,
  `Neutral` string,
  `Mixed` string
  )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = ',',
  'field.delim' = ','
) LOCATION 's3://<bucket_name>/sentiment/'
;

SELECT *
FROM default.ReviewSentimentAnalysis
WHERE sentiment = 'NEGATIVE'
  AND imagelocation LIKE '%trump%'
ORDER BY negative DESC;
```

In [11]:
df_tweets[df_tweets['rec_id']==806]['text']

806    Disappointed in GOP and Dems---Giving Obama power to raise the debt limit next year is  a mistake.
Name: text, dtype: object

In [12]:
df_tweets[df_tweets['rec_id']==505]['text']

505    Why is the UN condemning @Israel and doing nothing about Syria? What a disgrace.
Name: text, dtype: object

In [13]:
df_tweets[df_tweets['rec_id']==700]['text']

700    @BarackObama is so inept that I think he simply  made a mistake in originally scheduling the Joint Session on September 7th. Just sad.
Name: text, dtype: object