# Sentiment Analysis of Trump Tweets Regarding Coronavirus

### Notebook Input:
* A JSON file of tweets pulled from [Trump Twitter Archive](http://www.trumptwitterarchive.com/archive) ranging from 12/31/2019 - 5/27/2020

### Notebook Output:
* A JSON file of 200 coronavirus-related tweets
* A JSON file of 200 non-coronavirus-related tweets

In [101]:
import json
import pandas as pd
import random

## Data Cleaning / Pre-Processing

### Filter out  ReTweets

In [102]:
tweets_with_retweets    = []
tweets_without_retweets = []

with open('Tweets_12-31-2019_05-27-2020_to_work.json', encoding="utf8") as f:
  tweets_with_retweets = json.load(f)

for tweet in tweets_with_retweets:
    if(tweet['text'].find('RT') == 0):
        continue
        
    tweets_without_retweets.append(tweet)

with open("Tweets_12-31-2019_05-27-2020_without-retweets.json", "w") as outfile:
    json.dump(tweets_without_retweets, outfile)
    
print(f'All Trump tweets ranging 12/31/2019 - 5/27/2020: {len(tweets_with_retweets)}')
print(f'Trump tweets, filtered out retweets, ranging 12/31/2019 - 5/27/2020: {len(tweets_without_retweets)}')

All Trump tweets ranging 12/31/2019 - 5/27/2020: 4412
Trump tweets, filtered out retweets, ranging 12/31/2019 - 5/27/2020: 2114


### Filter Tweets by Coronavirus-related keywords

In [103]:
tweets_ary                  = []
keyword_filtered_tweets_ary = []

with open('Tweets_12-31-2019_05-27-2020_without-retweets.json', encoding="utf8") as f:
  tweets_ary = json.load(f)

for tweet in tweets_ary:
    lcase_tweet_txt = tweet['text'].lower()
    
    if(lcase_tweet_txt.find('china') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('covid') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('pandemic') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('virus') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('corona') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('hospital') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('ventilator') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('reopen') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('vaccine') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('testing') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('tests') > -1):
        keyword_filtered_tweets_ary.append(tweet)
        
with open("tweets_keyword_filtered.json", "w") as outfile:
    json.dump(keyword_filtered_tweets_ary, outfile)
    
print(f'Total keyword-filtered tweets: {len(keyword_filtered_tweets_ary)}')

Total keyword-filtered tweets: 227


### Identify and remove tweets where coronavirus was neither directory nor indirectly a component of the tweet, or it is a manual quote

In [104]:
pd.set_option('max_colwidth', 300)
pd.set_option('display.max_rows', 10)
tweets_df = pd.DataFrame.from_dict(keyword_filtered_tweets_ary)
tweets_df

Unnamed: 0,text,created_at,id_str
0,"....One person lost to this invisible virus is too much, it should have been stopped at its source, China, but I acted very quickly, and made the right decisions. Many of the current political complainers thought, at the time, that I was moving far to fast, like Crazy Nancy!",Tue May 26 15:18:23 +0000 2020,1265301261903106054
1,"For all of the political hacks out there, if I hadn’t done my job well, &amp; early, we would have lost 1 1/2 to 2 Million People, as opposed to the 100,000 plus that looks like will be the number. That’s 15 to 20 times more than we will lose. I shut down entry from China very early!",Tue May 26 15:18:20 +0000 2020,1265301249630654467
2,"We made most Governors look very good, even great, by getting them the Ventilators, unlimited Testing, and supplies, all of which they should have had in their own stockpiles. So they look great, and I just keep rolling along, doing great things and getting Fake Lamestream News!",Tue May 26 15:01:26 +0000 2020,1265296994056183809
3,"Great reviews on our handling of Covid 19, sometimes referred to as the China Virus. Ventilators, Testing, Medical Supply Distribution, we made a lot of Governors look very good - And got no credit for so doing. Most importantly, we helped a lot of great people!",Mon May 25 20:16:06 +0000 2020,1265013797334507521
4,"Nobody in 50 years has been WEAKER on China than Sleepy Joe Biden. He was asleep at the wheel. He gave them EVERYTHING they wanted, including rip-off Trade Deals. I am getting it all back!",Mon May 25 20:05:34 +0000 2020,1265011145879977985
...,...,...,...
222,"One of the greatest trade deals ever made! Also good for China and our long term relationship. 250 Billion Dollars will be coming back to our Country, and we are now in a great position for a Phase Two start. There has never been anything like this in U.S. history! USMCA NEXT!",Thu Jan 16 13:41:21 +0000 2020,1217804029599920128
223,"National Security Adviser suggested today that sanctions &amp; protests have Iran “choked off”, will force them to negotiate. Actually, I couldn’t care less if they negotiate. Will be totally up to them but, no nuclear weapons and “don’t kill your protesters.”",Sun Jan 12 23:46:18 +0000 2020,1216506722153639942
224,"The government of Iran must allow human rights groups to monitor and report facts from the ground on the ongoing protests by the Iranian people. There can not be another massacre of peaceful protesters, nor an internet shutdown. The world is watching.",Sat Jan 11 22:11:03 +0000 2020,1216120362230067202
225,"To the brave, long-suffering people of Iran: I've stood with you since the beginning of my Presidency, and my Administration will continue to stand with you. We are following your protests closely, and are inspired by your courage.",Sat Jan 11 21:46:18 +0000 2020,1216114135529902081


### Identified Tweets to be Removed
**Duplicate Tweets likely due to typo fixes**
* 1254474709674143749
* 1253684917038452736
* 1249505062554144768
* 1249369344540377089
* 1249100658692648962
* 1243710532714192898
* 1229776851297558529


**Classified as unrelated to coronavirus, either directly or indirectly**
* 1251589681428520960
* 1249103831994118146
* 1230875946334318593
* 1229790100797739008
* 1224374908710420480
* 1220673252764332037
* 1220044230065655808
* 1218952496388956162
* 1217827468230434818
* 1217808535091916800
* 1217804029599920128
* 1216506722153639942
* 1216120362230067202
* 1216114135529902081
* 1212014713808273410


**Tweets are manually entered quotes**
* 1244320570315018240
* 1244320704826310665
* 1242756708902076417
* 1232565919043317761


**Tweet is significantly a quote besides saying "Thank you Tom"**
* 1235594306297253889


In [105]:
print(f'Before count: {len(keyword_filtered_tweets_ary)}')
keyword_filtered_tweets_ary[:] = [tweet for tweet in keyword_filtered_tweets_ary if tweet.get('id_str') not in ['1254474709674143749', '1253684917038452736', '1249505062554144768', '1249369344540377089', '1249100658692648962', '1243710532714192898', '1229776851297558529', '1251589681428520960', '1249103831994118146', '1230875946334318593', '1229790100797739008', '1224374908710420480', '1220673252764332037', '1220044230065655808', '1218952496388956162', '1217827468230434818', '1217808535091916800', '1217804029599920128', '1216506722153639942', '1216120362230067202', '1216114135529902081', '1212014713808273410', '1244320570315018240', '1244320704826310665', '1242756708902076417', '1232565919043317761', '1235594306297253889']]
print(f'After count: {len(keyword_filtered_tweets_ary)}')

Before count: 227
After count: 200


In [106]:
# put the tweets in order from oldest to newest
keyword_filtered_tweets_ary.sort(key=lambda x: int(x['id_str']))

# save the keyword tweets into a JSON file
with open("corona_virus_tweets_cleaned.json", "w") as outfile:


### Make copy of non-coronavirus tweets

In [107]:
non_corona_virus_tweets = []

for tweet in tweets_without_retweets:
    if tweet in keyword_filtered_tweets_ary:
        continue
    else:
        non_corona_virus_tweets.append(tweet)

# save the keyword tweets into a JSON file
with open("non_corona_virus_tweets_all.json", "w") as outfile:
    json.dump(non_corona_virus_tweets, outfile)
        
len(non_corona_virus_tweets)

1914

In [108]:
#### NOTE #### 
# Your shuffle result will differ from the result I received
# For that reason, you may consider commenting out this entire cell
# The next cell will pick up the list of tweets I received when shuffleing

# Randomly select 250 tweets from the non-coronavirus tweets
# The extra 50 tweets serve as replacement tweets during manual review process
# If any tweets are barely original tweets or are merely a link: delete them from the list
# Any remaining tweets above 200 will be popped from the end of the array
# If tweet list falls below 200 tweets: repeat
random.shuffle(non_corona_virus_tweets)
random_250_non_corona_virus_tweets = non_corona_virus_tweets[:250]

# save the random selection so that it can be referenced in the future
with open("random_250_non_corona_virus_tweets.json", "w") as outfile:
    json.dump(random_250_non_corona_virus_tweets, outfile)

In [109]:
# For reproducabilty, loading in random shuffle results JSON file
random_250_non_corona_virus_tweets = []

with open('random_250_non_corona_virus_tweets.json', encoding="utf8") as f:
  random_250_non_corona_virus_tweets = json.load(f)

### Manually identify and remove tweets

In [111]:
pd.set_option('max_colwidth', 300)
pd.set_option('display.max_rows', 10)
tweets_df = pd.DataFrame.from_dict(random_250_non_corona_virus_tweets)
tweets_df

Unnamed: 0,text,created_at,id_str
0,"Nervous Nancy is an inherently “dumb” person. She wasted all of her time on the Impeachment Hoax. She will be overthrown, either by inside or out, just like her last time as “Speaker”. Wallace &amp; @FoxNews are on a bad path, watch! https://t.co/nkEj5YeRjb",Sun Apr 19 16:58:51 +0000 2020,1251918194639548417
1,"This is a total disgrace, but just another reason that I’m going to win Michigan again! https://t.co/XrqveOvYcG",Thu Jan 16 15:49:37 +0000 2020,1217836310657892353
2,"Incredible people, great Rally! https://t.co/3i6tgfqrRl",Sat Feb 22 04:17:45 +0000 2020,1231070547607511040
3,They are staging a coup against Bernie!,Mon Mar 02 21:32:54 +0000 2020,1234592543821705219
4,THANK YOU CALIFORNIA! #KAG2020 https://t.co/7BrkAKYWU0,Wed Mar 04 05:59:06 +0000 2020,1235082320161325057
...,...,...,...
245,"....had happened to a Presidential candidate, or President, who was a Democrat, everybody involved would long ago be in jail for treason (and more), and it would be considered the CRIME OF THE CENTURY, far bigger and more sinister than Watergate!",Thu Jan 02 13:58:02 +0000 2020,1212734798365626369
246,So nice to see this great honor. Thank you (but haven’t played golf in a long time)! https://t.co/FfJyUmRdGi,Sun May 03 14:36:19 +0000 2020,1256955754428448768
247,Great Alan. They are Fake News! https://t.co/n7zUY7mzIQ,Tue Apr 21 03:18:19 +0000 2020,1252436473866973185
248,She is a third rate reporter who has nothing going. A Fake News “journalist”. https://t.co/SopsC7uMMf,Sat Mar 28 01:05:07 +0000 2020,1243705647192997893


### Tweets to remove

**Nothing but a quote**
* 1218164681464000513
* 1220325126186598400
* 1220379537155948544
* 1242756708902076417

**Nothing but a link**
* 1229519051514355713
* 1256676153630363649
* 1250928193755926531
* 1263688831234183168
* 1233474782026436611
* 1253341461401096193
* 1255656645759246341
* 1256355575576834052
* 1222973244644315136
* 1230384588553211907
* 1244409814064775169
* 1235415690292662273
* 1253341575280656385
* 1232005756054069248
* 1247230728045236225
* 1263962362228457475
* 1263597356500451329
* 1220865941434699776
* 1242539178786725891
* 1231302822785953792
* 1221269233126060033
* 1262140755843448832
* 1244682364284014594

**Tweet not in English**
* 1216130169477439488

**Significantly a manual retweet:**
* 1235594306297253889
* 1215651567782780930

In [112]:
# remove the tweets above from the array
random_250_non_corona_virus_tweets[:] = [tweet for tweet in random_250_non_corona_virus_tweets if tweet.get('id_str') not in ['1218164681464000513', '1220325126186598400', '1220379537155948544', '1242756708902076417', '1229519051514355713', '1256676153630363649', '1250928193755926531', '1263688831234183168', '1233474782026436611', '1253341461401096193', '1255656645759246341', '1256355575576834052', '1222973244644315136', '1230384588553211907', '1244409814064775169', '1235415690292662273', '1253341575280656385', '1232005756054069248', '1247230728045236225', '1263962362228457475', '1263597356500451329', '1220865941434699776', '1242539178786725891', '1231302822785953792', '1221269233126060033', '1262140755843448832', '1244682364284014594', '1216130169477439488', '1235594306297253889', '1215651567782780930']]
len(random_250_non_corona_virus_tweets)

220

In [113]:
# grab the first 200 tweets
non_corona_virus_tweets_cleaned = random_250_non_corona_virus_tweets[:200]
len(non_corona_virus_tweets_cleaned)

200

In [116]:
# put the tweets in order from oldest to newest
non_corona_virus_tweets_cleaned.sort(key=lambda x: int(x['id_str']))

# save the keyword tweets into a JSON file
with open("non_corona_virus_tweets_cleaned.json", "w") as outfile:
    json.dump(non_corona_virus_tweets_cleaned, outfile)

## Summary
* This notebook took a list of President Trump's tweets ranging from 12/31/2019 - 5/27/2020. 
* After conducting automated and manual processes, a set of 200 coronavirus-related tweets, and a set of 200 non-coronavirus-related tweets were identified 
* Each set of tweets was saved into a JSON file