# Sentiment Analysis of Trump Tweets Regarding Coronavirus

In [99]:
import json
import pandas as pd

## Data Cleaning / Pre-Processing

### Filter out the  ReTweets
All tweets ranging from 12/31/2019 - 5/27/2020
Manually filtered out retweets


In [100]:
tweets_with_retweets = None

with open('Tweets_12-31-2019_05-27-2020_to_work.json', encoding="utf8") as f:
  tweets_with_retweets = json.load(f)

tweets_without_retweets = []

for tweet in tweets_with_retweets:
    if(tweet['text'].find('RT') == 0):
        continue
        
    tweets_without_retweets.append(tweet)

with open("Tweets_12-31-2019_05-27-2020_without-retweets.json", "w") as outfile:
    json.dump(tweets_without_retweets, outfile)
    
print(f'All Trump tweets ranging 12/31/2019 - 5/27/2020: {len(tweets_with_retweets)}')
print(f'Trump tweets, filtered out retweets, ranging 12/31/2019 - 5/27/2020: {len(tweets_without_retweets)}')

All Trump tweets ranging 12/31/2019 - 5/27/2020: 4412
Trump tweets, filtered out retweets, ranging 12/31/2019 - 5/27/2020: 2114


### Filter Tweets by Coronavirus-related keywords

In [101]:
import json

tweets_ary = None
with open('Tweets_12-31-2019_05-27-2020_without-retweets.json', encoding="utf8") as f:
  tweets_ary = json.load(f)

keyword_filtered_tweets_ary = []

for tweet in tweets_ary:

    lcase_tweet_txt = tweet['text'].lower()
    if(lcase_tweet_txt.find('china') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('covid') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('pandemic') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('virus') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('corona') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('hospital') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('ventilator') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('reopen') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('vaccine') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('testing') > -1):
        keyword_filtered_tweets_ary.append(tweet)
    elif(lcase_tweet_txt.find('tests') > -1):
        keyword_filtered_tweets_ary.append(tweet)
        
with open("tweets_keyword_filtered.json", "w") as outfile:
    json.dump(keyword_filtered_tweets_ary, outfile)
    
print(f'Total keyword-filtered tweets: {len(keyword_filtered_tweets_ary)}')

Total keyword-filtered tweets: 227


### Identify and remove tweets where coronavirus was neither directory nor indirectly a component of the tweet

In [102]:
pd.set_option('max_colwidth', 300)
pd.set_option('display.max_rows', 250)
tweets_df = pd.DataFrame.from_dict(keyword_filtered_tweets_ary)
tweets_df

Unnamed: 0,text,created_at,id_str
0,"....One person lost to this invisible virus is too much, it should have been stopped at its source, China, but I acted very quickly, and made the right decisions. Many of the current political complainers thought, at the time, that I was moving far to fast, like Crazy Nancy!",Tue May 26 15:18:23 +0000 2020,1265301261903106054
1,"For all of the political hacks out there, if I hadn’t done my job well, &amp; early, we would have lost 1 1/2 to 2 Million People, as opposed to the 100,000 plus that looks like will be the number. That’s 15 to 20 times more than we will lose. I shut down entry from China very early!",Tue May 26 15:18:20 +0000 2020,1265301249630654467
2,"We made most Governors look very good, even great, by getting them the Ventilators, unlimited Testing, and supplies, all of which they should have had in their own stockpiles. So they look great, and I just keep rolling along, doing great things and getting Fake Lamestream News!",Tue May 26 15:01:26 +0000 2020,1265296994056183809
3,"Great reviews on our handling of Covid 19, sometimes referred to as the China Virus. Ventilators, Testing, Medical Supply Distribution, we made a lot of Governors look very good - And got no credit for so doing. Most importantly, we helped a lot of great people!",Mon May 25 20:16:06 +0000 2020,1265013797334507521
4,"Nobody in 50 years has been WEAKER on China than Sleepy Joe Biden. He was asleep at the wheel. He gave them EVERYTHING they wanted, including rip-off Trade Deals. I am getting it all back!",Mon May 25 20:05:34 +0000 2020,1265011145879977985
5,"Sleepy Joe Biden (mostly his reps.) went crazy when I banned, in late January, people coming in from China. He called me “xenophobic” &amp; then went equally “nuts” when we let in 44,000 people - until he was told they were American citizens coming home. He later apologized!",Mon May 25 20:00:26 +0000 2020,1265009852516110336
6,"I give, and have given from the beginning, my entire yearly salary, $400,000 to $450,000, back to our government. Last check to HHS Covid relief. My great honor! https://t.co/Vv6Uyz9MkF",Mon May 25 01:57:23 +0000 2020,1264737293249794048
7,"The United States cannot have all Mail In Ballots. It will be the greatest Rigged Election in history. People grab them from mailboxes, print thousands of forgeries and “force” people to sign. Also, forge names. Some absentee OK, when necessary. Trying to use Covid for this Scam!",Sun May 24 14:08:36 +0000 2020,1264558926021959680
8,"The Wacky Do Nothing Attorney General of Michigan, Dana Nessel, is viciously threatening Ford Motor Company for the fact that I inspected a Ventilator plant without a mask. Not their fault, &amp; I did put on a mask. No wonder many auto companies left Michigan, until I came along!",Fri May 22 03:14:05 +0000 2020,1263669433366728704
9,I will be lowering the flags on all Federal Buildings and National Monuments to half-staff over the next three days in memory of the Americans we have lost to the CoronaVirus....,Thu May 21 22:41:20 +0000 2020,1263600794290417670


### Identified Tweets to be Removed
**Duplicate Tweets likely due to typo fixes**
* 1254474709674143749
* 1253684917038452736
* 1249505062554144768
* 1249369344540377089
* 1249100658692648962
* 1243710532714192898
* 1229776851297558529

**Classified as unrelated to coronavirus, either directly or indirectly**
* 1251589681428520960
* 1249103831994118146
* 1230875946334318593
* 1229790100797739008
* 1224374908710420480
* 1220673252764332037
* 1220044230065655808
* 1218952496388956162
* 1217827468230434818
* 1217808535091916800
* 1217804029599920128
* 1216506722153639942
* 1216120362230067202
* 1216114135529902081
* 1212014713808273410

**Tweets are manually entered quotes**
* 1244320570315018240
* 1244320704826310665
* 1242756708902076417
* 1232565919043317761

**Tweet is significantly a quote besides saying "Thank you Tom"**
* 1235594306297253889


In [103]:
print(f'Before count: {len(keyword_filtered_tweets_ary)}')
keyword_filtered_tweets_ary[:] = [tweet for tweet in keyword_filtered_tweets_ary if tweet.get('id_str') not in ['1254474709674143749', '1253684917038452736', '1249505062554144768', '1249369344540377089', '1249100658692648962', '1243710532714192898', '1229776851297558529', '1251589681428520960', '1249103831994118146', '1230875946334318593', '1229790100797739008', '1224374908710420480', '1220673252764332037', '1220044230065655808', '1218952496388956162', '1217827468230434818', '1217808535091916800', '1217804029599920128', '1216506722153639942', '1216120362230067202', '1216114135529902081', '1212014713808273410', '1244320570315018240', '1244320704826310665', '1242756708902076417', '1232565919043317761', '1235594306297253889']]
print(f'After count: {len(keyword_filtered_tweets_ary)}')

Before count: 227
After count: 200


In [104]:
with open("tweets_cleaned.json", "w") as outfile:
    json.dump(keyword_filtered_tweets_ary, outfile)

### Final List of Tweets
* Dating 12/31/2019 - 5/27/2020
* Tweets include keywords
* Removed additional tweets after manual inspection