In [1]:
import pandas as pd

## Objective

For the purpose of this analysis, I will attempt to measure the sentiment of tweets to learn whether tweets impact the number of Covid-19 cases and deaths in the United States. 

To create the dataset, I utilized the TWINT library to collect all tweets from January 1,2020 until July 10th. I then made various subsets of the tweets. For example, to measure the impact of tweets by public leaders viewed as polar opposites regarding their response to the pandemic, I collected tweets by President Trump and the Governor of New York, Andrew Cuomo. Another subset of tweets that I labeled as baseline consists of tweets by the New York Times and Washington Post - two of America's leading journalism outlets.

The purpose of creating these subsets is that the baseline tweets can be considered to be those that communicate mainly fact. While they might have op-ed columnists, we can assume that most tweets from the news reporting divisions will provide factual updates on the Covid response. By considering the two polar opposites, Trump and Cuomo, we can measure Covid outcomes, in terms of cases, after the tweets have been consumed by the public. Finally, the main Covid collection will allow us to see whether more individuals subscribed to the Trump/Cuomo tweets and how Covid cases changed, for the positive or negative, in their region.

## Obtaining Data

For the notebooks that contain the queries for the tweets gathered on TWINT, please refer to the Covid Data Queries notebook in the repo. The JSON files for these queries were used to create DataFrames.

In [2]:
#All Covid tweets
All_Covid_tweets = pd.read_json('Covid_tweets3.json',lines=True)

#All Trump tweets
Trump_Covid_tweets = pd.read_json('Trump_Covid_tweets3.json', lines=True)

#All Cuomo tweets
Cuomo_Covid_tweets = pd.read_json('Cuomo_Covid_tweets3.json',lines=True)

#Baseline Tweets
NYTimes_tweets = pd.read_json('Nytimes_Covid_tweets3.json',lines=True)
#print( len(NYTimes_tweets))
WashingtonPost_tweets = pd.read_json('Washpost_tweets3.json',lines=True)
#print( len(Washpost_tweets3.json))

In [3]:
#combining NYTimes and Washington Post to get Baseline Tweets
Baseline_tweets = pd.concat([NYTimes_tweets,WashingtonPost_tweets],axis=0)

In [4]:
# Covid data set

covid_cases = pd.read_csv('time_series_covid_19_confirmed.csv')

#Getting US data - confirmed cases
covid_cases = covid_cases[covid_cases['Country/Region'] == 'US']
#covid_cases = covid_cases.transpose()

# Covid death data set

covid_deaths = pd.read_csv('time_series_covid_19_deaths.csv')


#Getting US data - confirmed cases

#covid_deaths = covid_deaths.transpose()
covid_deaths = covid_deaths[covid_deaths['Country/Region'] == 'US']


In [5]:
#Covid cases and deaths (still need to rename columns, from left to right = cases then deaths)
covid_data = pd.concat([covid_cases,covid_deaths],axis=0)
covid_data = covid_data.transpose()
covid_data.head()

Unnamed: 0,225,225.1
Province/State,,
Country/Region,US,US
Lat,37.0902,37.0902
Long,-95.7129,-95.7129
1/22/20,1,0


In [6]:
#Getting rid of unnecessary rows
covid_data = covid_data.drop(['Province/State','Country/Region','Lat','Long'])

In [7]:
covid_data.head()

Unnamed: 0,225,225.1
1/22/20,1,0
1/23/20,1,0
1/24/20,2,0
1/25/20,2,0
1/26/20,5,0


In [8]:
All_Covid_tweets.head()

Unnamed: 0,id,conversation_id,created_at,date,time,timezone,user_id,username,name,place,...,geo,source,user_rt_id,user_rt,retweet_id,reply_to,retweet_date,translate,trans_src,trans_dest
0,1281376194600943616,1279120917134741504,2020-07-09 23:54:26,2020-07-09,19:54:26,EDT,1198522350305591297,roguemender,Rogue Mender,,...,,,,,,"[{'user_id': '1198522350305591297', 'username'...",,,,
1,1279496527812018176,1279496527812018176,2020-07-04 19:25:18,2020-07-04,15:25:18,EDT,720317280186290176,ryancummingsraw,Ryan Cummings,,...,,,,,,"[{'user_id': '720317280186290176', 'username':...",,,,
2,1279249277538230277,1279239739980181504,2020-07-04 03:02:49,2020-07-03,23:02:49,EDT,826479743885332480,anotherlattepls,Sheltering in Place in Mass.,,...,,,,,,"[{'user_id': '826479743885332480', 'username':...",,,,
3,1277602878329405442,1277602878329405440,2020-06-29 14:00:37,2020-06-29,10:00:37,EDT,1121060503567179776,ruth01467626,ruth,,...,,,,,,"[{'user_id': '1121060503567179776', 'username'...",,,,
4,1277602593880055808,1276576246567014400,2020-06-29 13:59:29,2020-06-29,09:59:29,EDT,1121060503567179776,ruth01467626,ruth,,...,,,,,,"[{'user_id': '1121060503567179776', 'username'...",,,,
