In [7]:
import pandas as pd

## Objective

For the purpose of this analysis, I will attempt to measure the sentiment of tweets to learn whether tweets impact the number of Covid-19 cases and deaths in the United States. 

To create the dataset, I utilized the TWINT library to collect all tweets from January 1,2020 until July 10th. I then made various subsets of the tweets. For example, to measure the impact of tweets by public leaders viewed as polar opposites regarding their response to the pandemic, I collected tweets by President Trump and the Governor of New York, Andrew Cuomo. Another subset of tweets that I labeled as baseline consists of tweets by the New York Times and Washington Post - two of America's leading journalism outlets.

The purpose of creating these subsets is that the baseline tweets can be considered to be those that communicate mainly fact. While they might have op-ed columnists, we can assume that most tweets from the news reporting divisions will provide factual updates on the Covid response. By considering the two polar opposites, Trump and Cuomo, we can measure Covid outcomes, in terms of cases, after the tweets have been consumed by the public. Finally, the main Covid collection will allow us to see whether more individuals subscribed to the Trump/Cuomo tweets and how Covid cases changed, for the positive or negative, in their region.

## Obtaining Data

For the notebooks that contain the queries for the tweets gathered on TWINT, please refer to the Covid Data Queries notebook in the repo. The JSON files for these queries were used to create DataFrames.

In [47]:
#All Covid tweets
All_Covid_tweets = pd.read_json('Covid_tweets3.json',lines=True)

#All Trump tweets
Trump_Covid_tweets = pd.read_json('Trump_Covid_tweets3.json', lines=True)

#All Cuomo tweets
Cuomo_Covid_tweets = pd.read_json('Cuomo_Covid_tweets3.json',lines=True)

#Baseline Tweets
NYTimes_tweets = pd.read_json('Nytimes_Covid_tweets3.json',lines=True)
#print( len(NYTimes_tweets))
WashingtonPost_tweets = pd.read_json('Washpost_tweets3.json',lines=True)
#print( len(Washpost_tweets3.json))

All_Covid_tweets.head()
All_Covid_tweets['Date'] = All_Covid_tweets['date']

In [9]:
#combining NYTimes and Washington Post to get Baseline Tweets
Baseline_tweets = pd.concat([NYTimes_tweets,WashingtonPost_tweets],axis=0)

Data for Covid Cases and Deaths was collected from The COVID Tracking Project.

In [28]:
# Covid data set

covid_cases = pd.read_csv('time_series_covid_19_confirmed.csv')

#Getting US data - confirmed cases
covid_cases = covid_cases[covid_cases['Country/Region'] == 'US']
#covid_cases = covid_cases.transpose()

# Covid death data set

covid_deaths = pd.read_csv('time_series_covid_19_deaths.csv')


#Getting US data - confirmed cases

#covid_deaths = covid_deaths.transpose()
covid_deaths = covid_deaths[covid_deaths['Country/Region'] == 'US']


In [29]:
#Covid cases and deaths (still need to rename columns, from left to right = cases then deaths)
covid_data = pd.concat([covid_cases,covid_deaths],axis=0)
#covid_data = covid_data.transpose()
#covid_data.rename(index={225:'Cases'})
covid_data.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,6/29/20,6/30/20,7/1/20,7/2/20,7/3/20,7/4/20,7/5/20,7/6/20,7/7/20,7/8/20
225,,US,37.0902,-95.7129,1,1,2,2,5,5,...,2590668,2636414,2687588,2742049,2795361,2841241,2891124,2936077,2996098,3054699
225,,US,37.0902,-95.7129,0,0,0,0,0,0,...,126711,127432,128105,128803,129442,129689,129960,130285,131480,132300


In [30]:
#transpose to get data in vertical orientation
#covid_data.rename(columns={225:'Cases'})
covid_data = covid_data.transpose()

#Getting rid of unnecessary rows
#covid_data = covid_data.drop(['Province/State','Country/Region','Lat','Long'])

In [31]:
covid_data = covid_data.drop(['Province/State','Country/Region','Lat','Long'])

In [32]:
covid_data.head()

Unnamed: 0,225,225.1
1/22/20,1,0
1/23/20,1,0
1/24/20,2,0
1/25/20,2,0
1/26/20,5,0


In [33]:
# create excel writer object
writer = pd.ExcelWriter('covid_data.xlsx')
# write dataframe to excel
covid_data.to_excel(writer)
# save the excel
writer.save()
print('DataFrame is written successfully to Excel File.')

DataFrame is written successfully to Excel File.


In [41]:
#Edited column names in Excel for Merge
covid_data_formatted = pd.read_excel('covid_data.xlsx')
covid_data_formatted.head()

Unnamed: 0,Date,Cases,Deaths
0,1/22/20,1,0
1,1/23/20,1,0
2,1/24/20,2,0
3,1/25/20,2,0
4,1/26/20,5,0


In [54]:
covid_data_formatted['Date'] = pd.to_(covid_data_formatted['Date'])

ValueError: Unable to parse string "1/22/20" at position 0

In [52]:
#All Tweet Data with corresponding case/death information

All_Covid_tweets_case_data = pd.merge(All_Covid_tweets,covid_data_formatted,on='Date')

ValueError: You are trying to merge on datetime64[ns] and object columns. If you wish to proceed you should use pd.concat