In [1]:
import pandas as pd

## Objective

For the purpose of this analysis, I will attempt to measure the sentiment of tweets to learn whether tweets impact the number of Covid-19 cases and deaths in the United States. 

To create the dataset, I utilized the TWINT library to collect all tweets from January 1,2020 until July 10th. I then made various subsets of the tweets. For example, to measure the impact of tweets by public leaders viewed as polar opposites regarding their response to the pandemic, I collected tweets by President Trump and the Governor of New York, Andrew Cuomo. Another subset of tweets that I labeled as baseline consists of tweets by the New York Times and Washington Post - two of America's leading journalism outlets.

The purpose of creating these subsets is that the baseline tweets can be considered to be those that communicate mainly fact. While they might have op-ed columnists, we can assume that most tweets from the news reporting divisions will provide factual updates on the Covid response. By considering the two polar opposites, Trump and Cuomo, we can measure Covid outcomes, in terms of cases, after the tweets have been consumed by the public. Finally, the main Covid collection will allow us to see whether more individuals subscribed to the Trump/Cuomo tweets and how Covid cases changed, for the positive or negative, in their region.

## Obtaining Data

For the notebooks that contain the queries for the tweets gathered on TWINT, please refer to the Covid Data Queries notebook in the repo. The JSON files for these queries were used to create DataFrames.

In [2]:
#All Covid tweets
All_Covid_tweets = pd.read_json('Covid_tweets3.json',lines=True)

#All Trump tweets
Trump_Covid_tweets = pd.read_json('Trump_Covid_tweets3.json', lines=True)

#All Cuomo tweets
Cuomo_Covid_tweets = pd.read_json('Cuomo_Covid_tweets3.json',lines=True)

#Baseline Tweets
NYTimes_tweets = pd.read_json('Nytimes_Covid_tweets3.json',lines=True)
#print( len(NYTimes_tweets))
WashingtonPost_tweets = pd.read_json('Washpost_tweets3.json',lines=True)
#print( len(Washpost_tweets3.json))

#combining NYTimes and Washington Post to get Baseline Tweets
Baseline_tweets = pd.concat([NYTimes_tweets,WashingtonPost_tweets],axis=0)

#Reformatting Date columns for later merge
All_Covid_tweets['Date'] = All_Covid_tweets['date']
Trump_Covid_tweets['Date'] = Trump_Covid_tweets['date']
Cuomo_Covid_tweets['Date'] = Cuomo_Covid_tweets['date']
Baseline_tweets['Date'] = Baseline_tweets['date']

Data for Covid Cases and Deaths was collected from The COVID Tracking Project.

In [3]:
# Covid data set

covid_cases = pd.read_csv('time_series_covid_19_confirmed.csv')

#Getting US data - confirmed cases
covid_cases = covid_cases[covid_cases['Country/Region'] == 'US']
#covid_cases = covid_cases.transpose()

# Covid death data set

covid_deaths = pd.read_csv('time_series_covid_19_deaths.csv')


#Getting US data - confirmed cases

#covid_deaths = covid_deaths.transpose()
covid_deaths = covid_deaths[covid_deaths['Country/Region'] == 'US']


In [4]:
#Covid cases and deaths (still need to rename columns, from left to right = cases then deaths)
covid_data = pd.concat([covid_cases,covid_deaths],axis=0)
covid_data = covid_data.transpose()

In [5]:
covid_data = covid_data.drop(['Province/State','Country/Region','Lat','Long'])

In [6]:
covid_data.head()

Unnamed: 0,225,225.1
1/22/20,1,0
1/23/20,1,0
1/24/20,2,0
1/25/20,2,0
1/26/20,5,0


### Adding Case/Death Data on Day of the Tweet

In [7]:
#Edited column names in Excel for Merge
covid_data_formatted = pd.read_excel('covid_data_date.xlsx')
covid_data_formatted.head()

Unnamed: 0,Date,Cases,Deaths
0,1/22/20,1,0
1,1/23/20,1,0
2,1/24/20,2,0
3,1/25/20,2,0
4,1/26/20,5,0


In [8]:
#Converting all Date columns to datetime for Merge
covid_data_formatted['Date'] = pd.to_datetime(covid_data_formatted['Date'])
All_Covid_tweets['Date'] = pd.to_datetime(All_Covid_tweets['Date'])
Trump_Covid_tweets['Date'] = pd.to_datetime(Trump_Covid_tweets['Date'])
Cuomo_Covid_tweets['Date'] = pd.to_datetime(Cuomo_Covid_tweets['Date'])
Baseline_tweets['Date'] = pd.to_datetime(Baseline_tweets['Date'])

In [9]:
#All Tweet Data with corresponding case/death information
All_Covid_tweets_case_data = pd.merge(All_Covid_tweets,covid_data_formatted,on='Date')
#Trump Tweet Data with corresponding case/death information
Trump_Covid_tweets_case_data = pd.merge(Trump_Covid_tweets,covid_data_formatted,on='Date')
#Cuomo Tweet Data with corresponding case/death information
Cuomo_Covid_tweets_case_data = pd.merge(Cuomo_Covid_tweets,covid_data_formatted, on='Date')
#Baseline Tweet Data with corresponding case/death information
Baseline_tweets_case_data = pd.merge(Baseline_tweets,covid_data_formatted,on='Date')

### Adding case/death data for two weeks after original tweet

In [10]:
#Getting date two weeks from now for Covid case/death reaction to Tweets
from datetime import datetime,timedelta

N = 14
days_N_from_now = All_Covid_tweets['Date'] + timedelta(days=N)

All_Covid_tweets_case_data['14 days'] = (All_Covid_tweets_case_data['Date'] + timedelta(days=N))
Trump_Covid_tweets_case_data['14 days'] = (Trump_Covid_tweets_case_data['Date'] + timedelta(days=N))
Cuomo_Covid_tweets_case_data['14 days'] = (Cuomo_Covid_tweets_case_data['Date'] +timedelta(days=N))
Baseline_tweets_case_data['14 days'] = (Baseline_tweets_case_data['Date'] + timedelta(days=N))

In [11]:
covid_data_two_week = pd.read_excel('covid_data_14days.xlsx')
covid_data_two_week.head()

Unnamed: 0,14 days,Cases,Deaths
0,1/22/20,1,0
1,1/23/20,1,0
2,1/24/20,2,0
3,1/25/20,2,0
4,1/26/20,5,0


In [12]:
#Converting all Date columns to datetime for Merge
covid_data_two_week['14 days'] = pd.to_datetime(covid_data_two_week['14 days'])
All_Covid_tweets_case_data['14 days'] = pd.to_datetime(All_Covid_tweets_case_data['14 days'])
Trump_Covid_tweets_case_data['14 days'] = pd.to_datetime(Trump_Covid_tweets_case_data['14 days'])
Cuomo_Covid_tweets_case_data['14 days'] = pd.to_datetime(Cuomo_Covid_tweets_case_data['14 days'])
Baseline_tweets_case_data['14 days'] = pd.to_datetime(Baseline_tweets_case_data['14 days'])

In [13]:
#All Tweet Data with corresponding case/death information
All_Covid_tweets_case_data = pd.merge(All_Covid_tweets_case_data,covid_data_two_week,on='14 days')
#Trump Tweet Data with corresponding case/death information
Trump_Covid_tweets_case_data = pd.merge(Trump_Covid_tweets_case_data,covid_data_two_week,on='14 days')
#Cuomo Tweet Data with corresponding case/death information
Cuomo_Covid_tweets_case_data = pd.merge(Cuomo_Covid_tweets_case_data,covid_data_two_week, on='14 days')
#Baseline Tweet Data with corresponding case/death information
Baseline_tweets_case_data = pd.merge(Baseline_tweets_case_data,covid_data_two_week,on='14 days')

### Adding Case/Death Data for four weeks after original tweet

In [14]:
covid_data_four_week = pd.read_excel('covid_data_28days.xlsx')
covid_data_four_week.head()

Unnamed: 0,28 days,Cases,Deaths
0,1/22/20,1,0
1,1/23/20,1,0
2,1/24/20,2,0
3,1/25/20,2,0
4,1/26/20,5,0


In [15]:
#Getting date two weeks from now for Covid case/death reaction to Tweets
from datetime import datetime,timedelta

N = 28
days_N_from_now = All_Covid_tweets['Date'] + timedelta(days=N)

All_Covid_tweets_case_data['28 days'] = (All_Covid_tweets_case_data['Date'] + timedelta(days=N))
Trump_Covid_tweets_case_data['28 days'] = (Trump_Covid_tweets_case_data['Date'] + timedelta(days=N))
Cuomo_Covid_tweets_case_data['28 days'] = (Cuomo_Covid_tweets_case_data['Date'] +timedelta(days=N))
Baseline_tweets_case_data['28 days'] = (Baseline_tweets_case_data['Date'] + timedelta(days=N))

In [16]:
#Converting all Date columns to datetime for Merge
covid_data_four_week['28 days'] = pd.to_datetime(covid_data_four_week['28 days'])
All_Covid_tweets_case_data['28 days'] = pd.to_datetime(All_Covid_tweets_case_data['28 days'])
Trump_Covid_tweets_case_data['28 days'] = pd.to_datetime(Trump_Covid_tweets_case_data['28 days'])
Cuomo_Covid_tweets_case_data['28 days'] = pd.to_datetime(Cuomo_Covid_tweets_case_data['28 days'])
Baseline_tweets_case_data['28 days'] = pd.to_datetime(Baseline_tweets_case_data['28 days'])

In [17]:
#All Tweet Data with corresponding case/death information
All_Covid_tweets_case_data = pd.merge(All_Covid_tweets_case_data,covid_data_four_week,on='28 days')
#Trump Tweet Data with corresponding case/death information
Trump_Covid_tweets_case_data = pd.merge(Trump_Covid_tweets_case_data,covid_data_four_week,on='28 days')
#Cuomo Tweet Data with corresponding case/death information
Cuomo_Covid_tweets_case_data = pd.merge(Cuomo_Covid_tweets_case_data,covid_data_four_week, on='28 days')
#Baseline Tweet Data with corresponding case/death information
Baseline_tweets_case_data = pd.merge(Baseline_tweets_case_data,covid_data_four_week,on='28 days')

In [18]:
Baseline_tweets_case_data.head()

Unnamed: 0,cashtags,conversation_id,created_at,date,geo,hashtags,id,likes_count,link,mentions,...,video,Date,Cases_x,Deaths_x,14 days,Cases_y,Deaths_y,28 days,Cases,Deaths
0,[],1270707306578264064,2020-06-10 13:20:05,2020-06-10,,[],1270707306578264065,209,https://twitter.com/nytimes/status/12707073065...,[],...,0,2020-06-10,2000702,113631,2020-06-24,2382426,122604,2020-07-08,3054699,132300
1,[],1270636833555308544,2020-06-10 08:40:03,2020-06-10,,[],1270636833555308544,857,https://twitter.com/nytimes/status/12706368335...,[],...,0,2020-06-10,2000702,113631,2020-06-24,2382426,122604,2020-07-08,3054699,132300
2,[],1270815442173603840,2020-06-10 20:29:46,2020-06-10,,[],1270815442173603841,159,https://twitter.com/washingtonpost/status/1270...,[],...,0,2020-06-10,2000702,113631,2020-06-24,2382426,122604,2020-07-08,3054699,132300
3,[],1270541216308957184,2020-06-10 02:20:06,2020-06-09,,[],1270541216308957184,382,https://twitter.com/nytimes/status/12705412163...,[nytmag],...,0,2020-06-09,1979908,112714,2020-06-23,2347491,121847,2020-07-07,2996098,131480
4,[],1270470755889840128,2020-06-09 21:40:07,2020-06-09,,[],1270470755889840134,404,https://twitter.com/nytimes/status/12704707558...,[nytmag],...,0,2020-06-09,1979908,112714,2020-06-23,2347491,121847,2020-07-07,2996098,131480


### Combined Tweet DataFrame

In [19]:
#Tweet dataframes combined

Master_Tweet_df = pd.concat([All_Covid_tweets_case_data,Trump_Covid_tweets_case_data,Cuomo_Covid_tweets_case_data,Baseline_tweets_case_data])

In [23]:
pip install optimuspyspark

Collecting optimuspyspark
  Using cached https://files.pythonhosted.org/packages/53/bd/da629b92ece647f8a37525b298e085d5c2367ada2db2a5cfcf63e8af1beb/optimuspyspark-2.2.29-py3-none-any.whl
Collecting pypika==0.32.0 (from optimuspyspark)
Collecting psutil==5.6.3 (from optimuspyspark)
  Using cached https://files.pythonhosted.org/packages/1c/ca/5b8c1fe032a458c2c4bcbe509d1401dca9dda35c7fc46b36bb81c2834740/psutil-5.6.3.tar.gz
Collecting packaging==19.1 (from optimuspyspark)
  Using cached https://files.pythonhosted.org/packages/ec/22/630ac83e8f8a9566c4f88038447ed9e16e6f10582767a01f31c769d9a71e/packaging-19.1-py2.py3-none-any.whl
Collecting simplejson==3.16.0 (from optimuspyspark)
Collecting backoff==1.8.0 (from optimuspyspark)
  Using cached https://files.pythonhosted.org/packages/00/b9/b045f0fe02aa80cefc5a6921d5f7674db58c1658d0e4b888562c15ef6aba/backoff-1.8.0-py2.py3-none-any.whl


Building wheels for collected packages: psutil
  Building wheel for psutil (setup.py) ... [?25lerror
[31m  ERROR: Command errored out with exit status 1:
   command: /opt/anaconda3/envs/learn-env/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-install-yklsus_p/psutil/setup.py'"'"'; __file__='"'"'/private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-install-yklsus_p/psutil/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-wheel-4po3ho3u --python-tag cp36
       cwd: /private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-install-yklsus_p/psutil/
  Complete output (50 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.macosx-10.7-x86_64-3.6
  

Failed to build psutil
[31mERROR: spyder 3.3.6 requires pyqt5<5.13; python_version >= "3", which is not installed.[0m
[31mERROR: spyder 3.3.6 requires pyqtwebengine<5.13; python_version >= "3", which is not installed.[0m
Installing collected packages: pypika, psutil, packaging, simplejson, backoff, optimuspyspark
  Found existing installation: psutil 5.7.0
    Uninstalling psutil-5.7.0:
      Successfully uninstalled psutil-5.7.0
  Running setup.py install for psutil ... [?25lerror
[31m    ERROR: Command errored out with exit status 1:
     command: /opt/anaconda3/envs/learn-env/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-install-yklsus_p/psutil/setup.py'"'"'; __file__='"'"'/private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-install-yklsus_p/psutil/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compi

In [26]:
from optimus import Optimus

ModuleNotFoundError: No module named 'optimus'

In [27]:
pip install pyspark

Note: you may need to restart the kernel to use updated packages.


In [28]:
pip install optimuspyspark

Collecting optimuspyspark
  Using cached https://files.pythonhosted.org/packages/53/bd/da629b92ece647f8a37525b298e085d5c2367ada2db2a5cfcf63e8af1beb/optimuspyspark-2.2.29-py3-none-any.whl
Collecting psutil==5.6.3 (from optimuspyspark)
  Using cached https://files.pythonhosted.org/packages/1c/ca/5b8c1fe032a458c2c4bcbe509d1401dca9dda35c7fc46b36bb81c2834740/psutil-5.6.3.tar.gz
Collecting packaging==19.1 (from optimuspyspark)
  Using cached https://files.pythonhosted.org/packages/ec/22/630ac83e8f8a9566c4f88038447ed9e16e6f10582767a01f31c769d9a71e/packaging-19.1-py2.py3-none-any.whl
Collecting backoff==1.8.0 (from optimuspyspark)
  Using cached https://files.pythonhosted.org/packages/00/b9/b045f0fe02aa80cefc5a6921d5f7674db58c1658d0e4b888562c15ef6aba/backoff-1.8.0-py2.py3-none-any.whl
Collecting simplejson==3.16.0 (from optimuspyspark)


Building wheels for collected packages: psutil
  Building wheel for psutil (setup.py) ... [?25lerror
[31m  ERROR: Command errored out with exit status 1:
   command: /opt/anaconda3/envs/learn-env/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-install-vlkjn98z/psutil/setup.py'"'"'; __file__='"'"'/private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-install-vlkjn98z/psutil/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-wheel-07qkh38i --python-tag cp36
       cwd: /private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-install-vlkjn98z/psutil/
  Complete output (50 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.macosx-10.7-x86_64-3.6
  

Failed to build psutil
[31mERROR: spyder 3.3.6 requires pyqt5<5.13; python_version >= "3", which is not installed.[0m
[31mERROR: spyder 3.3.6 requires pyqtwebengine<5.13; python_version >= "3", which is not installed.[0m
Installing collected packages: psutil, packaging, backoff, simplejson, optimuspyspark
  Found existing installation: psutil 5.7.0
    Uninstalling psutil-5.7.0:
      Successfully uninstalled psutil-5.7.0
  Running setup.py install for psutil ... [?25lerror
[31m    ERROR: Command errored out with exit status 1:
     command: /opt/anaconda3/envs/learn-env/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-install-vlkjn98z/psutil/setup.py'"'"'; __file__='"'"'/private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-install-vlkjn98z/psutil/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code,

In [29]:
import optimus as op

ModuleNotFoundError: No module named 'optimus'

In [30]:
pip install optimuspyspark

Collecting optimuspyspark
  Using cached https://files.pythonhosted.org/packages/53/bd/da629b92ece647f8a37525b298e085d5c2367ada2db2a5cfcf63e8af1beb/optimuspyspark-2.2.29-py3-none-any.whl
Collecting psutil==5.6.3 (from optimuspyspark)
  Using cached https://files.pythonhosted.org/packages/1c/ca/5b8c1fe032a458c2c4bcbe509d1401dca9dda35c7fc46b36bb81c2834740/psutil-5.6.3.tar.gz
Collecting backoff==1.8.0 (from optimuspyspark)
  Using cached https://files.pythonhosted.org/packages/00/b9/b045f0fe02aa80cefc5a6921d5f7674db58c1658d0e4b888562c15ef6aba/backoff-1.8.0-py2.py3-none-any.whl
Collecting packaging==19.1 (from optimuspyspark)
  Using cached https://files.pythonhosted.org/packages/ec/22/630ac83e8f8a9566c4f88038447ed9e16e6f10582767a01f31c769d9a71e/packaging-19.1-py2.py3-none-any.whl
Collecting simplejson==3.16.0 (from optimuspyspark)


Building wheels for collected packages: psutil
  Building wheel for psutil (setup.py) ... [?25lerror
[31m  ERROR: Command errored out with exit status 1:
   command: /opt/anaconda3/envs/learn-env/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-install-c103ubd_/psutil/setup.py'"'"'; __file__='"'"'/private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-install-c103ubd_/psutil/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-wheel-02igxjeh --python-tag cp36
       cwd: /private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-install-c103ubd_/psutil/
  Complete output (50 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.macosx-10.7-x86_64-3.6
  

Failed to build psutil
[31mERROR: spyder 3.3.6 requires pyqt5<5.13; python_version >= "3", which is not installed.[0m
[31mERROR: spyder 3.3.6 requires pyqtwebengine<5.13; python_version >= "3", which is not installed.[0m
Installing collected packages: psutil, backoff, packaging, simplejson, optimuspyspark
  Found existing installation: psutil 5.7.0
    Uninstalling psutil-5.7.0:
      Successfully uninstalled psutil-5.7.0
  Running setup.py install for psutil ... [?25lerror
[31m    ERROR: Command errored out with exit status 1:
     command: /opt/anaconda3/envs/learn-env/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-install-c103ubd_/psutil/setup.py'"'"'; __file__='"'"'/private/var/folders/24/g8m2m29n02ncczz28hys1_xw0000gn/T/pip-install-c103ubd_/psutil/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code,

In [31]:
from optimus import Optimus
op = Optimus()

ModuleNotFoundError: No module named 'optimus'