## Analysis on Covid-19 in the US & Public reactions on Social Media

"Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Most people infected with the COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment.  Older people, and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness." - WHO

As of May 03, there are approximately 1.16 million cases confirmed and 67,067 deaths in the US. To better understand the covid-19 situation, we are going to study reported cases of the coronavirus in across the United States from January till now. Along with that, we will also study how people react to the coronavirus on social media.

___
**Current Status of Coronavirus in US**
<br>We are going to use us-covid-19 dataset provided by The New York Times (https://github.com/nytimes/covid-19-data)

In [None]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go

In [168]:
# covid-19 confirmed cases and death in US
us_df = pd.read_csv('us.csv')
us_df

Unnamed: 0,date,cases,deaths
0,2020-01-21,1,0
1,2020-01-22,1,0
2,2020-01-23,1,0
3,2020-01-24,2,0
4,2020-01-25,3,0
...,...,...,...
97,2020-04-27,988143,50819
98,2020-04-28,1012572,53034
99,2020-04-29,1039166,55399
100,2020-04-30,1069559,57570


In [98]:
# plot time-series
# resource: 
# - easiest: https://plotly.com/python/time-series/
# - https://matplotlib.org/3.1.0/gallery/text_labels_and_annotations/date.html
# - https://www.dataquest.io/blog/tutorial-time-series-analysis-with-pandas/
# add your code here

**A Statewide Geographical Visualization** 

In [131]:
us_states_df = pd.read_csv('us-states.csv')
us_states_df.head(5)

Unnamed: 0,date,state,fips,cases,deaths
0,2020-01-21,Washington,53,1,0
1,2020-01-22,Washington,53,1,0
2,2020-01-23,Washington,53,1,0
3,2020-01-24,Illinois,17,1,0
4,2020-01-24,Washington,53,1,0


In [132]:
# ---- Convert fips to state codes ---
state_codes = {
    'WA': '53', 'DE': '10', 'DC': '11', 'WI': '55', 'WV': '54', 'HI': '15',
    'FL': '12', 'WY': '56', 'PR': '72', 'NJ': '34', 'NM': '35', 'TX': '48',
    'LA': '22', 'NC': '37', 'ND': '38', 'NE': '31', 'TN': '47', 'NY': '36',
    'PA': '42', 'AK': '2', 'NV': '32', 'NH': '33', 'VA': '51', 'CO': '8',
    'CA': '6', 'AL': '1', 'AR': '5', 'VT': '50', 'IL': '17', 'GA': '13',
    'IN': '18', 'IA': '19', 'MA': '25', 'AZ': '4', 'ID': '16', 'CT': '9',
    'ME': '23', 'MD': '24', 'OK': '40', 'OH': '39', 'UT': '49', 'MO': '29',
    'MN': '27', 'MI': '26', 'RI': '44', 'KS': '20', 'MT': '30', 'MS': '28',
    'SC': '45', 'KY': '21', 'OR': '41', 'SD': '46'
}

fips2code = {value: key for key, value in state_codes.items()}
us_states_df.fips = us_states_df.fips.astype('str')
us_states_df.fips = us_states_df.fips.map(fips2code)
us_states_df = us_states_df.dropna()
us_states_df.rename(columns = {'fips':'code'}, inplace = True) 
us_states_df.head(5)

Unnamed: 0,date,state,code,cases,deaths
0,2020-01-21,Washington,WA,1,0
1,2020-01-22,Washington,WA,1,0
2,2020-01-23,Washington,WA,1,0
3,2020-01-24,Illinois,IL,1,0
4,2020-01-24,Washington,WA,1,0


In [167]:
import plotly.express as px

fig = px.choropleth(us_states_df, 
                    locations="code", 
                    locationmode="USA-states",
                    color="cases", 
                    hover_name="state", 
                    animation_frame="date"
                   )
fig.update_layout(
    title_text='Coronavirus Confirmed Cases in the US (January-May)',
    geo = dict(
        scope='usa',
        projection=go.layout.geo.Projection(type = 'albers usa'),
        showlakes=True, # lakes
        lakecolor='rgb(255, 255, 255)',
        showframe = False,
        showcoastlines = False,)
)
fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 500 # frame rate
fig.layout.updatemenus[0].x = 0.1
fig.layout.updatemenus[0].y = 0.1

fig.layout.sliders[0].pad.t = 0
fig.layout.sliders[0].currentvalue.font.size = 24
fig.layout.sliders[0].currentvalue.xanchor = 'right'
fig.show()

**Counting Covid-19 Related Tweets**
<br>Using dataset from https://github.com/thepanacealab/covid19_twitter with some modifications.
New dates will be updated soon!

Filter Keywords: COVD19, CoronavirusPandemic, COVID-19, 2019nCoV, CoronaOutbreak,coronavirus , WuhanVirus, covid19, coronaviruspandemic, covid-19, 2019ncov, coronaoutbreak, wuhanvirus.

In [204]:
tweetcounts_df = pd.read_csv('tweets_count.csv')
tweetcounts_df

Unnamed: 0,date,tweet_id
0,2020-01-01,1
1,2020-01-04,2
2,2020-01-06,2
3,2020-01-08,4
4,2020-01-09,15
...,...,...
70,2020-03-18,854546
71,2020-03-19,772241
72,2020-03-20,768545
73,2020-03-21,786495


In [None]:
# plot time-series
# resource: 
# - easiest: https://plotly.com/python/time-series/
# - https://matplotlib.org/3.1.0/gallery/text_labels_and_annotations/date.html
# - https://www.dataquest.io/blog/tutorial-time-series-analysis-with-pandas/
# add your code here

**Tweets count for each US state (coming soon)**