# COVID-19 in the USA

## Data Analysis, Data Visualization & Comparison

This notebook contains data analysis and visualization of COVID-19 (Corona Virus) cases in the **United States**.

## About COVID-19
![CoronaVirus](https://cdn.pixabay.com/photo/2020/04/23/09/59/coronavirus-5081887_1280.jpg)
*Image by <a href="https://pixabay.com/users/iXimus-2352783/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=5081887">iXimus</a> from <a href="https://pixabay.com/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=5081887">Pixabay</a>*

[Coronavirus disease 2019 (COVID-19)](https://en.wikipedia.org/wiki/Coronavirus_disease_2019) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

* **First Identified:** December 2019 in Wuhan, the capital of Hubei province, China
* **Common Symptoms:** Fever, Cough, Fatigue, Shortness of Breath and Loss of Smell
* **Concering Symptoms:** Difficulty breathing, Persistent Chest Pain, Confusion, Difficulty Waking, and Bluish Skin
* **Complications:**	Pneumonia, Viral Sepsis, Acute Respiratory Distress Syndrome, Kidney Failure
* **Usual Onset:**	2â€“14 days (typically 5) from infection (time from exposure to onset of symptoms)
* **Risk factors:**	Travel, Viral Exposure
* **Prevention:** 	Hand Washing, Face Coverings, Quarantine, Social Distancing

### Useful Information on Covid-19
* [WHO](https://www.who.int/emergencies/diseases/novel-coronavirus-2019) - World Health Organization 
* [CDC](https://www.cdc.gov/coronavirus/2019-ncov) - Centers for Disease Control and Prevention

# Dataset

1. Git repository of the **Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)**.

    * Master branch: https://github.com/CSSEGISandData/COVID-19
    * Web-data branch: https://github.com/CSSEGISandData/COVID-19/tree/web-data


2. Kaggle dataset: 

    - https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset
    - https://www.kaggle.com/sudalairajkumar/covid19-in-usa

# Import Packages

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter
#%matplotlib inline
import seaborn as sns
from datetime import datetime
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import folium
import json

# Get Data from Dataset

In [None]:
df_cases = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases.csv")
df_cases_country = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv")
df_cases_state = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_state.csv")
df_cases_time = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_time.csv", parse_dates = ['Last_Update','Report_Date_String'])

In [None]:
print (df_cases.shape)
print ('Last Update: ' + str(df_cases.Last_Update.max()))
df_cases.head(1)

In [None]:
print (df_cases_country.shape)
print ('Last Update: ' + str(df_cases_country.Last_Update.max()))
df_cases_country.head(1)

In [None]:
print (df_cases_state.shape)
df_cases_state.head(1)

In [None]:
print (df_cases_time.shape)
df_cases_time.head(1)

In [None]:
df_data = pd.read_csv('/kaggle/input/novel-corona-virus-2019-dataset/covid_19_data.csv', parse_dates = ['ObservationDate','Last Update'])
print (df_data.shape)
print ('Last update: ' + str(df_data.ObservationDate.max()))
df_data.head(2)

In [None]:
# Clean data
df_data = df_data.drop(['SNo', 'Last Update'], axis=1)
df_data = df_data.rename(columns={
    'ObservationDate': 'Date', 
    'Country/Region': 'Country_Region', 
    'Province/State': 'Province_State'
})
df_data.head(2)

In [None]:
# Sort data
df_data = df_data.sort_values(['Date','Country_Region','Province_State'])
# Get first reported case date
df_data['first_date'] = df_data.groupby('Country_Region')['Date'].transform('min')
# Get days since first reported case date
df_data['days'] = (df_data['Date'] - df_data['first_date']).dt.days
print(df_data.shape)
df_data.head(2)

In [None]:
data_path = "/kaggle/input/covid19-in-usa/"
df_us_test = pd.read_csv(data_path + "us_covid19_daily.csv")
df_us_states_test = pd.read_csv(data_path + "us_states_covid19_daily.csv")
df_us_test["date"] = pd.to_datetime(df_us_test["date"], format="%Y%m%d")
df_us_states_test = df_us_states_test.reindex(index=df_us_states_test.index[::-1])
df_us_states_test["date"] = pd.to_datetime(df_us_states_test["date"], format="%Y%m%d").dt.date.astype(str)

print (df_us_states_test.shape)
#df_us_states_test.query('state == "NY"').sort_values('date', ascending=False).head()
#print (df_us_states_test.columns)
df_us_states_test.groupby('state').max().fillna(0).sort_values('positive', ascending=False).head().style\
.background_gradient(cmap='Reds',subset=["positive"])\
.background_gradient(cmap='Blues',subset=["negative"])\
.background_gradient(cmap='Purples',subset=["pending"])\
.background_gradient(cmap='OrRd',subset=["hospitalizedCurrently"])\
.background_gradient(cmap='OrRd',subset=["hospitalizedCumulative"])\

In [None]:
df_us_county_covid = pd.read_csv('/kaggle/input/covid19-us-county-jhu-data-demographics/' + "covid_us_county.csv")
df_us_county = pd.read_csv('/kaggle/input/covid19-us-county-jhu-data-demographics/' + "us_county.csv")

In [None]:
df_us_county.sort_values('population', ascending=False).head()

In [None]:
df_us_county_covid.sort_values('cases', ascending=False).head(2)

In [None]:
df_us_county_covid = df_us_county_covid.merge(\
                        df_us_county[['fips', 'population', 'male', 'female', 'female_percentage', 'median_age']],\
                                              on = ['fips'], how = "left"\
                    )
df_us_county_covid.sort_values('cases', ascending=False).head(2)

In [None]:
df_us_county_covid_latest = df_us_county_covid.sort_values(by = ['county', 'state', 'date'], ascending = [True, True, False])
df_us_county_covid_latest = df_us_county_covid_latest.drop_duplicates(subset = ['county', 'state'], keep = "first")
df_us_county_covid_latest.head(2)

In [None]:
df_us_state_covid = df_us_county_covid_latest.groupby(['state', 'date'], as_index=False)['cases', 'deaths', 'population'].sum()
df_us_state_covid['population'] = df_us_state_covid['population'].astype(int)
df_us_state_covid = df_us_state_covid.rename(columns={"state": "Province_State", "population": "Population"})
df_us_state_covid.sort_values('cases', ascending=False).head().style.hide_index()

In [None]:
# remove states with missing population
df_us_county_covid_latest = df_us_county_covid_latest[(df_us_county_covid_latest['population'] != 0)]
# population/cases/deaths in million
df_us_county_covid_latest['population (million)'] = round((df_us_county_covid_latest['population']/1000000), 2)
df_us_county_covid_latest['cases per million'] = round((df_us_county_covid_latest['cases']/df_us_county_covid_latest['population (million)']), 2)
df_us_county_covid_latest['deaths per million'] = round((df_us_county_covid_latest['deaths']/df_us_county_covid_latest['population (million)']), 2)
#df_us_county_covid_latest.fillna(0, inplace=True)
df_us_county_covid_latest.head(2)

# Total Cases

In [None]:
def get_total_cases(df_cases):
    total_confirmed = np.sum(df_cases['Confirmed'])
    total_deaths = np.sum(df_cases['Deaths'])
    total_recovered = np.sum(df_cases['Recovered'])
    total_active = np.sum(df_cases['Active'])
    total_mortality_rate = np.round((np.sum(df_cases['Deaths']) / np.sum(df_cases['Confirmed']) * 100), 2)
    total_recover_rate = np.round((np.sum(df_cases['Recovered']) / np.sum(df_cases['Confirmed']) * 100), 2)

    data = {
        'Confirmed': [total_confirmed],
        'Deaths': [total_deaths],
        'Recovered': [total_recovered],
        'Active': [total_active],
        'Mortality Rate %': [total_mortality_rate],
        'Recover Rate %': [total_recover_rate]
    }
    df_total = pd.DataFrame(data)
    return df_total

## Worldwide Total Cases

In [None]:
# colormaps: https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html
df_total = get_total_cases(df_cases_country).style.hide_index().background_gradient(cmap='Wistia', axis=1)
df_total

## USA Total Cases

In [None]:
# colormaps: https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html
df_total = get_total_cases(df_cases_country[df_cases_country.Country_Region == 'US']) \
            .style.hide_index().background_gradient(cmap='Wistia', axis=1)
df_total

# Cases in US States

In [None]:
# US state code to name mapping
# https://www.kaggle.com/sudalairajkumar/covid-19-analysis-of-usa
state_map_dict = {
 'AL': 'Alabama',
 'AK': 'Alaska',
 'AS': 'American Samoa',
 'AZ': 'Arizona',
 'AR': 'Arkansas',
 'CA': 'California',
 'CO': 'Colorado',
 'CT': 'Connecticut',
 'DE': 'Delaware',
 'DC': 'District of Columbia',
 'D.C.': 'District of Columbia',
 'FM': 'Federated States of Micronesia',
 'FL': 'Florida',
 'GA': 'Georgia',
 'GU': 'Guam',
 'HI': 'Hawaii',
 'ID': 'Idaho',
 'IL': 'Illinois',
 'IN': 'Indiana',
 'IA': 'Iowa',
 'KS': 'Kansas',
 'KY': 'Kentucky',
 'LA': 'Louisiana',
 'ME': 'Maine',
 'MH': 'Marshall Islands',
 'MD': 'Maryland',
 'MA': 'Massachusetts',
 'MI': 'Michigan',
 'MN': 'Minnesota',
 'MS': 'Mississippi',
 'MO': 'Missouri',
 'MT': 'Montana',
 'NE': 'Nebraska',
 'NV': 'Nevada',
 'NH': 'New Hampshire',
 'NJ': 'New Jersey',
 'NM': 'New Mexico',
 'NY': 'New York',
 'NC': 'North Carolina',
 'ND': 'North Dakota',
 'MP': 'Northern Mariana Islands',
 'OH': 'Ohio',
 'OK': 'Oklahoma',
 'OR': 'Oregon',
 'PW': 'Palau',
 'PA': 'Pennsylvania',
 'PR': 'Puerto Rico',
 'RI': 'Rhode Island',
 'SC': 'South Carolina',
 'SD': 'South Dakota',
 'TN': 'Tennessee',
 'TX': 'Texas',
 'UT': 'Utah',
 'VT': 'Vermont',
 'VI': 'Virgin Islands',
 'VA': 'Virginia',
 'WA': 'Washington',
 'WV': 'West Virginia',
 'WI': 'Wisconsin',
 'WY': 'Wyoming'
}

state_code_dict = {v:k for k, v in state_map_dict.items()}
state_code_dict["Chicago"] = 'Illinois'

def correct_state_names(x):
    try:
        return state_map_dict[x.split(",")[-1].strip()]
    except:
        return x.strip()
    
def get_state_codes(x):
    try:
        return state_code_dict[x]
    except:
        return "Others"
    
def get_state_name(x):
    try:
        for name, code in state_code_dict.items():
            if code == x:
                return name
        return 'Others'
    except:
        return "Others"

In [None]:
df_data_us = df_data[df_data['Country_Region'] == 'US'].copy()
df_data_us["Province_State"] = df_data_us["Province_State"].apply(correct_state_names)
df_data_us["State_Code"] = df_data_us["Province_State"].apply(lambda x: get_state_codes(x))

In [None]:
df_data_us.groupby('Province_State').max().sort_values('Confirmed', ascending=False).head()

In [None]:
df_us_states = df_cases_state[df_cases_state['Country_Region'] == 'US'].copy()
df_us_states["State_Code"] = df_us_states["Province_State"].apply(lambda x: get_state_codes(x))
df_us_states = df_us_states.merge(df_us_state_covid[['Province_State', 'Population']], on = ['Province_State'], how = "left")
# population/cases/deaths in million
df_us_states['Population (million)'] = round((df_us_states['Population']/1000000), 2)
df_us_states['Cases per million'] = round((df_us_states['Confirmed']/df_us_states['Population (million)']), 2)
df_us_states['Deaths per million'] = round((df_us_states['Deaths']/df_us_states['Population (million)']), 2)
df_us_states = df_us_states.replace([np.inf, -np.inf, np.nan], 0)
#df_us_states

In [None]:
df_us_states = df_cases_state[df_cases_state['Country_Region'] == 'US'].copy()
df_us_states["State_Code"] = df_us_states["Province_State"].apply(lambda x: get_state_codes(x))
df_us_states['Recovery_Rate'] = df_cases_state['Recovered'] / df_cases_state['Confirmed'] * 100

# merge US state population data
df_us_states = df_us_states.merge(df_us_state_covid[['Province_State', 'Population']], on = ['Province_State'], how = "left")
# population/cases/deaths in million
df_us_states['Population (million)'] = round((df_us_states['Population']/1000000), 2)
df_us_states['Cases per million'] = round((df_us_states['Confirmed']/df_us_states['Population (million)']), 2)
df_us_states['Deaths per million'] = round((df_us_states['Deaths']/df_us_states['Population (million)']), 2)
df_us_states = df_us_states.replace([np.inf, -np.inf, np.nan], 0)

df_us_states = df_us_states[[
                    'Province_State', 'Confirmed', 'Deaths', 'Recovered', 'Active', \
                    'Mortality_Rate', 'Recovery_Rate', 'Incident_Rate', \
                    'People_Tested', 'People_Hospitalized',  \
                    'Testing_Rate', 'Hospitalization_Rate',\
                    'Population (million)', 'Cases per million', 'Deaths per million',\
                    'Population', 'State_Code'
                ]]

# replace nan values with zero
df_us_states.fillna(0, inplace=True)

# replace underscore with space in columns name
#df_us_states.rename(columns=lambda x: x.replace('_', ' '), inplace=True)
#df_us_states.columns

df_us_states.sort_values('Confirmed', ascending=False)\
.rename(columns=lambda x: x.replace('_', ' '))\
.style.hide_index()\
.background_gradient(cmap='Blues',subset=["Confirmed"])\
.background_gradient(cmap='Reds',subset=["Deaths"])\
.background_gradient(cmap='Greens',subset=["Recovered"])\
.background_gradient(cmap='Purples',subset=["Active"])\
.background_gradient(cmap='GnBu',subset=["Incident Rate"])\
.background_gradient(cmap='OrRd',subset=["Mortality Rate"])\
.background_gradient(cmap='PuBu',subset=["Recovery Rate"])\
.background_gradient(cmap='Greens',subset=["People Tested"])\
.background_gradient(cmap='OrRd',subset=["People Hospitalized"])\
.background_gradient(cmap='Greens',subset=["Testing Rate"])\
.background_gradient(cmap='OrRd',subset=["Hospitalization Rate"])\
.background_gradient(cmap='Purples',subset=["Population (million)"])\
.background_gradient(cmap='Blues',subset=["Cases per million"])\
.background_gradient(cmap='Reds',subset=["Deaths per million"])\

## US State Counties with High number of Cases (Top 50)

In [None]:
df_us_county_covid_latest[['county', 'state', 'state_code', 'cases', 'deaths',\
                           'population (million)', 'median_age', 'cases per million', 'deaths per million']]\
.sort_values('cases', ascending=False)\
.head(50)\
.rename(columns=lambda x: x.replace('_', ' '))\
.style.hide_index()\
.background_gradient(cmap='Blues',subset=["cases"])\
.background_gradient(cmap='Reds',subset=["deaths"])\
.background_gradient(cmap='Greens',subset=["population (million)"])\
.background_gradient(cmap='Purples',subset=["median age"])\
.background_gradient(cmap='Blues',subset=["cases per million"])\
.background_gradient(cmap='Reds',subset=["deaths per million"])\

In [None]:
fig = go.Figure(data=[
    go.Pie(labels=df_us_states['Province_State'], 
           values=df_us_states['Confirmed'], 
           hole=.35,
           textinfo='label+percent'
          )
])

fig.update_layout(
    title_text="US Confirmed Cases by States",
    # Add annotations in the center of the donut pies.
    annotations=[
        dict(text='Confirmed<br>Cases', showarrow=False),
    ]
)
fig.update_traces(textposition='inside')
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

In [None]:
fig = go.Figure(data=[
    go.Pie(labels=df_us_states['Province_State'], 
           values=df_us_states['Deaths'], 
           hole=.35,
           textinfo='label+percent'
          )
])

fig.update_layout(
    title_text="US Deaths Cases by States",
    # Add annotations in the center of the donut pies.
    annotations=[
        dict(text='Deaths<br>Cases', showarrow=False),
    ]
)
fig.update_traces(textposition='inside')
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

In [None]:
fig = go.Figure(data=[
    go.Pie(labels=df_us_states['Province_State'], 
           values=df_us_states['People_Tested'], 
           hole=.35,
           textinfo='label+percent'
          )
])

fig.update_layout(
    title_text="US Tests Percentage by States",
    # Add annotations in the center of the donut pies.
    annotations=[
        dict(text='People<br>Tested', showarrow=False),
    ]
)
fig.update_traces(textposition='inside')
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

In [None]:
df_confirmed_top = df_us_states.sort_values('Confirmed', ascending=False).head(10)
fig = go.Figure(data=[
    go.Bar(name='Confirmed', x=df_confirmed_top['Province_State'], y=df_confirmed_top['Confirmed'],
           text=df_confirmed_top['Confirmed'], texttemplate='%{text:.2s}', textposition='outside'),
    go.Bar(name='Deaths', x=df_confirmed_top['Province_State'], y=df_confirmed_top['Deaths'],
           text=df_confirmed_top['Deaths'], texttemplate='%{text:.2s}', textposition='outside'),
    go.Bar(name='Recovered', x=df_confirmed_top['Province_State'], y=df_confirmed_top['Recovered'],
           text=df_confirmed_top['Recovered'], texttemplate='%{text:.2s}', textposition='outside'),
    #go.Bar(name='People Tested', x=df_confirmed_top['Province_State'], y=df_confirmed_top['People_Tested'],
    #       text=df_confirmed_top['People_Tested'], texttemplate='%{text:.2s}', textposition='outside'),
])
# Change the bar mode
fig.update_layout(
    title_text="Top 10 US States with Confirmed Cases",
    barmode='group', 
    #legend_orientation="h",
    yaxis_type='log',
    yaxis_title='Cases Count in Log Scale'
)
fig.update_layout(legend_orientation="h", legend=dict(x=0, y=1.1))
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

In [None]:
#df_confirmed_top = df_us_states.sort_values('Cases per million', ascending=False).head(10)
fig = go.Figure(data=[
    go.Bar(name='Confirmed', x=df_confirmed_top['Province_State'], y=df_confirmed_top['Confirmed'],
           text=df_confirmed_top['Confirmed'], texttemplate='%{text:.2s}', textposition='outside', marker_color='indianred'),
    go.Bar(name='Confirmed per million', x=df_confirmed_top['Province_State'], y=df_confirmed_top['Cases per million'],
           text=df_confirmed_top['Cases per million'], texttemplate='%{text:.2s}', textposition='outside', marker_color='lightsalmon'),
    go.Bar(name='Deaths per million', x=df_confirmed_top['Province_State'], y=df_confirmed_top['Deaths per million'],
           text=df_confirmed_top['Deaths per million'], texttemplate='%{text:.2s}', textposition='outside', marker_color='crimson'),
])
# Change the bar mode
fig.update_layout(
    title_text="Top 10 US States with Confirmed Cases",
    barmode='group', 
    #legend_orientation="h",
    yaxis_type='log',
    yaxis_title='Cases Count in Log Scale'
)
fig.update_layout(legend_orientation="h", legend=dict(x=0, y=1.1))
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

In [None]:
df_confirmed_top = df_us_states.sort_values('Cases per million', ascending=False).head(10)
fig = go.Figure(data=[
    go.Bar(name='Population', x=df_confirmed_top['Province_State'], y=df_confirmed_top['Population'],
           text=df_confirmed_top['Population'], texttemplate='%{text:.2s}', textposition='outside'),
    go.Bar(name='Confirmed per million', x=df_confirmed_top['Province_State'], y=df_confirmed_top['Cases per million'],
           text=df_confirmed_top['Cases per million'], texttemplate='%{text:.2s}', textposition='outside', marker_color='lightsalmon'),
    go.Bar(name='Deaths per million', x=df_confirmed_top['Province_State'], y=df_confirmed_top['Deaths per million'],
           text=df_confirmed_top['Deaths per million'], texttemplate='%{text:.2s}', textposition='outside', marker_color='crimson'),
])
# Change the bar mode
fig.update_layout(
    title_text="Top 10 US States with highest Cases per million population",
    barmode='group',
    yaxis_type='log',
    yaxis_title='Cases Count in Log Scale'
)
fig.update_layout(legend_orientation="h", legend=dict(x=0, y=1.1))
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

# Time Series of Cases in US

In [None]:
df_us_states_test.columns

In [None]:
#df_us_states_test.
df_us_states_test["state_name"] = df_us_states_test["state"].apply(lambda x: get_state_name(x))
df_us_states_test.head(2)

In [None]:
df_t = df_us_states_test.groupby('date').sum().reset_index()
df_t.head(2)

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=df_t['date'], y=df_t['positive'], 
                         mode='lines', name='Confirmed'))
fig.add_trace(go.Scatter(x=df_t['date'], y=df_t['death'], 
                         mode='lines', name='Deaths'))
fig.add_trace(go.Scatter(x=df_t['date'], y=df_t['recovered'], 
                         mode='lines', name='Recovered'))
fig.add_trace(go.Scatter(x=df_t['date'], y=df_t['hospitalized'], 
                         mode='lines', name='Hospitalized'))
fig.add_trace(go.Scatter(x=df_t['date'], y=df_t['total'], 
                         mode='lines', name='Total Tested'))

fig.update_layout(
        xaxis_title="",
        yaxis_title="Cases Count in Log Scale",
        title = 'Time Series - Confirmed, Deaths & Recovered Cases in USA',
        yaxis_type='log'
    )
fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        #bgcolor="LightSteelBlue",
        bordercolor="silver",
        borderwidth=1
    )
)
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=df_t['date'], y=df_t['positiveIncrease'], 
                         mode='lines', name='Confirmed'))
fig.add_trace(go.Scatter(x=df_t['date'], y=df_t['deathIncrease'], 
                         mode='lines', name='Deaths'))
fig.add_trace(go.Scatter(x=df_t['date'], y=df_t['hospitalizedIncrease'], 
                         mode='lines', name='Hospitalized'))
fig.add_trace(go.Scatter(x=df_t['date'], y=df_t['totalTestResultsIncrease'], 
                         mode='lines', name='Total Tests'))
fig.update_layout(
        xaxis_title="",
        yaxis_title="Cases Count in Log Scale",
        title = 'Time Series - Confirmed, Deaths & Recovered Cases [Daily Increase] in USA',
        yaxis_type='log'
    )

fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        #bgcolor="LightSteelBlue",
        bordercolor="silver",
        borderwidth=1
    )
)
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=df_t['date'], y=df_t['recovered']/df_t['positive']*100, 
                         mode='lines', name='Recovery Rate'))
fig.add_trace(go.Scatter(x=df_t['date'], y=df_t['death']/df_t['positive']*100, 
                         mode='lines', name='Death Rate'))

fig.update_layout(
        xaxis_title="",
        yaxis_title="Recovery/Death Rate Percentage (%)",
        title = 'Time Series - Recovery and Death Rate in USA',
        yaxis_type='log'
    )

fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        #font=dict(
        #    family="sans-serif",
        #    size=12,
        #    color="black"
        #),
        #bgcolor="LightSteelBlue",
        bordercolor="silver",
        borderwidth=1
    )
)
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

In [None]:
df_t = df_us_states_test.groupby(['state_name', 'date']).sum().reset_index()

top_count = 10
top_confirmed = df_us_states.sort_values('Confirmed', ascending=False).head(top_count)['Province_State'].to_list()
top_deaths = df_us_states.sort_values('Deaths', ascending=False).head(top_count)['Province_State'].to_list()
top_recovered = df_us_states.sort_values('Recovered', ascending=False).head(top_count)['Province_State'].to_list()
#print (top_confirmed)
#print (top_deaths)
#print (top_recovered)

df_top_confirmed = df_t[df_t.state_name.isin(top_confirmed)]
df_top_deaths = df_t[df_t.state_name.isin(top_deaths)]
df_top_recovered = df_t[df_t.state_name.isin(top_recovered)]

def get_top(state_list):    
    df_top = df_t[df_t['state_name'].isin(state_list)]
    df_top = df_top.groupby(['state_name', 'date']).sum()
    return df_top.reset_index()
    '''df_top10 = pd.DataFrame()
    for state, df_new in df_top.groupby(level=0):
        # 1. The cases count is cummulative in the dataset.
        #    Hence, we calculate the difference between current row and next row value
        # 2. Some rows after diff() were showing negative values.
        #    This is because in some cases, the next day entry had lesser value than current day entry.
        #    Therefore, used abs() to make them positive.
        #    Hoping that the dataset is corrected later on.
        df_new = df_new.diff().fillna(df_new).abs()
        df_top10 = df_top10.append(df_new, ignore_index=False)
    return df_top10.reset_index()'''

#df_top10_confirmed_daily = get_top10(top10_confirmed_country_list)
#df_top10_deaths_daily = get_top10(top10_deaths_country_list)
#df_top10_recovered_daily = get_top10(top10_recovered_country_list)
#df_top10_confirmed_daily.head(2)

df_top_confirmed.head(2)

In [None]:
fig = px.line(df_top_confirmed, x="date", y="positive", color="state_name")
fig.update_layout(
    title='Time Series - Confirmed Cases: Top '+str(top_count)+' States',
    xaxis_title='',
    yaxis_title='Confirmed Cases in Log Scale',
    yaxis_type='log'
)
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

In [None]:
fig = px.line(df_top_confirmed, x="date", y="positiveIncrease", color="state_name")
fig.update_layout(
    title='Time Series - Confirmed Cases [Daily Increase]: Top '+str(top_count)+' States',
    xaxis_title='',
    yaxis_title='Confirmed Cases in Log Scale',
    yaxis_type='log'
)
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

In [None]:
fig = px.line(df_top_confirmed, x="date", y="death", color="state_name")
fig.update_layout(
    title='Time Series - Death Cases: Top '+str(top_count)+' States',
    xaxis_title='',
    yaxis_title='Death Cases in Log Scale',
    yaxis_type='log'
)
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

In [None]:
fig = px.line(df_top_confirmed, x="date", y="total", color="state_name")
fig.update_layout(
    title='Time Series - People Tested: Top '+str(top_count)+' States',
    xaxis_title='',
    yaxis_title='Number of People Tested in Log Scale',
    yaxis_type='log'
)
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

In [None]:
fig = px.line(df_top_confirmed, x="date", y="totalTestResultsIncrease", color="state_name")
fig.update_layout(
    title='Time Series - People Tested [Daily Increase]: Top '+str(top_count)+' States',
    xaxis_title='',
    yaxis_title='Number of People Tested in Log Scale',
    yaxis_type='log'
)
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

# Heat Map of US States

## Maps using Plotly Library

In [None]:
# https://plotly.com/python/choropleth-maps/
data = df_us_states.copy()
data['Confirmed_Log'] = np.log2(df_us_states['Confirmed']+1)
data['Mortality_Rate'] = np.round(data['Mortality_Rate'], 2)
fig = px.choropleth(data, 
                    locations='State_Code',
                    locationmode="USA-states",
                    scope="usa",
                    color='Confirmed', # a column in the dataset
                    hover_name='Province_State', # column to add to hover information
                    hover_data=['Confirmed', 'Deaths', 'Recovered', 'Mortality_Rate'],
                    color_continuous_scale=px.colors.sequential.Sunsetdark)
fig.update_layout(title_text="Heat Map - Confirmed Cases in US States"),
fig.update_coloraxes(colorbar_title="<b>Color</b><br>Confirmed Cases")
#fig.update(layout_coloraxis_showscale=False)
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

In [None]:
# https://plotly.com/python/choropleth-maps/
data = df_us_states.copy()
data['Cases_Log'] = np.log2(df_us_states['Cases per million']+1)
data['Mortality_Rate'] = np.round(data['Mortality_Rate'], 2)
fig = px.choropleth(data, 
                    locations='State_Code',
                    locationmode="USA-states",
                    scope="usa",
                    color='Cases per million', # a column in the dataset
                    hover_name='Province_State', # column to add to hover information
                    hover_data=['Confirmed', 'Deaths', 'Recovered', 'Mortality_Rate',\
                                'Population (million)', 'Cases per million', 'Deaths per million'],
                    color_continuous_scale=px.colors.sequential.Sunsetdark,
                   )
fig.update_layout(title_text="Heat Map - Confirmed Cases (per million) in US States"),
fig.update_coloraxes(colorbar_title="<b>Color</b><br>Cases per million")
#fig.update(layout_coloraxis_showscale=False)
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.show()

In [None]:
df_us_county_covid_latest.head(2)

In [None]:
usa_counties_geo_json = '/kaggle/input/country-state-geo-location/us-counties.json'
with open(usa_counties_geo_json) as response:    
    usa_counties_geo = json.load(response)

df_us_county_covid_latest['cases_log'] = np.log(df_us_county_covid_latest['cases']+1)
fig = px.choropleth(df_us_county_covid_latest, 
                    geojson=usa_counties_geo, 
                    locations='fips', 
                    color='cases_log',
                    color_continuous_scale=px.colors.sequential.Sunsetdark,
                    #range_color=(0, 22),
                    scope="usa",
                    hover_name='county', # column to add to hover information
                    hover_data=['state', 'cases', 'deaths', 'population',\
                        'population (million)', 'cases per million', 'deaths per million'],
                          )
fig.update_layout(margin={"r":0,"l":0,"b":0})
fig.update_layout(title_text="Heat Map - Confirmed Cases in US Counties"),
fig.update_coloraxes(colorbar_title="<b>Color</b><br>Cases<br>Log Scale")
fig.show()

## Maps using Folium & Leaflet.js Library

In [None]:
usa_geo_json = '/kaggle/input/country-state-geo-location/us-states.json'
with open(usa_geo_json) as f:
  usa_geo = json.load(f)

data = df_us_states.copy()
for index, item in enumerate(usa_geo['features']):
    row = data[data['State_Code'] == item['id']]
    if row.empty: continue # skip for states that are not present in the cases dataset
    usa_geo['features'][index]['properties']['Confirmed'] = str(row.iloc[0]['Confirmed'])
    usa_geo['features'][index]['properties']['Deaths'] = str(row.iloc[0]['Deaths'])
    usa_geo['features'][index]['properties']['Recovered'] = str(row.iloc[0]['Recovered'])
    usa_geo['features'][index]['properties']['Mortality Rate'] = str(np.round(row.iloc[0]['Mortality_Rate'],2)) + '%'
    usa_geo['features'][index]['properties']['Recovery Rate'] = str(np.round(row.iloc[0]['Recovered'] / row.iloc[0]['Confirmed'] * 100, 2)) + '%'

print (usa_geo['features'][0]['properties'])

In [None]:
# logarithmic value is taken to avoid skewness
# as NY cases count is very much higher than the rest of the states
data['Confirmed_Log'] = np.log(data['Confirmed']+1)

# create a plain usa map
usa_map = folium.Map(location=[37, -102], tiles="cartodbpositron", zoom_start=4, max_zoom=6, min_zoom=3)

# add tile layers to the map
tiles = ['stamenwatercolor', 'cartodbpositron', 'openstreetmap', 'stamenterrain']
for tile in tiles:
    folium.TileLayer(tile).add_to(usa_map)

choropleth = folium.Choropleth(
    geo_data=usa_geo,
    name='choropleth',
    data=data,
    columns=['State_Code', 'Confirmed_Log'],
    key_on='feature.id',
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    nan_fill_color='#fef0d9',
    nan_fill_opacity=0.2,
    legend_name='Confirmed Cases (Log Scale)',
    highlight=True,
    line_color='black'
).add_to(usa_map)

style_function = "font-size: 15px; font-weight: bold"
choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(
        fields=['name', 'Confirmed', 'Deaths', 'Recovered', 'Mortality Rate', 'Recovery Rate'],
        aliases=['State','Confirmed', 'Deaths', 'Recovered', 'Mortality Rate', 'Recovery Rate'], 
        labels=True
    )
)

folium.LayerControl(collapsed=True).add_to(usa_map)
usa_map

In [None]:
df_us_county_covid_latest['fips_int'] = df_us_county_covid_latest['fips'].fillna(0)
df_us_county_covid_latest['fips_int'] = df_us_county_covid_latest['fips_int'].astype(int)
#print (df_us_county_covid_latest['fips_int'].head())

for index, item in enumerate(usa_counties_geo['features']):
    row = df_us_county_covid_latest[df_us_county_covid_latest['fips_int'] == int(item['id'])]
    if row.empty: continue # skip for states that are not present in the cases dataset
    usa_counties_geo['features'][index]['properties']['fips'] = str(row.iloc[0]['fips'])
    usa_counties_geo['features'][index]['properties']['state'] = str(row.iloc[0]['state'])
    usa_counties_geo['features'][index]['properties']['cases'] = str(row.iloc[0]['cases'])
    usa_counties_geo['features'][index]['properties']['deaths'] = str(row.iloc[0]['deaths'])
    usa_counties_geo['features'][index]['properties']['population'] = str(row.iloc[0]['population'])
    usa_counties_geo['features'][index]['properties']['population (million)'] = str(row.iloc[0]['population (million)'])
    usa_counties_geo['features'][index]['properties']['cases per million'] = str(row.iloc[0]['cases per million'])
    usa_counties_geo['features'][index]['properties']['deaths per million'] = str(row.iloc[0]['deaths per million'])

print (usa_counties_geo['features'][0]['properties'])

In [None]:
# logarithmic value is taken to avoid skewness
# as NY cases count is very much higher than the rest of the states
#data['Confirmed_Log'] = np.log(data['Confirmed']+1)

# create a plain usa map
usa_map = folium.Map(location=[37, -102], tiles="cartodbpositron", zoom_start=4, max_zoom=6, min_zoom=3)

# add tile layers to the map
tiles = ['stamenwatercolor', 'cartodbpositron', 'openstreetmap', 'stamenterrain']
for tile in tiles:
    folium.TileLayer(tile).add_to(usa_map)

#df_us_county_covid_latest['fips'] = df_us_county_covid_latest['fips'].fillna(0, inplace=True)
#df_us_county_covid_latest['fips'] = df_us_county_covid_latest['fips'].astype(int)
df_us_county_covid_latest['cases_log'] = np.log2(df_us_county_covid_latest['cases']+1)

choropleth = folium.Choropleth(
    geo_data=usa_counties_geo,
    name='choropleth',
    data=df_us_county_covid_latest,
    columns=['county', 'cases_log'],
    key_on='feature.properties.name',
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    nan_fill_color='#fef0d9',
    nan_fill_opacity=0.2,
    legend_name='Confirmed Cases (Log Scale)',
    highlight=True,
    line_color='black'
).add_to(usa_map)

style_function = "font-size: 15px; font-weight: bold"
choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(
        fields=['name', 'state', 'cases', 'deaths', 'population', 'population (million)', 'cases per million', 'deaths per million'],
        aliases=['County', 'State', 'Cases', 'Deaths', 'Population', 'Population (million)', 'Cases per million', 'Deaths per million'],
        labels=True
    )
)

folium.LayerControl(collapsed=True).add_to(usa_map)
usa_map

# Progression over Time

## Confirmed Cases - Animation over Time

In [None]:
'''df_temp = df_cases_time.groupby(['Last_Update', 'Country_Region'])['Confirmed', 'Deaths'].max().reset_index()
df_temp["Last_Update"] = pd.to_datetime(df_temp["Last_Update"]).dt.strftime('%m/%d/%Y')
df_temp['Confirmed'].fillna(0, inplace=True)
df_temp.sort_values('Confirmed', ascending=False).head()'''


df_t["date_reported"] = pd.to_datetime(df_t["date"]).dt.strftime('%m/%d/%Y')
df_t['state'] = df_t['state_name'].apply(lambda x: get_state_codes(x))
# while calculating mortality rate, adding 1 to confirmed to avoid divide by zero
df_t['mortality_rate'] = df_t['death'] / (df_t['positive']+1) * 100

df_t2 = df_t.groupby(['date', 'state_name']).max().reset_index()
df_t2.sort_values('date', ascending=False).head()

In [None]:
fig = px.scatter_geo(df_t2, locations="state",
                     locationmode="USA-states",
                     scope="usa",
                     hover_name="state_name", hover_data=["positive", "death", "recovered"], animation_frame="date",
                     color=np.power(df_t2["positive"]+1, 0.3)-1, size=np.power(df_t2["positive"]+1, 0.3),
                     range_color= [0, max(np.power(df_t2["positive"]+1, 0.3)+1)],
                     title="US COVID-19 Progression Animation Over Time",
                     color_continuous_scale=px.colors.sequential.Plasma,
                     #projection="natural earth"
                    )
#fig.update_coloraxes(colorscale="hot")
#fig.update(layout_coloraxis_showscale=False)
fig.update_coloraxes(colorbar_title="Color<br>Confirmed Cases<br>in reduced Scale")
fig.show()

In [None]:
fig = px.choropleth(df_t2, locations="state", locationmode='USA-states', scope="usa",
                     hover_name="state_name", hover_data=["positive", "death", "recovered"], animation_frame="date",
                     color=np.log(df_t2["positive"]+1), 
                     title="US COVID-19 Progression Animation Over Time",
                     color_continuous_scale=px.colors.sequential.Plasma,
                   )
#fig.update_coloraxes(colorscale="hot")
#fig.update(layout_coloraxis_showscale=False)
fig.update_coloraxes(colorbar_title="Color<br>Confirmed Cases<br>in Log Scale")
fig.show()