<a href="https://colab.research.google.com/github/jarrywei/Repo/blob/master/covid_dashboard.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# COVID-19 Analysis
I will be analyzing COVID-19 data using data from the <a href="https://covidtracking.com/">The COVID Tracking Project</a>.

In [None]:
import os
import datetime
import pandas as pd
import numpy as np
import datetime as dt
import plotly.graph_objects as go
import plotly.io as pio
import plotly.express as px

In [None]:
pd.options.display.max_columns = 75
pd.options.display.max_rows = 350
pd.options.display.max_colwidth = 50

# US National Data

I'll start this analysis by first looking at the US National Data. The COVID Tracking Project has a public API and we can simply pass in the URL to the pandas read json function to get the data into a DataFrame.

I want to calculate a few key metrics and will create a few variables to hold some dates. Specifically, they will hold yesterdays date, the day before yesterday, and one week ago.

In [None]:
us_daily_df = pd.read_json('https://api.covidtracking.com/v1/us/daily.json')
 
us_daily_df['date'] = pd.to_datetime(us_daily_df['date'],format='%Y%m%d')

us_daily_df.set_index('date', inplace=True)
us_daily_df.sort_index(inplace=True)

In [None]:
us_daily_df_new_per_day = us_daily_df.fillna(0)[['positiveIncrease','hospitalizedCumulative','deathIncrease']]
 
us_daily_df_new_per_day_rolling = us_daily_df_new_per_day.rolling(window=7).mean()

In [None]:
#Here is a quick way to generate a date range to pull key stats for certain time ranges 
dates_days = pd.date_range(start='1/31/2020', end= dt.datetime.now())
 
yesterday = dates_days[-2]
two_days_ago = dates_days[-3]
one_week_ago = dates_days[-8]


Taking a look at the data, we can see that it is cumulative. Since I want to analyze number of new cases and deaths each day, I'll have to subtract the previous row. I will take advantage of the <code>DataFrame.diff()</code> to handle that calculation.

The following functions add some formatting and context to the key stats.

In [None]:
def change_text(change):
    '''
    These functions will appear below the key stats to show the trend
    '''
    if change > 0:
        return "↑ " + str("{:.0%}".format(change)+' 7-day Trend')
    elif change < 0:
        return "↓ " + str("{:.0%}".format(change)+' 7-day Trend')
    elif change == 0:
        return "No Change 7-day Trend"
    else:
        pass
 
def change_color(change):
    '''
    This function defines the % change subtext in the top stats part
    '''
    if change > 0:
        return "red"
    elif change < 0:
        return "green"
    elif change == 0:
        return "grey"
    else:
        pass

Next, I will return the positive, death, and hospitalized stats for yesterday and then compare that to the 7 day trend.

In [None]:
#Here is where I will be dynamically calculating yesterdays stats along with the trends.
 
us_cases_yesterday = us_daily_df_new_per_day.loc[yesterday]['positiveIncrease']
us_cases_death = us_daily_df_new_per_day.loc[yesterday]['deathIncrease']
us_cases_hospitalized = us_daily_df_new_per_day.loc[yesterday]['hospitalizedCumulative']
 
us_cases_seven_day_trend = us_daily_df_new_per_day_rolling.loc[yesterday]['positiveIncrease'] / us_daily_df_new_per_day_rolling.loc[one_week_ago]['positiveIncrease'] - 1
us_death_seven_day_trend = us_daily_df_new_per_day_rolling.loc[yesterday]['deathIncrease'] / us_daily_df_new_per_day_rolling.loc[one_week_ago]['deathIncrease'] - 1
us_hospitalized_seven_day_trend = us_daily_df_new_per_day_rolling.loc[yesterday]['hospitalizedCumulative'] / us_daily_df_new_per_day_rolling.loc[one_week_ago]['hospitalizedCumulative'] - 1

Now, I will take the stats from yesterday and the trends and put them into an HTML table using Plotly.

In [None]:
fig0 = go.Figure(data=[go.Table(
    header=dict(values=['{:,}'.format(int(us_cases_yesterday)), '{:,}'.format(int(us_cases_hospitalized)), '{:,}'.format(int(us_cases_death))],
                fill_color='white',
                align='center',
               font_size=20,
                   height=25),
    cells=dict(values=[['New  Cases Yesterday', change_text(us_cases_seven_day_trend)], # 1st column
                       ['Deaths Yesterday', change_text(us_death_seven_day_trend)],
                      ['Hospitalized Yesterday',change_text(us_hospitalized_seven_day_trend)]], # 2nd column
               fill_color='white',
               font_color=[['black',change_color(us_cases_seven_day_trend)],['black',change_color(us_death_seven_day_trend)],['black',change_color(us_hospitalized_seven_day_trend)]],
               align='center')),
    
])
 
fig0.update_layout(height=400, autosize=True, title={
        'text': "US Summary Stats<br>"+ yesterday.strftime("%B %d, %Y"),
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font_size':20})
 
fig0.show()

Next, I will chart the new daily positive cases along with the seven day moving average. I'll repeat this for the deaths and hospitalized as well. New daily data has some noise so the moving average is a better indicator of the trend. An example of the noise in the daily average is that you will notice how cases go down over the weekend which probably has to do with reporting than less people getting sick.

I am using the charting library from <a href="https://plotly.com/">Plotly</a> which is an interactive charting library for Python. It uses pretty similiar syntax to seaborn and the charts plotly spits out do not require nearly as much wrangling as they do in Seaborn/MatPlotLib. These interactive charts give you the ability to do the following:
<ul>
    <li>Hover over to see the data labels</li>
    <li>Click and drag to zoom into an area</li>
    <li>Click on the legend items to hide that chart</li>
</ul>

There are a few others cool things you can do with the chart as well but I'll leave it up to you to go learn more about it by playing around with them.

In [None]:
fig1 = go.Figure()
fig1.add_trace(go.Bar(x=us_daily_df_new_per_day.index, 
                      y=us_daily_df_new_per_day['positiveIncrease'], 
                      marker_color='lightblue', 
                      name= 'Positive Cases',
                      hovertemplate = '%{y:,.0f}'
                     ))
fig1.add_trace(go.Scatter(x=us_daily_df_new_per_day_rolling.index, 
                          y=us_daily_df_new_per_day_rolling['positiveIncrease'], 
                          marker_color='#000080', 
                          name = '7 Day Moving Avg.',
                         hovertemplate ='%{y:,.0f}'))


fig1.update_layout(template='none', height=500, autosize=True,hovermode="x", title='New Daily COVID Cases',legend=dict(
    orientation="h",
    yanchor="bottom",
    y=1.02,
    xanchor="right",
    x=1
))

fig1.show()

In [None]:
fig10 = go.Figure()

fig10.add_trace(go.Scatter(x=us_daily_df.index, 
                          y=us_daily_df['positive'], 
                          marker_color='#000080', 
                          name = 'Total Cases',
                         hovertemplate ='%{y:,.0f}'))


fig10.update_layout(template='none', height=500, autosize=True,hovermode="x", title='Cumulative COVID Cases in the US',legend=dict(
    orientation="h",
    yanchor="bottom",
    y=1.02,
    xanchor="right",
    x=1
))

fig10.show()

In [None]:
fig2 = go.Figure()
fig2.add_trace(go.Bar(x=us_daily_df_new_per_day.index, 
                      y=us_daily_df_new_per_day['deathIncrease'], 
                      marker_color='lightblue', 
                      name= 'Deaths',
                     hovertemplate ='%{y:,.0f}'))

fig2.add_trace(go.Scatter(x=us_daily_df_new_per_day_rolling.index, 
                          y=us_daily_df_new_per_day_rolling['deathIncrease'], 
                          marker_color='#000080', 
                          name = '7 Day Moving Average',
                         hovertemplate ='%{y:,.0f}'))


fig2.update_layout(template='none', height=500, autosize=True, hovermode= 'x', title='New Daily COVID Deaths',legend=dict(
    orientation="h",
    yanchor="bottom",
    y=1.02,
    xanchor="right",
    x=1
))

fig2.show()

In [None]:
fig3 = go.Figure()


fig3.add_trace(go.Bar(x=us_daily_df.index, 
                      y=us_daily_df['hospitalizedCurrently'], 
                      marker_color='lightblue', 
                      name= 'New Daily Deaths',
                     hovertemplate ='%{y:,.0f}'))

df = us_daily_df[['hospitalizedCurrently']].rolling(window=7).mean()
fig3.add_trace(go.Scatter(x=df.index, 
                          y=df['hospitalizedCurrently'], 
                          marker_color='#000080', 
                          name = '7 Day Moving Average',
                         hovertemplate ='%{y:,.0f}'))


fig3.update_layout(template='none', height=500, autosize=True, hovermode= 'x', title='Currently Hospitalized',legend=dict(
    orientation="h",
    yanchor="bottom",
    y=1.02,
    xanchor="right",
    x=1
))

fig3.show()

In [None]:
fig31 = go.Figure()


fig31.add_trace(go.Bar(x=us_daily_df.index, 
                      y=us_daily_df['totalTestResultsIncrease'], 
                      marker_color='lightblue', 
                      name= 'New Daily Tests',
                     hovertemplate ='%{y:,.0f}'))

df = us_daily_df[['totalTestResultsIncrease']].rolling(window=7).mean()
fig31.add_trace(go.Scatter(x=df.index, 
                          y=df['totalTestResultsIncrease'], 
                          marker_color='#000080', 
                          name = '7 Day Moving Average',
                         hovertemplate ='%{y:,.0f}'))


fig31.update_layout(template='none', height=500, autosize=True, hovermode= 'x', title='New Daily COVID Tests',legend=dict(
    orientation="h",
    yanchor="bottom",
    y=1.02,
    xanchor="right",
    x=1
))

fig31.show()

# State Data

Now we can turn our attention to the state data to see where the virus is hitting the hardest. There is a seperate API link to pull the state data.

In [None]:
us_states_cases_df = pd.read_json('https://api.covidtracking.com/v1/states/daily.json')

In [None]:
us_states_cases_df['date'] = pd.to_datetime(us_states_cases_df['date'],format='%Y%m%d')

us_states_cases_df.set_index(['state','date'], inplace=True)
us_states_cases_df.sort_index(inplace=True)


us_states_cases_df = us_states_cases_df[['positiveIncrease','hospitalizedCumulative','deathIncrease']].fillna(0)

In [None]:
# #The State data is also cumulative which means we have to subtract the rows from the previous row to get the new daily numbers. However, there is a twist. We need to make sure that we subtract the row only if it's from the same state. So in order to do that, I will first re-index the DataFrame with the stat and the date as a multi-index. A multi-index just means that there are two levels of indexing.

# us_states_cases_df['positive_diffs'] = np.nan
# us_states_cases_df['hospitalizedCumulative_diffs'] = np.nan
# us_states_cases_df['death_diffs'] = np.nan
# us_states_cases_df['positive_diffs_rolling'] = np.nan
# us_states_cases_df['hospitalizedCumulative_diffs_rolling'] = np.nan
# us_states_cases_df['death_diffs_rolling'] = np.nan

# for idx in us_states_cases_df.index.levels[0]:
#   us_states_cases_df.loc[idx]['positive_diffs'] = us_states_cases_df.loc[idx]['positive'].diff()
#   us_states_cases_df.loc[idx]['hospitalizedCumulative_diffs'] = us_states_cases_df.loc[idx]['hospitalizedCumulative'].diff()
#   us_states_cases_df.loc[idx]['death_diffs'] = us_states_cases_df.loc[idx]['death'].diff()
#   #calculate the rolling average
#   us_states_cases_df.loc[idx]['positive_diffs_rolling'] = us_states_cases_df.loc[idx]['positive_diffs'].rolling(window=7).mean()
#   us_states_cases_df.loc[idx]['hospitalizedCumulative_diffs_rolling'] = us_states_cases_df.loc[idx]['hospitalizedCumulative_diffs'].rolling(window=7).mean()
#   us_states_cases_df.loc[idx]['death_diffs_rolling'] = us_states_cases_df.loc[idx]['death_diffs'].rolling(window=7).mean()



Now, we can create a function that loops through the DataFrame, isolates a state, and then runs the <code>DataFrame.diff()</code> function along with the <code>DataFrame.rolling()</code> function to calculate the seven day moving average. 

The number of COVID cases will naturally be higher in states with a higher population so I am bringing in data to normalize it so we can see which states are being hit the hardest per capita. I grabbed the state population data from this <a href="https://www.infoplease.com/us/states/state-population-by-rank">site</a>. In order to join the data with my original DataFrame I need to translate the state abbreviations to state names. So I need to create a mapping to do that. I use this <a href="https://www.ssa.gov/international/coc-docs/states.html">site</a> for the translations. An easy way to get these into a DataFrame is to highlight and copy the data and then use the the <code>pd.from_clipboard()</code> to create a DataFrame.

In [None]:
state_codes = pd.DataFrame({'state_name': {0: 'Alabama', 1: 'Alaska', 2: 'American Samoa', 3: 'Arizona', 4: 'Arkansas', 5: 'California', 6: 'Colorado', 7: 'Connecticut', 8: 'Delaware', 9: 'District Of Columbia', 10: 'Florida', 11: 'Georgia', 12: 'Guam', 13: 'Hawaii', 14: 'Idaho', 15: 'Illinois', 16: 'Indiana', 17: 'Iowa', 18: 'Kansas', 19: 'Kentucky', 20: 'Louisiana', 21: 'Maine', 22: 'Maryland', 23: 'Massachusetts', 24: 'Michigan', 25: 'Minnesota', 26: 'Mississippi', 27: 'Missouri', 28: 'Montana', 29: 'Nebraska', 30: 'Nevada', 31: 'New Hampshire', 32: 'New Jersey', 33: 'New Mexico', 34: 'New York', 35: 'North Carolina', 36: 'North Dakota', 37: 'Northern Mariana Is', 38: 'Ohio', 39: 'Oklahoma', 40: 'Oregon', 41: 'Pennsylvania', 42: 'Puerto Rico', 43: 'Rhode Island', 44: 'South Carolina', 45: 'South Dakota', 46: 'Tennessee', 47: 'Texas', 48: 'Utah', 49: 'Vermont', 50: 'Virginia', 51: 'Virgin Islands', 52: 'Washington', 53: 'West Virginia', 54: 'Wisconsin', 55: 'Wyoming'}, 'abbreviation': {0: 'AL', 1: 'AK', 2: 'AS', 3: 'AZ', 4: 'AR', 5: 'CA', 6: 'CO', 7: 'CT', 8: 'DE', 9: 'DC', 10: 'FL', 11: 'GA', 12: 'GU', 13: 'HI', 14: 'ID', 15: 'IL', 16: 'IN', 17: 'IA', 18: 'KS', 19: 'KY', 20: 'LA', 21: 'ME', 22: 'MD', 23: 'MA', 24: 'MI', 25: 'MN', 26: 'MS', 27: 'MO', 28: 'MT', 29: 'NE', 30: 'NV', 31: 'NH', 32: 'NJ', 33: 'NM', 34: 'NY', 35: 'NC', 36: 'ND', 37: 'MP', 38: 'OH', 39: 'OK', 40: 'OR', 41: 'PA', 42: 'PR', 43: 'RI', 44: 'SC', 45: 'SD', 46: 'TN', 47: 'TX', 48: 'UT', 49: 'VT', 50: 'VA', 51: 'VI', 52: 'WA', 53: 'WV', 54: 'WI', 55: 'WY'}})

In [None]:
state_populations = pd.DataFrame({'State': {0: 'California', 1: 'Texas', 2: 'Florida', 3: 'New York', 4: 'Illinois', 5: 'Pennsylvania', 6: 'Ohio', 7: 'Georgia', 8: 'North Carolina', 9: 'Michigan', 10: 'New Jersey', 11: 'Virginia', 12: 'Washington', 13: 'Arizona', 14: 'Massachusetts', 15: 'Tennessee', 16: 'Indiana', 17: 'Missouri', 18: 'Maryland', 19: 'Wisconsin', 20: 'Colorado', 21: 'Minnesota', 22: 'South Carolina', 23: 'Alabama', 24: 'Louisiana', 25: 'Kentucky', 26: 'Oregon', 27: 'Oklahoma', 28: 'Connecticut', 29: 'Utah', 30: 'Iowa', 31: 'Nevada', 32: 'Arkansas', 33: 'Mississippi', 34: 'Kansas', 35: 'New Mexico', 36: 'Nebraska', 37: 'West Virginia', 38: 'Idaho', 39: 'Hawaii', 40: 'New Hampshire', 41: 'Maine', 42: 'Montana', 43: 'Rhode Island', 44: 'Delaware', 45: 'South Dakota', 46: 'North Dakota', 47: 'Alaska', 48: 'DC', 49: 'Vermont', 50: 'Wyoming'}, 'July 2019 Estimate': {0: 39512223.0, 1: 28995881.0, 2: 21477737.0, 3: 19453561.0, 4: 12671821.0, 5: 12801989.0, 6: 11689100.0, 7: 10617423.0, 8: 10488084.0, 9: 9986857.0, 10: 8882190.0, 11: 8535519.0, 12: 7614893.0, 13: 7278717.0, 14: 6949503.0, 15: 6833174.0, 16: 6732219.0, 17: 6137428.0, 18: 6045680.0, 19: 5822434.0, 20: 5758736.0, 21: 5639632.0, 22: 5148714.0, 23: 4903185.0, 24: 4648794.0, 25: 4467673.0, 26: 4217737.0, 27: 3956971.0, 28: 3565287.0, 29: 3205958.0, 30: 3155070.0, 31: 3080156.0, 32: 3017825.0, 33: 2976149.0, 34: 2913314.0, 35: 2096829.0, 36: 1934408.0, 37: 1792147.0, 38: 1787065.0, 39: 1415872.0, 40: 1359711.0, 41: 1344212.0, 42: 1068778.0, 43: 1059361.0, 44: 973764.0, 45: 884659.0, 46: 762062.0, 47: 731545.0, 48: 705749.0, 49: 623989.0, 50: 578759.0}})

In [None]:
us_state_and_rolling_reset = us_states_cases_df.reset_index()

us_states_daily_df_codes = pd.merge(us_state_and_rolling_reset,state_codes,how='inner',left_on='state',right_on='abbreviation')

us_states_daily_df_population = pd.merge(us_states_daily_df_codes,state_populations,how='inner',left_on='state_name',right_on='State')


To normalize the COVID cases by state population, I'll divide the new cases by the state poplution and then divide that by 100,000 to get new cases per 100,000 people.

In [None]:
us_states_daily_df_population['cases_per_hundred_thousand'] = us_states_daily_df_population['positiveIncrease'] / (us_states_daily_df_population['July 2019 Estimate'] / 100000)

Now let's take a look at which states have the highest infection rate per 100,000 people.

In [None]:
##Yesterdays average
#df = us_states_daily_df_population[us_states_daily_df_population['date'] == yesterday]

##One week average
df = us_states_daily_df_population[us_states_daily_df_population['date'] >= one_week_ago].groupby(['state','state_name']).mean().reset_index()

df['text'] = df.apply(lambda x: x['state_name'] +'<br>' + 'New Cases Last 7 Days Avg: ' + '{:,.0f}'.format(x['positiveIncrease']) + '<br>' + 'Per 100,000 people: ' + '{:,.0f}'.format(x['cases_per_hundred_thousand']),axis=1)
#df['state_name'] + '<br>' + 'Last 7 Days Average ' + df['positive_new'].astype('str') + '<br>' + 'Per 100,000 ' + df['cases_per_hundred_thousand'].astype('str')

fig4 = go.Figure(data=go.Choropleth(
    locations=df['state'],
    z=df['cases_per_hundred_thousand'],
    locationmode='USA-states',
    colorscale='Reds',
    autocolorscale=False,
    text=df['text'], # hover text
    marker_line_color='white', # line markers between states
    colorbar_title="Cases per 100,000 people",
    hoverinfo="text"
))

fig4.update_layout(
    title_text='COVID-19 Cases',
    clickmode='event+select',
    geo = dict(
        scope='usa',
        projection=go.layout.geo.Projection(type = 'albers usa'),
        showlakes=True, # lakes
        lakecolor='rgb(255, 255, 255)')
)


fig4.show()

In [None]:
dropdown_state = 'MN'
indicator='New Cases'

single_state_data_api = pd.read_json('https://api.covidtracking.com/v1/states/'+dropdown_state+'/daily.json')

single_state_data_api['date'] = pd.to_datetime(single_state_data_api['date'],format='%Y%m%d')
single_state_data_api.set_index('date', drop=True, inplace=True)
single_state_data_api.sort_index(inplace=True)

single_state_data_api_new_per_day = single_state_data_api[['positiveIncrease','hospitalizedCumulative','deathIncrease']].fillna(0)

def y_axis_cat1(x):
  if x == 'New Cases':
    return 'positiveIncrease'
  elif x == 'Deaths':
    return 'deathIncrease'
  elif x == 'Hospitalized':
    return 'hospitalizedCumulative'

fig200 = go.Figure()
fig200.add_trace(go.Bar(x=single_state_data_api_new_per_day.index, 
          y=single_state_data_api_new_per_day[y_axis_cat1(indicator)], 
          marker_color='lightblue', 
          name= 'Positive Cases',
          hovertemplate = '%{y:,.0f}'
          ))
fig200.add_trace(go.Scatter(x=single_state_data_api_new_per_day.index, 
            y=single_state_data_api_new_per_day[y_axis_cat1(indicator)].rolling(window=7).mean(), 
            marker_color='#000080', 
            name = '7 Day Moving Avg.',
            hovertemplate ='%{y:,.0f}'))

fig200.update_layout(template='none', height=500, autosize=True,hovermode="x", title=dropdown_state+' New Daily COVID Cases',legend=dict(
  orientation="h",
  yanchor="bottom",
  y=1.02,
  xanchor="right",
  x=1
  ))

fig200.show()

Next, I am creating ten subplots to take a look at the states with the highest infection rates.

In [None]:
df = us_states_daily_df_population[us_states_daily_df_population['date'] >= one_week_ago].groupby(['state','state_name']).mean().reset_index()

state_list_sorted =  df.sort_values(by='cases_per_hundred_thousand', ascending=False)['state'].to_list()
state_names_list_sorted = df.sort_values(by='cases_per_hundred_thousand', ascending=False)['state_name'].to_list()

from plotly.subplots import make_subplots
import plotly.graph_objects as go

fig5 = make_subplots(rows=len(state_list_sorted[:10]), cols=1, subplot_titles=state_names_list_sorted[:10])

for i in state_list_sorted[:10]:
    # df = us_state_dailus_states_daily_df_populationy_cases.loc[i]
    # df_rolling = us_state_daily_cases_rolling.loc[i]
    df = us_states_daily_df_population.set_index(['state','date']).loc[i]
    fig5.add_trace(go.Bar(x=df.index, 
                          y=df['positiveIncrease'], 
                          marker_color='lightblue', 
                          name= 'New Daily Cases -'+i,
                          hovertemplate ='%{y:,.0f}'),
                          row=state_list_sorted.index(i)+1, col=1)
    
    fig5.add_trace(go.Scatter(x=df.index, 
                              y=df['positiveIncrease'].rolling(window=7).mean(), 
                              marker_color='#000080', 
                              name = '7 Day Moving Avg.',
                              hovertemplate ='%{y:,.0f}'), 
                              row=state_list_sorted.index(i)+1, col=1)


fig5.update_layout(template='none', height=4000, autosize=True, hovermode= 'x', showlegend=False
                  # title={'text': "Positives by State<br>Sorted by Highest Cases per 100,000 people",'y':0.998,'x':0.5,'xanchor': 'center','yanchor': 'top'}
                  )



fig5.show()   



Here is a look at the ten states with the lowest infection rates per 100,000 people.

In [None]:
state_list_sorted[-5:]

['MI', 'ME', 'MO', 'OR', 'HI']

In [None]:
df = us_states_daily_df_population[us_states_daily_df_population['date'] >= one_week_ago].groupby(['state','state_name']).mean().reset_index()

state_list_sorted =  df.sort_values(by='cases_per_hundred_thousand', ascending=False)['state'].to_list()
state_names_list_sorted = df.sort_values(by='cases_per_hundred_thousand', ascending=False)['state_name'].to_list()

from plotly.subplots import make_subplots
import plotly.graph_objects as go

fig5 = make_subplots(rows=len(state_list_sorted[-10:]), cols=1, subplot_titles=state_names_list_sorted[-10:])

for i in state_list_sorted[:10]:
    # df = us_state_dailus_states_daily_df_populationy_cases.loc[i]
    # df_rolling = us_state_daily_cases_rolling.loc[i]
    df = us_states_daily_df_population.set_index(['state','date']).loc[i]
    fig5.add_trace(go.Bar(x=df.index, 
                          y=df['positiveIncrease'], 
                          marker_color='lightblue', 
                          name= 'New Daily Cases -'+i,
                          hovertemplate ='%{y:,.0f}'),
                          row=state_list_sorted.index(i)+1, col=1)
    
    fig5.add_trace(go.Scatter(x=df.index, 
                              y=df['positiveIncrease'].rolling(window=7).mean(), 
                              marker_color='#000080', 
                              name = '7 Day Moving Avg.',
                              hovertemplate ='%{y:,.0f}'), 
                              row=state_list_sorted.index(i)+1, col=1)


fig5.update_layout(template='none', height=2000, autosize=True, hovermode= 'x', showlegend=False
                  # title={'text': "Positives by State<br>Sorted by Highest Cases per 100,000 people",'y':0.998,'x':0.5,'xanchor': 'center','yanchor': 'top'}
                  )



fig5.show()   



#Countries

In [None]:
from io import StringIO
import pandas as pd
import requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}

url='https://covid.ourworldindata.org/data/owid-covid-data.csv'
s=requests.get(url, headers= headers).text

country_data_df = pd.read_csv(StringIO(s), sep=",", parse_dates=['date'])

# country_data_df = pd.read_csv('https://covid.ourworldindata.org/data/owid-covid-data.csv',parse_dates=['date'])
country_data_df.head()

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,total_cases_per_million,new_cases_per_million,new_cases_smoothed_per_million,total_deaths_per_million,new_deaths_per_million,new_deaths_smoothed_per_million,reproduction_rate,icu_patients,icu_patients_per_million,hosp_patients,hosp_patients_per_million,weekly_icu_admissions,weekly_icu_admissions_per_million,weekly_hosp_admissions,weekly_hosp_admissions_per_million,new_tests,total_tests,total_tests_per_thousand,new_tests_per_thousand,new_tests_smoothed,new_tests_smoothed_per_thousand,positive_rate,tests_per_case,tests_units,total_vaccinations,people_vaccinated,people_fully_vaccinated,new_vaccinations,new_vaccinations_smoothed,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,new_vaccinations_smoothed_per_million,stringency_index,population,population_density,median_age,aged_65_older,aged_70_older,gdp_per_capita,extreme_poverty,cardiovasc_death_rate,diabetes_prevalence,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index
0,AFG,Asia,Afghanistan,2020-02-24,1.0,1.0,,,,,0.026,0.026,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,8.33,38928341.0,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511
1,AFG,Asia,Afghanistan,2020-02-25,1.0,0.0,,,,,0.026,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,8.33,38928341.0,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511
2,AFG,Asia,Afghanistan,2020-02-26,1.0,0.0,,,,,0.026,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,8.33,38928341.0,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511
3,AFG,Asia,Afghanistan,2020-02-27,1.0,0.0,,,,,0.026,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,8.33,38928341.0,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511
4,AFG,Asia,Afghanistan,2020-02-28,1.0,0.0,,,,,0.026,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,8.33,38928341.0,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511


In [None]:
clean_up_cols = ['new_cases_per_million','total_vaccinations_per_hundred','new_vaccinations_smoothed_per_million']
for i in clean_up_cols:
  country_data_df[i] = country_data_df.groupby('iso_code')[i].transform(lambda x: x.ffill())

In [None]:
df = country_data_df[country_data_df['date'] == country_data_df['date'].max()]
#df = country_cases_pop_df[country_cases_pop_df['date'] == dt.datetime(2021,1,22)]


fig20 = go.Figure(data=go.Choropleth(
    locations=df['iso_code'],
    z=df['new_cases_per_million'].astype(float),
    colorscale = 'Sunsetdark',
    marker_line_color='white', # line markers between states
    colorbar_title="Cases per Million People"
))

fig20.update_layout(
    title_text='Cases per Million People - '+dt.datetime.strftime(country_data_df['date'].max(),'%b %-d, %Y'),
    geo = dict(
        #scope='usa',
        projection_type='equirectangular'
        ), height=700, margin=dict(l=50, r=50)
)
fig20.write_html("cases_country.html",full_html=False, include_plotlyjs='cdn')
fig20.show()

In [None]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

lst = df.sort_values(by='new_cases_per_million', ascending=False)['location'].tolist()[:10]

fig22 = make_subplots(rows=10, cols=1, shared_yaxes=True, subplot_titles=(lst))

for j, i in enumerate(lst):
  filt = country_data_df['location'] == i
  fig22.add_trace(go.Scatter(
      x=country_data_df.loc[filt]['date'], 
      y=country_data_df.loc[filt]['new_cases_per_million'].rolling(window=7).mean(),
      mode='lines',
      marker_color='#000080',
      name=i), 
      row=j+1, col=1
      )
  
fig22.update_layout(height=1500, width=800, template='simple_white',
                  #title_text="Cases per 100,000 People per Country", 
                  showlegend=False)

# fig22.add_annotation(text= 'Sorted by countries with the higest number of cases per capita. Country must have more than 1,000 cases.',
#                   xref="paper", yref="paper",
#                   x=0, y=1.035, showarrow=False)

fig22.update_yaxes(range=[0, 1500])
fig22.show()

In [None]:
df = country_data_df[country_data_df['date'] == country_data_df['date'].max()]

fig20 = go.Figure(data=go.Choropleth(
    locations=df['iso_code'],
    z=df['total_vaccinations_per_hundred'].astype(float),
    colorscale = 'Greens',
    marker_line_color='white', # line markers between states
    colorbar_title="Vaccines per 100,000 People"
))

fig20.update_layout(
    title_text='Vaccines per 100,000 People - '+dt.datetime.strftime(df['date'].max(),'%b %-d, %Y'),
    geo = dict(
        #scope='usa',
        projection_type='equirectangular'
        ), height=700, margin=dict(l=50, r=50)
)
fig20.write_html("vaccines_country.html",full_html=False, include_plotlyjs='cdn')
fig20.show()

In [None]:
filt = (country_data_df['date'] == country_data_df['date'].max()) & (country_data_df['location'] != 'World')
top_ten_vaccinated = country_data_df.loc[filt].sort_values(by='total_vaccinations_per_hundred', ascending=False).head(20)

In [None]:
filt = (country_data_df['location'].isin(top_ten_vaccinated['location'].unique().tolist())) & (country_data_df['date'] >= dt.datetime(2020,12,1))

In [None]:
top_20_total_vaccinations = country_data_df.loc[filt][['location','date','total_vaccinations_per_hundred']]

fig = go.Figure()

lst = top_20_total_vaccinations['location'].unique().tolist()

for i in lst:
  filt = (country_data_df['location'] == i) & (country_data_df['date'] >= dt.datetime(2021,1,1))
  df = country_data_df.loc[filt]
  fig.add_trace(go.Scatter(x=df['date'], y=df['total_vaccinations_per_hundred'],opacity=.75 ,mode='lines+markers',name=i))

fig.update_layout(template='simple_white', hovermode='x', title='Total Vaccinations per 100,000 people')
#fig.update_yaxes(type="log")
fig.show()
fig.write_html("vaccine.html",full_html=False, include_plotlyjs='cdn')

In [None]:
fig = go.Figure()

lst = top_ten_vaccinated['location'].unique().tolist()

for i in lst:
  filt = (country_data_df['location'] == i) & (country_data_df['date'] >= dt.datetime(2021,1,1))
  df = country_data_df.loc[filt]
  fig.add_trace(go.Scatter(x=df['date'], y=df['total_vaccinations_per_hundred'],opacity=.75 ,mode='lines+markers',name=i))

fig.update_layout(template='simple_white', hovermode='x', title='Total Vaccinations per 100,000 people')
fig.update_yaxes(type="log")
fig.show()
fig.write_html("vaccine_log.html",full_html=False, include_plotlyjs='cdn')

In [None]:
fig = go.Figure()

lst = top_ten_vaccinated['location'].unique().tolist()

for i in lst:
  filt = (country_data_df['location'] == i) & (country_data_df['date'] >= dt.datetime(2021,1,1))
  df = country_data_df.loc[filt]
  fig.add_trace(go.Scatter(x=df['date'], y=df['new_vaccinations_smoothed_per_million'], opacity=.75 ,mode='lines+markers',name=i))

fig.update_layout(template='simple_white', hovermode='x', title = 'New Vaccinations per Million People (smooth)')
fig.update_yaxes(type="linear")
fig.write_html("new_vaccines.html",full_html=False, include_plotlyjs='cdn')
fig.show()

In [None]:
state_vaccines = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/us_state_vaccinations.csv',parse_dates=['date'])
state_vaccines.head()

Unnamed: 0,date,location,total_vaccinations,total_distributed,people_vaccinated,people_fully_vaccinated_per_hundred,total_vaccinations_per_hundred,people_fully_vaccinated,people_vaccinated_per_hundred,distributed_per_hundred,daily_vaccinations_raw,daily_vaccinations,daily_vaccinations_per_million,share_doses_used
0,2021-01-12,Alabama,78134.0,377025.0,70861.0,0.15,1.59,7270.0,1.44,7.69,,,,0.207
1,2021-01-13,Alabama,84040.0,378975.0,74792.0,0.19,1.71,9245.0,1.52,7.73,5906.0,5906.0,1205.0,0.222
2,2021-01-14,Alabama,92300.0,435350.0,80480.0,,1.88,,1.64,8.88,8260.0,7083.0,1445.0,0.212
3,2021-01-15,Alabama,100567.0,444650.0,86956.0,0.27,2.05,13488.0,1.77,9.07,8267.0,7478.0,1525.0,0.226
4,2021-01-16,Alabama,,,,,,,,,7557.0,7498.0,1529.0,


In [None]:
clean_up_cols = ['daily_vaccinations','total_vaccinations_per_hundred','total_vaccinations_per_hundred','people_vaccinated']
for i in clean_up_cols:
  state_vaccines[i] = state_vaccines.groupby('location')[i].transform(lambda x: x.ffill())

In [None]:
filt = (state_vaccines['date'] == state_vaccines['date'].max()) & (state_vaccines['people_vaccinated'] > 50000) & ~(state_vaccines['location'].isin(['United States']) )
states_sorted_total = state_vaccines.loc[filt].sort_values(by='daily_vaccinations', ascending=False)['location'].tolist()

In [None]:
fig = go.Figure()

lst = states_sorted_total[:10]
df = state_vaccines

for i in lst:
  filt = (df['location'] == i) & (df['date'] >= dt.datetime(2021,1,1))
  temp_df = df.loc[filt]
  fig.add_trace(go.Scatter(x=temp_df['date'], y=temp_df['daily_vaccinations'], opacity=.75 ,mode='lines+markers',name=i))

fig.update_layout(template='simple_white', hovermode='x', title = 'US Daily Doses Administered by US State')
fig.update_yaxes(type="linear")
fig.write_html("state_total_vaccines.html",full_html=False, include_plotlyjs='cdn')
fig.show()

In [None]:

filt = (state_vaccines['date'] == state_vaccines['date'].max()) & (state_vaccines['people_vaccinated'] > 50000) & ~(state_vaccines['location'].isin(['United States','Veterans Health','Indian Health Svc']) )
states_sorted_per_thousand = state_vaccines.loc[filt].sort_values(by='total_vaccinations_per_hundred', ascending=False)['location'].tolist()

fig = go.Figure()

lst = states_sorted_per_thousand[:20]
df = state_vaccines

for i in lst:
  filt = (df['location'] == i) & (df['date'] >= dt.datetime(2021,1,1))
  temp_df = df.loc[filt]
  fig.add_trace(go.Scatter(x=temp_df['date'], y=temp_df['total_vaccinations_per_hundred'], opacity=.75 ,mode='lines+markers',name=i))

fig.update_layout(template='simple_white', hovermode='x', title = 'US Daily Doses Administered by US State (Top 20)')
fig.update_yaxes(type="linear")
fig.write_html("state_total_vaccines_per_hundred.html",full_html=False, include_plotlyjs='cdn')

fig.show()

In [None]:
vaccines_population_df = pd.merge(state_vaccines, state_populations, how='inner', left_on='location', right_on='State')

In [None]:
latest_state_data_df = vaccines_population_df.loc[state_vaccines['date'] == state_vaccines['date'].max()]

In [None]:
latest_state_data_df['Percent of Population Vaccinated'] = latest_state_data_df['people_vaccinated'] / latest_state_data_df['July 2019 Estimate']



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [None]:
fig = go.Figure(go.Bar(
            x=latest_state_data_df['Percent of Population Vaccinated'],
            y=latest_state_data_df['location'],
            orientation='h'))

fig.update_layout(
    template='simple_white', 
    yaxis={'categoryorder':'sum ascending'},
    xaxis={'side':'top', 'tickformat':'%'},
    height = 1000,
    title='Percent of Population Vaccinated by US State')

fig.write_html("state_percent_vaccinated.html",full_html=False, include_plotlyjs='cdn')
fig.show()

In [None]:
df = latest_state_data_df.sort_values(by='Percent of Population Vaccinated')

fig = go.Figure(go.Scatter(
            x=df['Percent of Population Vaccinated'],
            y=df['location'],
             mode='markers+text',
            orientation='h',
            text=df["location"],
            textposition="middle right"))

fig.update_layout(
    template='simple_white', 
    xaxis={'side':'top', 'tickformat':'%'},
    yaxis={'showticklabels':False},
    height = 1000,
    title='Percent of Population Vaccinated by US State')

fig.write_html("state_percent_vaccinated.html",full_html=False, include_plotlyjs='cdn')
fig.show()

In [None]:
state_populations.head

<bound method NDFrame.head of              State  July 2019 Estimate
0       California          39512223.0
1            Texas          28995881.0
2          Florida          21477737.0
3         New York          19453561.0
4         Illinois          12671821.0
5     Pennsylvania          12801989.0
6             Ohio          11689100.0
7          Georgia          10617423.0
8   North Carolina          10488084.0
9         Michigan           9986857.0
10      New Jersey           8882190.0
11        Virginia           8535519.0
12      Washington           7614893.0
13         Arizona           7278717.0
14   Massachusetts           6949503.0
15       Tennessee           6833174.0
16         Indiana           6732219.0
17        Missouri           6137428.0
18        Maryland           6045680.0
19       Wisconsin           5822434.0
20        Colorado           5758736.0
21       Minnesota           5639632.0
22  South Carolina           5148714.0
23         Alabama           49031

# Archive Analysis


In exploring COVID-19 cases by country, I found a data set by humdata.org. I'll import the csv into a DataFrame

In [None]:
import pandas as pd
countries_df = pd.read_csv('https://data.humdata.org/hxlproxy/api/data-preview.csv?url=https%3A%2F%2Fraw.githubusercontent.com%2FCSSEGISandData%2FCOVID-19%2Fmaster%2Fcsse_covid_19_data%2Fcsse_covid_19_time_series%2Ftime_series_covid19_confirmed_global.csv&filename=time_series_covid19_confirmed_global.csv')

In [None]:
countries_df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,1/31/20,2/1/20,2/2/20,2/3/20,2/4/20,2/5/20,2/6/20,2/7/20,2/8/20,2/9/20,2/10/20,2/11/20,2/12/20,2/13/20,2/14/20,2/15/20,2/16/20,2/17/20,2/18/20,2/19/20,2/20/20,2/21/20,2/22/20,2/23/20,...,1/14/21,1/15/21,1/16/21,1/17/21,1/18/21,1/19/21,1/20/21,1/21/21,1/22/21,1/23/21,1/24/21,1/25/21,1/26/21,1/27/21,1/28/21,1/29/21,1/30/21,1/31/21,2/1/21,2/2/21,2/3/21,2/4/21,2/5/21,2/6/21,2/7/21,2/8/21,2/9/21,2/10/21,2/11/21,2/12/21,2/13/21,2/14/21,2/15/21,2/16/21,2/17/21,2/18/21,2/19/21
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,53775,53831,53938,53984,54062,54141,54278,54403,54483,54559,54595,54672,54750,54854,54891,54939,55008,55023,55059,55121,55174,55231,55265,55330,55335,55359,55384,55402,55420,55445,55473,55492,55514,55518,55540,55557,55575
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,65994,66635,67216,67690,67982,68568,69238,69916,70655,71441,72274,72812,73691,74567,75454,76350,77251,78127,78992,79934,80941,81993,83082,84212,85336,86289,87528,88671,89776,90835,91987,93075,93850,94651,95726,96838,97909
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,103127,103381,103611,103833,104092,104341,104606,104852,105124,105369,105596,105854,106097,106359,106610,106887,107122,107339,107578,107841,108116,108381,108629,108629,109088,109313,109559,109782,110049,110303,110513,110711,110894,111069,111247,111418,111600
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,8868,8946,9038,9083,9083,9194,9308,9379,9416,9499,9549,9596,9638,9716,9779,9837,9885,9937,9972,10017,10070,10137,10172,10206,10251,10275,10312,10352,10391,10427,10463,10503,10538,10555,10583,10610,10645
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,18613,18679,18765,18875,18926,19011,19093,19177,19269,19367,19399,19476,19553,19580,19672,19723,19782,19796,19829,19900,19937,19996,20030,20062,20086,20112,20163,20210,20261,20294,20329,20366,20381,20389,20400,20452,20478


In [None]:
#countries_df_removed_provinces = countries_df[countries_df['Province/State'].isna()].drop(['Province/State'], axis=1)
countries_df_grouped = countries_df.groupby('Country/Region').sum().drop(['Lat','Long'], axis=1).reset_index()

In [None]:
%%capture
!pip install pycountry
import re
import pycountry

In [None]:
def country_name_lookup(x):
  clean_x = re.sub(r'[^\w]', ' ', x).strip()
  try: 
    generic_country_name =  pycountry.countries.search_fuzzy(clean_x)[0].name
  except:
    generic_country_name = 0

  return generic_country_name


def country_abbr_lookup(x):
  clean_x = re.sub(r'[^\w]', ' ', x).strip()
  try: 
    generic_country_name =  pycountry.countries.search_fuzzy(clean_x)[0].alpha_3
  except:
    generic_country_name = 0

  return generic_country_name

In [None]:
countries_df_grouped['Country'] = countries_df_grouped['Country/Region'].apply(country_name_lookup)

In [None]:
countries_df_grouped.drop(['Country/Region'],axis=1,inplace=True)
countries_filtered = countries_df_grouped[countries_df_grouped['Country'] != 0]
countries_filtered.set_index('Country',drop=True,inplace=True)

In [None]:
countries_df_new_daily_cases = countries_filtered.diff(axis=1).fillna(0)

In [None]:
country_cases_new_melted = pd.melt(countries_df_new_daily_cases, value_vars=countries_df.iloc[:,4:], var_name='date', value_name='new_cases', ignore_index=False)

In [None]:
country_cases_new_melted.head()

Unnamed: 0_level_0,date,new_cases
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
Afghanistan,1/22/20,0.0
Albania,1/22/20,0.0
Algeria,1/22/20,0.0
Andorra,1/22/20,0.0
Angola,1/22/20,0.0


In [None]:
import bs4 as bs
import urllib.request
import pandas as pd
import re

source = urllib.request.urlopen('https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)').read()
soup = bs.BeautifulSoup(source,'lxml')

table = soup.find_all('table')
population_df = pd.read_html(str(table))[3]

In [None]:
population_df['Country_Clean'] = population_df.apply(lambda x: re.sub("[\(\[].*?[\)\]]", "", x['Country/Territory']).strip(), axis=1)
population_df['Country'] = population_df['Country_Clean'].apply(country_name_lookup)

In [None]:
population_df_group = population_df.groupby(['Country'])['Population(1 July 2019)'].sum().reset_index()

In [None]:
#remove unnecessary columns 
#population_df_filtered = population_df[['Country','Population(1 July 2019)']].set_index(['Country'])
population_df_filtered = population_df_group.set_index(['Country'])

In [None]:
population_df_filtered.to_csv('population_df_filtered.csv')

In [None]:


#Merge the dataframes
country_cases_pop_df = pd.merge(country_cases_new_melted, population_df_filtered, how='inner', left_index=True, right_index=True)

#Normalize cases by population
country_cases_pop_df['Cases per 100,000'] = country_cases_pop_df['new_cases'] / (country_cases_pop_df['Population(1 July 2019)'] / 100000)

#convert the date column
country_cases_pop_df['date'] = pd.to_datetime(country_cases_pop_df['date'])



In [None]:
df = country_cases_pop_df[country_cases_pop_df['date'] == country_cases_pop_df['date'].max()]
#df = country_cases_pop_df[country_cases_pop_df['date'] == dt.datetime(2021,1,22)]

# kpi_avg_country['text'] = kpi_avg_country.index.get_level_values(1) + '<br>' + \
#     'Avg. KPI ' + round(kpi_avg_country['KPI'],3).astype('str') + '<br>' + \
#     '# of Properties ' + round(kpi_avg_country['Kipsu Company ID'],0).astype('str')

fig20 = go.Figure(data=go.Choropleth(
    locations=df.index.map(country_abbr_lookup),
    z=df['Cases per 100,000'].astype(float),
    colorscale = 'Sunsetdark',
    #autocolorscale=True,
    #text=kpi_avg_country['text'], # hover text
    marker_line_color='white', # line markers between states
    colorbar_title="Cases per 100,000 People"
))

fig20.update_layout(
    title_text='Cases per 100,000 People - '+dt.datetime.strftime(country_cases_pop_df['date'].max(),'%b %-d, %Y'),
    geo = dict(
        #scope='usa',
        projection_type='equirectangular'
        )#,width=1500, height=1000
)

fig20.show()

In [None]:
countries_highest_cases_per_capita = df[df['new_cases'] > 999].sort_values(by='Cases per 100,000', ascending=False).index.tolist()

In [None]:
#coutry_list = ['United Kingdom','United States','Spain', 'France', 'South Africa','Ireland']

fig21 = go.Figure()
for i in countries_highest_cases_per_capita[:10]:
  fig21.add_trace(go.Scatter(
      x=country_cases_pop_df.loc[i]['date'], 
      y=country_cases_pop_df.loc[i]['Cases per 100,000'].rolling(window=7).mean(),
      mode='lines',
      name=i))

fig21.update_layout(template='simple_white', hovermode='x')

fig21.show()

In [None]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

lst = countries_highest_cases_per_capita[:10]

fig22 = make_subplots(rows=10, cols=1, shared_yaxes=True, subplot_titles=(lst))

for j, i in enumerate(lst):
  fig22.add_trace(go.Scatter(
      x=country_cases_pop_df.loc[i]['date'], 
      y=country_cases_pop_df.loc[i]['Cases per 100,000'].rolling(window=7).mean(),
      mode='lines',
      marker_color='#000080',
      name=i), 
      row=j+1, col=1
      )
  
fig22.update_layout(height=1500, width=800, template='simple_white',
                  #title_text="Cases per 100,000 People per Country", 
                  showlegend=False)

# fig22.add_annotation(text= 'Sorted by countries with the higest number of cases per capita. Country must have more than 1,000 cases.',
#                   xref="paper", yref="paper",
#                   x=0, y=1.035, showarrow=False)

fig22.update_yaxes(range=[0, 100])
fig22.show()

# Export to HTML file
Finally, we can to export the just the charts as an HTML file. Plotly's interactive charts remain interactive in an HTML export. 

In [None]:

html_start = '''
<!doctype html>
<html lang="en">
  <head>

    <!-- Required meta tags -->
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">

    <!-- Bootstrap CSS -->
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@4.5.3/dist/css/bootstrap.min.css" integrity="sha384-TX8t27EcRE3e/ihU7zmQxVncDAy5uIKz4rEkgIXeMed4M0jlfIDPvg6uqKI2xXr2" crossorigin="anonymous">

    <!-- Font Awesome -->
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">

    <style type="text/css">
      
      h1,h2,h3 {
        padding-top: 50px;
      }

      .center {
        text-align: center;
      }

    </style>

    <title>COVID Dashboard</title>

  </head>

  <div class = "container">

  <body>

      <h1>COVID-19 Dashboard</h1>
      <h4 style="color:grey"> '''+dt.datetime.strftime(dt.datetime.now(),'%b %-d, %Y')+'''</h4>
     
'''

html_end ='''

</div>
</div>

<!-- Optional JavaScript; choose one of the two! -->

    <!-- Option 1: jQuery and Bootstrap Bundle (includes Popper) -->
    <script src="https://code.jquery.com/jquery-3.5.1.slim.min.js" integrity="sha384-DfXdz2htPH0lsSSs5nCTpuj/zy4C+OGpamoFVy38MVBnE+IbbVYUew+OrCXaRkfj" crossorigin="anonymous"></script>
    <script src="https://cdn.jsdelivr.net/npm/bootstrap@4.5.3/dist/js/bootstrap.bundle.min.js" integrity="sha384-ho+j7jyWK8fNQe+A12Hb8AhRq26LrZ/JpcUGGOn+Y7RsweNrtN/tE3MoK7ZeZDyx" crossorigin="anonymous"></script>

    <!-- Option 2: jQuery, Popper.js, and Bootstrap JS
    <script src="https://code.jquery.com/jquery-3.5.1.slim.min.js" integrity="sha384-DfXdz2htPH0lsSSs5nCTpuj/zy4C+OGpamoFVy38MVBnE+IbbVYUew+OrCXaRkfj" crossorigin="anonymous"></script>
    <script src="https://cdn.jsdelivr.net/npm/popper.js@1.16.1/dist/umd/popper.min.js" integrity="sha384-9/reFTGAW83EW2RDu2S0VKaIzap3H66lZH81PoYlFhbGU+6BZp6G7niu735Sk7lN" crossorigin="anonymous"></script>
    <script src="https://cdn.jsdelivr.net/npm/bootstrap@4.5.3/dist/js/bootstrap.min.js" integrity="sha384-w1Q4orYjBQndcko6MimVbzY0tgp4pWB4lZ7lr30WKz0vr/aWKhXdBNmNb5D92v7s" crossorigin="anonymous"></script>
    -->

  </body>
</html>

'''


In [None]:
#os.remove('covid_dashboard.html')
with open('covid_report.html', 'a') as f:
    f.write(html_start)
    f.write('<h2>US National Data</h2>')
    f.write(fig0.to_html(full_html=False, include_plotlyjs='cdn'))
    f.write(fig1.to_html(full_html=False, include_plotlyjs='cdn'))
    f.write(fig2.to_html(full_html=False, include_plotlyjs='cdn'))
    f.write(fig3.to_html(full_html=False, include_plotlyjs='cdn'))
    f.write(fig10.to_html(full_html=False, include_plotlyjs='cdn'))
    f.write(fig4.to_html(full_html=False, include_plotlyjs='cdn'))
    f.write('<h2>World Data</h2>')
    f.write(fig20.to_html(full_html=False, include_plotlyjs='cdn'))
    f.write('<h3>Cases per 100,000 People per Country</h3>')
    f.write('<p>Sorted by countries with the higest number of cases per capita. Country must have more than 1,000 new daily cases. Charts show a seven day moving average.</p>')
    f.write(fig22.to_html(full_html=False, include_plotlyjs='cdn'))
    f.write(html_end)