# Predicted Analysis on Global COVID-19 Dataset

**Matt Dannheisser** **Henry Luong**

COVID-19 has been impacted most of our lives at one point or another within this past year. The world has been faced with a pandemic and companies are beginning to rapidly mass produce vaccines in order to combat this virus. Our goal for this project is to predict how the future spread of COVID-19 will be affected by a vaccine given that individuals in certain countries have reported varying willingness to take a vaccine. Our model will provide predictions by country using daily contagion data, IPSOS research reporting a country’s likeliness of taking a vaccine, population data, and reported vaccine efficacy reported by the pharmaceutical companies. 

# Plotly Library

We decided to use Plotly library due to its' features for exploratory data analysis and compatibility with their dashboard. The plotly Python library is an interactive, open-source plotting library that supports over various amounts of beautiful interactive web-based visualizations and in depth guides for each chart style covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases. With a simple line of code, you can create a stunning visual and annotate charts at ease. 

**NOTE** 
As we start plotting graphs, please under that the speed for interactive charts can take a while to load. Depending on your computer, certain charts require more power and could even make the browser unresponsive. This might be a disadvantage to using this library due to its' very slow render and freezing of the entire jupyter notebook at times.

https://plotly.com/

___

# JupyterDash Library

Although Plotly offers great charts and graphs, their main feature happens to be their dashboard. Dash is their Python frame for building web analytic applications. Written on top of Plotly, users are able to bind a user interface around their code to create a efficient platform to display their information. This will allow users to visualize multiple graphs within the dashboard and tweak how each attribute is correlated with each other. In this project, we will perform a time-series analysis on the different aspects of the dataset that can help us create a model to predict the future spread.

https://medium.com/plotly/introducing-jupyterdash-811f1f57c02e

# Plotly may be installed using pip...

$ pip install plotly==4.12.0

#### or conda.

$ conda install -c plotly plotly=4.12.0

# Install the jupyter-dash package using pip...

$ pip install jupyter-dash

#### or conda:

$ conda install -c conda-forge -c plotly jupyter-dash

# For use in JupyterLab, install the jupyterlab and ipywidgets packages using pip...

$ pip install jupyterlab "ipywidgets>=7.5"

#### or conda.

$ conda install jupyterlab "ipywidgets=7.5"

# Run the following commands to install the required JupyterLab extensions(note that this will require node to be installed):
#### JupyterLab renderer support
jupyter labextension install jupyterlab-plotly@4.12.0

#### OPTIONAL: Jupyter widgets extension
jupyter labextension install @jupyter-widgets/jupyterlab-manager plotlywidget@4.12.0

#### Run either one of the following to show Plotly graphs in your terminal to download the node package.

conda install -c conda-forge nodejs

conda install -c conda-forge/label/gcc7 nodejs

conda install -c conda-forge/label/cf201901 nodejs

conda install -c conda-forge/label/cf202003 nodejs

**NOTE**
If after installing one of the node packages that the graph still does not show, run all of the packages mentioned above.



## Data Sources

||COVID-19 Spread|Global Attitude on Vaccine|Country Population|
|------|------|------|------|
|**Description**|Daily new case of COVID by country. This will confirm total and daily cases all provided by CSSE at John Hopkins University.|Sampled likelihood of citizens to take COVID-19 vaccine based on survey research by the World Economic Forum.|The total population for each country based on the latest United Nations Population Division Estimates|
|**Size**|10,655 Kb    |680 Kb     |Small table from web page     |
|**Location**|[Our World in Data](https://ourworldindata.org/covid-cases)|PDF provided in folder|[Population by Country (2020) - Worldometer](https://www.worldometers.info/world-population/population-by-country/)|
|**Format**|CSV|PDF|HTML|
|**Acess Method**|Direct Access|Manually entered into pd.DataFrame|API|
|**Variables Used**| country, date, total cases, new daily cases| countries, varying levels of willingness to take vaccine| country, population|

- This analysis utilizes data pulled on 12/4/2020 with datapoint dates ranging from  12/31/2019 - 11/29/2020. All data points past this range represent projected figures. 
- Only countries that are included in the Global Attitude on Vaccine study were included.

In [19]:
#!pip install lxml html5lib beautifulsoup4 PyPDF2 plotly pandas==1.1.0

In [20]:
import pandas as pd
import requests
import PyPDF2 as pdf
import numpy as np
import math
import plotly.express as plt
import plotly.graph_objects as go
import plotly.figure_factory as ff
from ipywidgets import widgets
from ipywidgets import interact, interact_manual
from datetime import date
import datetime as dt
import collections
import dash
import pandas as pd
from dash.dependencies import Output, Input
from dash.exceptions import PreventUpdate
import dash_html_components as html
import dash_core_components as dcc
import dash_table


In [21]:
pd.options.display.min_rows= 400
pd.options.display.float_format = '{:,.1f}'.format

## Data Collecting and Parsing

##### Country Population

In [22]:
# API get request for COVID data
link= requests.get('https://www.worldometers.info/world-population/population-by-country')
population_df= pd.read_html(link.text, header= 0, index_col= 0)[0]
population_df.rename(columns={'Country (or dependency)': 'location', 'Population (2020)': 'population'}, inplace= True)

##### Global Attitude on Vaccine

The varying willingness levels for a country's citizens to take the vaccine will be binned into probabilities and used as weights with the population data to determine a country's demand for the vaccine.

- PDF was not in a friendly to read format for any of the pdf reading libraries, so data was manually entered.

In [23]:
#PDF was not in a friendly to read format for any of the pdf reading libraries, so data was manually entered.
# This will allow us to create a probability of the supply and demand of vaccine 
# so that we can create a predicted model for the daily infected cases.

# Builds df for countries' attitudes to COVID Vaccine
attitudes = pd.DataFrame({'Country': ['Global Average', 'China', 'Brazil', 'Australia', 'India',
                              'Malaysia', 'United Kingdom', 'South Korea', 'Saudi Arabia', 
                              'Peru', 'Canada', 'Argentina', 'Mexico', 'Japan', 'Spain', 
                              'Netherlands', 'Turkey', 'Belgium', 'Chile', 'Sweden', 
                              'United States', 'Germany', 'Italy', 'South Africa',
                              'France', 'Hungary', 'Poland', 'Russia'], 
        'Total Agree': [74, 97, 88, 88, 87, 85, 85, 84, 84, 79, 76, 76, 75, 75, 72, 
                              71, 70, 70, 70, 67, 67, 67, 67, 64, 59, 56, 56, 54],
        'Strongly Agree': [37, 38, 64, 59, 44, 35, 52, 27, 39, 48, 48, 47, 38, 24, 38,
                              38, 42, 34, 40, 34, 35, 36, 37, 29, 22, 19, 18, 19],
        'Somewhat Agree': [37, 59, 25, 28, 44, 51, 33, 58, 45, 31, 29, 29, 37, 51, 34,
                              33, 28, 36, 30, 33, 32, 31, 29, 35, 37, 37, 37, 34],
        'Somewhat Disagree': [15, 2, 8, 8, 9, 11, 9, 15, 12, 11, 13, 14, 13, 20, 17, 16,
                              14, 17, 14, 20, 17, 20, 17, 19, 21, 17, 27, 22],
        'Strongly Disagree': [12, 1, 4, 5, 4, 4, 7, 1, 4, 10, 11, 10, 12, 5, 11, 13, 16,
                              13, 16, 13, 16, 13, 17, 18, 20, 28, 18, 24],
        'Total Disagree': [26, 3, 12, 12, 13, 15, 15, 16, 16, 21, 24, 25, 25, 25, 28, 
                              29, 30, 30, 30, 33, 33, 33, 33, 36, 41, 44, 45, 47]})

attitudes['Totals']=attitudes[['Strongly Agree', 'Somewhat Agree',
       'Somewhat Disagree', 'Strongly Disagree']].apply(sum, axis=1)

##### COVID-19 Spread

Comparing the global attitude on vaccines with the global COVID-19 dataset, we will read in the csv and obtain only the 
countries that contain data from both sources.

In [24]:
# Reads in population and COVID data for countries in the focus of this study
countries= ['China', 'Brazil', 'Australia', 'India',
            'Malaysia', 'United Kingdom', 'South Korea', 'Saudi Arabia', 
            'Peru', 'Canada', 'Argentina', 'Mexico', 'Japan', 'Spain', 
            'Netherlands', 'Turkey', 'Belgium', 'Chile', 'Sweden', 
            'United States', 'Germany', 'Italy', 'South Africa',
            'France', 'Hungary', 'Poland', 'Russia']

covid_df= pd.read_csv('Corona Virus Cases.csv')
covid_df= covid_df[covid_df['location'].isin(countries)]
covid_df= covid_df[['location','date', 'total_cases','new_cases','reproduction_rate','stringency_index']]

# Merges population data with COVID data remoiving unnused countries
population_df= population_df[population_df['location'].isin(countries)]
covid_df= covid_df.merge(population_df[['location','population']],on= 'location')
covid_df['date']= covid_df['date'].astype('datetime64')

## Data Exploration

In [25]:
print(covid_df.dtypes)
covid_df.describe()

location                     object
date                 datetime64[ns]
total_cases                 float64
new_cases                   float64
reproduction_rate           float64
stringency_index            float64
population                    int64
dtype: object


Unnamed: 0,total_cases,new_cases,reproduction_rate,stringency_index,population
count,7955.0,8498.0,6770.0,8337.0,8540.0
mean,516748.5,5665.1,1.2,56.5,177973699.9
std,1380615.0,14873.7,0.5,26.2,366239578.0
min,1.0,-766.0,0.2,0.0,9660351.0
25%,10771.5,40.0,0.9,44.9,32365999.0
50%,89527.0,633.0,1.1,63.4,51269185.0
75%,338607.5,3785.2,1.3,75.9,126476461.0
max,13246651.0,207913.0,6.7,100.0,1439323776.0


In [26]:
# These negatives in new_cases appear to be reporting issues
negatives= covid_df[covid_df['new_cases']<0]
negatives

Unnamed: 0,location,date,total_cases,new_cases,reproduction_rate,stringency_index,population
2658,Spain,2020-04-19,193252.0,-713.0,0.9,85.2,46754778
2694,Spain,2020-05-25,235400.0,-372.0,0.8,75.5,46754778
3037,France,2020-06-03,151325.0,-766.0,1.1,72.2,65273511
4330,Italy,2020-06-20,238011.0,-148.0,0.9,55.6,60461826


In [27]:
#Here we can see that each country has been steadily increasing in the total number of infection cases each day as line plot. Some 
#have increased quicker than others with India quickly overtaking Brazil, September 5th.

plt.line(covid_df, x='date', y= 'total_cases',
         color= 'location', title= 'Total New Cases')

In [28]:
covid_df['cases_norm_pop']= covid_df['total_cases']/covid_df['population']
plt.line(covid_df, x='date', y= 'cases_norm_pop',
         color= 'location', title= 'Cases Normalized by Population')


#we aren't able to have a closer view of the total cases for the smaller countries so what we can do is
#transform the data into its log form so that we can verify each individual country

#uncomment below to view transformed
# covid_df['log_total_cases'] = np.log10(covid_df['total_cases'])

# fig = plt.line(covid_df, x="date", y="log_total_cases", color="location",
#               line_group="location", hover_name="location")
# fig.show()

In [29]:
#To determine the infection rate, we would simply divide the number of infections by the number of those at risk per country.
#It is seen here that certain countries such as Chile has a high infection rate around June but quickly tapered off while
#Belgium saw a drastic increase in infections in October. 

covid_df['cases_norm_pop']= covid_df['total_cases']/covid_df['population']
plt.line(covid_df, x='date', y= 'cases_norm_pop',
         color= 'location')

In [30]:
plt.histogram(covid_df[covid_df['date']>'2020-07-01'],x= 'new_cases', marginal='rug',
             title= 'Histogram with Rug plot of New Cases')

In [31]:
#We can see an average of new_cases per country. It is interesting that the average for new cases in China is signicantly
#lower in than most countries considering that it originated there.
group_country = covid_df.groupby(['location']).mean().reset_index()
group_country = group_country.sort_values('new_cases', ascending=False)

plt.bar(group_country, x='location', y='new_cases',
        title='Bar Plot of Average of New Cases Per Country')



In [32]:
countries= pd.unique(covid_df.sort_values('total_cases',ascending=False)['location'])
plt.scatter(covid_df[covid_df['location'].isin(countries[:4])],
            x= 'date',y= 'new_cases', color= 'location')

In [33]:
plt.scatter(covid_df[covid_df['location'].isin(countries[4:10])],
            x= 'date',y= 'new_cases', color= 'location')

In [34]:
#We can see here that there is a lonely outlier for Chile that occured on June 18th, 2020 with the number of new cases
#increasing to about 36,000. 
plt.scatter(covid_df[covid_df['location'].isin(countries[10:15])],
            x= 'date',y= 'new_cases', color= 'location')

In [35]:
#Also noticed here is the a lone outlier in China that occured on February 13th, 2020 that resulted in about 15,000 new cases.

plt.scatter(covid_df[covid_df['location'].isin(countries[15:])],
            x= 'date',y= 'new_cases', color= 'location')

## Data Manipulation

## Demand for Vacine

With an in-depth time-series analysis on how each attribute of the global COVID-19 dataset correlated per country, we are now able to create a model that can predict each countries' future spread based on their features. By merging the global attitude survey with the COVID-19 dataset, we can then assign a probability of each countries' vaccine demand.

In [36]:
#assigning a probability of getting the vaccine to each of the bins
probability= {'Strongly Agree':.95, 'Somewhat Agree':.60, 'Somewhat Disagree':.20, 'Strongly Disagree':.01}

def assign_prob(covid_df, attitudes, probability):
    """
    Merges the population and attitudes data, assigns the probability that an indivudual within that attitude's bin 
    will get the COVID vaccine if it is available, and sums the total vaccine demand for that country.
    
    Returns: A dataframe of countries and the amount of citizens within each that are willing to get a COVID vaccine
    """
    pop_df= covid_df[['location','population']].drop_duplicates()
    pop_df= pop_df.merge(attitudes,how='inner',left_on='location', right_on= 'Country', copy= False)
    for col in probability:
        pop_df[col]= pop_df['population']*(pop_df[col]/100)*probability[col]
    pop_df['Vaccine Demand']= pop_df[['Strongly Agree','Somewhat Agree', 'Somewhat Disagree', 'Strongly Disagree']].apply(sum,axis=1)
    return pop_df[['location','population','Vaccine Demand']].round()
demand_df= assign_prob(covid_df, attitudes, probability)
demand_df

Unnamed: 0,location,population,Vaccine Demand
0,Argentina,45195774,29354655.0
1,Australia,25499884,18997414.0
2,Belgium,11589623,6655920.0
3,Brazil,212559417,164606013.0
4,Canada,37742154,24800369.0
5,Chile,19116201,11270912.0
6,China,1439323776,1035017727.0
7,Germany,83783942,47698198.0
8,Spain,46754778,28057542.0
9,France,65273511,31004918.0


## Supply of Vaccine

**Way forward**

Let's begin with the WHO's Fair Allocation Framework that estimates that promises equal distrobution of COVID vaccines
based upon respective population. The FAF breaks the distrobution on vaccines out into stages. Stage 1 guarantees 20% of
of a country's population will have access to the vaccine. Stage 2 promises delivery to the wider population. We can use
this as the begining of our framework for determining supply. Next we can combine this with our own best estimates for
global supply. Ussing data like Pfizer's promise to make 1.3 billion vaccines in 2021

In [37]:
from datetime import timedelta

#variables to be made interactive later
stage_1_start_date= '2020-12-01'
stage_2_start_date= '2021-04-01'
boost_to_stage_2_production= .3

def assign_daily_supply(population_df, date_range, stage,daily_supply_df1=None):
    """
    Function to be used inside of calc_supply to assign daily supplies the respective countries within each stage
    """
    daily_supply_df= pd.DataFrame(columns= population_df.index)
    daily_supply_df['Date']= date_range
    for c in daily_supply_df.columns[0:-1]:
        if stage == 1:
            daily_supply_df[c]= (daily_supply_df['Date'].apply(
                lambda x: x-min(date_range)) / np.timedelta64(1,'D')).multiply(population_df.loc[c,'daily supply stage 1'])
        if stage== 2:
            daily_supply_df[c]= (daily_supply_df['Date'].apply(
                lambda x: x-min(date_range)) / np.timedelta64(1,'D')).multiply(population_df.loc[c,'daily supply stage 2'])
            daily_supply_df[c]= daily_supply_df[c] + daily_supply_df1[c]

    return daily_supply_df

def calc_supply(population_df, stage_1, stage_2, prod_boost_2):
    """
    Calculates the supply of vaccine to each country based upon WHO FAF guidance and adjusted by input variables.
   
    Input Variables: 
    population_df= country population data
    stage_1= The start date stage 1 of the FAF
    stage_2= The start date of stage 2 of the FAF
    prod_boost_2= The boost to the production rate of vaccines in stage 2
    
    Returns: A data frame representing the supply of vaccines to each country.
    """
    population_df= population_df[['location','population']].drop_duplicates()
    stage_1_range= pd.date_range(stage_1, stage_2-timedelta(days=1)).date
    stage_2_range= pd.date_range(stage_2, periods= 365).date
    # We assume equal and stable distrobution among countries until 20% is reached.
    # We assume surge to supply chains as stage 2 is reached
    days= len(stage_1_range)
    population_df['daily supply stage 1']= (population_df['population'] * .2) / days
    population_df['daily supply stage 2']= (population_df['daily supply stage 1']*(1+prod_boost_2))
    population_df= population_df.set_index('location')
    #print(population_df.round())

    daily_supply_df1= assign_daily_supply(population_df, stage_1_range, 1)
    daily_supply_df2= assign_daily_supply(population_df, stage_2_range, 2, daily_supply_df1.iloc[-1])
    daily_supply_df= daily_supply_df1.append(daily_supply_df2)

    return daily_supply_df

#unpivot supply df and merge with demand df
daily_supply_df= calc_supply(covid_df, date.fromisoformat(stage_1_start_date), date.fromisoformat(stage_2_start_date), boost_to_stage_2_production)
daily_supply_df= pd.melt(daily_supply_df, id_vars= 'Date', value_name='Supply') 
supply_demand_df= daily_supply_df.merge(demand_df, on='location')
supply_demand_df= supply_demand_df[['Date', 'location', 'population', 'Supply', 'Vaccine Demand']]
supply_demand_df['Date']=supply_demand_df['Date'].astype('datetime64')
supply_demand_df= supply_demand_df.set_index('Date')
supply_demand_df.head()
    
   

Unnamed: 0_level_0,location,population,Supply,Vaccine Demand
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-12-01,Argentina,45195774,0.0,29354655.0
2020-12-02,Argentina,45195774,74703.8,29354655.0
2020-12-03,Argentina,45195774,149407.5,29354655.0
2020-12-04,Argentina,45195774,224111.3,29354655.0
2020-12-05,Argentina,45195774,298815.0,29354655.0


In [38]:
plt.scatter(supply_demand_df, y= 'Supply', color= 'location')

In [39]:
covid_df['date']= covid_df['date'].astype('datetime64')

In [None]:
%%time
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
import warnings
warnings.simplefilter("ignore")

def arima_returns(df,p=1,d=1,q=1,num_of_forecasts=50):
    '''
    Take the covid_df and ARIMA function parameters and compute ARIMA forecast for each country
    Returns: pandas dataframe of forecasted values.
    '''

    df_new_cases= df.pivot(index= 'date',columns= 'location', values= 'new_cases').iloc[0:-10]# dropping last days due to na values
    df_exog= df.pivot(index= 'date', columns= 'location', values= 'stringency_index').fillna(0)
    forecast_df= pd.DataFrame()
    for col in df_new_cases.columns:
        new_cases_series= df_new_cases[col]
        exog_series= df_exog[col] # will use if forecasting for existing data
        country_forecast= SARIMAX(endog= new_cases_series,trend='t', order= (p,d,q))
        country_forecast= country_forecast.fit().forecast(num_of_forecasts)
        country_forecast.name= col
        forecast_df= forecast_df.append(country_forecast)
    forecast_df=forecast_df.transpose().melt(ignore_index=False, var_name= 'location', value_name= 'new_cases')
    forecast_df['new_cases']= forecast_df['new_cases'].round(0)
    
    return forecast_df
projected_daily_cases_df= arima_returns(covid_df,30,0,5,438)


In [None]:
plt.scatter(covid_df[covid_df['location'].isin(countries[:4])],
            x= 'date',y= 'new_cases', color= 'location')

In [None]:
plt.scatter(projected_daily_cases_df[projected_daily_cases_df['location'].isin(countries[:4])],
            y= 'new_cases', color= 'location')

In [None]:
#creates values to assign affect on daily cases for projections
national_immunity= pd.DataFrame({'Immunity': [.0,.1,.2,.3,.4,.5,.6,.7,.8,.9], 'Affect on Daily Cases': [1,1,.9,.8, .6, .4, .2, .05, .01,.001]})
national_immunity

In [None]:
supply_demand_df.head()

In [None]:
import datetime
def get_total_cases(covid_df, projected_daily_cases_df, supply_demand_df):
    
    day_totals= covid_df.pivot(index= 'date', columns= 'location', values= 'total_cases').iloc[0:-10] #drop last due to nan values
    working_projections= projected_daily_cases_df.pivot(columns= 'location', values= 'new_cases')    
    
    for index, col in working_projections.iterrows():
        day_totals.loc[index]= day_totals.loc[index-datetime.timedelta(days=1)] + col
        
    day_totals= day_totals.melt(var_name= 'location', value_name='total_cases', ignore_index= False).reset_index()
    day_totals= day_totals.merge(projected_daily_cases_df.reset_index(), 
                                 left_on= ['location','date'], right_on= ['location','index'])[['date','location','total_cases','new_cases']]
    
    return day_totals
total_cases_pre_vaccine= get_total_cases(covid_df, projected_daily_cases_df, supply_demand_df)
total_cases_pre_vaccine.head()

In [None]:
%%time
from datetime import timedelta
def get_total_cases(covid_df, projected_daily_cases_df, supply_demand_df):
    
    day_totals= covid_df.pivot(index= 'date', columns= 'location', values= 'total_cases').iloc[0:-10] #drop last due to nan values
    working_projections= projected_daily_cases_df.pivot(columns= 'location', values= 'new_cases')    
    supply= supply_demand_df.pivot(columns= 'location', values= 'Supply')
    demand= supply_demand_df.pivot(columns= 'location', values= 'Vaccine Demand')
    population= supply_demand_df.drop_duplicates(['location','population'])[['location','population']]
    population= population.set_index('location')['population']
    
    for i in working_projections.index:
        if i < date.fromisoformat(stage_1_start_date):
            day_totals.loc[i]= day_totals.loc[i-timedelta(days=1)] + working_projections.loc[i]
        else:
            vacinated= pd.concat([supply.loc[i], demand.loc[i]],axis=1).min(axis=1)
            previous_day= day_totals.loc[i-timedelta(days=1)]
            immunity_over_population= round((previous_day + vacinated ) / population,1)
            immunity_factor= []
            for item in immunity_over_population:
                val= national_immunity[national_immunity['Immunity']== item]['Affect on Daily Cases'].values[0]
                immunity_factor.append(val)
            
            working_projections.loc[i]= round(working_projections.loc[i] * immunity_factor,0)
            day_totals.loc[i]= day_totals.loc[i-timedelta(days=1)] + working_projections.loc[i]
        
    day_totals= day_totals.melt(var_name= 'location', value_name='total_cases', ignore_index= False).reset_index()
#    day_totals= day_totals.merge(projected_daily_cases_df.reset_index(), 
#                                 left_on= ['location','date'], right_on= ['location','index'])[['date','location','total_cases','new_cases']]
    day_totals['new_cases']= day_totals[['location','total_cases']].groupby('location').diff()

    return day_totals
total_cases_vaccine= get_total_cases(covid_df, projected_daily_cases_df, supply_demand_df)
total_cases_vaccine

In [None]:
plt.line(total_cases_vaccine,
            y= 'total_cases', x= 'date', color= 'location')

In [None]:
covid_df[covid_df['location']=='United States'].tail()

## Dashboard Implement

#### 1. Dash components
The dash component will compose of the "layout" of the app and describes what the application will look like. We will maintain the component by importing the library as dash_core_components and dash_html_components. We will essentially create the frame of the dashboard so that we can view the data exploration as well as the projected number of total cases and new cases for COVID-19 in each country. This is essentially an interactive widget that can allow us to switch between different countries of the viewers choosing and output them into graphs and allow us to view the changes as the days go by from January 2020-Current. We will be using the dropdown documentation that is listed down below:

http://dash.plotly.com/dash-core-components/dropdown

The documentation provides step by step on how to implement the dashboard to allow for a cleaner look.

#### 2. App Callback
After creating the dash component the app callback describes the interactivity of the application through python functions. The function automatically call the Dash whenever the input component's property changes. This is an important step in order to create a functionable dashboard. You can find the documentation for this down below:

https://dash.plotly.com/basic-callbacks

#### 3. How to Utilize the Dashboard

Once the dashboard has been created, users are able to interactively control which countries to view via the multi-dropdown component. With the countries in selection, users are able to view the many features that the graph has to offer (total_cases, new_cases, stringency_index, and reproduction_rate. The top plot views the current cases beginning January 2020-current. The bottom plot allows viewers to view the forecast or future projections of total and new cases per country. 


In [None]:
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css'] #to create layouts

app = dash.Dash(__name__, external_stylesheets=external_stylesheets) #starting the app


colors = {
    'background': '#111111',
    'text': '#7FDBFF'
}

countrieslist = set(total_cases_vaccine['location']) #unique list of countries

#dashboard component within the app layout
app.layout = html.Div(style={'backgroundColor': 'rgb(50, 50, 50)'}, children=[ 
    #header
    html.H1("Data Exploration on COVID-19", style={'text-align': 'center', 'color': 'white'}), 
    #store data in the browser
    dcc.Store(id='memory-output'),
    #dropdown components
    dcc.Dropdown(id='memory-countries', options=[
        {'value': x, 'label': x} for x in countrieslist
        ], multi=True, value=['World', 'United States','France','India','Brazil']),
    dcc.Dropdown(id='memory-field', options=[
        {'value': 'total_cases', 'label': 'total_cases'},
        {'value': 'new_cases', 'label': 'new_cases'},
        {'value': 'reproduction_rate', 'label': 'reproduction_rate'},
        {'value': 'stringency_index', 'label': 'stringency_index'},
    ], value='new_cases'),
    #create graph name
    html.Div([
        dcc.Graph(id='memory-graph')
    ]),
    html.Div(style={'backgroundColor': 'rgb(50, 50, 50)'}, children=[
    
    html.H1("Projected Cases on COVID-19", style={'text-align': 'center', 'color': 'white'}), 
    dcc.Store(id='memory-output2'),
    dcc.Dropdown(id='memory-countries2', options=[
        {'value': x, 'label': x} for x in countrieslist
        ], multi=True, value=['United States','France','India','Brazil']),
    dcc.Dropdown(id='memory-field2', options=[
        {'value': 'total_cases', 'label': 'total_cases'},
        {'value': 'new_cases', 'label': 'new_cases'},
    ], value='new_cases'),
    html.Div([
        dcc.Graph(id='memory-graph2')#,
    ]),
    ])
])

#app callback for multidrop of countries that will output to the data
@app.callback(Output('memory-output', 'data'),
              Input('memory-countries', 'value'))
def filter_countries(countries_selected):
    if not countries_selected:
        # Return all the rows on initial load/no country selected.
        return covid_df.to_dict('records')

    filtered = covid_df.query('location in @countries_selected')

    return filtered.to_dict('records')


#app callback for single dropdown of features
@app.callback(Output('memory-graph', 'figure'),
              Input('memory-output', 'data'),
              Input('memory-field', 'value'))
def on_data_set_graph(data, field):
    if data is None:
        raise PreventUpdate

    aggregation = collections.defaultdict(
        lambda: collections.defaultdict(list)
    )

    for row in data:

        a = aggregation[row['location']]

        a['name'] = row['location']
        a['mode'] = 'markers'

        a['y'].append(row[field])
        a['x'].append(row['date'])

    return {
        'data': [x for x in aggregation.values()]
    }

#app callback for the second graph multi-dropdown of countries.
@app.callback(Output('memory-output2', 'data'),
              Input('memory-countries2', 'value'))
def filter_countries2(countries_selected):
    if not countries_selected:
        # Return all the rows on initial load/no country selected.
        return total_cases_vaccine.to_dict('records')

    filtered = total_cases_vaccine.query('location in @countries_selected')

    return filtered.to_dict('records')

#app callback for the second graph single dropdown of total and new_cases projections.
@app.callback(Output('memory-graph2', 'figure'),
              Input('memory-output2', 'data'),
              Input('memory-field2', 'value'))
def on_data_set_graph2(data, field):
    if data is None:
        raise PreventUpdate

    aggregation = collections.defaultdict(
        lambda: collections.defaultdict(list)
    )

    for row in data:

        a = aggregation[row['location']]

        a['name'] = row['location']
        a['mode'] = 'markers'

        a['y'].append(row[field])
        a['x'].append(row['date'])

    return {
        'data': [x for x in aggregation.values()]
    }


#we will then run this line of code once we have created our app call back which will run the graph into the jupyter notebook
if __name__ == '__main__':
    app.run_server(debug=False)