## Initialization

Import packages and funcitons

In [2]:
from __future__ import print_function
%precision %.2f
%matplotlib inline
# Interactive widget for plot
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

# Read in functions.py
import functions
import graphic

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re

# Plotly
import plotly.express as px
import plotly.graph_objects as go
plt.rcParams["figure.figsize"] = (18, 10)  # plot size

import datetime

# # interactive dataframe
import itables.interactive
from itables import show

# dplyr-style for python
from dppd import dppd
dp, X = dppd()

# display all results from cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import warnings
warnings.filterwarnings('ignore')

<IPython.core.display.Javascript object>

## Acknowledgements

This is a spin-off version of "corona-calculator" from **Element AI** (https://corona-calculator.herokuapp.com/).

In this version, I have revised and made some modifications to fit my own analysis for the situation in **Germany**. Please feel free to port the codes and make changes to fit your requirements (*for example*: to analyze other countries situation). 

I also use Robert Koch Institute's estimation of "no social-distancing scenario" for Germany as a benchmark for social distancing simulation. The report states that the number of active cases will rise to ~ 10 millions in 3 months if no intervention is made as of 19 Mar 2020. (https://www.iamexpat.de/expat-info/german-expat-news/rki-coronavirus-could-infect-10-million-people-germany)

**1. Data inputs**
 - Include "Incubation period" as a factor to capture the real "going to be reported" cases after lock-down initiated.
 
 - Transmission rate & basic reproduction rate (R0) parameters are recalculated based on means of all related researches on MIDAS network data.
 
 - Recovery rate is obtained as "time from symptom onset to recovery" from Singapore and China research journals (using lognormal parametric survival methods & ratio of cumulative number of recovered/deaths and that of infected at time t)
 
 - Number of Confirmed / Recovered / Death timeseries are gather directly from Johns Hopkins University repository instead of merging daily reports.
 
 - Age cohort mortality rates for **Germany** specifically (obtain age distribution in the population at https://www.populationpyramid.net/germany/2019/)
 
**2. Methodology**
 - SEIRS model with **vital dynamic** instead of SIR (as of 25.03, seems like the authors are also applying SEIR model).
     - A stands for "Asymptomatic" - individuals that are infected but show no symptoms which become carriers, finally moving directly to R state (recovered)
<!--      - Q stands for "Quarantined" - individuals that are either susceptible or exposed or infected to the diseases but move to quarantined (isolation) -->
     - The 2nd S stands for Susceptible again since there are reports recovered patients might not have enough time to develop strong immune towards the disease and passive immune has not been proven to work on all recovered cases
     - Vital dynamic means incorporate birth and death rate of the population sample into the formula

## Literature Reviews


1. **SIR model:**
![SIR Model](https://upload.wikimedia.org/wikipedia/commons/8/8a/SIR.PNG)
![SIR func](http://idmod.org/docs/general/_images/math/7edd99664ee58dde174cfe47bf51ade942786541.png)

    Where N (population) = S + I + R.
    
    The model asummes:

    - The population size is fixed (i.e., no births, deaths due to disease, or deaths by natural causes)
    - Incubation period of the infectious agent is instantaneous
    - Duration of infectivity is same as length of the disease
    - Completely homogeneous population with no age, spatial, or social structure
The crucial factor governing disease spread is R0 (the basic reproduction rate), which is the **average number of people somebody with the disease infects.** This is a function of the number of susceptible people, the infection rate **β** and the recovery rate **γ**.

        **β = Probability of transmission x Number of contacts**

        **R0 = Probability of transmission x Number of Contacts per day x Number of infectious days**


2. **SEIR model:**
![SEIR Model](https://upload.wikimedia.org/wikipedia/commons/3/3d/SEIR.PNG)
![SEIR func](http://idmod.org/docs/general/_images/math/5c34ba7654b6b1031ac83c60ea98007456d22ee3.png)

With vital dynamics (birth + death rate)

![SEIR func vital](http://idmod.org/docs/general/_images/math/7a0619d75a08582ad67f21d3a0ffb938b8576920.png)

    Where N = S + E + I + R
    
3. **SEIRS model:**
![SEIRS func](http://idmod.org/docs/general/_images/math/731b07cdd61b5a6bf4093e3cb6c18da4c9be8c97.png)

![SEIRS func_vital](http://idmod.org/docs/general/_images/math/b5eb231f8a3d8adb148e9f0ffbdec2ad4295314e.png)

## Gathering Inputs
### Live data

- Confirmed / Deaths/ Recovered timeseries data directly from Johns Hopkins University repository.
- Estimated parameters from MIDAS-network https://github.com/midas-network/COVID-19


In [4]:
# Get total timeseries from JHU
global_confirmed = pd.read_csv('https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
global_death = pd.read_csv('https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
global_recovered = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Recovered.csv')

global_confirmed = global_confirmed.rename(columns={'Country/Region':'Country'})
global_recovered = global_recovered.rename(columns={'Country/Region':'Country'})
global_death = global_death.rename(columns={'Country/Region':'Country'})


# estimated parameters 
est_para = pd.read_csv('https://raw.githubusercontent.com/midas-network/COVID-19/master/parameter_estimates/2019_novel_coronavirus/estimates.csv')

In [72]:
est_para.columns

Index(['id', 'peer_review', 'peer_review_location', 'name', 'abbreviation',
       'units', 'country', 'location_name', 'location_type', 'start_date',
       'end_date', 'value_type', 'value', 'uncertainty_type', 'lower_bound',
       'upper_bound', 'population', 'method_description', 'data_description',
       'data_URL', 'date_publication', 'title_publication', 'type_publication',
       'authors', 'publication_URL', 'entry_date', 'entry_person',
       'entry_email'],
      dtype='object')

21.02

Here incubation period (IB_pred), R0 (basic reproduction number) & transmission rate are collected as the means of all published researches on MIDAS network regarding each factor respectively.

In [141]:
# Incubation period
IB_pred = (dp(est_para)
         .query('name == "incubation period"')
         .select(['peer_review','id','value_type','data_description','end_date','upper_bound','lower_bound','value'])
         .mutate(value = pd.to_numeric(X['value'], errors='coerce'))
        .pd)
IB_pred = IB_pred.value.mean()

# Basic reproduction value
R0 = (dp(est_para)
         .query('abbreviation == "R0"')
         .select(['peer_review','id','abbreviation','data_description','end_date','upper_bound','lower_bound','value'])
         .mutate(value = pd.to_numeric(X['value'], errors='coerce'))
        .pd)
R0 = R0.value.mean()

# Transmission rate
TR = (dp(est_para)
     .query("name == 'transmission rate'")
     .select(['id','location_type','abbreviation','name','end_date','upper_bound','lower_bound','value'])
      .mutate(value = pd.to_numeric(X['value'], errors='coerce'))
    .pd)

TR = TR.value.mean()/10

# Recovery time
Recover_time = (dp(est_para)
         .query('name == "time from symptom onset to recovery"')
         .select(['peer_review','id','value_type','data_description','end_date','upper_bound','lower_bound','value'])
         .mutate(value = pd.to_numeric(X['value'], errors='coerce'))
        .pd)
Recover_time = Recover_time.value.mean()

### Local files input

Read input from local files:
1. Age cohort with mortality rates
2. Hospital beds per country (public + private)
3. Demographics data (total population)

In [79]:
# Proportion of DE is taken from the latest age pyramid: https://www.populationpyramid.net/germany/2019/
AGE_DATA = pd.read_csv("./Data/age_data.csv", index_col="Age Group")
BED_DATA = functions.preprocess_bed_data("./Data/OECD hospital beds.csv")
country_data = pd.read_csv("./Data/OECD demographics.csv")

# Only filter out total population data
country_data = (dp(country_data)
               .query("VAR == 'DEMODOMP' & UNIT == 'EFFPEREF' & Year == 2018")
               .mutate(Value = X.Value * 1000)
               .select(["Country","Value"])
               .pd)

In [129]:
# Set constant values

# """
# SIR model constants
# """
class RecoveryRate:
    default = 1 / 10  # Update with MIDAS network everytime running the code

class MortalityRate:
    # Take weighted average of death rate across age groups. This assumes each age group is equally likely to
    # get infected, which may not be exact, but is an assumption we need to make for further analysis,
    # notably segmenting deaths by age group.
    default = (AGE_DATA.Proportion_DE_2020 * AGE_DATA.Mortality).sum()

class CriticalDeathRate:
    # Death rate of critically ill patients who don't have access to a hospital bed.
    # This is the max reported from Wuhan:
    # https://wwwnc.cdc.gov/eid/article/26/6/20-0233_article
    default = 0.122

class TransmissionRatePerContact:
    # Probability of a contact between carrier and susceptible leading to infection.
    # Using the mean value of all research reports
    default = TR
    
    # The transmission rate of a asymptomatic infected individual is lower by a certain ratio
    # The ratio is reported to be 55%
    # source: https://science.sciencemag.org/content/early/2020/03/13/science.abb3221
    default_per_symptom_state = {
        SymptomState.ASYMPTOMATIC : 0.55 * default,
        SymptomState.SYMPTOMATIC : default,
    }

class AverageDailyContacts:
    min = 0
    max = 50
    default = 15

class AsymptomaticRate:
    # Proportion of true cases showing no symptoms
    # The number comes from a study led on passengers of the Diamond Princess Cruise, in Japan
    # https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2020.25.10.2000180
    default = 0.179
    
# """
# Health care constants
# """

class ReportingRate:
    # Proportion of true cases diagnosed
#     default = 0.14
    default = 1 # assuming all reported cases are true case

class HospitalizationRate:
    # Cases requiring hospitalization. We multiply by the ascertainment rate because our source got their estimate
    # from the reported cases, whereas we will be using it with total cases.
    default = 0.19 * ReportingRate.default

0.66

## Simulation

In [6]:
print('Select the country:')
def f_select(Country):
    return(Country)
w_select = interactive(f_select, Country=country_data['Country'])
display(w_select)

Select the country:


interactive(children=(Dropdown(description='Country', options=('Australia', 'Austria', 'Belgium', 'Canada', 'C…

In [126]:
target_country = 'Country == "' + w_select.result + '"'

from datetime import datetime, timedelta
target_date = datetime.strftime(datetime.now(),"%m/%d/%y")
target_date = target_date[-(len(target_date)-1):]

# Test if today's date is already there
try:
    x = global_confirmed[target_date]
    del x
    print('Get the latest data..')
    target_date
except:
    print('Get yesterday data since today is not updated yet..')
    target_date = datetime.strftime(datetime.now()-timedelta(1),"%m/%d/%y")
    target_date = target_date[-(len(target_date)-1):]
    x = global_confirmed[target_date]
    del x
    target_date
    

NameError: name 'w_select' is not defined

In [14]:
historical_df = (dp(global_confirmed)
                 .query(target_country)          
                 .assign(Type = "Confirmed")
                 .append(dp(global_death)
                         .query(target_country)
                         .assign(Type = "Death")
                         .pd)
                 .append(dp(global_recovered)
                         .query(target_country)
                         .assign(Type = "Recovered")
                         .pd)                 
                 .select(["-Province/State",'-Lat','-Long','-Country'])
                 .set_index('Type')
                 .pd)

confirmed = pd.DataFrame(historical_df.iloc[0]).rename_axis('Date').reset_index()
confirmed['Date'] = pd.to_datetime(confirmed['Date']) 
confirmed['Status'] = "Confirmed"
confirmed = confirmed.rename(columns = {'Confirmed':'Number'})

deaths = pd.DataFrame(historical_df.iloc[1]).rename_axis('Date').reset_index()
deaths['Date'] = pd.to_datetime(confirmed['Date']) 
deaths['Status'] = "Deaths"
deaths = deaths.rename(columns = {'Death':'Number'})

recovered = pd.DataFrame(historical_df.iloc[2]).rename_axis('Date').reset_index()
recovered['Date'] = pd.to_datetime(confirmed['Date']) 
recovered['Status'] = "Recovered"
recovered = recovered.rename(columns = {'Recovered':'Number'})

historical_df = confirmed.append(deaths).append(recovered)

In [15]:
num_hospital_beds = (dp(BED_DATA)
                     .query(target_country)
                     .select('Latest Bed Estimate')
                     .pd).iloc[0]['Latest Bed Estimate']

number_cases_deaths =(dp(global_death)
                     .select(['Country',target_date])
                     .query(target_country)
                     .pd).iloc[0][target_date]

number_cases_recovered =(dp(global_recovered)
                     .select(['Country',target_date])
                     .query(target_country)
                     .pd).iloc[0][target_date]

number_cases_confirmed =(dp(global_confirmed)
                     .select(['Country',target_date])
                     .query(target_country)
                     .pd).iloc[0][target_date]

population = (dp(country_data)
             .query(target_country)
             .select('Value')
             .pd).iloc[0]['Value']

### Get predictions

In [16]:
def f(contact_rate):
    return(contact_rate)

w = interactive(f, contact_rate= widgets.IntSlider(min=0, max=100, step=1, value=50))
print('How many people does an infected individual meet daily?')
display(w)

How many people does an infected individual meet daily?


interactive(children=(IntSlider(value=50, description='contact_rate'), Output()), _dom_classes=('widget-intera…

In [18]:
sir_model = functions.SIRModel(
        transmission_rate_per_contact=TransmissionRatePerContact.default,
        contact_rate = w.result,
        recovery_rate = RecoveryRate.default,
        normal_death_rate = MortalityRate.default,
        critical_death_rate = CriticalDeathRate.default,
        hospitalization_rate = HospitalizationRate.default,
        hospital_capacity = num_hospital_beds,
    )

true_cases_estimator = functions.TrueInfectedCasesModel(ReportingRate.default)
estimated_true_cases = true_cases_estimator.predict(number_cases_confirmed)

df = functions.get_predictions(
    cases_estimator=true_cases_estimator,
    sir_model=sir_model,
    num_diagnosed=number_cases_confirmed,
    num_recovered=number_cases_recovered,
    num_deaths= number_cases_deaths,
    area_population=population)

### Plotting

In [19]:
print('Historical COVID-19 data of the selected country')
graphic.plot_historical_data(historical_df)


Historical COVID-19 data of the selected country


In [20]:
df_base = df[~df.Status.isin(["Need Hospitalization"])]

df_base['Date'] = None
for idx,row in df_base.iterrows():
    df_base['Date'][idx] = datetime.strftime(datetime.now() + timedelta(int(df_base['Days'][idx])),"%d/%m/%y")
df_base['Date'] = pd.to_datetime(df_base['Date'],format="%d/%m/%y")
graphic.infection_graph(df_base, df_base.Forecast.max())
# graphic.infection_graph(df_base, 2000000)

In [21]:
df_full = (dp(df_base)
           .select("-Days")
           .rename(columns = {'Forecast':'Number'})
           .append(historical_df)
           .pd)
print('Combination historical and forecast COVID-19 cases')
graphic.plot_historical_data(df_full)

Combination historical and forecast COVID-19 cases


In [28]:
# Work on simulation of 7+ days when social distancing enforced (1.4.2020)



In [29]:
num_dead = df[df.Status == "Dead"].Forecast.iloc[-1]
num_recovered = df[df.Status == "Recovered"].Forecast.iloc[-1]
outcomes_by_age_group = functions.get_status_by_age_group(AGE_DATA, MortalityRate.default,
                                                          num_dead, num_recovered)
fig = graphic.age_segregated_mortality(
    outcomes_by_age_group.loc[:, ["Dead", "Need Hospitalization"]]
)

fig

In [21]:
peak_occupancy = df.loc[df.Status == "Need Hospitalization"]["Forecast"].max()

num_beds_comparison_chart = graphic.num_beds_occupancy_comparison_chart(
    num_beds_available=num_hospital_beds, 
    max_num_beds_needed=peak_occupancy)
num_beds_comparison_chart

## TEST ZONE