# COVID-19 sandbox with regression and epidemiological models (WIP)

## Data source
* **Rest-of-the-world data are taken from here: [github](https://github.com/CSSEGISandData/COVID-19)**
* **Italian data are taken from here: [github](https://github.com/pcm-dpc/COVID-19)**

## Disclaimer
* **The regression model is the result of my own work, while the epidemiological model is originally from [this work](https://www.kaggle.com/volpatto/covid-19-study-with-epidemiology-models). I only adapted it to work with adifferent dataset.**
* **READ HERE FIRST: This is not an official study! This is very unlike to represent real scenarios! I'm not specialist in Epidemiology! This is a very simple demonstration about how to handle the models!**
* **READ EVEN MORE: Real studies and predictions should be performed by teams with multiple specialities. COVID-19 is a real thing, don't propagate results and data as conclusive if YOU ARE NOT A SPECIALIST IN THE FIELD.**

## Table of Contents

1. [Importing libs](#importing)

2. [Loading data](#loading)

3. [Regression models](#regression)

4. [Look at the data](#look)

5. [Run regression models](#run_regression)




5. [Epidemiology models](#models)

6. [Programming SIR/SEIR-based models in Python](#implementations)

7. [Least-squares fitting](#least-squares)

8. [Extrapolation/Predictions](#deterministic-predictions)

9. [Bayesian Calibration](#bayes-calibration)

    * [SIR model](#bayes-sir)
    * [Modified SEIR model](#bayes-seir2)

This notebook presents a simple "study", more like as an exercise, of how a data conciliation can be performed on Epidemiology models. This kind of approach can be adapted for other cases that have Dynamical Systems as mathematical model.

Classical models are analyzed:

* SIR (Susceptible-Infected-Recovered) model;
* SEIR (Susceptible-Exposed-Infected-Recovered) model;

New modifications are proposed in order to represent better features that are present in COVID-19 spreading. The modified models are:

* SIRD (Susceptible-Infected-Recovered-Dead) model, which considers the disease mortality rate explicity;
* SEIR-2 model, which is a modification of SEIR that take into account the fact that exposed individuals without symptoms can transmit the disease to susceptible individuals during incubation time;
* SEIRD model. This model is just SEIR-2 model, but considering the sub-population of Dead individuals;
* SEIRD-Q model. This is the most ambitious model here. This model try to model, in some sense, the effect in the population dynamics related to removal due to quarantine.

*New models will be devised in the next few days. Note that these models are "local" models, so it must be applied to territorial area for cities. It will be adjusted latter to take it into account.*

Before analyze the models, we begin having a look at the available data.

**P.S.: This is just a study related to computational aspects. I'm not an epidemiologist. COVID-19 is a REAL issue! This notebook can not be regarded as professional advices in any sense except for, maybe, as a computational analysis. Maybe these models, or some idea, can help someone else that can really face this issue.**

<a id="importing"></a>
## Importing libs

In [None]:
import pandas as pd
import numpy as np
from datetime import datetime,timedelta
from sklearn.metrics import mean_squared_error
from scipy.optimize import curve_fit
from scipy.optimize import fsolve
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
from itertools import compress
import traceback
from scipy.signal import lfilter, filtfilt
import pymc3 as pm # for uncertainty quantification and model calibration
from scipy.integrate import solve_ivp # to solve ODE system
from scipy import optimize # to solve minimization problem from least-squares fitting
from numba import jit # to accelerate ODE system RHS evaluations
import theano # to control better pymc3 backend and write a wrapper
import theano.tensor as t # for the wrapper to a custom model to pymc3

# Plotting libs
import matplotlib.pyplot as plt
import altair as alt

seed = 12345 # for the sake of reproducibility :)
np.random.seed(seed)

plt.style.use('seaborn-talk') # beautify the plots!

THEANO_FLAGS='optimizer=fast_compile' # A theano trick

<a id="loading"></a>
## Loading data
Common to both logistic regressions and epidemiological models

In [None]:
def load(keys):
    def extract_ita(tipo, code):
        if tipo == "nat":
            url = "../input/dpccovid19ita-20200418/dpc-covid19-ita-andamento-nazionale.csv"
            df = pd.read_csv(url)
        elif tipo == "reg":
            url = "../input/dpccovid19ita-20200418/dpc-covid19-ita-regioni.csv"
            df_raw = pd.read_csv(url)
            is_code =  df_raw['codice_regione']==code
            df = df_raw[is_code]
        elif tipo == "prov":
            url = "../input/dpccovid19ita-20200418/dpc-covid19-ita-province.csv"
            df_raw = pd.read_csv(url)
            is_code =  df_raw['codice_provincia']==code
            df = df_raw[is_code]
    
        # Interpretazione
        #print(df)
        try:
            df = df[['data', 'totale_casi', 'deceduti','dimessi_guariti']].copy()
        except:
            df = df[['data', 'totale_casi']].copy()
            df['deceduti'] = 0
            df['dimessi_guariti'] = 0
            
        #df = df.loc[:,['data','totale_casi','deceduti','dimessi_guariti']]
        date = df['data']
        FMT = '%Y-%m-%dT%H:%M:%S'
        df['date_n'] = pd.Series(date.map(lambda x : (dt.datetime.strptime(x, FMT) - dt.datetime.strptime("2019-12-31T00:00:00", FMT)).days  ), index=df.index)
        df.index = df['date_n']
    
        return df
    
    def extract_jhu(url):
        df = pd.read_csv(url, delimiter=',')
        df = df.transpose()
        
        # Create date_n
        date_n = list(df.index[4:].values)
        try:
            FMT = '%m/%d/%y'
            date_n = map(lambda x : (dt.datetime.strptime(x, FMT) - dt.datetime.strptime("12/31/19", FMT)).days, date_n)
        except:
            FMT = '%m/%d/%Y'
            date_n = map(lambda x : (dt.datetime.strptime(x, FMT) - dt.datetime.strptime("12/31/2019", FMT)).days, date_n)
        new_header = df.iloc[1]
        df.columns = new_header
        df = df[4:]
        df['date_n'] = pd.Series(date_n, index=df.index)
        
        # Format date with a standard format
        df['date'] = pd.Series(df.index, index=df.index)
        FMTout = "%Y-%m-%d"
        try:
            FMTin = "%m/%d/%y"
            #df['date'] = map(lambda x : (datetime.strftime(datetime.strptime(x, FMTin), FMTout)), pd.Series(df.index, index=df.index))
            df['date'] = list(dt.datetime.strftime(dt.datetime.strptime(x, FMTin), FMTout) for x in pd.Series(df.index, index=df.index))
        except:
            FMTin = "%m/%d/%Y"
            #df['date'] = map(lambda x : (datetime.strftime(datetime.strptime(x, FMTin), FMTout)), pd.Series(df.index, index=df.index))
            df['date'] = list(dt.datetime.strftime(dt.datetime.strptime(x, FMTin), FMTout) for x in pd.Series(df.index, index=df.index))
        
        # Index based on date_n        
        #df.index = range(len(df))
        df.index = df['date_n']
        
        # Sum country regions
        df = df.groupby(level=0, axis=1).sum()
        
        # Reorder columns to make in more readable
        cols = df.columns.tolist()
        cols = cols[-2:] + cols[:-2]
        df = df[cols]
        
        # Filter only countris I am interested in
        #df = df[['date_n','Italy', 'Germany', 'Spain', 'United Kingdom', 'France', 'Austria', 'US', 'China']]
        
        return df
    
    def concatandrename(df, df2, column, name):
        df = pd.concat([df, df2[column]], axis=1)
        df = df.rename(columns={column: name})
        return df
        
    # Extract from JHU database
    url_jhu_cases = "../input/covid19jhu20200418/time_series_covid19_confirmed_global.csv"
    url_jhu_deaths = "../input/covid19jhu20200418/time_series_covid19_deaths_global.csv"
    url_jhu_recovered = "../input/covid19jhu20200418/time_series_covid19_recovered_global.csv"
    df_cases = extract_jhu(url_jhu_cases)
    df_deaths = extract_jhu(url_jhu_deaths)
    df_recovered = extract_jhu(url_jhu_recovered)
    
    # Extract from Ita database
    df_ita = extract_ita("nat", -1)
    df_emiliaromagna = extract_ita("reg", 8)
    df_lombardia = extract_ita("reg", 3)
    df_veneto = extract_ita("reg", 5)
    df_parma = extract_ita("prov", 34)
    df_reggioemilia = extract_ita("prov", 35)
    df_modena = extract_ita("prov", 36)
    
    # Merge Ita data with JHU data
    df_cases = concatandrename(df_cases, df_ita, "totale_casi", "Italia")
    df_cases = concatandrename(df_cases, df_emiliaromagna, "totale_casi", "EmiliaRomagna")
    df_cases = concatandrename(df_cases, df_lombardia, "totale_casi", "Lombardia")
    df_cases = concatandrename(df_cases, df_veneto, "totale_casi", "Veneto")
    df_cases = concatandrename(df_cases, df_parma, "totale_casi", "Parma")
    df_cases = concatandrename(df_cases, df_reggioemilia, "totale_casi", "Reggio")
    df_cases = concatandrename(df_cases, df_modena, "totale_casi", "Modena")
    
    df_deaths = concatandrename(df_deaths, df_ita, "deceduti", "Italia")
    df_deaths = concatandrename(df_deaths, df_emiliaromagna, "deceduti", "EmiliaRomagna")
    df_deaths = concatandrename(df_deaths, df_lombardia, "deceduti", "Lombardia")
    df_deaths = concatandrename(df_deaths, df_veneto, "deceduti", "Veneto")
    df_deaths = concatandrename(df_deaths, df_parma, "deceduti", "Parma")
    df_deaths = concatandrename(df_deaths, df_reggioemilia, "deceduti", "Reggio")
    df_deaths = concatandrename(df_deaths, df_modena, "deceduti", "Modena")
    
    df_recovered = concatandrename(df_recovered, df_ita, "dimessi_guariti", "Italia")
    df_recovered = concatandrename(df_recovered, df_emiliaromagna, "dimessi_guariti", "EmiliaRomagna")
    df_recovered = concatandrename(df_recovered, df_lombardia, "dimessi_guariti", "Lombardia")
    df_recovered = concatandrename(df_recovered, df_veneto, "dimessi_guariti", "Veneto")
    df_recovered = concatandrename(df_recovered, df_parma, "dimessi_guariti", "Parma")
    df_recovered = concatandrename(df_recovered, df_reggioemilia, "dimessi_guariti", "Reggio")
    df_recovered = concatandrename(df_recovered, df_modena, "dimessi_guariti", "Modena")
    
    print("Elements in df_cases: {}".format(len(df_cases)))
    print("Elements in df_deaths: {}".format(len(df_deaths)))
    print("Elements in df_recovered: {}".format(len(df_recovered)))
    
    # Prepara dati per processamento
    if len(keys)>0:
        x = np.asarray(list(df_cases['date_n']))
        X = len(keys)*[x]
        Y_cases = [np.asarray(list(df_cases[key])) for key in keys]
        Y_deaths = [np.asarray(list(df_deaths[key])) for key in keys]
        Y_recovered = [np.asarray(list(df_recovered[key])) for key in keys]
        Label_cases = keys
        Label_deaths = [Label_cases[i]+"+" for i in range(len(Label_cases))]
        Label_recovered = [Label_cases[i]+"|" for i in range(len(Label_cases))]
    else:
        X = []
        Y_cases = []
        Y_deaths = []
        Y_recovered = []
        Label_cases = []
        Label_deaths = []
        Label_recovered = []
    
    return df_cases, df_deaths, df_recovered, X, Y_cases, Y_deaths, Y_recovered, Label_cases, Label_deaths, Label_recovered


"""
Extract data for one specific country/region
"""
def country(df_cases, df_deaths, df_recovered, name):
    df = pd.DataFrame(index=df_cases.index)
    df = pd.concat([df, df_cases['date'], df_cases[name]], axis=1)
    df = pd.concat([df, df_deaths[name]], axis=1)
    df = pd.concat([df, df_recovered[name]], axis=1)
    df.columns = ['date', 'confirmed', 'deaths', 'recovered']
    df['confirmed_marker'] = 'Confirmed'
    df['deaths_marker'] = 'Death'
    df['recovered_marker'] = 'Recovered'
    df = df.reset_index(drop=False)
    return df

"""
Trims the initial days with 0 cases from a country df
"""
def trim_country(df):
    df0 = df[df.confirmed > 0]
    df0 = df0.reset_index(drop=True)
    df0['day'] = df0.date_n.apply(lambda x: (x - df0.date_n.min()))
    
    #df0 = df0.rename(columns={"date_n": "day"})
    return df0

def load_population():
    return pd.read_csv("../input/countries-of-the-world/countries of the world.csv")

def get_target_population(df, country):
    country = country + ' '
    return float(df[df.Country == country].Population)

<a id="regression"></a>
# Regression models

In [None]:
def daynumber2date(d):
    return str(dt.datetime(2019, 12, 31) + dt.timedelta(days=int(d)))[:10]

formatter_date = FuncFormatter(lambda x_val, tick_pos: "{}".format(daynumber2date(x_val)))
formatter_log10 = FuncFormatter(lambda x_val, tick_pos: "{}".format(np.power(10,x_val)))

def removenan(x, y):
    keep = np.logical_not(np.logical_or(np.logical_or(np.isnan(x), np.isinf(x)), np.logical_or(np.isnan(y), np.isinf(y))))
    x = list(compress(x, keep))
    y = list(compress(y, keep))
    if (sum(np.isnan(x)) + sum(np.isinf(x)) + sum(np.isnan(y) + sum(np.isinf(y)))) > 0:
        print("Stammerda non funziona")
    return x, y

def logistic1_model(x, a, dtau, tau):
    return a/(1+np.exp(-(x-dtau)/tau))

def logistic2_model(x, a, b, dtau, tau):
    return a/(1+b*np.exp(-np.power((x-dtau)/tau, 1.0)))

def logistic21_model(x, a, b, dtau, tau, ni):
    return a/np.power(1+b*np.exp(-np.power((x-dtau)/tau, 1.0)), 1.0/ni)

def logistic3_model(x, a, b, tau, alpha):
    # usato da Matteo P.
    return a/(1+b*np.exp(-np.power(x/tau, alpha)))

def logistic31_model(x, a, b, tau, alpha, ni):
    return a/np.power(1+b*np.exp(-np.power(x/tau, alpha)), 1.0/ni)

def logistic32_model(x, a, b, tau, alpha, ni, k):
    return k + (a-k)/np.power(1+b*np.exp(-np.power(x/tau, alpha)), 1.0/ni)

def linear_model(x, a, b):
    return a*x+b

def quadratic_model(x, a, b, c):
    return a*(x**2)+b*x+c

def model(model, x, y, horizon=0, threshold=10, relative_rmse = False, verbose=False):
    #x = np.asarray(x)
    #y = np.asarray(y)
    
    if len(x) > 0:
    
        if horizon == 0:
            x_pred = x
        else:
            x_pred = np.arange(min(x), max(x)+horizon, 1)
        
        if model == "logistic1":
            try:
                p0 = [20000, 100, 2]
                fit = curve_fit(logistic1_model, x, y, p0=p0)
                
                y_pred = logistic1_model(x_pred, fit[0][0], fit[0][1], fit[0][2])
    
                # End date
                sol = int(fsolve(lambda x : logistic1_model(x,fit[0][0],fit[0][1],fit[0][2]) - int(fit[0][2]),fit[0][1]))
                end_date = dt.datetime(2019, 12, 31) + dt.timedelta(days=sol)
                #print "End date: " + str(end_date)
                
            except:
                traceback.print_exc()
                print(model)
                x_pred = [0]
                y_pred = [0]
                fit = [[1, 0, 0], [0]]
                end_date = 0
        
        if model == "logistic2":
            try:
                p0 = [1.52646450e+05, 1.56215676e-01, 9.59401246e+01, 6.23161909e+00]
                fit = curve_fit(logistic2_model, x, y, maxfev=100000, p0=p0)
                
                y_pred = logistic2_model(x_pred, fit[0][0], fit[0][1], fit[0][2], fit[0][3])
                
            except Exception:
                traceback.print_exc()
                print(model)
                x_pred = [0]
                y_pred = [0]
                fit = [[0, 0, 1], [0]]
            
            end_date = 0
            
        if model == "logistic21":
            try:                
                #p0 = [1.52646450e+05, 1.56215676e-01, 9.59401246e+01, 6.23161909e+00, 1.0]
                # Uso il valore massimo dei casi come a0, questo fa si' che il modello converga per tutti i set, se no non convergerebbe
                p0 = [max(y), 1.56215676e-01, 9.59401246e+01, 6.23161909e+00, 1.0]
                bounds = ([-np.inf, -np.inf, -np.inf, -np.inf, 0.000000],
                          [ np.inf,  np.inf,  np.inf,  np.inf, 8.000000])
                fit = curve_fit(logistic21_model, x, y, maxfev=100000, p0=p0)
                
                y_pred = logistic21_model(x_pred, fit[0][0], fit[0][1], fit[0][2], fit[0][3], fit[0][4])
                
            except Exception:
                traceback.print_exc()
                print(model)
                x_pred = [0]
                y_pred = [0]
                fit = [[0, 0, 1], [0]]
            
            end_date = 0
            
        if model == "logistic3":
            try:
                p0 = [1.52646560e+05, 1.56040732e-01, 9.59471514e+01, 1.5]
                bounds = ([-np.inf, -np.inf, -np.inf, 0.0000009],
                          [ np.inf,  np.inf,  np.inf, 8.0000000])
                fit = curve_fit(logistic3_model, x, y, maxfev=100000, bounds=bounds, p0=p0)
                
                y_pred = logistic3_model(x_pred, fit[0][0], fit[0][1], fit[0][2], fit[0][3])
                
            except Exception:
                traceback.print_exc()
                print(model)
                x_pred = [0]
                y_pred = [0]
                fit = [[0, 0, 1], [0]]
            
            end_date = 0
            
        if model == "logistic31":
            try:
                p0 = [1.52646560e+05, 1.56040732e-01, 9.59471514e+01, 1.5, 1.0]
                bounds = ([-np.inf, -np.inf, -np.inf, 0.0000009, 0.000000],
                          [ np.inf,  np.inf,  np.inf, 8.0000000, 8.000000])
                fit = curve_fit(logistic31_model, x, y, maxfev=100000, bounds=bounds, p0=p0)
                
                y_pred = logistic31_model(x_pred, fit[0][0], fit[0][1], fit[0][2], fit[0][3], fit[0][4])
                
            except Exception:
                traceback.print_exc()
                print(model)
                x_pred = [0]
                y_pred = [0]
                fit = [[0, 0, 1], [0]]
            
            end_date = 0
            
        if model == "logistic32":
            try:
                p0 = [1.52646560e+05, 1.56040732e-01, 9.59471514e+01, 1.5, 1.0, 0.0]
                bounds = ([-np.inf, -np.inf, -np.inf, 0.000009, 0.000000, -np.inf],
                          [ np.inf,  np.inf,  np.inf, 8.000000, 8.000000,  np.inf])
                fit = curve_fit(logistic32_model, x, y, maxfev=100000, bounds=bounds, p0=p0)
                
                y_pred = logistic32_model(x_pred, fit[0][0], fit[0][1], fit[0][2], fit[0][3], fit[0][4], fit[0][5])
                
            except Exception:
                traceback.print_exc()
                print(model)
                x_pred = [0]
                y_pred = [0]
                fit = [[0, 0, 1], [0]]
            
            end_date = 0
        
        elif model == "linear":
            try:
                x_pred = np.arange(min(x), max(x)+horizon, 1)
                fit = curve_fit(linear_model, x, y)
                y_pred = linear_model(x_pred, fit[0][0], fit[0][1])
            except:
                x_pred = [0]
                y_pred = [0]
                fit = [[0, 0], [0]]
                
            end_date = 0
                
        elif model == "quadratic":
            try:
                x_pred = np.arange(min(x), max(x)+horizon, 1)
                fit = curve_fit(quadratic_model, x, y)
                y_pred = quadratic_model(x_pred, fit[0][0], fit[0][1], fit[0][2])
            except:
                x_pred = [0]
                y_pred = [0]
                fit = [[0, 0, 0], [0]]
                
            end_date = 0
        
        # Calcolo RMSE
        rmse = np.sqrt(mean_squared_error(y, y_pred[:len(y)]))
        if relative_rmse:
            rmse = rmse / max(y)
        
        # Cerco quando la derivata e' al di sotto di una certa soglia, quindi nuovi casi inferiori a ...
        d = np.diff(y_pred)
        zero_crossings = np.where(np.diff(np.sign(d-threshold)))[0]
        if len(zero_crossings)>0:
            last_zero_crossing_day = x_pred[zero_crossings[-1]]
            end_date = last_zero_crossing_day
        else:
            end_date = -1
            
        if verbose:
            #print len(x)
            #print len(y)
            #print len(x_pred)
            #print len(y_pred)
            print(fit[0])
        
        return x_pred, y_pred, fit, rmse, end_date
    
    else:
        return [], [], [], 0, 0


def run_time_model(X, Y, past=0, horizon=0, threshold=10, relative_rmse=False, verbose=False):
    Fit = list()
    X_pred = list()
    Y_pred = list()
    Rmse = list()
    End_date = list()
    
    if past>0:
        print("Non hai capito un cazzo, past deve essere < 0")
    
    for i, (x, y) in enumerate(zip(X,Y)):
        x, y = removenan(x, y)
        x = x[:len(x)+past]
        y = y[:len(y)+past]
        
        #x_pred, y_pred, fit, end_date = model("logistic1", x, y, horizon)
        x_pred, y_pred, fit, rmse, end_date = model("logistic21", x, y, horizon=horizon, threshold=threshold, relative_rmse=relative_rmse, verbose=verbose)
        #x_pred, y_pred, fit, end_date = model("logistic3", x, y, horizon)
        #x_pred, y_pred, fit, end_date = model("logistic31", x, y, horizon, verbose=True)
        #x_pred, y_pred, fit, end_date = model("logistic32", x, y, horizon, verbose=True)
        
        Fit.append(fit)
        X_pred.append(x_pred)
        Y_pred.append(y_pred)
        Rmse.append(rmse)
        End_date.append(end_date)
        
    return X_pred, Y_pred, Fit, Rmse, End_date


def run_time_model_timemachine(X, Y, Label, threshold):
    timemachine = range(-20,0+1,1)
    Timemachine_rmse = list()
    Timemachine_enddate = list()
        
    for past in timemachine:
        X_pred_cases, Y_pred_cases, Fit_cases, Rmse_cases, End_date_cases = run_time_model(X, Y, past=past, horizon=90, threshold=threshold, relative_rmse=True, verbose=False)
        Timemachine_rmse.append(Rmse_cases)
        Timemachine_enddate.append(End_date_cases)
            
    # Riordina risultati nel formato solito (per paese)
    paese = 0
    lista_enddate = list()
    lista_rmse = list()
    for paese in range(0, len(X)):
        lista_enddate_paese = list()
        lista_rmse_paese = list()
        for giorno in range(0, len(Timemachine_enddate)):
            lista_enddate_paese.append(Timemachine_enddate[giorno][paese])
            lista_rmse_paese.append(Timemachine_rmse[giorno][paese])
        lista_enddate.append(lista_enddate_paese)
        lista_rmse.append(lista_rmse_paese)
    
    # Grafico
    fig = plt.figure()
    ax1 = fig.add_subplot(211)
    ax1.yaxis.set_major_formatter(formatter_date)
    
    for i in range(len(lista_enddate)):
        ax1.plot(timemachine, lista_enddate[i], label=Label[i])
    ax1.legend()
    ax1.set_title("Fine prevista")
    ax1.xaxis.set_label_text("Giorno nel passato")
    ax1.yaxis.set_label_text("Giorno")
    ax1.yaxis.grid(True, which='major')
    ax1.yaxis.grid(True, which='minor')
    
    ax2 = fig.add_subplot(212)
    for i in range(len(lista_enddate)):
        ax2.plot(timemachine, lista_rmse[i], label=Label[i])
    ax2.set_title("RMSE")
    ax2.xaxis.set_label_text("Giorno nel passato")
    ax2.yaxis.set_label_text("RMSE")
    ax2.yaxis.grid(True, which='major')
    ax2.yaxis.grid(True, which='minor')
    
    return


def run_cross_model(Y1,Y2):
    Fit = list()
    Y1_pred = list()
    Y2_pred = list()
    
    for i, (y1, y2) in enumerate(zip(Y1, Y2)):
        y1, y2 = removenan(y1, y2)
        y1_pred, y2_pred, fit = model("linear", y1, y2)
        Fit.append(fit)
        Y1_pred.append(y1_pred)
        Y2_pred.append(y2_pred)
        
    return Y1_pred, Y2_pred, Fit


def plot_timeseries(title, logplot, threshold, \
                    X, Y, X_pred, Y_pred, Label):
    
    colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf', '#1f77b4', '#ff7f0e']
    plt.rcParams['figure.figsize'] = [20, 10]
    plt.rc('font', size=14)
    fig = plt.figure()
    
    ax1 = fig.add_subplot(221)
    for idx, (x, y, x_pred, y_pred, label) in enumerate(zip(X, Y, X_pred, Y_pred, Label)):
        if logplot:
            ax1.scatter(x, np.log10(y), marker='.', color=colors[idx], label=label)
            ax1.plot(x_pred, np.log10(y_pred), color=colors[idx], label=label)
        else:
            ax1.scatter(x, y, marker='.', color=colors[idx], label=label)
            ax1.plot(x_pred, y_pred, color=colors[idx], label=label)
    
    ax1.set_title(title)
    ax1.xaxis.set_label_text("Giorno")
    ax1.yaxis.set_label_text("Casi")
    ax1.legend()
    ax1.xaxis.set_major_formatter(formatter_date)
    if logplot:
        ax1.set_ylim(0, 6)
        ax1.yaxis.set_major_formatter(formatter_log10)
    else:
        ax1.set_ylim(0, 1000000)
    ax1.yaxis.grid(True, which='major')
    ax1.yaxis.grid(True, which='minor')
    
    # Nuovi casi previsti (derivata della previsione comulativa)
    ax2 = fig.add_subplot(222)
    ax2.set_title(title + " (derivata)")
    for idx, (x, y, x_pred, y_pred, label) in enumerate(zip(X, Y, X_pred, Y_pred, Label)):
        d = np.diff(y_pred)
        ax2.plot(x_pred[1:], np.log10(d), color=colors[idx], label=label)
        
    ax2.xaxis.set_label_text("Giorno")
    ax2.yaxis.set_label_text("Nuovi casi")
    ax2.set_ylim(0, 5)
    ax2.xaxis.set_major_formatter(formatter_date)
    ax2.yaxis.set_major_formatter(formatter_log10)
    ax2.set_yticks([np.log10(threshold)], minor=True)
    ax2.xaxis.grid(True, which='major')
    ax2.xaxis.grid(True, which='minor')
    ax2.yaxis.grid(True, which='major')
    ax2.yaxis.grid(True, which='minor')
    
    # Errori
    ax3 = fig.add_subplot(223)
    ax3.set_title(title + " (errori)")
    for idx, (x, y, x_pred, y_pred, label) in enumerate(zip(X, Y, X_pred, Y_pred, Label)):
        x, y = removenan(x, y)
        ax3.plot(x, y_pred[:len(y)]-y, color=colors[idx], label=label)
        
    ax3.set_ylim(-3000, 3000)
    ax3.yaxis.grid(True, which='major')
    ax3.yaxis.grid(True, which='minor')
    
    plt.show()
    return


def plot_timeseries2(title, logplot, threshold, \
                     X1, Y1, X1_pred, Y1_pred, Label1, \
                     X2, Y2, X2_pred, Y2_pred, Label2, scale2=1.0):
    
    colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf', '#1f77b4', '#ff7f0e']
    plt.rcParams['figure.figsize'] = [20, 10]
    plt.rc('font', size=14)
    fig = plt.figure()
    
    ax1 = fig.add_subplot(111)
    for idx, (x1, y1, x1_pred, y1_pred, label1, x2, y2, x2_pred, y2_pred, label2) in enumerate(zip(X1, Y1, X1_pred, Y1_pred, Label1, X2, Y2, X2_pred, Y2_pred, Label2)):
        if logplot:
            ax1.scatter(x1, np.log10(y1), marker='.', color=colors[idx], label=label1)
            ax1.plot(x1_pred, np.log10(y1_pred), color=colors[idx], label=label1)
            ax1.scatter(x2, np.log10(y2*scale2), marker='+', color=colors[idx], label=label2)
            ax1.plot(x2_pred, np.log10(y2_pred*scale2), color=colors[idx], label=label2)
        else:
            ax1.scatter(x1, y1, marker='.', color=colors[idx], label=label1)
            ax1.plot(x1_pred, y1_pred, color=colors[idx], label=label1)
            ax1.scatter(x2, y2*scale2, marker='+', color=colors[idx], label=label2)
            ax1.plot(x2_pred, y2_pred*scale2, color=colors[idx], label=label2)
    
    ax1.set_title(title)
    ax1.xaxis.set_label_text("Giorno")
    ax1.yaxis.set_label_text("Casi")
    ax1.legend()
    #ax1.xaxis.set_major_formatter(formatter_date)
    if logplot:
        ax1.set_ylim(0, 6)
        ax1.yaxis.set_major_formatter(formatter_log10)
    else:
        ax1.set_ylim(0, 200000)
    ax1.yaxis.grid(True, which='major')
    ax1.yaxis.grid(True, which='minor')
    
    plt.show()
    return


def plot_crossseries(title, xlabel, ylabel,
                     Y1, Y2, Label, Y1_pred=[], Y2_pred=[], Fit=[]):
    
    colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf', '#1f77b4', '#ff7f0e']
    fig = plt.figure()
    
    ax = fig.add_subplot(111)
    plt.rcParams['figure.figsize'] = [20, 10]
    plt.rc('font', size=14)
    
    if len(Y1_pred) > 0:
        fit_active = True
    else:
        fit_active = False
    
    for idx, (y1, y2, label) in enumerate(zip(Y1, Y2, Label)):
        #ax.scatter(y1, y2, marker='.', color=colors[idx], label=label)
        ax.plot(y1, y2, marker='.', color=colors[idx], label=label)
        
        if fit_active:
            y1_pred = Y1_pred[idx]
            y2_pred = Y2_pred[idx]
            fit = Fit[idx]
            ax.plot(y1_pred, y2_pred, color=colors[idx], label=label)
            #print("%s : % 5.2f %% deaths/total cases" %(label, fit[0][0]*100))  
              
    ax.legend()
    ax.set_title(title)
    ax.xaxis.set_label_text(xlabel)
    ax.yaxis.set_label_text(ylabel)
    ax.xaxis.grid(True, which='major')
    ax.xaxis.grid(True, which='minor')
    ax.yaxis.grid(True, which='major')
    ax.yaxis.grid(True, which='minor')
    plt.show()
    
    return

<a id="look"></a>
## Look at the data

In [None]:
# Caricamento dati
if 1:
    keys = ['Italia',\
            'Spain', \
            'Germany', \
            'France', \
            'US', \
            'China', \
            'Japan' \
            ]
if 0:
    keys = ['Italia',\
            'EmiliaRomagna', \
            'Lombardia', \
            'Veneto', \
            #'Parma', \
            #'Modena', \
            ]
if 0:
     keys = ['Italia']

# Caricamento dati
df_cases, df_deaths, df_recovered, X, Y_cases, Y_deaths, Y_recovered, Label_cases, Label_deaths, Label_recovered =  load(keys)

# Running average
N = 5
b = N*[1.0/N]

# Plots
if 1:
    plot_crossseries("Crescita contagi vs contagi totali", "Cases", "New cases", \
                     np.log10([Y_cases[i][1:] for i in range(len(Y_cases))]), \
                     np.log10(lfilter(b, [1.], np.diff(Y_cases))), \
                     Label_cases)

if 0:
    plot_crossseries("Crescita morti vs morti totali", "Cases", "New cases", \
                     np.log10([Y_deaths[i][1:] for i in range(len(Y_deaths))]), \
                     np.log10(lfilter(b, [1.], np.diff(Y_deaths))), \
                     Label_deaths)

if 0:
    plot_crossseries("Morti vs casi totali", "Cases", "New cases", \
                     np.log10(Y_cases), \
                     np.log10(lfilter(b, [1.], Y_deaths)), \
                     Label_deaths)

if 1:
    plot_crossseries("Crescita morti vs casi totali", "Cases", "New cases", \
                     np.log10([Y_cases[i][1:] for i in range(len(Y_cases))]), \
                     np.log10(lfilter(b, [1.], np.diff(Y_deaths))), \
                     Label_deaths)


<a id="run_regression"></a>
## Run regression models

In [None]:
# Caricamento dati
if 1:
    keys = ['Italia',\
            'Spain', \
            'Germany', \
            'France', \
            'US', \
            'China'
            ]
if 0:
    keys = ['Italia',\
            'EmiliaRomagna', \
            'Lombardia', \
            'Veneto', \
            #'Parma', \
            #'Modena', \
            ]
if 0:
     keys = ['Italia']

# Caricamento dati
df_cases, df_deaths, df_recovered, X, Y_cases, Y_deaths, Y_recovered, Label_cases, Label_deaths, Label_recovered =  load(keys)

# Previsione casi
days_ago = 0
threshold = 50
X_pred_cases, Y_pred_cases, Fit_cases, Rmse_cases, End_date_cases = run_time_model(X, Y_cases, past=days_ago, horizon=90, threshold=threshold)
X_pred_deaths, Y_pred_deaths, Fit_deaths, Rmse_deaths, End_date_deaths = run_time_model(X, Y_deaths, past=days_ago, horizon=90, threshold=threshold)

if 1:
    plot_timeseries("Casi", True, threshold,\
                    X, Y_cases, X_pred_cases, Y_pred_cases, Label_cases)

    print("Predicted end dates:")
    for i, date in enumerate(End_date_cases):
        print("{}\t\t(as of {},\t rmse={}):\t{}".format(Label_cases[i], daynumber2date(max(X[i])-days_ago), Rmse_cases[i], daynumber2date(date)))


# Previsione morti
if 0:
    plot_timeseries("Morti", True, threshold,\
                    X, Y_deaths, X_pred_deaths, Y_pred_deaths, Label_deaths)

    print("Predicted end dates (deaths):")
    for i, date in enumerate(End_date_deaths):
        print("{}\t\t(as of {},\t rmse={}):\t{}".format(Label_deaths[i], daynumber2date(max(X[i])-days_ago), Rmse_deaths[i], daynumber2date(date)))


if 0:
    plot_timeseries2("Casi", True, threshold,\
                     X, Y_cases, X_pred_cases, Y_pred_cases, Label_cases,\
                     X, Y_deaths, X_pred_deaths, Y_pred_deaths, Label_deaths, scale2=7.7)


# Come varia la previsione nel tempo
if 1:
    run_time_model_timemachine(X, Y_cases, Label_cases, threshold=10)

<a id="epi"></a>
# Epidemiological models

## Load

In [None]:
df_cases, df_deaths, df_recovered, X, Y_cases, Y_deaths, Y_recovered, Label_cases, Label_deaths, Label_recovered =  load([])
df_pop = load_population()


"""
Initial conditions
"""
df_target_country = country(df_cases, df_deaths, df_recovered, "Italy")
df_target_country = trim_country(df_target_country)
target_population = get_target_population(df_pop, 'Italy')
print(df_target_country)
print(target_population)

Now, let's take a look at the target country:

In [None]:
def altair_plot_for_confirmed_and_deaths(df_grouped: pd.DataFrame, data_at_x_axis: str='date') -> alt.Chart:
    confirmed_plot = alt.Chart(df_grouped).mark_circle(size=60).encode(
        x=alt.X(data_at_x_axis, axis=alt.Axis(title='Date')),
        y=alt.Y('confirmed', axis=alt.Axis(title='Cases'), title='Confirmed'),
        color=alt.Color("confirmed_marker", title="Cases"),
    )

    deaths_plot = alt.Chart(df_grouped).mark_circle(size=60).encode(
        x=data_at_x_axis,
        y='deaths',
        color=alt.Color("deaths_marker"),
    )
    
    recovered_plot = alt.Chart(df_grouped).mark_circle(size=60).encode(
        x=data_at_x_axis,
        y='recovered',
        color=alt.Color("recovered_marker"),
    )

    return confirmed_plot + deaths_plot + recovered_plot

In [None]:
altair_plot_for_confirmed_and_deaths(df_target_country).interactive()

<a id="models"></a>
## Epidemiology models

Now, let me explore the data in order to calibrate an epidemiologic model in order to try to simulate and predict cases.

### Classical models

Here I present a brief review of classical temporal models (space dependency is not considered). Then I proposed modifications for such models.

#### SIR model

The model represents an epidemic scenario, aiming to predict and control infectious diseases. It consists in a non-linear dynamical system, which considers populational sub-groups according to the state of the individuals. A simple model would be composed by 3 subgroups:

* Susceptible individuals (S);
* Infected (I);
* Recovered (R).

With such components, a classical dynamical system known as SIR model. The equations of such a system is written as:

\begin{align*}
  \dot{S} &= - \beta S I \\ 
  \dot{I} &= \beta S I - \zeta I \\ 
  \dot{R} &= \zeta I
\end{align*}

where $\dot{(\bullet)}$ stands for time-derivative.

Some biological explanation for parameters:

* $\beta$ is the conversion parameter due to interaction between a susceptible individual with an infected one;
* $\zeta$ is the conversion parameter related to the recovery rate. In other words, the individuals that become immune;

#### SEIR model

Another classical model known as SEIR (Susceptible-Exposed-Infected-Recovered) is common applied in Computational Epidemiology literature (you can check it elsewhere). In this model, a new sub-group of individuals is considered: Exposed. Such individuals are those that are infected, but don't show any sympton. In the classical SEIR model, exposed individuals **do not transmit the disease**. The ODE system now becomes:

\begin{align*}
    \dot{S} &= - \beta S  I \\
    \dot{E} &= \beta S I - \alpha E \\
    \dot{I} &= \alpha E - \zeta I \\
    \dot{R} &= \zeta I \\
\end{align*}

Brief biological interpretation for additional parameter:

* $\alpha$ is the conversion parameter for exposed individuals that transformed into infected ones.

### Modified models

Here, I propose some simple modifications in order to improve model representability for COVID-19.

#### Modified SIR model (SIRD)

In this model, deaths due to the disease is considered explicitly. A new individuals sub-group is introduced: dead individuals. To consider such phenomenon, an additional equation is required, as well as a modification in the Infected equation balance. The ODE system is given below:

\begin{align*}
  \dot{S} &= - \beta S I \\ 
  \dot{I} &= \beta S I - \zeta I - \delta I \\ 
  \dot{R} &= \zeta I \\
  \dot{D} &= \delta I
\end{align*}

Brief biological interpretation for additional parameter:

* $\delta$ is the mortality rate for the disease.

#### Modified SEIR model (SEIR-2)

This model aims to solve the lack of the original SEIR model, which does not consider disease transmission between exposed and susceptible individuals. In order to take it into account,
we modified balance equations for S and E as follows:

\begin{align*}
    \dot{S} &= - \beta S  I  - \gamma S E \\
    \dot{E} &= \beta S I - \alpha E + \gamma S E \\
    \dot{I} &= \alpha E - \zeta I \\
    \dot{R} &= \zeta I \\
\end{align*}

Brief biological interpretation for additional parameter:

* $\gamma$ is the conversion rate parameter for susceptible individuals that interact with exposed individuals and then become exposed.

#### Modified SEIR model with deaths (SEIRD)

Very similiar to the last one, but it considers a sub-population of dead individuals due to the disease. Thus, the model is written as:

\begin{align*}
    \dot{S} &= - \beta S  I  - \gamma S E \\
    \dot{E} &= \beta S I - \alpha E + \gamma S E \\
    \dot{I} &= \alpha E - \zeta I - \delta I \\
    \dot{R} &= \zeta I \\
    \dot{D} &= \delta I
\end{align*}

#### Modified SEIRD model considering quarantine lockdown (SEIRD-Q)

This is a modified model that take into account a removal rate from Susceptible, Exposed and Infected individuals to quarantine. The main hypothesis is that this conversion
is under a constant removal parameter (by time, i.e., 1 / time), and after the conversion, the individual becomes "Recovered" and can not transmit the disease anymore. The new system is written as

\begin{align*}
    \dot{S} &= - \beta S  I  - \gamma S E - \omega S\\
    \dot{E} &= \beta S I - \alpha E + \gamma S E - \omega E \\
    \dot{I} &= \alpha E - \zeta I - \delta I - \omega I \\
    \dot{R} &= \zeta I + \omega (S + E + I) \\
    \dot{D} &= \delta I
\end{align*}

Brief biological interpretation for additional parameter:

* $\omega$ is the conversion rate parameter for Susceptible, Exposed and Infected individuals that becomes Recovered due to a removal to a quarantine.

### Remarks for the models units

All sub-population variables (S, I, R, etc) are dimensionless. To obtain the variables, we have to consider that

\begin{align*}
    &S := \frac{\mathcal{S}}{N} \\
    &E := \frac{\mathcal{E}}{N} \\
    &I := \frac{\mathcal{I}}{N} \\
    &R := \frac{\mathcal{R}}{N} \\
    &D := \frac{\mathcal{D}}{N} \\
\end{align*}

with $N$ denoting the total population and $\mathcal{S}$, $\mathcal{E}$, $\mathcal{I}$, $\mathcal{R}$ and $\mathcal{D}$ as the absolute sub-population amounts. Therefore, S, E, I, R and D are given as fractions of the total population.

<a id="implementations"></a>
## Programming SIR/SEIR-based models in Python

In [None]:
@jit(nopython=True)
def sir_model(t, X, beta=1, zeta=1/15):
    S, I, R = X
    S_prime = - beta * S * I
    I_prime = beta * S * I - zeta * I
    R_prime = zeta * I
    return S_prime, I_prime, R_prime


@jit(nopython=True)
def sird_model(t, X, beta=1, delta=0.02, zeta=1/15):
    """
    SIR model that takes into account the number of deaths.
    """
    S, I, R, D = X
    S_prime = - beta * S * I
    I_prime = beta * S * I - zeta * I - delta * I
    R_prime = zeta * I
    D_prime = delta * I
    return S_prime, I_prime, R_prime, D_prime


@jit(nopython=True)
def seir2_model(t, X, alpha=1/5, beta=1, gamma=0, zeta=1/15, delta=0.02):
    """
    This is a modified SEIR model in order to take into account incubation time in exposed individual.
    The exposed individuals can transmit the infection to susceptible individuals.
    """
    S, E, I, R = X
    S_prime = - beta * S * I - gamma * E * S
    E_prime = beta * S * I - alpha * E + gamma * E * S
    I_prime = alpha * E - zeta * I - delta * I
    R_prime = zeta * I
    return S_prime, E_prime, I_prime, R_prime


@jit(nopython=True)
def seird_model(t, X, alpha=1/5, beta=1, gamma=0, zeta=1/15, delta=0.02):
    """
    A modified SEIR model in order to take into account deaths.
    """
    S, E, I, R, D = X
    S_prime = - beta * S * I - gamma * E * S
    E_prime = beta * S * I - alpha * E + gamma * E * S
    I_prime = alpha * E - zeta * I - delta * I
    R_prime = zeta * I
    D_prime = delta * I
    return S_prime, E_prime, I_prime, R_prime, D_prime


@jit(nopython=True)
def seirdq_model(t, X, alpha=1/5, beta=1, gamma=0, omega=0, zeta=1/15, delta=0.02):
    """
    A modified SEIRD model in order to take into account quarantine.
    """
    S, E, I, R, D = X
    S_prime = - beta * S * I - gamma * E * S - omega * S
    E_prime = beta * S * I - alpha * E + gamma * E * S - omega * E
    I_prime = alpha * E - zeta * I - delta * I - omega * I
    R_prime = zeta * I + omega * (S + E + I)
    D_prime = delta * I
    return S_prime, E_prime, I_prime, R_prime, D_prime

ODE solvers wrappers using `scipy.integrate.solve_ivp`:

In [None]:
def sir_ode_solver(y0, t_span, t_eval, beta=1, zeta=1/14):
    solution_ODE = solve_ivp(
        fun=lambda t, y: sir_model(t, y, beta=beta, zeta=zeta), 
        t_span=t_span, 
        y0=y0,
        t_eval=t_eval,
        method='LSODA'
    )
    
    return solution_ODE


def sird_ode_solver(y0, t_span, t_eval, beta=1, delta=0.02, zeta=1/14):
    solution_ODE = solve_ivp(
        fun=lambda t, y: sird_model(t, y, beta=beta, zeta=zeta, delta=delta), 
        t_span=t_span, 
        y0=y0,
        t_eval=t_eval,
        method='LSODA'
    )
    
    return solution_ODE


def seir_ode_solver(y0, t_span, t_eval, beta=1, gamma=0, alpha=1/4, zeta=1/14, delta=0.0):
    solution_ODE = solve_ivp(
        fun=lambda t, y: seir2_model(t, y, alpha=alpha, beta=beta, gamma=gamma, zeta=zeta, delta=delta), 
        t_span=t_span, 
        y0=y0,
        t_eval=t_eval,
        method='LSODA'
    )
    
    return solution_ODE


def seird_ode_solver(y0, t_span, t_eval, beta=1, gamma=0, delta=0.02, alpha=1/4, zeta=1/14):
    solution_ODE = solve_ivp(
        fun=lambda t, y: seird_model(t, y, alpha=alpha, beta=beta, gamma=gamma, zeta=zeta, delta=delta), 
        t_span=t_span, 
        y0=y0,
        t_eval=t_eval,
        method='LSODA'
    )
    
    return solution_ODE


def seirdq_ode_solver(y0, t_span, t_eval, beta=1, gamma=0, delta=0.02, omega=0, alpha=1/4, zeta=1/14):
    solution_ODE = solve_ivp(
        fun=lambda t, y: seirdq_model(t, y, alpha=alpha, beta=beta, gamma=gamma, omega=omega, zeta=zeta, delta=delta), 
        t_span=t_span, 
        y0=y0,
        t_eval=t_eval,
        method='LSODA'
    )
    
    return solution_ODE

I'll assume that the whole country's population is suscetible. So I can define the following initial conditions:

In [None]:
S0, E0, I0, R0, D0 = target_population, 5 * float(df_target_country.confirmed[0]), float(df_target_country.confirmed[0]), 0., 0.

y0_sir = S0 / target_population, I0 / target_population, R0  # SIR IC array
y0_sird = S0 / target_population, I0 / target_population, R0, D0  # SIRD IC array
y0_seir = S0 / target_population, E0 / target_population, I0 / target_population, R0  # SEIR IC array
y0_seird = S0 / target_population, E0 / target_population, I0 / target_population, R0, D0  # SEIRD IC array

Select the models to run setting bool variables below:

In [None]:
has_to_run_sir = False
has_to_run_sird = False
has_to_run_seir = False
has_to_run_seird = True
has_to_run_seirdq = False

<a id="least-squares"></a>
## Least-Squares fitting

Now, we can know how to solve the forward problem, so we can try to fit it with a non-linear Least-Squares method for parameter estimation. Let's begin with a generic Least-Square formulation:

In [None]:
def sir_least_squares_error_ode(par, time_exp, f_exp, fitting_model, initial_conditions):
    args = par
    time_span = (time_exp.min(), time_exp.max())
    
    y_model = fitting_model(initial_conditions, time_span, time_exp, *args)
    simulated_time = y_model.t
    simulated_ode_solution = y_model.y
    _, simulated_qoi, _ = simulated_ode_solution
    
    residual = f_exp - simulated_qoi

    return np.sum(residual ** 2.0)


def sird_least_squares_error_ode(par, time_exp, f_exp, fitting_model, initial_conditions):
    args = par
    f_exp1, f_exp2 = f_exp
    time_span = (time_exp.min(), time_exp.max())
    
    y_model = fitting_model(initial_conditions, time_span, time_exp, *args)
    simulated_time = y_model.t
    simulated_ode_solution = y_model.y
    _, simulated_qoi1, _, simulated_qoi2 = simulated_ode_solution
    
    residual1 = f_exp1 - simulated_qoi1
    residual2 = f_exp2 - simulated_qoi2

    weighting_for_exp1_constraints = 1e0
    weighting_for_exp2_constraints = 1e0
    return weighting_for_exp1_constraints * np.sum(residual1 ** 2.0) + weighting_for_exp2_constraints * np.sum(residual2 ** 2.0)


def seir_least_squares_error_ode(par, time_exp, f_exp, fitting_model, initial_conditions):
    args = par
    time_span = (time_exp.min(), time_exp.max())
    
    y_model = fitting_model(initial_conditions, time_span, time_exp, *args)
    simulated_time = y_model.t
    simulated_ode_solution = y_model.y
    _, _, simulated_qoi, _ = simulated_ode_solution
    
    residual = f_exp - simulated_qoi

    return np.sum(residual ** 2.0)


def seird_least_squares_error_ode(par, time_exp, f_exp, fitting_model, initial_conditions):
    args = par
    f_exp1, f_exp2 = f_exp
    time_span = (time_exp.min(), time_exp.max())
    
    y_model = fitting_model(initial_conditions, time_span, time_exp, *args)
    simulated_time = y_model.t
    simulated_ode_solution = y_model.y
    _, _, simulated_qoi1, _, simulated_qoi2 = simulated_ode_solution
    
    residual1 = f_exp1 - simulated_qoi1
    residual2 = f_exp2 - simulated_qoi2

    weighting_for_exp1_constraints = 1e0
    weighting_for_exp2_constraints = 1e0
    return weighting_for_exp1_constraints * np.sum(residual1 ** 2.0) + weighting_for_exp2_constraints * np.sum(residual2 ** 2.0)


def callback_de(xk, convergence):
    print(f'parameters = {xk}')

Setting fitting domain (given time for each observation) and the observations (observed population at given time):

In [None]:
data_time = df_target_country.day.values.astype(np.float64)
infected_individuals = df_target_country.confirmed.values / target_population
dead_individuals = df_target_country.deaths.values / target_population
recovered_individuals = df_target_country.recovered.values / target_population

To calibrate the model, we define an objective function, which is a Least-Squares function in the present case, and minimize it. To (*try to*) avoid local minima, we use Differential Evolution (DE) method (see this [nice presentation](https://www.maths.uq.edu.au/MASCOS/Multi-Agent04/Fleetwood.pdf) to get yourself introduced to this great subject). In summary, DE is a family of Evolutionary Algorithms that aims to solve Global Optimization problems. Moreover, DE is derivative-free and population-based method.

Below, calibration is performed for selected models:

In [None]:
if has_to_run_sir:
    num_of_parameters_to_fit_sir = 1
    bounds_sir = num_of_parameters_to_fit_sir * [(0, 1)]

    result_sir = optimize.differential_evolution(
        sir_least_squares_error_ode, 
        bounds=bounds_sir, 
        args=(data_time, infected_individuals, sir_ode_solver, y0_sir), 
        popsize=300,
        strategy='best1bin',
        tol=1e-2,
        recombination=0.5,
#         mutation=0.7,
        maxiter=100,
        disp=True,
        seed=seed,
        callback=callback_de,
        workers=-1
    )

    print(result_sir)

In [None]:
if has_to_run_sird:
    # num_of_parameters_to_fit_sir = 1
    # bounds_sir = num_of_parameters_to_fit_sir * [(0, 1)]
    bounds_sird = [(0, 1), (0, 0.2)]

    result_sird = optimize.differential_evolution(
        sird_least_squares_error_ode, 
        bounds=bounds_sird, 
        args=(data_time, [infected_individuals, dead_individuals], sird_ode_solver, y0_sird), 
        popsize=300,
        strategy='best1bin',
        tol=1e-2,
        recombination=0.5,
    #     mutation=0.7,
        maxiter=100,
        disp=True,
        seed=seed,
        callback=callback_de,
        workers=-1
    )

    print(result_sird)

In [None]:
if has_to_run_seird:
    num_of_parameters_to_fit_sir = 1
    # bounds_sir = num_of_parameters_to_fit_sir * [(0, 1)]
    bounds_seird = [(0, 1), (0, 1), (0, 0.2)]

    result_seird = optimize.differential_evolution(
        seird_least_squares_error_ode, 
        bounds=bounds_seird, 
        args=(data_time, [infected_individuals, dead_individuals], seird_ode_solver, y0_seird), 
        popsize=300,
        strategy='best1bin',
        tol=1e-2,
        recombination=0.7,
    #     mutation=0.7,
        maxiter=100,
        disp=True,
        seed=seed,
        callback=callback_de,
        workers=-1
    )

    print(result_seird)

In [None]:
if has_to_run_seirdq:
#     num_of_parameters_to_fit_sir = 1
    # bounds_sir = num_of_parameters_to_fit_sir * [(0, 1)]
    bounds_seird = [(0, 1), (0, 1), (0, 0.2), (0, 1)]

    result_seirdq = optimize.differential_evolution(
        seird_least_squares_error_ode, 
        bounds=bounds_seird, 
        args=(data_time, [infected_individuals, dead_individuals], seirdq_ode_solver, y0_seird), 
        popsize=200,
        strategy='best1bin',
        tol=1e-2,
        recombination=0.7,
    #     mutation=0.7,
        maxiter=200,
        disp=True,
        seed=seed,
        callback=callback_de,
        workers=-1
    )

    print(result_seirdq)

In [None]:
if has_to_run_seir:
    num_of_parameters_to_fit_seir = 2
    bounds_seir = num_of_parameters_to_fit_seir * [(0, 1)]

    result_seir = optimize.differential_evolution(
        seir_least_squares_error_ode, 
        bounds=bounds_seir, 
        args=(data_time, infected_individuals, seir_ode_solver, y0_seir), 
        popsize=300,
        strategy='best1bin',
        tol=1e-2,
        recombination=0.7,
#         mutation=0.7,
        maxiter=100,
        disp=True,
        seed=seed,
        callback=callback_de,
        workers=-1
    )

    print(result_seir)

In [None]:
zeta_fitted = 1/14  # recover rate... the inverse is equal to the amount of days needed to recover from the disease
if has_to_run_sir:
    beta_fitted_sir = result_sir.x  # SIR parameters
    
if has_to_run_sird:
    beta_fitted_sird, delta_fitted_sird = result_sird.x  # SIRD parameters
    
alpha_fitted = 1/4
if has_to_run_seird:
    beta_fitted_seird, gamma_fitted_seird, delta_fitted_seird = result_seird.x  # SEIRD parameters
    
if has_to_run_seirdq:
    beta_fitted_seirdq, gamma_fitted_seirdq, delta_fitted_seirdq, omega_fitted_seirdq = result_seirdq.x  # SEIRD parameters

if has_to_run_seir:
#     beta_fitted_seir, gamma_fitted_seir = result_seir.x  # SEIR parameters
#     gamma_fitted_seir = 0.0
    beta_fitted_seir, gamma_fitted_seir = result_seir.x  # SEIR parameters

In [None]:
t0 = data_time.min()
tf = data_time.max()

if has_to_run_sir:
    solution_ODE_sir = sir_ode_solver(y0_sir, (t0, tf), data_time, beta_fitted_sir, zeta_fitted)  # SIR
    t_computed_sir, y_computed_sir = solution_ODE_sir.t, solution_ODE_sir.y
    S_sir, I_sir, R_sir = y_computed_sir

if has_to_run_sird:
    solution_ODE_sird = sird_ode_solver(y0_sird, (t0, tf), data_time, beta_fitted_sird, delta_fitted_sird, zeta_fitted)  # SIRD
    t_computed_sird, y_computed_sird = solution_ODE_sird.t, solution_ODE_sird.y
    S_sird, I_sird, R_sird, D_sird = y_computed_sird

if has_to_run_seird:
    solution_ODE_seird = seird_ode_solver(y0_seird, (t0, tf), data_time, beta_fitted_seird, gamma_fitted_seird, delta_fitted_seird, alpha_fitted, zeta_fitted)  # SEIRD
    t_computed_seird, y_computed_seird = solution_ODE_seird.t, solution_ODE_seird.y
    S_seird, E_seird, I_seird, R_seird, D_seird = y_computed_seird

if has_to_run_seirdq:
    solution_ODE_seirdq = seirdq_ode_solver(
        y0_seird, 
        (t0, tf), 
        data_time, 
        beta_fitted_seirdq, 
        gamma_fitted_seirdq, 
        delta_fitted_seirdq, 
        omega_fitted_seirdq, 
        alpha_fitted, 
        zeta_fitted
    )
    t_computed_seirdq, y_computed_seirdq = solution_ODE_seirdq.t, solution_ODE_seirdq.y
    S_seirdq, E_seirdq, I_seirdq, R_seirdq, D_seirdq = y_computed_seirdq
    
if has_to_run_seir:
    solution_ODE_seir = seir_ode_solver(y0_seir, (t0, tf), data_time, beta_fitted_seir, gamma_fitted_seir, alpha_fitted,  zeta_fitted)  # SEIR
    t_computed_seir, y_computed_seir = solution_ODE_seir.t, solution_ODE_seir.y
    S_seir, E_seir, I_seir, R_seir = y_computed_seir

In [None]:
model_list = list()
alpha_list = list()
beta_list = list()
delta_list = list()
gamma_list = list()
omega_list = list()
zeta_list = list()

if has_to_run_sir:
    model_list.append("SIR")
    alpha_list.append("-")
    beta_list.append(np.float(beta_fitted_sir))
    delta_list.append("-")
    gamma_list.append("-")
    omega_list.append("-")
    zeta_list.append(zeta_fitted)

if has_to_run_sird:
    model_list.append("SIRD")
    alpha_list.append("-")
    beta_list.append(beta_fitted_sird)
    delta_list.append(delta_fitted_sird)
    gamma_list.append("-")
    omega_list.append("-")
    zeta_list.append(zeta_fitted)
    
if has_to_run_seir:
    model_list.append("SEIR")
    alpha_list.append(alpha_fitted)
    beta_list.append(beta_fitted_seir)
    delta_list.append("-")
    gamma_list.append(gamma_fitted_seir)
    omega_list.append("-")
    zeta_list.append(zeta_fitted)

if has_to_run_seird:
    model_list.append("SEIRD")
    alpha_list.append(alpha_fitted)
    beta_list.append(beta_fitted_seird)
    delta_list.append(delta_fitted_seird)
    gamma_list.append(gamma_fitted_seird)
    omega_list.append("-")
    zeta_list.append(zeta_fitted)

if has_to_run_seirdq:
    model_list.append("SEIRD-Q")
    alpha_list.append(alpha_fitted)
    beta_list.append(beta_fitted_seirdq)
    delta_list.append(delta_fitted_seirdq)
    gamma_list.append(gamma_fitted_seirdq)
    omega_list.append(omega_fitted_seirdq)
    zeta_list.append(zeta_fitted)
    
parameters_dict = {
    "Model": model_list,
    r"$\alpha$": alpha_list,
    r"$\beta$": beta_list,
    r"$\delta$": delta_list,
    r"$\gamma$": gamma_list,
    r"$\omega$": omega_list,
    r"$\zeta$": zeta_list,
}

df_parameters_calibrated = pd.DataFrame(parameters_dict)

df_parameters_calibrated

In [None]:
print(df_parameters_calibrated.to_latex(index=False))

Show calibration result based on available data:

In [None]:
plt.figure(figsize=(9,7))

if has_to_run_sir:
    plt.plot(t_computed_sir, I_sir * target_population, label='Infected (SIR)', marker='v', linestyle="-", markersize=10)
    plt.plot(t_computed_sir, R_sir * target_population, label='Recovered (SIR)', marker='o', linestyle="-", markersize=10)
    
if has_to_run_sird:
    plt.plot(t_computed_sird, I_sird * target_population, label='Infected (SIRD)', marker='X', linestyle="-", markersize=10)
    plt.plot(t_computed_sird, R_sird * target_population, label='Recovered (SIRD)', marker='o', linestyle="-", markersize=10)
    plt.plot(t_computed_sird, D_sird * target_population, label='Deaths (SIRD)', marker='s', linestyle="-", markersize=10)
    
if has_to_run_seird:
    plt.plot(t_computed_seird, I_seird * target_population, label='Infected (SEIRD)', marker='X', linestyle="-", markersize=10)
    plt.plot(t_computed_seird, R_seird * target_population, label='Recovered (SEIRD)', marker='o', linestyle="-", markersize=10)
    plt.plot(t_computed_seird, D_seird * target_population, label='Deaths (SEIRD)', marker='s', linestyle="-", markersize=10)
    
if has_to_run_seirdq:
    plt.plot(t_computed_seirdq, I_seirdq * target_population, label='Infected (SEIRD-Q)', marker='X', linestyle="-", markersize=10)
#     plt.plot(t_computed_seirdq, R_seirdq * target_population, label='Recovered (SEIRD-Q)', marker='o', linestyle="-", markersize=10)
    plt.plot(t_computed_seirdq, D_seirdq * target_population, label='Deaths (SEIRD-Q)', marker='s', linestyle="-", markersize=10)

if has_to_run_seir:
    plt.plot(t_computed_seir, I_seir * target_population, label='Infected (SEIR)', marker='X', linestyle="-", markersize=10)
    plt.plot(t_computed_seir, R_seir * target_population, label='Recovered (SEIR)', marker='o', linestyle="-", markersize=10)
    
plt.plot(data_time, infected_individuals * target_population, label='Observed infected', marker='s', linestyle="", markersize=10)
plt.plot(data_time, dead_individuals * target_population, label='Recorded deaths', marker='v', linestyle="", markersize=10)
plt.plot(data_time, recovered_individuals * target_population, label='Recorded recovered', marker='v', linestyle="", markersize=10)
plt.legend()
plt.grid()
plt.xlabel('Time (days)')
plt.ylabel('Population')

plt.tight_layout()
plt.savefig("all_deterministic_calibration.png")
plt.show()

In [None]:
methods_list = list()
deaths_list = list()
if has_to_run_sird:
    methods_list.append("SIRD")
    deaths_list.append(int(D_sird.max() * target_population))
    print(f"Death estimate for today (SIRD):\t{int(D_sird.max() * target_population)}")
    
if has_to_run_seird:
    methods_list.append("SEIRD")
    deaths_list.append(int(D_seird.max() * target_population))
    print(f"Death estimate for today (SEIRD):\t{int(D_seird.max() * target_population)}")
    
if has_to_run_seirdq:
    methods_list.append("SEIRD-Q")
    deaths_list.append(int(D_seirdq.max() * target_population))
    print(f"Death estimate for today (SEIRD-Q):\t{int(D_seirdq.max() * target_population)}")

methods_list.append("Recorded")
deaths_list.append(int(dead_individuals[-1] * target_population))

death_estimates_dict = {"Method": methods_list, "Deaths estimate": deaths_list}
df_deaths_estimates = pd.DataFrame(death_estimates_dict)
print(f"Recorded deaths until today:\t{int(dead_individuals[-1] * target_population)}")

In [None]:
# df_deaths_estimates.set_index("Model", inplace=True)
print(df_deaths_estimates.to_latex(index=False))

<a id="deterministic-predictions"></a>
## Extrapolation/Predictions

Now, let's extrapolate to next days.

In [None]:
t0 = float(data_time.min())
number_of_days_after_last_record = 90
tf = data_time.max() + number_of_days_after_last_record
time_range = np.linspace(0., tf, int(tf))

if has_to_run_sir:
    solution_ODE_predict_sir = sir_ode_solver(y0_sir, (t0, tf), time_range, beta_fitted_sir, zeta_fitted)  # SIR
    t_computed_predict_sir, y_computed_predict_sir = solution_ODE_predict_sir.t, solution_ODE_predict_sir.y
    S_predict_sir, I_predict_sir, R_predict_sir = y_computed_predict_sir

if has_to_run_sird:
    solution_ODE_predict_sird = sird_ode_solver(y0_sird, (t0, tf), time_range, beta_fitted_sird, delta_fitted_sird, zeta_fitted)  # SIR
    t_computed_predict_sird, y_computed_predict_sird = solution_ODE_predict_sird.t, solution_ODE_predict_sird.y
    S_predict_sird, I_predict_sird, R_predict_sird, D_predict_sird = y_computed_predict_sird

if has_to_run_seird:
    solution_ODE_predict_seird = seird_ode_solver(y0_seird, (t0, tf), time_range, beta_fitted_seird, gamma_fitted_seird, delta_fitted_seird, alpha_fitted, zeta_fitted)  # SEIRD
    t_computed_predict_seird, y_computed_predict_seird = solution_ODE_predict_seird.t, solution_ODE_predict_seird.y
    S_predict_seird, E_predict_seird, I_predict_seird, R_predict_seird, D_predict_seird = y_computed_predict_seird
    
if has_to_run_seirdq:
    solution_ODE_predict_seirdq = seirdq_ode_solver(y0_seird, (t0, tf), time_range, beta_fitted_seirdq, gamma_fitted_seirdq, delta_fitted_seirdq, omega_fitted_seirdq, alpha_fitted, zeta_fitted)  # SEIRD
    t_computed_predict_seirdq, y_computed_predict_seirdq = solution_ODE_predict_seirdq.t, solution_ODE_predict_seirdq.y
    S_predict_seirdq, E_predict_seirdq, I_predict_seirdq, R_predict_seirdq, D_predict_seirdq = y_computed_predict_seirdq

if has_to_run_seir:
    solution_ODE_predict_seir = seir_ode_solver(y0_seir, (t0, tf), time_range, beta_fitted_seir, gamma_fitted_seir, alpha_fitted, zeta_fitted)  # SEIR
    t_computed_predict_seir, y_computed_predict_seir = solution_ODE_predict_seir.t, solution_ODE_predict_seir.y
    S_predict_seir, E_predict_seir, I_predict_seir, R_predict_seir = y_computed_predict_seir

Calculating the day when the number of infected individuals is max:

In [None]:
has_to_plot_infection_peak = True

if has_to_run_sir:
    crisis_day_sir = np.argmax(I_predict_sir)
    
if has_to_run_sird:
    crisis_day_sird = np.argmax(I_predict_sird)

if has_to_run_seir:
    crisis_day_seir = np.argmax(I_predict_seir)
    
if has_to_run_seird:
    crisis_day_seird = np.argmax(I_predict_seird)
    
if has_to_run_seirdq:
    crisis_day_seirdq = np.argmax(I_predict_seirdq)

In [None]:
plt.figure(figsize=(9,7))

if has_to_run_sir:
    plt.plot(t_computed_predict_sir, 100 * S_predict_sir, label='Susceptible (SIR)', marker='s', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_sir, 100 * I_predict_sir, label='Infected (SIR)', marker='X', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_sir, 100 * R_predict_sir, label='Recovered (SIR)', marker='o', linestyle="-", markersize=10)
    if has_to_plot_infection_peak:
        plt.axvline(x=crisis_day_sir + 1, color="red", linestyle="--", label="Infected peak (SIR)")
    
if has_to_run_sird:
    plt.plot(t_computed_predict_sird, 100 * S_predict_sird, label='Susceptible (SIRD)', marker='s', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_sird, 100 * I_predict_sird, label='Infected (SIRD)', marker='X', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_sird, 100 * R_predict_sird, label='Recovered (SIRD)', marker='o', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_sird, 100 * D_predict_sird, label='Deaths (SIRD)', marker='v', linestyle="-", markersize=10)
    if has_to_plot_infection_peak:
        plt.axvline(x=crisis_day_sird + 1, color="red", linestyle="--", label="Infected peak (SIRD)")

if has_to_run_seird:
    plt.plot(t_computed_predict_seird, 100 * S_predict_seird, label='Susceptible (SEIRD)', marker='s', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_seird, 100 * E_predict_seird, label='Exposed (SEIRD)', marker='*', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_seird, 100 * I_predict_seird, label='Infected (SEIRD)', marker='X', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_seird, 100 * R_predict_seird, label='Recovered (SEIRD)', marker='o', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_seird, 100 * D_predict_seird, label='Deaths (SEIRD)', marker='v', linestyle="-", markersize=10)
    if has_to_plot_infection_peak:
        plt.axvline(x=crisis_day_seird + 1, color="red", label="Infected peak (SEIRD)")
    
if has_to_run_seirdq:
    plt.plot(t_computed_predict_seirdq, 100 * S_predict_seirdq, label='Susceptible (SEIRD-Q)', marker='s', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_seirdq, 100 * E_predict_seirdq, label='Exposed (SEIRD-Q)', marker='*', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_seirdq, 100 * I_predict_seirdq, label='Infected (SEIRD-Q)', marker='X', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_seirdq, 100 * R_predict_seirdq, label='Recovered (SEIRD-Q)', marker='o', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_seirdq, 100 * D_predict_seirdq, label='Deaths (SEIRD-Q)', marker='v', linestyle="-", markersize=10)
    if has_to_plot_infection_peak:
        plt.axvline(x=crisis_day_seirdq + 1, color="red", label="Infected peak (SEIRD-Q)")

if has_to_run_seir:
    plt.plot(t_computed_predict_seir, 100 * S_predict_seir, label='Susceptible (SEIR)', marker='s', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_seir, 100 * E_predict_seir, label='Exposed (SEIR)', marker='*', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_seir, 100 * I_predict_seir, label='Infected (SEIR)', marker='X', linestyle="-", markersize=10)
    plt.plot(t_computed_predict_seir, 100 * R_predict_seir, label='Recovered (SEIR)', marker='o', linestyle="-", markersize=10)
    if has_to_plot_infection_peak:
        plt.axvline(x=crisis_day_seir + 1, color="red", linestyle="--", label="Infected peak (SEIR)")

plt.xlabel('Time (days)')
plt.ylabel('Population %')
plt.legend()
plt.grid()

plt.tight_layout()
plt.savefig("seir_deterministic_predictions.png")
plt.show()

In [None]:
if has_to_run_sir:
    print(f"Max number of infected individuals (SIR model):\t {int(np.max(I_predict_sir) * target_population)}")
    print(f"Population percentage of max number of infected individuals (SIR model):\t {np.max(I_predict_sir) * 100:.2f}%")
    print(f"Day estimate for max number of infected individuals (SIR model):\t {crisis_day_sir + 1}")
    print("")

if has_to_run_sird:
    print(f"Max number of infected individuals (SIRD model):\t {int(np.max(I_predict_sird) * target_population)}")
    print(f"Population percentage of max number of infected individuals (SIRD model):\t {np.max(I_predict_sird) * 100:.2f}%")
    print(f"Day estimate for max number of infected individuals (SIRD model):\t {crisis_day_sird + 1}")
    print(f"Percentage of number of death estimate (SIRD model):\t {100 * D_predict_sird[-1]:.3f}%")
    print(f"Number of death estimate (SIRD model):\t {target_population * D_predict_sird[-1]:.3f}")
    print("")

if has_to_run_seir:
    print(f"Max number of infected individuals (SEIR model):\t {int(np.max(I_predict_seir) * target_population)}")
    print(f"Population percentage of max number of infected individuals (SEIR model):\t {np.max(I_predict_seir) * 100:.2f}%")
    print(f"Day estimate for max number of infected individuals (SEIR model):\t {crisis_day_seir + 1}")
    print("")
    
if has_to_run_seird:
    print(f"Max number of infected individuals (SEIRD model):\t {int(np.max(I_predict_seird) * target_population)}")
    print(f"Population percentage of max number of infected individuals (SEIRD model):\t {np.max(I_predict_seird) * 100:.2f}%")
    print(f"Day estimate for max number of infected individuals (SEIRD model):\t {crisis_day_seird + 1}")
    print(f"Percentage of number of death estimate (SEIRD model):\t {100 * D_predict_seird[-1]:.3f}%")
    print(f"Number of death estimate (SEIRD model):\t {target_population * D_predict_seird[-1]:.3f}")
    print("")
    
if has_to_run_seirdq:
    print(f"Max number of infected individuals (SEIRD-Q model):\t {int(np.max(I_predict_seirdq) * target_population)}")
    print(f"Population percentage of max number of infected individuals (SEIRD-Q model):\t {np.max(I_predict_seirdq) * 100:.2f}%")
    print(f"Day estimate for max number of infected individuals (SEIRD-Q model):\t {crisis_day_seirdq + 1}")
    print(f"Percentage of number of death estimate (SEIRD-Q model):\t {100 * D_predict_seirdq[-1]:.3f}%")
    print(f"Number of death estimate (SEIRD-Q model):\t {target_population * D_predict_seirdq[-1]:.3f}")
    print("")

<a id="bayes-calibration"></a>
## Bayesian Calibration

Now that we have an idea of the values that we must expect (i.e., some prior knowledge), we can properly apply a Bayesian model calibration. This way, we can estimate uncertainties on our models parameters and outcomes/predictions. For such purpose, we'll use the great [PyMC3](https://docs.pymc.io/) package.

<a id="bayes-sir"></a>
### SIR Model

Let's begin with easy :) Just one parameter!

In [None]:
@theano.compile.ops.as_op(itypes=[t.dvector, t.dvector, t.dvector, t.dscalar], otypes=[t.dvector])
def sir_ode_solver_wrapper(time_exp, f_observations, initial_conditions, beta):
    time_span = (time_exp.min(), time_exp.max())
    
    zeta = 1/14
    y_model = sir_ode_solver(initial_conditions, time_span, time_exp, beta, zeta)
    simulated_time = y_model.t
    simulated_ode_solution = y_model.y
    _, simulated_qoi, _ = simulated_ode_solution

    return simulated_qoi

In [None]:
with pm.Model() as model_mcmc:
    # Prior distributions for the model's parameters
    beta = pm.Uniform('beta', lower=0.2, upper=0.4)

    # Defining the deterministic formulation of the problem
    fitting_model = pm.Deterministic('sir_model', sir_ode_solver_wrapper(
        theano.shared(data_time), 
        theano.shared(infected_individuals), 
        theano.shared(np.array(y0_sir)),
        beta,
        )
    )

    # Variance related to population fraction amount! Let's assume a variance of 100 individuals, since there are cases that have been not tracked
    variance = (100 / target_population) * (100 / target_population)
    standard_deviation = np.sqrt(variance)
    likelihood_model = pm.Normal('likelihood_model', mu=fitting_model, sigma=standard_deviation, observed=infected_individuals)

    # The Monte Carlo procedure driver
    step = pm.step_methods.Metropolis()
    sir_trace = pm.sample(4500, chains=4, cores=4, step=step)

In [None]:
pm.traceplot(sir_trace, var_names=('beta'))
plt.savefig('sir_beta_traceplot.png')
plt.show()

In [None]:
pm.plot_posterior(sir_trace, var_names=('beta'), kind='hist', round_to=3)
plt.savefig('sir_beta_posterior.png')
plt.show()

In [None]:
percentile_cut = 2.5

y_min_sir = np.percentile(sir_trace['sir_model'], percentile_cut, axis=0)
y_max_sir = np.percentile(sir_trace['sir_model'], 100 - percentile_cut, axis=0)
y_fit_sir = np.percentile(sir_trace['sir_model'], 50, axis=0)

In [None]:
plt.figure(figsize=(9, 7))

plt.plot(data_time, y_fit_sir, 'b', label='Infected')
plt.fill_between(data_time, y_min_sir, y_max_sir, color='b', alpha=0.2)

plt.legend()
plt.xlabel('Time (day)')
plt.ylabel('Population %')
# plt.xlim(0, 10)

plt.savefig('sir_uncertainty.png')
plt.show()

Improvements in this SIR model calibration are under investigations!

<a id="bayes-seir2"></a>
### SEIR-2 Model

A more interesting model, with two parameters to calibrate.

In [None]:
@theano.compile.ops.as_op(itypes=[t.dvector, t.dvector, t.dvector, t.dscalar, t.dscalar], otypes=[t.dvector])
def seir_ode_solver_wrapper(time_exp, f_observations, initial_conditions, beta, gamma):
    time_span = (time_exp.min(), time_exp.max())
    
    y_model = seir_ode_solver(initial_conditions, time_span, time_exp, beta, gamma)
    simulated_time = y_model.t
    simulated_ode_solution = y_model.y
    _, _, simulated_qoi, _ = simulated_ode_solution

    return simulated_qoi

In [None]:
with pm.Model() as model_mcmc:
    # Prior distributions for the model's parameters
    beta = pm.Uniform('beta', lower=0, upper=0.001)
    gamma = pm.Uniform('gamma', lower=0, upper=0.5)

    # Defining the deterministic formulation of the problem
    fitting_model = pm.Deterministic('seir2_model', seir_ode_solver_wrapper(
        theano.shared(data_time), 
        theano.shared(infected_individuals), 
        theano.shared(np.array(y0_seir)),
        beta,
        gamma
        )
    )

    # Variance related to population fraction amount! Let's assume a variance of 100 individuals, since there are cases that have been not tracked
    variance = (100 / target_population) * (100 / target_population)
    standard_deviation = np.sqrt(variance)
    likelihood_model = pm.Normal('likelihood_model', mu=fitting_model, sigma=standard_deviation, observed=infected_individuals)

    # The Monte Carlo procedure driver
    step = pm.step_methods.Metropolis()
    seir_trace = pm.sample(8500, chains=4, cores=4, step=step)

In [None]:
pm.traceplot(seir_trace, var_names=('beta'))
plt.savefig('seir2_beta_traceplot.png')
plt.show()

pm.traceplot(seir_trace, var_names=('gamma'))
plt.savefig('seir2_gamma_traceplot.png')
plt.show()

In [None]:
pm.plot_posterior(seir_trace, var_names=('beta'), kind='hist', round_to=3)
plt.savefig('seir2_beta_posterior.png')
plt.show()

pm.plot_posterior(seir_trace, var_names=('gamma'), kind='hist', round_to=3)
plt.savefig('seir2_gamma_posterior.png')
plt.show()

In [None]:
percentile_cut = 2.5

y_min_seir = np.percentile(seir_trace['seir2_model'], percentile_cut, axis=0)
y_max_seir = np.percentile(seir_trace['seir2_model'], 100 - percentile_cut, axis=0)
y_fit_seir = np.percentile(seir_trace['seir2_model'], 50, axis=0)

In [None]:
plt.figure(figsize=(9, 7))

plt.plot(data_time, y_fit_seir, 'b', label='Infected')
plt.fill_between(data_time, y_min_seir, y_max_seir, color='b', alpha=0.2)

plt.legend()
plt.xlabel('Time (day)')
plt.ylabel('Population %')
# plt.xlim(0, 10)

plt.savefig('seir2_uncertainty.png')
plt.show()

Note: Bayesian calibration is currently under investigations/improvements.