# Is the Covid Curve Flattening in Italy?
> Data fitting via Nonlinear Constrained Optimization.

- toc: true 
- badges: true
- comments: true
- categories: [jupyter]
- image: images/chart-preview.png

# About

This notebook explains how to fit the [Italian open-data](https://github.com/pcm-dpc/COVID-19) about the spread of #covid19 using a [Susceptible-Infectious-Recovered (SIR) model](https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology). 

**Data fitting** is modeled and solved using Nonlinear Constrained Optimization via [Pyomo](http://www.pyomo.org/) and [IPOPT](https://github.com/coin-or/Ipopt).

> Important: I am neither a medical doctor nor an epidemiologist. I am just a *mathematical optimizer*, and this post is about a didactic example of data fitting via Mathematical Optimization.

### Running this notebook on Colab
Run the following snippet if you are running this notebook in [Colab](https://colab.research.google.com/).

In [None]:
#collapse-hide
import shutil
import sys
import os.path

if not shutil.which("pyomo"):
    !pip install -q pyomo
    assert(shutil.which("pyomo"))

if not (shutil.which("ipopt") or os.path.isfile("ipopt")):
    if "google.colab" in sys.modules:
        !apt-get install -y -qq glpk-utils
    else:
        try:
            !conda install -c conda-forge ipopt 
        except:
            pass

# The SIR model: The Elegance of Simplicity

The **Susceptible-Infectious-Recovered (SIR)** model is a classical epidemic model that is used to describe rapid outbreaks that occur in less than a year. For a complete survey on the SIR and related models, we recommend reading the **SIAM Review** by Herbert W. Hethcote entitled [The Mathematics of Infectious
Diseases](https://epubs.siam.org/doi/pdf/10.1137/S0036144500371907). The next paragraphs are based on that very nice paper.

In this post, we will not enter into details of all those models, but we will focus on the basic equations that can be used to capture the main trend of the Covid19 outbreak in Italy at the time of writing.

## Continuous Time Model
The **SIR model** consider a population of $N$ individuals partitioned into three groups: (i), Susceptible, (ii) Infectious, and (iii) Recovered. The number of individuals at time $t$ is each group is denoted by $S(t)$, $I(t)$, and $R(t)$, respectively. Since the model do not consider births and deaths, we have at any time of outbreak:
$$
    N = S(t) + I(t) + R(t).
$$

The dynamics of an outbreak is described by the SIR model via the following ordinary differential equations:

$$
\frac{dS}{dt} = -\frac{\beta}{N}IS, \;\;\;\;\;\;\;\;\;  S(0) \geq 0
$$
$$
\frac{dI}{dt} = \frac{\beta}{N}IS - \gamma I, \;\;\; I(0) \geq 0
$$
$$
\frac{dR}{dt} = \gamma I, \;\;\;\;\;\;\;\;\;\;\;\;\;\;\; R(0) \geq 0
$$

Where $S(0), I(0), R(0)$ are the initial group sizes at the begging of the outbreak. The parameter $\beta$ is the average number of contacts of a person per unit of time, while $\frac{1}{\gamma}$ is the mean infectious period.

A very important parameter is the **basic reproduction number $R_0$** defined as
$$
R_0 = \frac{\beta}{\gamma}
$$
which is defined in [1] as the average number of secondary infections produced when one infected individual is introduced into a host population where everyone is susceptible.

## Discrete Time Model
The previous model can be discretized thus obtaining the following **discrete time model**:

$$
S(t+1) = S(t) - \frac{\beta}{N} I(t) S(t), \;\;\;\;\;\;\;\;\;\;
$$
$$
I(t+1) = I(t) + \frac{\beta}{N} S(t) I(t) - \gamma I(t), 
$$
$$
R(t+1) = R(t) + \gamma I(t) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;
$$

Intuitively, we have that proportion of individuals moving at unit of time $t+1$ from the susceptible group $S$ to the infected groups $I$ is equal to $\frac{\beta}{N} I(t) S(t)$. The proportion of individuals moving at unit of time $t+1$ from the infected group $I$ to the recovered groups $R$ is equal to $\gamma I(t)$.

The dynamic of the system is completely described by the two parameters $\beta$ and $\gamma$, and the initial conditions $S(0)$, $I(0)$, and $R(0)$.

> Note: The goal of this post is to present a data fitting approach to estimate the two parameters $\gamma$ and $\beta$, given the data of an outbreak which is not yet concluded. That is, we have reasonable good estimate of $I(t)$ and $R(t)$ for all $t \leq \bar t$, but we have to estimate those values for $t \geq \bar t$.

As a side effect, we can get the value of the important parameter $R_0$.

And all this happen by solving a Nonlinear Constrained Optimization problem using Python in a Colab.

# Data Fitting via Nonlinear Constrained Optimization
The Italian government is updating every day on GitHub a detailed report of the situation of the [COVID19 outbreak in Italy](https://github.com/pcm-dpc/COVID-19): this is a very important source of data for every scientist.

In particular, since February, 24th, 2020, we have the number of official cases (infected persons) the number of recovered, and the number of deaths. We are interested specifically on the number of new infected by day, which is only implicitly contained in the database.

Another important source of data is the [Kaggle Covid Dataset](https://www.kaggle.com/c/covid19-global-forecasting-week-2), which contains the data for all the world. From Kaggle, we used the dataset for the Hubei region in China, where the outbreak is at the moment under control, and people is getting back to work.

## Input Data

## Variables, Objective Function, and Constraints

## Solving the Model

# Is the Curve Flattening?


### License
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.