# "Epidemic modeling - Part 3"
> "Examining the major flaw of the deterministic SEIR model"

- toc: true 
- badges: true
- comments: true
- categories: [probability distributions, modeling, SEIR, epidemiology]
- image: images/proba_distrib.png

![](my_icons/proba_distrib.png)

## Motivation for write-up

This is the 3rd part of a multi-part series blog post on modeling in epidemiology.

The COVID-19 pandemic has brought a lot of attention to study of epidemiology and more specifically to the various mathematical models that are used to inform public health policies. Everyone has been trying to understand the growth or slowing of new cases and trying to predict the necessary sanitary resources. This blog post attempts to explain the foundations for some of the most used models and enlighten the reader on two key points.

After introducing the concepts of compartmentalization and disease dynamics in the first blog post, the second part looked at a deterministic numerical solution for the SEIR model discussed, and the effects of the parameters $\beta$, $\sigma$, and $\gamma$.

While arguments can be made that the compartments themselves don't reflect the reality of COVID-19, this is not the point of this discussion; I want to focus on the idea that the population level dynamics forget about the individual progression of the disease. 

With this mind, this third part is going to discuss the problems that arise when averaging the times from E &rarr; I ($\sigma$) and I &rarr; R ($\gamma$) on the smimulations. 

Let's have a look at the individual progression of disease to understand what is wrong.

## Implications of averaging E &rarr; I and I &rarr; R

Let's first look at what happens if we average times from E &rarr; I and I &rarr; R to obtain $\sigma$ and $\gamma$.

In [25]:
#hide
!pip install plotly==4.6.0
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from scipy.stats import poisson
from scipy.stats import expon
from scipy.stats import gamma
from scipy.stats import weibull_min

def seir_model(init, parms, days):
    S_0, E_0, I_0, R_0 = init
    Epd, Ipd, Rpd = [0], [0], [0]
    S, E, I, R = [S_0], [E_0], [I_0], [R_0]
    dt=0.1
    t = np.linspace(0,days,int(days/dt))
    sigma, beta, gam = parms
    for _ in t[1:]:
        next_S = S[-1] - beta*S[-1]*I[-1]*dt
        Epd.append(beta*S[-1]*I[-1]*dt)
        next_E = E[-1] + (beta*S[-1]*I[-1] - sigma*E[-1])*dt
        Ipd.append(sigma*E[-1]*dt)
        next_I = I[-1] + (sigma*E[-1] - gam*I[-1])*dt
        Rpd.append(gam*I[-1]*dt)
        next_R = R[-1] + (gam*I[-1])*dt
        S.append(next_S)
        E.append(next_E)
        I.append(next_I)
        R.append(next_R)
    return np.stack([S, E, I, R, Epd, Ipd, Rpd]).T



### E &rarr; I

Using the numerical model in part 2 and in order to see the distribution of E &rarr; I, we set the initial number of E to be the same as the population, and plot the number of E over time as below:

(while the analytic solution is the exponential distribution, numerically it only approximates it)

In [64]:
#collapse_hide
# Define parameters
days = 30
N = 10000
init = 0, N, 0, 0
sigma = 0.2   # 1/5 --> 5 days on average to go from E --> I
beta = 1.75
gam = 0.1     # 1/10 --> 10 days on average to go from I --> R
parms = sigma, beta, gam

# Plot simulation
fig = go.Figure(data=[       
    go.Scatter(name='E to I', x=np.linspace(0,days,days*10), y=100*(1-seir_model(init, parms, days).T[1]/N)), 
    go.Scatter(name='$\\text{Exponential distribution with} Scale = \\frac{1}{\sigma}$', x=np.arange(days), y=100*expon.cdf(np.arange(days),loc=0,scale=1/sigma))
])

fig.update_layout(
    title='Number of E moving to I over time when all population is exposed on day 0',
    xaxis_title='Days',
    yaxis_title='Percent of exposed having become infectious',
    legend=dict(
        x=0.6,
        y=0,
        traceorder="normal",
    )
)

fig.show()

The time period from exposed to infectious is not instant. Once a person is exposed, it takes some days for them to become infectious. 

The plot above confirms the numerical model from part 2 assumes people go from E &rarr; I according to the exponential distribution.

For COVID-19, research has shown that the mean time from E &rarr; I is 5 days, and that 95% go from E &rarr; I within 14 days.

While the graph above shows some closeness:
* the median is 3.3 days
* the mean is about 5 days
* 95% within 15 days

But:
 
* About 18% of the exposed become infectious after 1 day

While the above may be close to reality for COVID-19, this last point is not. We know this is not true in reality because people tend not to become infectious straight away. 

A likelier distribution of time spend in E state before going to I state is either a gamma distribution (Weibull distribution could also have been used with different parameters).

Let's see the difference.

In [0]:
#collapse_hide
days = np.arange(30)
cdf = pd.DataFrame({
    'Exponential': expon.cdf(days,loc=0,scale=5), 
    'Gamma': gamma.cdf(days,1.8,loc=0.8,scale=3), 
    #'Weibull': weibull_min.cdf(days,1.1, loc=0.8, scale=5)
    })

In [66]:
#collapse_hide
fig = go.Figure(data=[       
    go.Scatter(name='Expon E --> I', x=days, y=cdf.Exponential),
    go.Scatter(name='Gamma E --> I', x=days, y=cdf.Gamma),
    #go.Scatter(name='Weibull', x=days, y=cdf.Weibull)
])

fig.update_layout(
    title='Number of E moving to I over time when all population is exposed on day 0',
    xaxis_title='Days',
    yaxis_title='Percent of exposed having become infectious',
    legend=dict(
        x=0.6,
        y=0,
        traceorder="normal",
    )
)

fig.show()

In [67]:
#collapse_hide
print("Median:")
print(df.median())
print("95th quantile:")
print(df.quantile(q=0.95))

Median:
Exponential     7.527047
Gamma          11.283579
dtype: float64
95th quantile:
Exponential    32.762962
Gamma          16.870541
Name: 0.95, dtype: float64


The gamma distribution matches very closely the actual COVID-19 data we have seen published. The exponential distribution does not.

However, we have seen changing $\sigma$ does not alter much the total number of people infected, so this may have negligible impact on the overall results for the simulations.

The real problem comes from $\gamma$ below.

### I &rarr; R

The same discussion above applies for the time from I &rarr; R here.

From the discussion above, we know the numerical model in part 2 approximates the time from I &rarr; R as an exponential distribution.

Let's verifiy this in the plot below:


In [68]:
#collapse_hide
# Define parameters
days = 30
N = 10000
init = 0, 0, N, 0
sigma = 1/5   # 1/5 --> 5 days on average to go from E --> I
beta = 1.75
gam = 1/11     # 1/11 --> 11 days on average to go from I --> R
parms = sigma, beta, gam

# Plot simulation
fig = go.Figure(data=[       
    go.Scatter(name='I to R', x=np.linspace(0,days,days*10), y=100*(1-seir_model(init, parms, days).T[2]/N)), 
    go.Scatter(name='$\\text{Exponential distribution with} Scale = \\frac{1}{\gamma}$', x=np.arange(days), y=100*expon.cdf(np.arange(days),loc=0,scale=1/gam))
])

fig.update_layout(
    title='Number of I moving to R over time when all population is infectious on day 0',
    xaxis_title='Days',
    yaxis_title='Percent of infectious having become recovered',
    legend=dict(
        x=0.6,
        y=0,
        traceorder="normal",
    )
)

fig.show()

The time period from infectious to recovered is not instant. Once a person is infectious, it takes some days for them to become infectious. 

The plot above confirms the numerical model from part 2 assumes people go from I &rarr; R according to the exponential distribution.

For COVID-19, research has shown that the mean time from I &rarr; R is 11 days, and that 95% go from E &rarr; I within 17 days.

While the exponential distribution above we have:
* the median is 7 days
* the mean is about 11 days
* 95% within 33 days
* About 18% of the infectious become recovered after 1 day

We know this is not even close to reality for COVID-19.
We know this is not true in reality because people tend not to become recovered straight away. 

A likelier distribution of time spend in I state before going to R state is a gamma distribution again (Weibull distribution could also have been used with different parameters).

Let's see the difference.

In [0]:
#collapse_hide
days = np.arange(30)
df = pd.DataFrame({
    'Exponential': expon.rvs(loc=0,scale=11, size=10000),
    'Gamma': gamma.rvs(7, loc=4,scale=1.1, size=10000), 
    #'Weibull': weibull_min.rvs(2.5, loc=6, scale=6.7, size=10000)
    })

In [0]:
#collapse_hide
days = np.arange(30)
cdf = pd.DataFrame({
    'Exponential': expon.cdf(days,loc=0,scale=11),
    'Gamma': gamma.cdf(days,7,loc=4,scale=1.1),
    #'Weibull': weibull_min.cdf(days,2.5, loc=6, scale=6.7)
    })

In [71]:
#collapse_hide
fig = go.Figure(data=[       
    go.Scatter(name='Expon I --> R', x=days, y=cdf.Exponential),
    go.Scatter(name='Gamma I --> R', x=days, y=cdf.Gamma),
    #go.Scatter(name='Weibull I --> R', x=days, y=cdf.Weibull)
])

fig.update_layout(
    title='Number of I moving to R over time when all population is infectious on day 0',
    xaxis_title='Days',
    yaxis_title='Percent of infectious having become recovered',
    legend=dict(
        x=0.6,
        y=0,
        traceorder="normal",
    )
)

fig.show()

In [72]:
#collapse_hide
print("Mean:")
print(df.mean())
print("Median:")
print(df.median())
print("95th Quantie:")
print(df.quantile(q=0.95))

Mean:
Exponential    11.034861
Gamma          11.700213
dtype: float64
Median:
Exponential     7.608165
Gamma          11.305786
dtype: float64
95th Quantie:
Exponential    33.273754
Gamma          17.136692
Name: 0.95, dtype: float64


We can see here the actual distribution is very important and averaging will not be enough to have a proper model. 

The gamma distribution matches very closely the actual COVID-19 data we have seen published. The exponential distribution does not.

Furthermore, we have seen that changing $\gamma$ does alter the total number of people infected, so this may have an important impact on the overall results for the simulations.

We need to build a new model to verify (see part 4).

## Discussion

While many models are assuming exponential distributions to model COVID-19, we can see here the probability distributions of $\sigma$ and especially $\gamma$ are intimately linked to the models and absolutely need to be accounted for properly.

Note: the COVID-19 parameters may be wrong, the point is to have a qualitative discussion on the effect probability distributions rather than exact quantitative.