#Objective

This notebook shows a simulation example of a [SEIR](https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology#The_SEIR_model) model. 

My work is based on [COVID-19 CA: by simple SEIR⚽️](https://www.kaggle.com/kmatsuyama/covid-19-ca-by-simple-seir) and many pionners to interpret the dissemination of COVID-19:

https://towardsdatascience.com/modelling-the-coronavirus-epidemic-spreading-in-a-city-with-python-babd14d82fa2

https://qiita.com/kotai2003/items/ed28fb723a335a873061 (Japanese)

https://arxiv.org/abs/2002.06563.

I appreciate them.

We adopt a optimizer to fit SEIR parameters to real data.

Thanks to this kernel for the idea: https://www.kaggle.com/saga21/covid-global-forecast-sir-model

In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import integrate, optimize
from sklearn.linear_model import LinearRegression

In [None]:
ca_train = pd.read_csv('../input/covid19-local-us-ca-forecasting-week-1/ca_train.csv')
ca_test = pd.read_csv('../input/covid19-local-us-ca-forecasting-week-1/ca_test.csv')
ca_submission = pd.read_csv('../input/covid19-local-us-ca-forecasting-week-1/ca_submission.csv')

train_df = ca_train
test_df =  ca_test
submission_df =  ca_submission

In [None]:
train_df.head()

In [None]:
reported = train_df[train_df['Date']>= '2020-03-10'].reset_index()
reported['day_count'] = list(range(1,len(reported)+1))
reported.head()

#SEIR model.
The differential equations of SEIR model are following:

$$
    \frac{d S}{d t} = -\beta \frac{SI}{N} \\
    \frac{d E}{d t} = \beta \frac{SI}{N} -\varepsilon E \\
    \frac{d I}{d t} = {\varepsilon}E - \gamma I \\
    \frac{d R}{d t} = \gamma I \\
$$

Here, $S,E,I,R$ mean the number of Susceptible, Exposed, Infectious and Recovered, respectively. 

$\beta$ is the infectious rate($0\leq\beta\leq1$), $\varepsilon$ is the rate at which an exposed person becomes infective and $\gamma$ is the recovery rate. 

$\varepsilon$ and $\gamma$ have relationships with $l_p$(latency period) and $i_p$(infectious period):

$$
    \varepsilon = \frac{1}{l_p} \quad [1/day]\\
    \gamma = \frac{1}{i_p} \quad [1/day]
$$

We try to fit three parameters $\beta, \varepsilon, \gamma$. However in some cases, these parameters is out of reasonable ranges(Case 1 and 2). Finally, we assume that $\beta, \gamma$ are constant value and we optimize to $\varepsilon$(Case 3).

In [None]:
ydata = [i for i in reported.ConfirmedCases.values]
xdata = reported.day_count
ydata = np.array(ydata, dtype=float)
xdata = np.array(xdata, dtype=float)

#case 1(Unreasonable)

Free parameters: $\beta, \varepsilon, \gamma$

In [None]:
N = 36000000 #population of California
inf0 = ydata[0] #Infectious
sus0 = N - inf0 #Susceptible
exp0 = 0.0 #Exposed
rec0 = 0.0 #Recovered
init_state = [sus0, exp0, inf0, rec0]
#beta = 1.0 #constant.
#gamma = 1.0 / 7.0 #constant.

In [None]:
# Define differential equation of SEIR model

'''
dS/dt = -beta * S * I / N
dE/dt = beta* S * I / N - epsilon * E
dI/dt = epsilon * E - gamma * I
dR/dt = gamma * I

[v[0], v[1], v[2], v[3]]=[S, E, I, R]

dv[0]/dt = -beta * v[0] * v[2] / N
dv[1]/dt = beta * v[0] * v[2] / N - epsilon * v[1]
dv[2]/dt = epsilon * v[1] - gamma * v[2]
dv[3]/dt = gamma * v[2]

'''

def seir_model(v, x, beta, epsilon, gamma, N):
    return [-beta * v[0] * v[2] / N ,beta * v[0] * v[2] / N - epsilon * v[1],
            epsilon * v[1] - gamma * v[2],gamma * v[2]]

def fit_odeint(x, beta, epsilon, gamma):
    return integrate.odeint(seir_model, init_state, x, args=(beta, epsilon, gamma, N))[:,2]

In [None]:
popt, pcov = optimize.curve_fit(fit_odeint, xdata, ydata)
fitted = fit_odeint(xdata, *popt)

In [None]:
print("Optimal parameters: beta = ", popt[0], "epsilon = ", popt[1], ", gamma = ", popt[2])

In this case, $\beta$ and $\gamma$ has negative values and it is unfeasible.

#case 2(unreasonable)

Free parameters: $\varepsilon, \gamma$

Here we assume $\beta=1$ according to previous study ( https://arxiv.org/abs/2002.06563 ).

In [None]:
N = 36000000 #population of California
inf0 = ydata[0] #Infectious
sus0 = N - inf0 #Susceptible
exp0 = 0.0 #Exposed
rec0 = 0.0 #Recovered
init_state = [sus0, exp0, inf0, rec0]
beta = 1.0 #constant.
#gamma = 1.0 / 7.0 #constant.

In [None]:
# Define differential equation of SEIR model
def seir_model(v, x, beta, epsilon, gamma, N):
    return [-beta * v[0] * v[2] / N ,beta * v[0] * v[2] / N - epsilon * v[1],
            epsilon * v[1] - gamma * v[2],gamma * v[2]]

def fit_odeint(x, epsilon, gamma):
    return integrate.odeint(seir_model, init_state, x, args=(beta, epsilon, gamma, N))[:,2]

In [None]:
popt, pcov = optimize.curve_fit(fit_odeint, xdata, ydata)
fitted = fit_odeint(xdata, *popt)

In [None]:
print("Optimal parameters: epsilon = ", popt[0], ", gamma = ", popt[1])

In this case, $\varepsilon$ is too big value, which means latency period is too small($\approx 0.018$ days). It seems like unreal situation.

#case 3(Possible)

Free parameter: $\varepsilon$

We assume that $\beta=1,~\gamma=1/7~(i_p=7)$ according to previous study ( https://arxiv.org/abs/2002.06563 ).

In [None]:
N = 36000000 #population of California
inf0 = ydata[0] #Infectious
sus0 = N - inf0 #Susceptible
exp0 = 0.0 #Exposed
rec0 = 0.0 #Recovered
init_state = [sus0, exp0, inf0, rec0]
beta = 1.0 #constant.
gamma = 1.0 / 7.0 #constant.

In [None]:
# Define differential equation of SEIR model
def seir_model(v, x, beta, epsilon, gamma, N):
    return [-beta * v[0] * v[2] / N ,beta * v[0] * v[2] / N - epsilon * v[1],
            epsilon * v[1] - gamma * v[2],gamma * v[2]]

def fit_odeint(x, epsilon):
    return integrate.odeint(seir_model, init_state, x, args=(beta, epsilon, gamma, N))[:,2]

In [None]:
popt, pcov = optimize.curve_fit(fit_odeint, xdata, ydata)
fitted = fit_odeint(xdata, *popt)

In [None]:
inf_period = 1.0/gamma
lat_period = 1.0/popt[0]
print("Optimal parameters: gamma =", gamma, ", epsilon = ", popt[0], "\ninfectious period(day) = ", inf_period, ", latency period(day) = ", lat_period)

In [None]:
plt.plot(xdata, ydata, 'o')
plt.plot(xdata, fitted)
plt.title("Fit of SEIR model to global infected cases")
plt.ylabel("Population infected")
plt.xlabel("Days")
plt.show()

In following simulation, we adopt the parameters in Case 3.

#Numerical Integration

We fixed all parameters for SEIR models. Next step is numerical integration.

In [None]:
# parameters
t_max = 100 #days
dt = 1

N = 36000000 #population of California
inf0 = ydata[0] #Infectious
sus0 = N - inf0 #Susceptible
exp0 = 0.0 #Exposed
rec0 = 0.0 #Recovered
init_state = [sus0, exp0, inf0, rec0]
beta_const = 1.0 #Assumption: Infection rate is constant.
epsilon_const = popt[0]
gamma_const = 1.0 / 7.0 #Assumption: Recovery rate is constant.

In [None]:
# numerical integration
times = np.arange(0, t_max, dt)
args = (beta_const, epsilon_const, gamma_const, N)

# Numerical Solution using scipy.integrate
# Solver SEIR model
result = integrate.odeint(seir_model, init_state, times, args)
# plot
plt.plot(times, result)
plt.legend(['Susceptible', 'Exposed', 'Infectious', 'Removed'])
plt.title("SEIR model  COVID-19")
plt.xlabel('time(days)')
plt.ylabel('population')
plt.grid()

plt.show()

In [None]:
result_df = pd.DataFrame(data=result, columns=['Susceptible', 'Exposed', 'Infectious', 'Removed'])
result_df.shape

#Fatalities Estimation

To evaluate Fatalities, we assume Fatality Rate( `Fatalities`/`ConfirmedCases`) is constant. This assumption is reasonable because we can use a simple Linear Regression to `ConfirmedCases`-`Fatalities` space.

In [None]:
lr = LinearRegression()
X_train = reported[['ConfirmedCases']].values
Y_train = reported[['Fatalities']].values
lr.fit(X_train, Y_train)
print('coefficient = ', lr.coef_[0], '(which means Fatality rate)')
print('intercept = ', lr.intercept_)

In [None]:
X_pred = result_df[['Infectious']].values
Y_pred = lr.predict(X_pred)
plt.scatter(X_train, Y_train, c='blue')
plt.plot(X_pred, Y_pred, c='red')
plt.title("Regression Line")
plt.xlabel('ConfirmedCases')
plt.ylabel('Fatalities')
plt.grid()

plt.xlim([100,800])
plt.ylim([0,20])

plt.show()

In [None]:
Y_pred_df = pd.DataFrame(Y_pred)
result_df['Fatalities'] = Y_pred_df
result_df.head()

In [None]:
submission = result_df[0:len(submission_df)].reset_index()
submission_df['ConfirmedCases'] = submission['Infectious']
submission_df['Fatalities'] = submission['Fatalities']
submission_df.head()

#Further study

Assumption that there is some intervention that causes the reproduction number (R_0) to fall to a lower value (R_t) at a certain time (e.g. physical distancing).

This kernel will be helpful for that: https://www.kaggle.com/anjum48/seir-model-with-intervention#Model-with-intervention

In [None]:
submission_df.to_csv("submission.csv", index=False)