# Important note!

Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your GT login and the GT logins of any of your collaborators below. (The GT logins are worth 1 point per notebook, so don't miss the opportunity to get a free point!)

In [None]:
YOUR_ID = "" # Please enter your GT login, e.g., "rvuduc3" or "gtg911x"
COLLABORATORS = [] # list of strings of your collaborators' IDs

In [None]:
import re

RE_CHECK_ID = re.compile (r'''[a-zA-Z]+\d+|[gG][tT][gG]\d+[a-zA-Z]''')
assert RE_CHECK_ID.match (YOUR_ID) is not None

collab_check = [RE_CHECK_ID.match (i) is not None for i in COLLABORATORS]
assert all (collab_check)

del collab_check
del RE_CHECK_ID
del re

**Jupyter / IPython version check.** The following code cell verifies that you are using the correct version of Jupyter/IPython.

In [None]:
import IPython
assert IPython.version_info[0] >= 3, "Your version of IPython is too old, please update it."

# Mean-field models

One way to simplify both the cellular automata (CA) and Markov chain (MC) models---which suffer from state-space explosion when the number of state variables becomes large---is to derive a "mean-field" approximation.

When applying this approach to a CA model, you typically throw away connectivity information of cells, instead assuming that cells are fully-connected. Then, you try to model the fraction of the population that exists in each state.

Chapter 12.3 of [Sayama's book](http://bingweb.binghamton.edu/~sayama/textbook/) has some discussion on this idea, though the way we will derive it for SIR is different.

## Example: Applying the mean-field technique to the SIR system

For instance, recall that in the susceptible-infected-recovered (SIR) model of infection that each cell exists in one of three possible states (the "S", "I", and "R" states). In a mean-field model, you would first define a time-dependent variable for each state, which you interpret as the fraction of the population in that state. That is, let

- $S_t$ be the fraction of the population that is susceptible at (discrete) time $t$;
- $I_t$ be the fraction that is infected at $t$; and
- $R_t$ be the fraction that is recovered at $t$,

where $S_t + I_t + R_t = 1$. We will implicitly assume that the number of individuals is large enough that we can treat these fractions as being continuous.

In class, we discussed a model for this system that closely approximates the CA and Markov models. Recall that

- $I_t S_t$ measures the fraction of total encounters that can cause disease transmission;
- $\tau$ is a parameter that represents the fraction (or probability) of such encounters transmitting disease in any time step; and
- $\frac{1}{\kappa}$ is a parameter that represents the fraction (probability) of infected individuals recovering in any time step.

Then, a corresponding discrete-time dynamical system might be

$$
\begin{eqnarray}
  S_{t+1} & \equiv & S_t - \tau I_t S_t = S_t (1 - \tau I_t) \\
  I_{t+1} & \equiv & I_t + \tau I_t S_t - \frac{1}{\kappa} I_t = I_t \left( 1 + \tau S_t - \frac{1}{\kappa} \right) \\
  R_{t+1} & \equiv & R_t + \frac{1}{k} I_t = R_t \left( 1 + \frac{1}{k} \frac{I_t}{R_t} \right).
\end{eqnarray}
$$

**Exercise 1** (1 point). Find the equilibrium points of this system.

YOUR ANSWER HERE

## Implementing the mean-field SIR system

Let's implement the mean-field SIR system.

Since $S_t + I_t + R_t = 1$, you can also just represent any two of the three state variables and recover the third from them. In the simulation you build below, let's directly represent $S_t$ and $I_t$ and leave $R_t$ implicit.

In [None]:
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

**Exercise 2** (2 points). Let `x[:2]` be a length-2 NumPy vector representing the state of the system, where `x[0]` is the susceptible fraction and `x[1]` is the infected fraction. Implement a function, `F_sir(x, tau, kappa)`, that computes the next state given `x`.

In [None]:
def F_sir (x, tau, kappa):
    # x = (s, i)
    x_next = x.copy ()
    # YOUR CODE HERE
    raise NotImplementedError()
    return x_next

In [None]:
assert np.allclose (F_sir (np.array ([0.75, 0.25]), 2./3, 2.0), np.array ([5./8, 1./4]))
print ("\n(Passed!)")

**Exercise 3** (2 points). Implement a function, `sim(t_max, alpha, tau, kappa)`, that simulates the system for `t_max` time steps. The parameter `alpha` is the initial fraction of the population that is infected; assume all other individuals are susceptible.

In [None]:
def sim (t_max, alpha, tau, kappa):
    X = np.zeros ((2, t_max+1)) # X[:, t] = [S_t, I_t]
    # YOUR CODE HERE
    raise NotImplementedError()
    return X

In [None]:
def summarize_sim (X, alpha, tau, kappa):
    assert len (X.shape) == 2 and X.shape[0] == 2
    t_max = X.shape[1] - 1

    print ("Simulation results: t_max = {}, alpha = {}, tau = {}, kappa = {}".format (t_max, alpha, tau, kappa))
    s, i = X[0, -1], X[1, -1]
    print ("- S_{{{}}} = {:.3f}".format (t_max, s))
    print ("- I_{{{}}} = {:.3f}".format (t_max, i))
    print ("- R_{{{}}} = {:.3f}".format (t_max, 1-s-i))
    
def plot_sim (X, alpha, tau, kappa):
    assert len (X.shape) == 2 and X.shape[0] == 2
    t_max = X.shape[1] - 1
    
    T = np.arange (t_max+1)
    use_points = len (T) <= 30
    plt.plot (T, X[0, :], 'ys--' if use_points else 'y-')
    plt.plot (T, X[1, :], 'r*--' if use_points else 'r--')
    plt.plot (T, 1. - X[0, :] - X[1, :], 'bo--' if use_points else 'b--')
    plt.legend (['S', 'I', 'R'])
    plt.axis ([0, t_max+1, 0, 1])
    plt.title ("alpha = {}, tau = {}, kappa = {}".format (alpha, tau, kappa))
    
T_MAX = 30
ALPHA = 1. / 3
TAU = 0.2
KAPPA = 2

X = sim (T_MAX, ALPHA, TAU, KAPPA)
summarize_sim (X, ALPHA, TAU, KAPPA)
plot_sim (X, ALPHA, TAU, KAPPA)

assert np.allclose (X[:, -1], np.array ([0.556, 0.0]), atol=0.001)
print ("\n(Passed!)")

The simulation above uses parameters that closely approximate those of the one-dimensional, three-cell model from Lab 5, Part B, where you applied Markov chain analysis to estimate the average fraction of the population in the recovered state in equilibrium. The output from Lab 5, Part B, Exercise 5 should have looked something like,

```
=== Parameter summary ===
N: 3
TAU: 0.2
K: 2

=== Results ===

4 state(s) have a non-zero steady-state probability:
  39: Pr[(0, 3, 0)] == 0.4096
  13: Pr[(3, 3, 0)] == 0.2304
  33: Pr[(0, 3, 3)] == 0.2304
  16: Pr[(3, 3, 3)] == 0.1296

Probability of k persons being infected:
  Pr[0 recovered] == 0
  Pr[1 recovered] == 0.4096
  Pr[2 recovered] == 0.4608
  Pr[3 recovered] == 0.1296
  ==> Expected value of k == 1.72
  ==> Expected fraction infected == 0.573333333333
```

In the simulation test code above, `ALPHA` is set to 1/3 since one of the three grid cells begins in the infected state. You should see that the recovery fraction predicted by the CA model ($\approx 0.573$) differs from the one predicted by the mean-field approximation ($\approx 0.444$).

**Exercise 4** (2 points). Find values of $\tau$ and $\kappa$ that approximate the recovered fraction of $\approx 0.573$ produced by the CA model as closely as possible. In particular, in the code below, create two variables, `tau_adj` and `kappa_adj`, corresponding to your values. The testing code will print the $(S, I, R)$ fractions and create a plot, as was done above.

Since `T_MAX` is also somewhat arbitrary, you may, if you wish, change its value as well. We've bumped it up to 50 time steps by default.

> Hint: You can certainly use trial and error to find suitable values. But can you also think of a more systematic and programmatic way to find them, using tools or ideas from earlier labs?

In [None]:
T_MAX = 30
ALPHA = 1. / 3

# Create `tau_adj` and `kappa_adj` as instructed
# YOUR CODE HERE
raise NotImplementedError()

# The following show what happens with your
# values of `tau_adj` and `kappa_adj`:
print ("Your parameters: tau_adj = {}, kappa_adj = {}".format (tau_adj, kappa_adj))
X_adj = sim (T_MAX, ALPHA, tau_adj, kappa_adj)
summarize_sim (X_adj, ALPHA, tau_adj, kappa_adj)
plot_sim (X_adj, ALPHA, tau_adj, kappa_adj)

In [None]:
assert np.allclose (X_adj[:, -1], np.array ([1-0.573, 0.0]), atol=1e-3)
print ("\n(Passed!)")

## Extension to continuous time

Next, suppose we wish to treat time as a continuous, rather than discrete, variable. Doing so gives rise to a system of ordinary differential equations (ODEs):

$$
\begin{eqnarray}
  \dfrac{d\vec{y}}{dt}
  = \dfrac{d}{dt}\left(\begin{matrix}
      S(t) \\
      I(t) \\
      R(t)
    \end{matrix}\right)
  & = & \left(\begin{matrix}
      - \tau_0 I(t) S(t) \\
      \tau_0 S(t) (It) - \dfrac{1}{\kappa_0} I(t) \\
      \dfrac{1}{\kappa_0} I(t)
    \end{matrix}\right)
  \equiv \vec{F}(\vec{y}),
\end{eqnarray}
$$

where $\vec{y}(t)$ is the state vector and both $\tau_0$ and $\kappa_0$ are now rate parameters, having units of "fractions per unit time."

**Exercise 5** (3 points). Simulate the continuous time system. You may find it helpful to refer back to [Lab 4's exercises on solving ODEs](https://github.com/rvuduc/cx4230sp17labs/blob/master/lab4/part_a--loveshack.ipynb).

Your solution should do the following:
- Use the $\alpha$, $\tau_0$, and $\kappa_0$ constants defined at the top of the following code cell, i.e., `ALPHA`, `TAU_0`, and `KAPPA_0`, respectively. (They are set to be the same as those from Exercise 3, the discrete-time model that computed `X`.)
- Use the initial population parameters $S(0) = 1 - \alpha$, $I(0) = \alpha$, and $R(0) = 0$. These values are set in the `y0[:3]` array, below.
- Compute solutions at the time points 0, 1, 2, ..., 30, which are stored in the `T[:31]` array, below.
- Store the results for $S(t)$, $I(t)$, and $R(t)$ for these 31 time points (i.e., including $t=0$) in three NumPy arrays named `S_ode[:31]`, `I_ode[:31]`, and `R_ode[:31]`, respectively. The summary and plotting code fragments below assume these names.

> The main point of this exercise is to show you the qualitative similarity between the discrete-time results and continuous-time results.

In [None]:
# Initially, suppose these rate parameters have
# the same values as the CA model, i.e.,

# Use same ALPHA as Exercise 3's test code;
# use the following $\tau_0$ and $\kappa_0$:
TAU_0 = TAU
KAPPA_0 = KAPPA

# Initial populations, i.e., [S(0), I(0), R(0)]
y0 = np.array ([1.0 - ALPHA, ALPHA, 0.])

# Time points at which to compute the solutions:
T = np.arange (31).astype (float)

# YOUR CODE HERE
raise NotImplementedError()

def summarize_sim_ode (S, I, T, alpha, tau_0, kappa_0):
    t_max = T[-1]
    print ("ODE simulation parameters:")
    print ("  - t_max = {}".format (t_max))
    print ("  - alpha = {}".format (alpha))
    print ("  - tau_0 = {}".format (tau_0))
    print ("  - kappa_0 = {}".format (kappa_0))
    print ("\nResults:")
    print ("- S_{{{}}} = {:.3f}".format (t_max, S[-1]))
    print ("- I_{{{}}} = {:.3f}".format (t_max, I[-1]))
    print ("- R_{{{}}} = {:.3f}".format (t_max, 1-S[-1]-I[-1]))
    
summarize_sim_ode (S_ode, I_ode, T, ALPHA, TAU_0, KAPPA_0)

In [None]:
def plot_sim_ode (S, I, T, alpha, tau, kappa):
    t_max = T[-1]
    use_points = len (T) <= 35
    plt.plot (T, S, 'ys--' if use_points else 'y-')
    plt.plot (T, I, 'r*--' if use_points else 'r--')
    plt.plot (T, 1. - S - I, 'bo--' if use_points else 'b--')
    plt.legend (['S', 'I', 'R'])
    plt.axis ([0, t_max+1, 0, 1])
    plt.title ("alpha = {}, tau = {}, kappa = {}".format (alpha, tau, kappa))
        
# Figure to compare discrete-time and continuous-time models
plt.figure (figsize=(12, 6))
plt.subplot (1, 2, 1)
plot_sim (X, ALPHA, TAU, KAPPA)
plt.subplot (1, 2, 2)
plot_sim_ode (S_ode, I_ode, T, ALPHA, TAU_0, KAPPA_0)

assert np.linalg.norm (X[0, :] - S_ode, ord=np.inf) <= 0.05
assert np.linalg.norm (X[1, :] - I_ode, ord=np.inf) <= 0.05
assert np.linalg.norm (1-X[0, :]-X[1, :] - R_ode, ord=np.inf) <= 0.05
print ("\n(Passed!)")