# Important note!

Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your GT login and the GT logins of any of your collaborators below. (The GT logins are worth 1 point per notebook, so don't miss the opportunity to get a free point!)

In [None]:
YOUR_ID = "" # Please enter your GT login, e.g., "rvuduc3" or "gtg911x"
COLLABORATORS = [] # list of strings of your collaborators' IDs

In [None]:
import re

RE_CHECK_ID = re.compile (r'''[a-zA-Z]+\d+|[gG][tT][gG]\d+[a-zA-Z]''')
assert RE_CHECK_ID.match (YOUR_ID) is not None

collab_check = [RE_CHECK_ID.match (i) is not None for i in COLLABORATORS]
assert all (collab_check)

del collab_check
del RE_CHECK_ID
del re

**Jupyter / IPython version check.** The following code cell verifies that you are using the correct version of Jupyter/IPython.

In [None]:
import IPython
assert IPython.version_info[0] >= 3, "Your version of IPython is too old, please update it."

# Analyzing the SIR-CA infection model

The goal of this notebook is to analyze theoretically the SIR-CA model of [Lab 4](https://github.com/rvuduc/cx4230sp17labs/tree/master/lab4). The SIR-CA model is special in the sense that this analysis is tractable, using the mathematical tool of Markov chains.

## Setup

Run the following code cells to get started.

In [None]:
# Our usual multidimensional array tools
import numpy as np
import scipy as sp
import scipy.sparse

# Some handy combinatorial functions, e.g.,
# for generating permutations and combinations.
# See: https://docs.python.org/3/library/itertools.html
import itertools

In [None]:
 # Core plotting support
import matplotlib.pyplot as plt
%matplotlib inline

## Monte Carlo simulations for a 1-D, 3-cell grid

Let's start by experimentally analyzing the simplest case, which is a one-dimensional grid consisting of just three (3) cells.

The following code cell helps implement an SIR-CA model for the case of a 1-D world with empty left-right boundaries. Note that some functions accept additional parameters for the length of infection, $k$ (variable `k`), and conditional probability of infection given exposure, $\tau$ (variable `tau`).

In [None]:
EMPTY = -1
SUSCEPTIBLE = 0
INFECTED = 1

def count (G):
    """
    Counts the number of locations in a NumPy array, `G`,
    where `np.where (G)` holds.
    """
    return len (np.where (G)[0])

def find (G):
    """
    Returns the set of locations of a NumPy array, `G`,
    where `np.where (G)` holds.
    """
    assert type (G) is np.ndarray
    return {i for i in np.where (G)[0]}

def create_world_1d (init_state):
    """
    Creates a one-dimensional world. If the initial
    state is not given, creates a world full of
    susceptible individuals with a single index case
    (approximately in the center).
    """
    if type (init_state) is int: # Interpret as a dimension
        n = init_state
        G = EMPTY * np.ones (n+2)
        G[1:-1] = SUSCEPTIBLE
        G[int ((n+2)/2)] = INFECTED
    else:
        G = np.append (np.append ([EMPTY], init_state), [EMPTY])
    return G

def empty (G):
    """
    Given a grid, returns a new grid whose entries are
    1 wherever the original grid is EMPTY or 0
    otherwise.
    """
    return (G == EMPTY).astype (int)

def susceptible (G):
    """
    Given a grid, returns a new grid whose entries are
    1 wherever the original grid is SUSCEPTIBLE or 0 
    otherwise.
    """
    return (G == SUSCEPTIBLE).astype (int)

def infected (G, k):
    """
    Given a grid, returns a new grid whose entries are
    1 wherever the original grid is in any day of
    infection [INFECTED, INFECTED+k) or 0 otherwise.
    """
    assert k >= INFECTED
    return ((G >= INFECTED) & (G < INFECTED+k)).astype (int)

def recovered (G, k):
    """
    Given a grid, returns a new grid whose entries are
    1 wherever the original grid is RECOVERED or 0 
    otherwise.
    """
    return (G == INFECTED+k).astype (int)

def exposed (G, k):
    """
    Given a grid, returns a new grid whose entries are
    1 wherever the original grid is exposed to infection
    or 0 otherwise.
    """
    S = susceptible (G)
    I = infected (G, k)
    E = np.zeros (G.shape, dtype=int)
    E[1:-1] = S[1:-1] & (I[0:-2] | I[2:])
    return E

def spreads (G, k, tau):
    """
    Given a grid `G`, returns a new grid of newly infected
    cells. That is, this function assigns a 1 to a
    fraction of exposed cells and 0 to all other cells.
    The new infections are chosen independently and
    uniformly at random with the prescribed probability
    (`tau`).
    """
    R = np.random.uniform (size=G.shape)
    return exposed (G, k) * (R < tau)

def step (G, k, tau):
    """
    Given a grid, computes a new grid state by applying
    the SIR-CA model.
    """
    return G + infected (G, k) + spreads (G, k, tau)

def sim (init_state, k, tau, t_max=None):
    """
    Simulates a 1-D SIR-CA system.
    """
    G = create_world_1d (init_state)

    if t_max is None:
        t_max = (len (G)-2) * (k+1)
        
    for t in range (t_max):
        G = step (G, k, tau)
    return G

**Exercise 1** (2 points). Use the above functions to conduct the following experiment. Simulate the SIR-CA system for $n=3$ cells (excluding the boundaries), $\tau=0.2$ and $k=2$. Perform $s$ simulations; for each simulation, count and record the number of susceptible, infected, and recovered individuals. Record these results in three NumPy arrays, `num_S`, `num_I`, and `num_R`, each of length $s$.

In the code below, a constant `NUM_SIMS` initializes $s=10$. Later, in exercise 3, you will adjust this value.

In [None]:
# Size of the 1-D world (excluding boundaries)
N = 3

# Pr[infected|exposed]
TAU = 0.2

# Length of an infection in simulation timesteps
K = 2

# Counts of {S, I, R} at the end of each simulation
NUM_SIMS = 10
num_S = np.zeros (NUM_SIMS, dtype=int)
num_I = np.zeros (NUM_SIMS, dtype=int)
num_R = np.zeros (NUM_SIMS, dtype=int)

# Implement the experiment below

# YOUR CODE HERE
raise NotImplementedError()

# Prints your results:
print ("Susceptible: {:.1f}% <= {} ...".format (1e2/N*np.mean (num_S), num_S[:10]))
print ("Infected: {:.1f}% <= {} ...".format (1e2/N*np.mean (num_I), num_I[:10]))
print ("Recovered: {:.1f}% <= {} ...".format (1e2/N*np.mean (num_R), num_R[:10]))

In [None]:
assert (num_S + num_I + num_R == N).all ()
assert (num_I == 0).all ()
print ("\n(Passed.)")

**Exercise 2** (2 points). Suppose we wish to estimate the steady-state probability that the number of recovered individuals is $\rho$. (The number of recovered individuals indicates how many individuals became infected.)

Using your `num_R[:]` computed above, estimate these probabilities for $\rho \in \{0, 1, 2, \ldots, n\}$ where $n$ is the number of individuals. Store your results in the array, `pr_R[:]`.

In [None]:
pr_R = np.zeros (N+1) # pr_R[k] == estimate of Pr[R == k]

# YOUR CODE HERE
raise NotImplementedError()

for k in range (0, N+1):
    print ("Pr[R={}] ~ {:.3f}".format (k, pr_R[k]))

In [None]:
assert len (pr_R) == N+1
assert pr_R[0] == 0.0
assert (np.abs (np.sum (pr_R) - 1.0) < 1e-15)
print ("\n(Passed.)")

**Exercise 3** (3 points). Determine how many simulations you need to run to get a 90% confidence interval that estimates $Pr[R=k]$ to within $\pm 0.005$. That is, adjust the number of simulations until the confidence interval for all $Pr[R=k]$ falls below $\pm 0.005$.

Your solution should print each estimate of $Pr[R=k]$ and its associated 90% confidence interval. That is, include print statements that produce output that looks something like (for `N=3`):

```
  After XX simulations:
    Pr[R=1] = 0.### +/- 0.00###
    Pr[R=2] = 0.### +/- 0.00###
    Pr[R=3] = 0.### +/- 0.00###
```

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Markov chain analysis

To carry out the Markov chain analysis, recall that you need to do the following.

1. Define the state space, $\Sigma$.
2. Construct a probability transition matrix, $P \equiv \left(p_{ij}\right)$ where $p_{ij} \equiv \mathrm{Pr}[\sigma_j\,|\,\sigma_i]$ is the conditional probability of moving to state $\sigma_j$ starting from state $\sigma_i$ for every pair of states $\sigma_i, \sigma_j \in \Sigma$.

For the 1-D, 3-cell system, the state space $\Sigma$ consists of all possible grid states, $\sigma \equiv (g_1, g_2, g_3)$ where $g_i \in \left\{ \mathtt{S}, \mathtt{I}_1, \ldots, \mathtt{I}_k, \mathtt{R} \right\}$.

**Exercise 3** (2 points). Generate a Python [set()](https://docs.python.org/3/library/stdtypes.html#set) named `Sigma` that contains all possible grid-state triples.

> Hint: Take a look at [`itertools.product()`](https://docs.python.org/3/library/itertools.html#itertools.product).

In [None]:
CELL_VALUES = [SUSCEPTIBLE] + list (range (INFECTED, INFECTED+K+1))

# Generate Sigma
#Sigma = set (...)
# YOUR CODE HERE
raise NotImplementedError()

print ("For k ==", K, "each cell may have these values:", CELL_VALUES)
print ("==>", len (Sigma), "possible states in total.\n")
print ("Here are the states:")
print (Sigma)

In [None]:
assert len (Sigma) == (K+2)**N
for i in range (0, K+2):
    for j in range (0, K+2):
        for k in range (0, K+2):
            assert (i, j, k) in Sigma
print ("\n(Passed.)")

To construct the probability transition matrix, $P$, we will eventually need to map individual states to integer indices. The following code builds two lookup tables for converting grid states to integers and vice-versa.

In [None]:
INDEX_TO_SIGMA = dict (enumerate (Sigma))
SIGMA_TO_INDEX = {val: key for (key, val) in INDEX_TO_SIGMA.items ()}

print ("==> Index to state table:\n", INDEX_TO_SIGMA, "\n")
print ("==> State to index table:\n", SIGMA_TO_INDEX)

The states are tuples whereas the grid in our simulation is a 2-D array (with an artificial empty boundary). Here are some handy functions to convert between a grid and a state.

In [None]:
def convert_state_to_grid (s):
    """
    Converts a state, given as a tuple `s`, into an equivalent
    1-D grid with `len (s)` cells.
    """
    G = create_world_1d (s)
    return G

def convert_grid_to_state (G):
    """
    Converts a 1-D grid (with its empty boundary)
    into a state tuple.
    """
    G_interior = G[1:-1]
    return tuple (G[1:-1])

In [None]:
# Test
assert N == 3
SIGMA0 = (SUSCEPTIBLE, INFECTED, SUSCEPTIBLE)
G0 = create_world_1d (SIGMA0)

print ("==> Original grid:")
print (G0, "\n")

print ("==> Equivalent state (id):")
s0 = convert_grid_to_state (G0)
i0 = SIGMA_TO_INDEX[s0]
print ("State", s0, "--> index", i0, "\n")

print ("==> Convert back to grid:")
s0_prime = INDEX_TO_SIGMA[i0]
G0_prime = convert_state_to_grid (s0_prime)
print (G0_prime, "\n")

assert (G0 == G0_prime).all ()
print ("==> Passed!")

## Reachability

The last thing you need to compute $p_{ij}$ is a way to enumerate all _reachable_ $j$ values, given $i$. The following code cells build a function to determine reachability.

**Exercise 4** (2 points). Let $\tau$ be the conditional probability that an exposed individual becomes infected. Further suppose the grid contains $n_e$ exposed individuals.

1. How many different ways are there for $n_s$ individuals to become infected? (Assume $0 \leq n_s \leq n_e$.)
2. What is the probability of $n_s$ of the $n_e$ individuals becoming infected?

YOUR ANSWER HERE

In [None]:
def reachable (G, k, tau):
    """
    Generator to enumerate all worlds reachable in one
    simulation step from a given world.
    """
    # Find all infected cells
    I = infected (G, k)
    if np.sum (I) == 0: # No infected cells
        yield (1.0, G)
    else: # >= 1 infected cell
        # Find number and locations of all exposed cells
        locs_E = find (exposed (G, k))
        n_E = len (locs_E)

        # Try all possible spreading combinations
        for k in range (n_E+1):
            # Probability that a particular set of k exposed people become infected:
            prob_k = (tau**k) * ((1.0 - tau)**(n_E - k))

            # Enumerate all reachable infection states
            for spread_locs_k in itertools.combinations (locs_E, k):
                G_next = np.copy (G) + I
                for i in spread_locs_k:
                    G_next[i] = INFECTED
                yield (prob_k, G_next)

In [None]:
n_states = len (Sigma)
nz_i = []
nz_j = []
nz_val = []
for s_i in Sigma:
    i = SIGMA_TO_INDEX[s_i]
    G_i = convert_state_to_grid (s_i)
    
    for (p_ij, G_j) in reachable (G_i, K, TAU):
        s_j = convert_grid_to_state (G_j)
        j = SIGMA_TO_INDEX[s_j]
        nz_i.append (i)
        nz_j.append (j)
        nz_val.append (p_ij)
            
P = sp.sparse.coo_matrix ((nz_val, (nz_i, nz_j)), (n_states, n_states))
print ("no. of states ==", n_states)
print ("tau ==", TAU, "; P = (p_{ij}) =\n", P)
print ("nnz(P) ==", P.nnz)
print ("row sums ==", P.sum (axis=1))

plt.spy (P, markersize=5, precision=.1)

**Exercise 5** (1 point). Compute the steady-state distribution, using the initial distribution `x0` defined below and the state-transition matrix `P` defined above. Store this distribution in a NumPy array `x[:n_states]`.

In [None]:
x0 = np.zeros ((n_states, 1))
x0[SIGMA_TO_INDEX[(SUSCEPTIBLE, INFECTED, SUSCEPTIBLE)]] = 1.0

# YOUR CODE HERE
raise NotImplementedError()

# Summarize the "interesting" states
x_sorted = sorted (list (enumerate (x)), key=lambda t: t[1], reverse=True)
x_interesting = list (filter (lambda t: t[1] > 0.0, x_sorted))

print ("=== Parameter summary ===")
print ("N:", N)
print ("TAU:", TAU)
print ("K:", K)

print ("\n=== Results ===\n")

print (len (x_interesting), "state(s) have a non-zero steady-state probability:")
for (i, x_i) in x_interesting:
    print ("  %d:" % i, "Pr[%s] ==" % str(INDEX_TO_SIGMA[i]), x_i[0])

nnz_hist = np.zeros (N+1)
for (i, x_i) in x_interesting:
    nnz = sum ([j > 0 for j in INDEX_TO_SIGMA[i]])
    nnz_hist[nnz] += x_i[0]
    
print ("\nProbability of k persons being infected:")
E_k = 0.0
for (k, p_k) in enumerate (nnz_hist):
    print ("  Pr[%d recovered] == %g" % (k, p_k))
    E_k += k * p_k
print ("  ==> Expected value of k ==", E_k)
print ("  ==> Expected fraction infected ==", E_k/N)

In [None]:
assert abs (E_k - 1.72) <= 1e-15
print ("\n(Passed.)")