# Important note!

Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your GT login and the GT logins of any of your collaborators below. (The GT logins are worth 1 point per notebook, so don't miss the opportunity to get a free point!)

In [None]:
YOUR_ID = "" # Please enter your GT login, e.g., "rvuduc3" or "gtg911x"
COLLABORATORS = [] # list of strings of your collaborators' IDs

In [None]:
import re

RE_CHECK_ID = re.compile (r'''[a-zA-Z]+\d+|[gG][tT][gG]\d+[a-zA-Z]''')
assert RE_CHECK_ID.match (YOUR_ID) is not None

collab_check = [RE_CHECK_ID.match (i) is not None for i in COLLABORATORS]
assert all (collab_check)

del collab_check
del RE_CHECK_ID
del re

**Jupyter / IPython version check.** The following code cell verifies that you are using the correct version of Jupyter/IPython.

In [None]:
import IPython
assert IPython.version_info[0] >= 3, "Your version of IPython is too old, please update it."

# Random walks (1-D)

The mean-field SIR model of the previous part does not model the flow of infection in space. To develop a better approach, we will need ways to reason about spatial flows. This notebook considers one such model, known as a random walk.

> We will adapt these ideas about "random walkers" to modeling zombies, in anticipation of [HvZ@GT](https://hvz.gatech.edu), though we probably won't look in detail at that modeling problem until right _after_ HvZ and the midterm.

Some of this discussion is based on freely available [notes by Kai Norlund (2006)](http://www.acclab.helsinki.fi/~knordlun/mc/mc7nc.pdf), a copy of which is also [posted on T-Square](https://t-square.gatech.edu/access/content/group/gtc-239f-fc11-5690-9dae-2dc96b59f372/Nordlund-2006-random-walk-notes.pdf).

## Setup

In [None]:
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt

## The statistics of random steps

Let $S_t \in \{\pm 1\}$ be a random variable denoting the direction of the step $\Delta x$ at time $t$, where all $\{S_t\}$ are independent and identically distributed (i.i.d.) with

$$
\begin{eqnarray}
  \mathrm{Pr}[S_t=+1] & \equiv & \alpha \\
  \mathrm{Pr}[S_t=-1] & \equiv & 1 - \alpha.
\end{eqnarray}
$$

The parameter $\alpha$ is sometimes called the "anisotropy" parameter, which indicates whether the walker is biased toward one direction. When $\alpha=0.5$, the walker is equally likely to go in either direction and we say its movement is "isotropic" (equally likely to go in all directions). When $\alpha \neq 0.5$, its movement is "anisotropic."

**Exercise 1** (2 point). Write a function `s[:num_steps] = random_steps(num_steps, dx, alpha)` that returns a NumPy array of i.i.d. random steps, as defined above. That is, `s[t]` should be a realization of the random variable, $S_t$; each `s[t]` should be either `+dx` or `-dx`, chosen independently with probability `alpha` and `1.0-alpha`, respectively.

In [None]:
def random_steps (num_steps, dx=1.0, alpha=0.5):
    assert dx >= 0
    assert 0 <= alpha <= 1
    assert num_steps > 0
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
n_test = 100000
dx_test = 2.0
alpha_test = 0.25
s_test = random_steps (n_test, dx=dx_test, alpha=alpha_test)

print ("average size of the {} steps (dx={}, alpha={}): {:.3f}".format (n_test,
                                                                        dx_test,
                                                                        alpha_test,
                                                                        np.mean (s_test)))
assert len (s_test) == n_test
assert ((s_test == -dx_test) | (s_test == dx_test)).all ()
assert np.isclose (np.mean (s_test), dx_test*(2*alpha_test - 1), atol=0.05)

**Step statistics.** As shown in class,

$$
\begin{eqnarray}
  \mathrm{E}[S_t] & = & 2 \alpha - 1 \equiv \mu_s \\
  \mathrm{Var}[S_t] & = & 1 - \mu_s^2 \equiv \sigma_s^2.
\end{eqnarray}
$$

**Exercise 2** (2 points). Compute the step statistics shown above for various values of the anisotropy parameter.

That is, for each value of $\alpha$ defined in the array of different anisotropy parameters to try, called `ALPHAS`, compute $\mu_s$ and $\sigma_s$ (i.e., **not** $\sigma_s^2$ but rather its square root). Store your results in two NumPy arrays, `E_mu_s[:]` and `E_sigma_s[:]`, each of size `len(ALPHAS)`, such that

- `E_mu_s[i]` is the $\mu_s$ value for $\alpha=$`ALPHAS[i]`; and

- `E_sigma_s[i]` is the $\sigma_s$ value for $\alpha=$`ALPHAS[i]`.

In [None]:
ALPHAS = np.array ([0.1, 0.25, 0.33, 0.5, 0.67, 0.75, 0.9]) # Anisotropy parameters

# YOUR CODE HERE
raise NotImplementedError()

print ("The theory says that the step size mean +/- (1-s.d.) values are:")
for i, a in enumerate (ALPHAS):
    print ("  alpha={} => E[S_t] = {:.2f} +/- {:.2f}".format (a, E_mu_s[i], E_sigma_s[i]))

In [None]:
assert (np.abs (E_mu_s) < 1.0).all ()
assert (E_sigma_s > 0).all ()
assert np.allclose (E_mu_s, -E_mu_s[::-1], atol=0.01)
assert np.allclose (E_sigma_s, E_sigma_s[::-1], atol=0.01)
assert np.isclose (E_mu_s[int (len (ALPHAS)/2)], 0.0, atol=0.01)
assert np.isclose (E_sigma_s[int (len (ALPHAS)/2)], 1.0, atol=0.01)
print ("\n(Passed!)")

**Exercise 3** (2 points). For each $\alpha=$`ALPHAS[i]`, generate 500,000 steps using `random_steps()` and compute the sample mean and standard deviation. Store your results in `Mu_s[i]` and `Sigma_s[i]`, respectively.

In [None]:
Mu_s = np.zeros (len (ALPHAS))
Sigma_s = np.zeros (len (ALPHAS))

# YOUR CODE HERE
raise NotImplementedError()

print ("Experiment says:")
for i, a in enumerate (ALPHAS):
    print ("  alpha={} => E[S_t] ~ {:.2f} +/- {:.2f}".format (a, Mu_s[i], Sigma_s[i]))

In [None]:
mu_err_test = np.linalg.norm (Mu_s - E_mu_s, ord=np.inf)
sigma_err_test = np.linalg.norm (Sigma_s - E_sigma_s, ord=np.inf)

print ("For {} steps:".format (NUM_STEPS))
print ("- ||Mu_s - E_mu_s||_inf = {:.3f}".format (mu_err_test))
print ("- ||Sigma_s - E_sigma_s||_inf = {:.3f}".format (sigma_err_test))

assert mu_err_test <= 0.01
assert sigma_err_test <= 0.01

print ("\n(Passed!)")

## The statistics of random walks

Let the position of the random walker be the random variable

$$
X_t \equiv \Delta x \sum_{k=1}^{t} S_k,
$$

where $S_k$ are the i.i.d. step directions and $\Delta x$ is the step distance.

**CLT approximation.** From the Central Limit Theorem (CLT),

$$
\begin{eqnarray}
    \mathrm{E}[X_t] \equiv \mu_t & \approx & \mu_s \cdot {\Delta x} \cdot t \\
  \mathrm{Var}[X_t] \equiv \sigma_t^2 & \approx & \sigma_s^2 \cdot {\Delta x}^2 \cdot t.
\end{eqnarray}
$$

**Exercise 4** (2 points). For each anisotropy parameter $\alpha_i$ listed in `ALPHAS[i]`, compute $\mu_t$ and $\sigma_t$ (i.e., **not** $\sigma_t^2$ but rather its square root) for the value of $t$ given below by `T_MAX`. Use a value for $\Delta x$ of 1.0.

Store your results in `E_mu_t[i]` and `E_sigma_t[i]`, two NumPy arrays of the same length as `ALPHAS`.

In [None]:
T_MAX = 100

# YOUR CODE HERE
raise NotImplementedError()

print ("The central limit theorem suggests that the mean position, +/- (1-sd), of a random walker is:")
for i, a in enumerate (ALPHAS):
    print ("  alpha={} => E[S_t] = {:.2f} +/- {:.2f}".format (a, E_mu_t[i], E_sigma_t[i]))

In [None]:
assert np.allclose (E_mu_t, -E_mu_t[::-1], atol=0.01)
assert np.allclose (E_sigma_t, E_sigma_t[::-1], atol=0.01)
assert np.allclose (E_mu_t / E_sigma_t / np.sqrt (T_MAX), np.array ([-1.333, -0.577, -0.362, 0., 0.362, 0.577, 1.333]), atol=0.01)
print ("\n(Passed!)")

**Exercise 5** (2 point). Write a function that returns the positions of a random walker who takes `t_max` steps of size `dx` each, with a probability of moving in the `+dx` direction of `alpha` and in the `-dx` direction with probability `1.0-alpha`. The return value should be a one-dimensional NumPy array of the positions.

In [None]:
def random_walk (t_max, dx=1.0, alpha=0.5):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
t_test = 100
dx_test = 2.0
alpha_test = 0.25
x_test = random_walk (t_test, dx=dx_test, alpha=alpha_test)

print ("position after {} steps (dx={}, alpha={}): {:.3f}".format (t_test,
                                                                   dx_test,
                                                                   alpha_test,
                                                                   x_test[-1]))
assert x_test[-1] < 0
print ("\n(Passed.)")

**Exercise 6** (3 points). For each $\alpha_i$ (i.e., `ALPHAS[i]`), conduct 1,000 random walks of 100 steps each, and use the results to estimate the mean and compute a 99% confidence interval. Store the sample mean, sample standard deviation, and the confidence interval width in `Mu_t[i]`, `Sigma_t[i]`, and `Confint_t[i]`, respectively.

In [None]:
T_MAX = 100 # Random walk length
Mu_t = np.zeros (len (ALPHAS))
Sigma_t = np.zeros (len (ALPHAS))
Confint_t = np.zeros (len (ALPHAS))

# YOUR CODE HERE
raise NotImplementedError()
    
print ("Your simulation results:")
for i, a in enumerate (ALPHAS):
    print ("  alpha={} => E[X_t] ≈ {:.2f} +/- {:.2f} [sd: {:.2f}]".format (a,
                                                                           Mu_t[i],
                                                                           Confint_t[i],
                                                                           Sigma_t[i]))

In [None]:
Lower_test = Mu_t - Confint_t
Upper_test = Mu_t + Confint_t
for i, a in enumerate (ALPHAS):
    result_i = Lower_test[i] <= E_mu_t[i] <= Upper_test[i]
    result_i_str = "is" if result_i else "**is not**"
    print ("[alpha={:.2f}] True mean of {:.1f} {} in [{:.3f}, {:.3f}].".format (ALPHAS[i], E_mu_t[i], result_i_str, Lower_test[i], Upper_test[i]))
assert ((Lower_test <= E_mu_t) & (E_mu_t <= Upper_test)).all ()
print ("\n(Passed!)")

Here is a picture of some realizations of random walks consisting of $t=25$ steps at different isotropies, $\alpha$ ($\Delta x = 1$).

In [None]:
T_MAX = 40
X_t = np.zeros ((T_MAX, len (ALPHAS)))
for i, a in enumerate (ALPHAS):
    X_t[:, i] = random_walk (T_MAX, alpha=a)
    
plt.figure (figsize=(8, 5))
for i, _ in enumerate (ALPHAS):
    plt.plot (np.arange (T_MAX)+1, X_t[:, i], '*-')
plt.legend (['{:.2f}'.format (a) for a in ALPHAS], loc=0)

## Position distributions

Consider a random walker who takes $t$ steps. Assume each step is of unit size, i.e., $\Delta x = 1$. Let the walker's *position distribution* be the probability that the walker lands on position $k$ at time $t$, i.e., $Pr[X_t = k]$.

Note that after $t$ unit-sized steps, $|k| \leq t$.

**Exercise 7** (3 points). Write a function that estimates this distribution, given

- the positive-step probability, $\alpha$, and
- the number of steps, $t$.

Your function should have the following signature:

```python
def estimate_posdist (alpha, t, num_trials=1):
    ...
```

It should perform `num_trials` simulation experiments to estimate the position distribution. It should return this estimate as a NumPy array, `Pos_dist[:2*t+1]` of length $2t+1$. In particular, for all $-t \leq k \leq t$, the empirical estimate of $\mathrm{Pr}[X_t=k]$ should appear in `P_dist[t+k]`.

In [None]:
def estimate_posdist (alpha, t, num_trials=1):
    P_dist = np.zeros (2*t + 1)
    # YOUR CODE HERE
    raise NotImplementedError()
    return P_dist

ALPHA = 0.5
T_MAX = 20
NUM_TRIALS = T_MAX**3
P_dist = estimate_posdist (ALPHA, T_MAX, num_trials=NUM_TRIALS)

from scipy.stats import norm
plt.figure (figsize=(8, 4))
K = -T_MAX + np.arange (2*T_MAX+1)
mu_t = (2.0 * ALPHA - 1.0) * T_MAX
sigma_t = np.sqrt ((1.0 - (2.0*ALPHA-1.0)**2) * T_MAX)
plt.plot (K, P_dist, '*-',
          K, norm.pdf (K, loc=mu_t, scale=sigma_t), 'r-')

**Exercise (OPTIONAL).** Notice that the diagnostic plot for Exercise 7 overlays the normal distribution predicted by the CLT approximation. Try repeating Exercise 7 for different values of $\alpha$ and $t$. What do you notice about the results? Do they make sense?