In [None]:
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

# Sequential Importance Sampling for HMMs

A simple version of a ***Normal-Normal Hidden Markov Model (HMM)*** for $t=1,\cdots,T$ 
and a ***sequential importance sampling proposal*** for this ***HMM*** using the [slash distribution](https://en.wikipedia.org/wiki/Slash_distribution) are


\begin{align*}
Y_t \sim {} & N(X_t, \sigma^2) \quad\;  \text{(observed)} &&& \tilde X_t = {} & \tilde X_{t-1} + Z_t/U_t \quad \text{(slash distribution proposal)}\\
X_t \sim {} & N(X_{t-1}, 1) \quad \text{(unobserved)} &&& Z_t \sim {} & N(0, 1)\\
X_0 \equiv {} & 0 \equiv \tilde X_0 &&&U_t \sim {} & U(0, 1)
\end{align*}

The ***sequential importance weight*** for the $j^{th}$ ***importance sampling proposal*** random variable (sequence) $\tilde X_{1:T} \equiv (\tilde X_1, \cdots, \tilde X_t, \cdots \tilde X_T)$, where $j$ is not included in the notation for brevity, is

\begin{align*}
\require{cancel}
   W_1^* = {} & \frac{p_{X_1|Y_1}(\tilde x_1|Y_1=y_1)}{\tilde p_{\tilde X_1}(\tilde x_1)} \\
   = {} &\frac{p_{Y_1 | X_1}(y_1 | \tilde x_1) p_{X_1}(\tilde x_1) \overset{\text{normalization}}{\cancel{/ p_{Y_1}(y_1)}}}{\tilde p_{\tilde X_1}(\tilde x_1) } \quad \text{where the normalized importance weight of the $j^{th}$ sample proposals is } W_{1j} = \frac{W_{1j}^*}{\sum_j W_{1j}^*}\\
    W_t^* = {} &  \frac{p_{X_{1:t},Y_{1:t}}(\tilde x_{1:t}, y_{1:t})/p_{Y_{1:t}}(y_{1:t})}{\tilde p_{\tilde X_{1:t}}(\tilde x_{1:t})} \\
   = {} & \underbrace{\left(\frac{p_{X_{1:(t-1)},Y_{1:(t-1)}}(\tilde x_{1:(t-1)}, y_{1:(t-1)})}{ \tilde p_{\tilde X_{1:(t-1)}}(\tilde x_{1:(t-1)})  }\right)}_{W^*_{t-1}} \frac{p_{X_t|X_{t-1}}(\tilde x_t | \tilde x_{t-1})p_{Y_t|X_{t}}( y_t | \tilde x_t)}{ \tilde p_{\tilde X_t | \tilde X_{t-1}}(\tilde x_t | \tilde x_{t-1}) } \frac{1}{\underset{\text{normalization}}{\cancel{p_{Y_{1:t}}(y_{1:t})}}} \quad \text{again normalized over $j$ as } W_{jt} = \frac{W_{jt}^*}{\sum_j W_{jt}^*}\\
   & {} \text{with $\tilde p_{\tilde X_t | \tilde X_{t-1}}(\tilde x_t | \tilde x_{t-1}) \not =  p_{\tilde X_t | \tilde X_{t-1}}(\tilde x_t | \tilde x_{t-1})$ not cancelling though for the specification $\tilde X \sim N(\tilde X_{t-1},1)$ they would.}
\end{align*}

Complete the `calculate_SIS_weights` function so that the `SIS_wResampling` function and the code below executes ***sequential importance sampling with resampling*** (also called [***sequential importance resampling***](https://en.wikipedia.org/wiki/Particle_filter#Sequential_Importance_Resampling_(SIR)))
for this ***HMM*** and ***slash proposal distribution*** specification.  

*This problem is inspired by Chapter 6.3.2 **Sequential Monte Carlo** and particularly draws upon sections 6.3.2.5 **Particle Filters** on page 179, 6.3.2.4 **Sequential Importance Sampling for Hidden Markov Models** on pages 175-176, 6.3.2.3 **Weight Degeneracy, Rejuvenation, and Effective Sample Size**, and Example 6.3 **Slash Distribution** pages 165-166 of the Givens and Hoeting **Computational Statistics** textbook. [Errata Warning: the right side equation 6.33 on page 175 in section 6.3.2.4 **Sequential Importance Sampling for Hidden Markov Models** has the typo $f_t$ which should be $f_{t-1}$ and the left side of the equation is wrong and should either be $f_{t}(x_{1:t},y_t|y_{1:(t-1)})$, or the normalizing constant $p(y_t|y_{1:(t-1)})$ could be added as a denomenator on the right hand side, or the equation could have alternatively been specified in terms of joint distributions rather than conditional distributions as $f_{t}(x_{1:t},y_{1:t}) = f_{t-1}(x_{1:(t-1)},y_{1:(t-1)})p_x(x_t|x_{t-1})p_y(y_t|x_t)$, e.g., as is done [here](https://www.almoststochastic.com/2013/08/sequential-importance-sampling.html)]*.   

In [None]:
# do not edit this cell
T,noise_to_signal = 100,5
x,y = np.zeros(T+1),np.zeros(T+1)
np.random.seed(9)
x[0], y[0] = 0, x[0] + stats.norm.rvs(scale=noise_to_signal, size=1)
for t in range(1,T+1):
    x[t] = x[t-1] + stats.norm.rvs(size=1)
    y[t] = x[t] + stats.norm(scale=noise_to_signal).rvs(size=1)
plt.plot(x, label='Signal')
plt.plot(y, label='Signal+Noise')
plt.title("How well can the original signal be detected?")
plt.legend();

In [None]:
n = 500
# T is the number of time points and we have one time series observation comprised of T time point observations
# n above is the number of particles, i.e., importance sampling proposals, each a time series of length T 
# (There's actually T+1 points because X0=tildeX0=0)

tilde_x, w_star, effective_sample_size = np.zeros((n,T+1)), np.ones((n,T+1)), np.ones(T+1)*n
# tilde_x is the importance sampling proposals (i.e., stroed across rows)
# w_star  is the unnormalized importance sampling proposal weights accumulated up to time t 
#         for each of the n proposals (i.e., stored down the columns)
# effective_sample_size is what it sounds like and is calculated from SIS weights as in the code below

# proposal distribution sampling and evaluation
# https://en.wikipedia.org/wiki/Slash_distribution
slash_pdf = lambda x: (1-np.exp(-x**2/2))/(x**2*np.sqrt(2*np.pi))
slash_rvs = lambda n: stats.norm.rvs(size=n)/stats.uniform.rvs(size=n)

# particle weights: COMPLETE THIS FUNCTION! When this fucntion works all the remaining code will work.
def calculate_SIS_weights(y, tilde_x_t, tilde_x_t_minus_1, w_star_t_minus_1, proposal_pdf):
    w_star_t = np.zeros(w_star_t_minus_1.shape)
    # <complete>
    return w_star_t

# normalizing weights for resampling has a known problem which is avoided by this function
# https://github.com/numpy/numpy/issues/8317
def normalize(w):
    pcond = True
    while pcond:
        w[w==min(w[:-1][w[:-1]>0])] = 0
        # https://github.com/scipy/scipy/blob/v1.8.0/scipy/stats/_multivariate.py
        # does automatically does this "correction" and subsequent `pcond` checks below
        w[..., -1] = 1. - w[..., :-1].sum()
        # annoyingly, this reassignment of `w[-1]` in `scipy.stats._multivariate.py` 
        # can cause proper `w` to break, e.g., producing a negative `w[-1]` or a `w.sum()`>1.
        # to correct this error which `scipy.stats._multivariate.py` may introduce 
        # this code iteratively simplifies `w` via `w[w==min(w[:-1][w[:-1]>0])] = 0` above
        # and the normalization below until the `pcond` checks in `_multivariate.py` will pass
        w = w/np.sort(w)[::-1].sum()
        # "true for bad p"
        pcond = np.any(w < 0, axis=-1)
        pcond |= np.any(w > 1, axis=-1)
        pcond |= w[:-1].sum()>1
    return w        

# This is a helper function for particle filter rejuvenation
# i.e., bootstrapping SIS according to the particle weights
# i.e., converting SIS to SIR (i.e., SIS with resampling) 
def bootstrap_indices(ireps):
    return np.concatenate([r*[i] for i,r in enumerate(ireps)]).flatten().astype(int)
    
def SIS_wResampling(y, tilde_x, w_star, slash_rvs, slash_pdf, n, T, R):
    
    # SIS extension loop
    for t in range(1, T+1):

        # extension of SIS proposal from `tilde_x[:,t-1]` to tilde_x[:,t]
        tilde_x[:,t] = tilde_x[:,t-1] + proposal_rvs(n)
        # SIS (cumulative) weights: YOU SHOULD HAVE COMPLETED THIS FUNCTION ABOVE!
        w_star[:,t] = calculate_SIS_weights(y[t], tilde_x[:,t].copy(), tilde_x[:,t-1].copy(), 
                                            w_star[:,t-1].copy(), proposal_pdf = proposal_pdf)
        # normalizing SIS weights for subsequent bootstrapping, i.e., particle filter rejuvenation
        w_star_normalized = w_star[:,t].copy()
        w_star_normalized = w_star_normalized/w_star_normalized.sum() # this "should" be sufficient
        # but because of the issue noted above in `normalize` additional numeric correction is needed
        w_star_normalized = normalize(w_star_normalized)  
        effective_sample_size[t] = 1/(w_star_normalized**2).sum()
        # sequence weights decay over time as a bad proposals at time t erode overal proposal quality
        # the weights can be rejuvenated, however, by using a bootstrapping step in the partical filter
        if (effective_sample_size[t]<R) or (t==T-1):
            # bootstrap partical filter rejuvenation according to the current (normalized) SIS weights
            bs_samp = stats.multinomial(n, p=w_star_normalized).rvs(size=1)[0]
            tilde_x[:,:(t+1)] = tilde_x[bootstrap_indices(bs_samp),:(t+1)]
            w_star[:,t] = 1 # because the weighted importance samples approximate the true
            # distribuition, thus they are taken is iid samples from the true distribution
    return tilde_x, effective_sample_size

np.random.seed(20)            
tilde_x, effective_sample_size = SIS_wResampling(y.copy(), tilde_x.copy(), w_star.copy(),
                                                 proposal_rvs, proposal_pdf, n, T, n/10)
            
def plotit():
    fig,ax = plt.subplots(2,1, figsize=(15,10))
    for i in range(n):
        ax[0].plot(tilde_x[i,:], color='gray')
    ax[0].plot(x, 'k:', label='latent HMM')
    ax[0].plot(y, 'r-', label='Observation')
    ax[0].plot(tilde_x.mean(axis=0), 'w-', label='SIS HMM estimate')
    ax[0].legend(facecolor='gray', framealpha=.5)
    ax[1].plot(effective_sample_size)
    ax[1].set_title("Effective Sample Size");
plotit()

## Hints:

- The questions below are designed to help you complete the necessary code.
    - Only update the `calculate_SIS_weights` method.  
    - Nothing else in the code should be changed.



### Problem 3 question 0-11 (2 points, 1/6 point each)

0. What kind of methodology is implemented by the `SIS_wResampling` function?

- (A) Bayesian MCMC analysis
- (B) Optimization
- (C) Particle filtering
- (D) Rejection sampling

1. Which of the following does `stats.norm(loc=tilde_x_t, scale=noise_to_signal).pdf(y)` represent?

- (A) $p_{Y_t|X_{t}}( y_t | \tilde x_t)$
- (B) $p_{X_t|X_{t-1}}(\tilde x_t | \tilde x_{t-1})$
- (C) $\tilde p_{\tilde X_t | \tilde X_{t-1}}(\tilde x_t | \tilde x_{t-1})$
- (D) None of the above

2. Which of the following can `stats.norm(loc=tilde_x_t, scale=noise_to_signal).pdf(y)` represent?

- (A) $p_{Y_1 | X_1}(y_1 | \tilde x_1)$
- (B) $p_{X_1}(\tilde x_1)$
- (C) $\tilde p_{\tilde X_1}(\tilde x_1)$
- (D) None of the above

4. Which of the following does `stats.norm(loc=tilde_x_t_minus_1, scale=1).pdf(tilde_x_t)` represent?

- (A) $p_{Y_t|X_{t}}( y_t | \tilde x_t)$
- (B) $p_{X_t|X_{t-1}}(\tilde x_t | \tilde x_{t-1})$
- (C) $\tilde p_{\tilde X_t | \tilde X_{t-1}}(\tilde x_t | \tilde x_{t-1})$
- (D) None of the above


4. Which of the following can `stats.norm(loc=tilde_x_t_minus_1, scale=1).pdf(tilde_x_t)` represent?

- (A) $p_{Y_1 | X_1}(y_1 | \tilde x_1)$
- (B) $p_{X_1}(\tilde x_1)$
- (C) $\tilde p_{\tilde X_1}(\tilde x_1)$
- (D) None of the above

5. Which of the following does `proposal_pdf(tilde_x_t-tilde_x_t_minus_1)` represent?

- (A) $p_{Y_t|X_{t}}( y_t | \tilde x_t)$
- (B) $p_{X_t|X_{t-1}}(\tilde x_t | \tilde x_{t-1})$
- (C) $\tilde p_{\tilde X_t | \tilde X_{t-1}}(\tilde x_t | \tilde x_{t-1})$
- (D) None of the above

6. Which of the following can `proposal_pdf(tilde_x_t-tilde_x_t_minus_1)` represent?

- (A) $p_{Y_1 | X_1}(y_1 | \tilde x_1)$
- (B) $p_{X_1}(\tilde x_1)$
- (C) $\tilde p_{\tilde X_1}(\tilde x_1)$
- (D) None of the above

7. What is $\frac{p_{X_{1:(t-1)},Y_{1:(t-1)}}(\tilde x_{1:(t-1)}, y_{1:(t-1)})}{ \tilde p_{\tilde X_{1:(t-1)}}(\tilde x_{1:(t-1)})  }$ equal to?

- (A) `w_star_t`
- (B) `w_star_t_minus_1`
- (C) `w_star_t * w_star_t_minus_1`
- (D) None of the above


8. Why doesn't the time series match the observed data?

- (A) The model assumes noisy data and estimates the latent trend in spite of the noise
- (B) More data is required since this is just a single (multivariate time series) observation 
- (C) The effective sample size is too low and the particles are thus not sufficiently diverse
- (D) The model just does not seem to work very well as currently specified

9. Why is the predicted trend line smoother toward the end of the time series?

- (A) Rejuvenating effective sample size by bootstrapping filters out early sequence diversity 
- (B) There are more proposal sequences towards then end of the time series versus the beginning
- (C) The data is more volatile towards the beginning of the time series versus versus the end
- (D) It's not really less smooth and just looks that way due to run to run variation

10. If the diversity of the sequences sequences resampled by bootstrapping is reduced due to sample multiplicity from bootstrap sampling, why does the effective sample size still end to be close to the original number of specified particles `n`?

- (A) The effective samples size calculation is larger for more homogenous weights
- (B) The weights are rejuvenated to 1 through the bootstrap approximation to the true distribution
- (C) Both of the above
- (D) None of the above

11. What is the true cumulative MSE of the bootstrap particle filter estimator?

$$\frac{\sum_{t=1}^T\left(\frac{\sum_{j=1}^n \tilde x_{jt}}{n} - x_t\right)^2}{n}$$ 

*Note: do not include $t=0$ in your calculation as that's just the inititalization.*

In [None]:
# p3q0-q11: 1/6 point each [format: `str` either "A" or "B" or "C" or "D" based on the choices above]
p3q0 = ""#<"A"|"B"|"C"|"D"> 
p3q1 = ""#<"A"|"B"|"C"|"D"> 
p3q2 = ""#<"A"|"B"|"C"|"D"> 
p3q3 = ""#<"A"|"B"|"C"|"D"> 
p3q4 = ""#<"A"|"B"|"C"|"D"> 
p3q5 = ""#<"A"|"B"|"C"|"D"> 
p3q6 = ""#<"A"|"B"|"C"|"D"> 
p3q7 = ""#<"A"|"B"|"C"|"D"> 
p3q8 = ""#<"A"|"B"|"C"|"D"> 
p3q9 = ""#<"A"|"B"|"C"|"D"> 
p3q10 = ""#<"A"|"B"|"C"|"D"> 
# p2q11: [format: `float`]
p3q11 = #
# This cell will produce a runtime error until the `p1q11` variable is assigned a value