In [82]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from utils import *

In [2]:
%load_ext autoreload
%autoreload 2

# Checking Metropolitan and EHMM samplers with linear Gaussian models

The experiments are for the models and samplers presented in the paper\
[1] Shestopaloff A.Y, Neal R.M. Sampling Latent States for High-Dimensional Non-Linear State Space Models with the Embedded HMM Method

---
## Model specification

In [86]:
# Set up the transition model with parameters above
T = 250
n = 10
A = DiagonalMatrixParam() 
A.parametrize(n, DistributionType.NORMAL, (0., 0.25))
Q = SymmetricMatrixParam()
Q.parametrize(n, DistributionType.INVGAMMA, (1., 0.5), 1., minx=0, maxx=np.inf)
prior_mean = np.zeros(n)
Q_init = (np.ones((n,n)) * 0.7 + np.eye(n) * 0.3) * (1/(1-0.9*0.9))
trm = TransitionSpec(A, Q, prior_mean, Q_init)

Make sure the second parameter of Inverse Gamma distribution is the reciprocal of the scale.


In [87]:
# Set up the observation model with parameters above
C = DiagonalMatrixParam()
C.parametrize(n, DistributionType.NORMAL, (0.6, 0.7))
R = SymmetricMatrixParam()
R.parametrize(n, DistributionType.INVGAMMA, (1., 0.5), 1., minx=0, maxx=np.inf)
obsm = ObservationSpec(ModelType.LINEAR_GAUSS, C, R)

Make sure the second parameter of Inverse Gamma distribution is the reciprocal of the scale.


## Sampler specification  

### Metropolis sampling scheme
The scheme samples one state at a time, with all dimensions at once, conditional on all other states $x_{-t}=(x_{1},...,x_{t-1},x_{t+1},...,x_{n})$, using autoregressive update of the form described in [1].

In [88]:
# Specify metropolis scheme
nupd = 50  # number of parameter updates between the scheme runs
met_sampler = SamplerSpec(SamplerType.METROPOLIS, nupd)

\
To run the simulation we need to provide the initial sample $\mathbf{x}_0$ to start off the sampler, and specify the parameters of the simulation. We set $\mathbf{x}_0=\mathbf{0}$ for this model and scheme. We run 5 simulations for $10^6$ iterations each, starting with the different seed for randomisation. To save memory we thin the sampling, only recording each tenth sample.

In [89]:
n_iter_met = int(1e2)
x_init = np.zeros((n, T))
seeds = np.array([1, 10, 100, 1e4, 1e5], dtype=int)  # <-- pass empty array, and add smoother spec to MCMC session, to run only Kalman smoother
scales_met = np.array([0.2, 0.8])
thinning = 10
simulation_met = SimulationSpec(n_iter_met, seeds, x_init, thinning, scales_met)

### Embedded HMM sampling scheme
This scheme was proposed in [1] and uses forward pool state selection. Each sampling step samples the whole sequence at once, using HMM forwards/backwards-like algorithm to select a state from the pool at each time. To generate the pool for each time the scheme uses autoregressive and shift updates. If specified it can alternate between updates on original and the reversed observations sequence. 

In [90]:
# Specify EHMM scheme
pool_sz = 50
ehmm_sampler = SamplerSpec(SamplerType.EHMM, nupd, pool_sz)

\
As with the single state sampler, we set $\mathbf{x}_0=\mathbf{0}$ for this model and scheme. We run 5 simulations for $10000$ iterations each, starting with the different seed for randomisation (we use the same seeds as for single state sampler). Each iteration will run on both original and reversed sequence of observations.

In [91]:
n_iter_ehmm = 10000
scales_ehmm = np.array([0.1, 0.4])
reverse = True
simulation_ehmm = SimulationSpec(n_iter_ehmm, seeds, x_init, scaling=scales_ehmm, reverse=reverse)

## Observations 
Finally, we need the observations, on which to run the samplers and the smoother. The data we use is the synthetic data, generated by the programme when running the MCMC in non-parametrized mode.

In [92]:
dataprovider = "met_gauss_noreverse_nonparm"
with h5py.File(RESULTS_PATH/f"{dataprovider}_results_seed1.h5", "r") as f:
    data = f['observations'][:].reshape(n, T)
dataspec = Data(data)

## Simulation    
At first we run the Metropolitan single state sampler and compare the results with the Kalman smoother results. This will establish a benchmark for further experiments. We will also run the EHMM sampler and compre its results with our benchmark as well as with the single state sampler.

In [93]:
# Simulation with Metropolitan single state sampler
met_session_name = "met_gauss_noreverse_param_v2"
mcmc_met_lg = MCMCsession(met_session_name)
if mcmc_met_lg.hasResults():
    mcmc_met_lg.loadResults() # <-- NB! Each file is ~2GB so it takes time to download
else:
    mcmc_met_lg.init(T, trm, obsm, met_sampler, simulation_met, dataspec)
    IdChecker().checkSpecs(mcmc_met_lg)
    mcmc_met_lg.run()

Launched Baysis
Loaded model specifications.
2114231 = std::__1::function<std::__1::shared_ptr<algos::IMcmc> (HighFive::Group const&, HighFive::Group const&, HighFive::Group const&)>
2114232 = std::__1::function<std::__1::shared_ptr<algos::IMcmc> (HighFive::Group const&, HighFive::Group const&, HighFive::Group const&)>
2114331 = std::__1::function<std::__1::shared_ptr<algos::IMcmc> (HighFive::Group const&, HighFive::Group const&, HighFive::Group const&)>
2114332 = std::__1::function<std::__1::shared_ptr<algos::IMcmc> (HighFive::Group const&, HighFive::Group const&, HighFive::Group const&)>
2114431 = std::__1::function<std::__1::shared_ptr<algos::IMcmc> (HighFive::Group const&, HighFive::Group const&, HighFive::Group const&)>
2114432 = std::__1::function<std::__1::shared_ptr<algos::IMcmc> (HighFive::Group const&, HighFive::Group const&, HighFive::Group const&)>
2115231 = std::__1::function<std::__1::shared_ptr<algos::IMcmc> (HighFive::Group const&, HighFive::Group const&, HighFive::Grou

In [18]:
# Simulation with EHMM sampler
ehmm_session_name = "ehmm50_noflip_gauss_wreverse_param"
mcmc_ehmm_lg = MCMCsession(ehmm_session_name)
if mcmc_ehmm_lg.hasResults():
    mcmc_ehmm_lg.loadResults()
else:
    mcmc_ehmm_lg.init(T, trm, obsm, ehmm_sampler, simulation_ehmm, dataspec)
    mcmc_ehmm_lg.run()

Launched Baysis
Loaded model specifications.
2114231 = std::__1::function<std::__1::shared_ptr<algos::IMcmc> (HighFive::Group const&, HighFive::Group const&, HighFive::Group const&)>
2114232 = std::__1::function<std::__1::shared_ptr<algos::IMcmc> (HighFive::Group const&, HighFive::Group const&, HighFive::Group const&)>
2114331 = std::__1::function<std::__1::shared_ptr<algos::IMcmc> (HighFive::Group const&, HighFive::Group const&, HighFive::Group const&)>
2114332 = std::__1::function<std::__1::shared_ptr<algos::IMcmc> (HighFive::Group const&, HighFive::Group const&, HighFive::Group const&)>
2114431 = std::__1::function<std::__1::shared_ptr<algos::IMcmc> (HighFive::Group const&, HighFive::Group const&, HighFive::Group const&)>
2114432 = std::__1::function<std::__1::shared_ptr<algos::IMcmc> (HighFive::Group const&, HighFive::Group const&, HighFive::Group const&)>
2115231 = std::__1::function<std::__1::shared_ptr<algos::IMcmc> (HighFive::Group const&, HighFive::Group const&, HighFive::Grou

KeyboardInterrupt: 