# MCMC for Inference of a Parameter Function

In this notebook, we develop a Markov Chain Monte Carlo Algorithm for the inference of a parameter
function. We consider a problem similar to the ODE example on exercise sheet two. However, we do not prescribe
a fixed parameter coefficient $u$, but rather extend the problem to a parameter function $u(t)$.
This makes the inverse problem formally infinite-dimensional.
The forward model is characterized by the ODE
$$
    \frac{dx}{dt} = u(t)x(t),\quad t\in [0, 1],\quad x(0) = 1.
$$
As the true parameter function, we choose $u_{true}(t) = 5 \sin(4\pi t)$. We further assume that data is
given as the solution of the forward problem at a number of times $\{t_i\}_{i=1}^d$, perturbed by
zero-centered Gaussian noise,
$$
    y_i = x(t_i) + \eta,\quad \eta \sim \mathcal{N}(0,\sigma_{noise}^2),\quad i=1,2,\ldots,d.
$$
Consequently, we can define the likelihood for observing the data given a parameter function $u$ as
$$
    l(y|u) \propto \exp \Bigl( -\frac{1}{2\sigma_{noise}^2} || \mathcal{B}x(u)-y ||^2 \Bigr),
$$
where $x(u)$ is the implicitly defined solution of the ODE and $B$ denotes a projection operator onto
the observation locations.

Moving on, we prescribe an infinite-dimensional prior in the form of a Gaussian random field.
In particular, we choose a zero mean, $\bar{u}_{prior} = 0\ \forall t$ and a square exponential or
radial basis covariance function,
$$
    \mathrm{cov}(t_i, t_j) = \sigma_{prior}^2 \exp\Bigl( -\frac{1}{2l^2}(t_i-t_j)^2 \Bigr).
$$

Here, $\sigma_{prior}^2$ denotes the field variance and $l$ the characteristic length of correlation
between different time points.

To solve this problem numerically, we discretize the parameter function and ODE solution on a uniform
grid,
$$
    \mathcal{G}_t = \{ t_n = n\Delta t;\ n=0,1,\ldots,N;\ \Delta t = \mathrm{const}>0 \}.
$$

The class `ODEInverseProblem` in the file `ode_inverse_problem` implements the functionalities for
the solution of the discretized inverse problem. For a given parameter function, it provides methods to compute the
numerical ODE solution and evaluate the log prior, likelihood and posterior. It further has 
multiple attributes to call. We will introduce the most important here, but you can also find them
under the `@property` decorators at the bottom of the `ode_inverse_problem` file.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

import ode_inverse_problem as oip
import mcmc_sampler as mcmc

%matplotlib widget
plt.close('all')
plt.style.use('bmh')

The `ODEInverseProblem` class requires two settings dictionaries in its constructor. The
`settings_ode` dict specifies the parameters for the discretization and numerical solution of the
ODE problem. `settings_inverse` contains the parameters that define the inverse problem.

In [None]:
settings_ode = {
    'initial-condition':        1,
    'start-time':               0,
    'end-time':                 1,
    'time-step-size':           1e-2,
    'solver':                   'explicit-euler'
}

settings_inverse = {
    'exact-solution':           lambda t: 5 * np.sin(4 * np.pi * t),
    'prior-mean':               lambda t: np.zeros(t.size),
    'prior-variance':           10,
    'prior-correlation-length': 0.1,
    'num-data-points':          50,
    'data-noise-variance':      1e-2
}

BIProblem = oip.ODEInverseProblem(settings_ode, settings_inverse)

The `BIProblem` object has attributes that yield the forward solution of the ODE problem for the
exact parameter function and the prescribed discretization. We can further contrast it with the
"exact solution" (evaluated on a very fine grid). Lastly, `BIProblem` initializes data by perturbing
the exact solution on `num-data-points` locations of the fine time grid.

In [None]:
_, ax = plt.subplots()
ax.set_title('Numerical ODE solution')
ax.plot(BIProblem.grid, BIProblem.forward_exact_solution, label='Approximation')
ax.plot(BIProblem.grid_fine,  BIProblem.forward_exact_solution_fine, label='exact')
ax.scatter(*BIProblem.data, color='grey', label='data')
ax.set_xlim((BIProblem.grid[0], BIProblem.grid[-1]))
ax.set_xlabel(r'$t$')
ax.set_ylabel(r'$x(t)$')
ax.legend()

It is the goal of this exercise to develop an MCMC procedure for the described inverse problem.
The class `MCMCSampler` in the file `mcmc_sampler` implements such an MCMC framework. For its
initialization, it requires  a proposal and a kernel function. The proposal function is used to 
create new samples from a given state. The kernel function accepts or rejects a step, given the 
current state of a chain and a proposal.

**Exercise:** Implement the preconditioned Crank-Nicolson proposal and the corresponding
              form of the Metropolis-Hastings kernel.
              
*The proposal function takes the following input*:
  1. A `numpy` random number generator
  2. A sampling prefactor that corresponds to the cholesky factor of the prior covariance
     matrix
  3. The current sample
  4. The step size $\beta$
   
*The kernel function takes as input*:
  1. A `numpy` random number generator
  2. The current state and the proposed sample
  3. The "log probabilities" of the current and proposed samples. For the PCN proposal, this is
     the log likelihood (not the negative log-likelihood!)

In [None]:
def proposal(rng, sampling_prefactor, current_sample, step_size):

    return proposal

def kernel(rng, current_sample, current_log_prob, proposed_sample, proposal_log_prob):

    # Return next sample and corresponding log probability (current sample or proposal)
    # Boolean flag is_accepted indicates if sample has been accepted
    return next_sample, next_log_prob, is_accepted

We initialize the sampler with the `likelihood_only=True` flag. This means that only the
log-likelihood is passed to the kernel function. We can then call the `sample` method with
canonical MCMC settings. The `num-statistics-batches` setting allows for the subdivision of the
iterations after the burn-in into batches. This can be useful to examine convergence.
The `sample` function returns the four variables `sample_norm`, `accept_ratio`, `mean` and `variance`.
`sample_norm` holds the norm of the MCMC sample at each iteration. It defines a particular
*quantity of interest* (QOI), which we can analyze and visualize instead of the sample itself.
`accept_ratio`, `mean` and `variance` comprise the acceptance ratio, the sample mean and the sample
variance for each batch (all iterations after burn-in if the number of batches is chosen to be 1).

In [None]:
settings_mcmc = {
    'num-samples':              2e4,
    'num-burnin':               1e4,
    'step-size':                0.01,
    'num-statistics-batches':   1,
    'initial-sample':           BIProblem.prior_mean
}

Sampler = mcmc.MCMCSampler(BIProblem, proposal, kernel, likelihood_only=True)
sample_norm, accept_ratio, mean, variance = Sampler.sample(settings_mcmc)

**Exercise:** Run the sampler with the given settings. Print out the acceptance ratio, plot the
sample norm, and the posterior mean and 95% confidence interval (assume that the posterior is
approximately Gaussian) and compare to the exact solution.

Now vary the prior parameters `prior-variance` and `prior-correlation-length`. What do you observe and why?