# Introduction to calibration and uncertainty propagation
Up until now we have been creating models that may accurately represent the local epidemic but (at best) only provide one possible epidemic profile that would be consistent with the observations. In this notebook, we extend this to obtain a range of parameter values and epidemic trajectories that would be consistent with the local observations, and thereby quantify the uncertainty in our simulations.

In this notebook, we will learn how to use a Markov Chain Monte Carlo (MCMC) algorithm to calibrate an SIR model to epidemic data.
That is, we will use a Bayesian sampling approach to estimate model parameters and to project the epidemic with uncertainty.

We will implement the Metropolis algorithm which is one type of MCMC.

Recommended pre-reading:
- Wikipedia page on Metropolis–Hastings algorithm [here](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm),
- Some example implementations with discussion of common tuning issues [here](https://jellis18.github.io/post/2018-01-02-mcmc-part1/).


And also, a great interactive demo of multiple Bayesian sampling algorithms [here](https://chi-feng.github.io/mcmc-demo/).



In [None]:
# pip install the required packages if running in Colab
try:
  import google.colab
  IN_COLAB = True
  %pip install summerepi2
except:
  IN_COLAB = False

if IN_COLAB:
  !wget https://raw.githubusercontent.com/monash-emu/AuTuMN/master/notebooks/capacity_building/common_files/calibration_.png

In [None]:
# Standard imports, plotting option and constant definition
import pandas as pd
from scipy import stats
import numpy as np

from summer2 import CompartmentalModel
from summer2.parameters import Parameter

pd.options.plotting.backend = "plotly"

## Create some dummy data we want our model to fit to

In [None]:
data = pd.DataFrame({"active_cases":
{
    60.: 3000.,
    80.: 8500.,
    100.: 21000.,
    120.: 40000.,
    140.: 44000.,
    160.: 30000.,
    180.: 16000.,
    200.: 7000.,
}}
)
data['active_cases'].plot(kind="scatter")

# Model

## Define a simple SIR model

In [None]:
def build_sir_model(model_config: dict) -> CompartmentalModel:
    """
    Create a compartmental model, with the minimal compartmental structure needed to run and produce some sort of 
    meaningful outputs.
    
    Args:
        model_config: Fixed values that determine structural and numerical properties of the model
    Returns:
        A compartmental model currently without stratification applied
    """

    model = CompartmentalModel(
        times=(model_config["start_time"], model_config["end_time"]),
        compartments=["S", "I", "R"],
        infectious_compartments=["I"],
    )

    infectious_seed = model_config["infectious_seed"]
    initial_population = model_config["initial_population"]
    assert initial_population >= infectious_seed, "Initial population size must be greater than infectious seed"

    model.set_initial_population(
        distribution=
        {
            "S": initial_population - infectious_seed, 
            "I": infectious_seed
        }
    )
    
    # Set up flows with summer2 Parameter objects - these are placeholders
    # whose actual values will be looked up in a dictionary when we run the model
    
    # Susceptible people can get infected
    model.add_infection_frequency_flow(
        name="infection", 
        contact_rate=Parameter("contact_rate"), 
        source="S", 
        dest="I",
    )
    
    # Note that you can perform arithmetic and other transforms on Parameter objects - 
    # their final values will be computed later
    
    # Infectious people recover
    model.add_transition_flow(
        name="recovery",
        fractional_rate= 1. / Parameter("infection_duration"),
        source="I",
        dest="R",
    )

    return model

## Run the model with some example parameters

In [None]:
model_config = {
    # Fixed configuration options that define the structure and behaviour of the model
    "initial_population": 1.e6,
    "infectious_seed": 100.,
    "start_time": 0,
    "end_time": 365,
}
# Get an SIR model object
sir_model = build_sir_model(model_config)


In [None]:
# Define a dictionary of free parameters that we will use in calibration - ie those that we
# declared as Parameter objects when building the model
parameters = {
    "contact_rate": 0.3,
    "infection_duration": 9.0
}

# Run the model with the dummy parameter values
sir_model.run(parameters)

# Plot the model outputs against the data
output_df = pd.DataFrame({
    "modelled": sir_model.get_outputs_df()["I"],
    "observed": data.active_cases
})
output_df.plot(kind='scatter')


# Calibration specifications
## Posterior distribution
The main objective of our calibration is to estimate the **posterior distribution** of the calibrated parameters. This is the probability distribution of the parameters that are able to describe our observations, given some prior knowledge about these parameters and a mathematical model. In other words, this tells us "What values the parameters should take such that our model is able to capture the data, and given any prior information we had about the parameters before even running the model".

### Overview of the posterior computation
![title](../common_files/calibration_.png)

## Bayes Theorem used to decide parameter acceptance


The posterior probability of a parameter set $\theta$ associated with the data $y$ is denoted $P(\theta | y)$.

Let's write the Bayes Therorem for reference:
$$P(\theta | y) = \frac{P(y | \theta) \times P(\theta)}{P(y)} \quad.$$

Within the MCMC loop, we are only interested in the acceptance ratio that defines the probability of acceptance of a newly proposed parameter set $\theta '$, when the last accepted parameter set was $\theta$. This is the ratio of the posterior probabilities between $\theta '$ and $\theta$:
$$H := \frac{P(\theta ' | y)}{P(\theta | y)} = \frac{\frac{P(y | \theta ') \times P(\theta ')}{P(y)}}{\frac{P(y | \theta) \times P(\theta)}{P(y)}} = \frac{P(y | \theta ') \times P(\theta ')}{P(y | \theta) \times P(\theta)} \quad .$$

If $H \geq 1$, we accept the proposed parameter set $\theta'$. Otherwise, the proposed parameter set $\theta'$ is accepted with probability $H$. 

Here we will define the fundamental aspects of our MCMC calibration:
- Our prior knowledge about the calibrated parameters: $P(\theta)$
- The likelihood associated with our model. This is the probability of observing the data under a given model parameterisation: $P(y|\theta)$

... and some other technical aspects:
- Intitial point from which the MCMC algorithm starts
- The proposal function (or jumping process), defining how we move around in our parameter space. This is defined by $\pi(\theta' | \theta)$ which is the probability of reaching the parameter set $\theta'$, when starting from the parameter set $\theta$.

### Using log-transformed quantities

The prior and likelihood quantities are often extremely small numbers in practice, which may make computation difficult due to computer precision limits. To avoid issues related to rounding, the probabilities are usually transformed using the logarithm function before calculation of the acceptance ratio.

$$H=\frac{P(y | \theta ') \times P(\theta ')}{P(y | \theta) \times P(\theta)} \\
= exp\Bigl( \Bigl[\ln\bigl(P(y|\theta')\bigr) + \ln\bigl(P(\theta')\bigr) \Bigr] - \Bigl[\ln\bigl(P(y|\theta)\bigr) + \ln\bigl(P(\theta)\bigr) \Bigr] \Bigr) \\
= exp( A(\theta') - A(\theta)) \quad,
$$
where $A(\theta):=\ln\bigl(P(y|\theta)\bigr) + \ln\bigl(P(\theta)\bigr) $ will be referred to as the acceptance quantity associated with parameter set $\theta$.

### Prior distributions

In [None]:
def evaluate_log_priors(proposed_parameters: dict) -> float:
    # Initialise the prior likelihood to 1
    prior_log_proba = 0.

    # Use a uniform prior on [0., 0.5] for the contact_rate 
    prior_log_proba += stats.uniform.logpdf(x=proposed_parameters['contact_rate'], loc=0, scale=0.5)

    # Use a normal prior for the infection duration, with mean=7 days and sd=.5
    prior_log_proba += stats.norm.logpdf(x=proposed_parameters['infection_duration'], loc=7, scale=.5)

    return prior_log_proba

### Likelihood function

In [None]:
def evaluate_log_likelihood(model, proposed_parameters: dict) -> float:

    # build and run the model with the selected parameters
    parameter_set = dict(proposed_parameters)
    model.run(parameter_set)
    modelled_active = model.get_outputs_df()['I']

    # calculate the log-likelihood associated with the model run
    log_likelihood = 0.
    for data_time, data_value in data['active_cases'].iteritems():
        modelled_value = modelled_active.loc[data_time]
        # use a normal likelihood with sd=100, centered on the model estimate
        log_likelihood += stats.norm.logpdf(x=data_value, loc=modelled_value, scale=10000.)

    return log_likelihood


### Acceptance quantity

In [None]:
def evaluate_acceptance_quantity(model: CompartmentalModel, proposed_parameters: dict) -> float:

    log_prior = evaluate_log_priors(proposed_parameters)
    log_likelihood = evaluate_log_likelihood(model, proposed_parameters)
    acceptance_quantity = log_prior + log_likelihood
    
    return acceptance_quantity

### Proposal (jumping) function

In [None]:
def propose_parameter_set(previous_parameters: dict, jumping_sds: dict) -> dict:
    
    proposed_parameters = {}
    for param_name in ["contact_rate", "infection_duration"]:
        proposed_parameters[param_name] = stats.norm.rvs(loc=previous_parameters[param_name], scale=jumping_sds[param_name])

    return proposed_parameters


## The Metropolis algorithm

In [None]:
def sample_with_metropolis(model: CompartmentalModel, n_iter: int, initial_parameters: dict, jumping_sds:dict) -> dict:
   
    mcmc_record = pd.DataFrame(
        index=range(n_iter), 
        columns=["contact_rate", "infection_duration", "acceptance_quantity", "changed_position"],
        dtype=float
    )

    current_parameters = initial_parameters
    current_acceptance_quantity = evaluate_acceptance_quantity(model, current_parameters) 

    for i_run in range(n_iter):
        # Print a periodic update so we know things are running...
        
        if (i_run % 1000) == 0:
            print(f"Running iter {i_run} of {n_iter} iterations")
        
        # Propose a new parameter set and evaluate its acceptance quantity
        proposed_parameters = propose_parameter_set(current_parameters, jumping_sds)
        proposed_acceptance_quantity = evaluate_acceptance_quantity(model, proposed_parameters) 

        # Decide whether to accept the proposed parameters or not
        if proposed_acceptance_quantity >= current_acceptance_quantity:
            accept = 1
        else:
            accept_proba = np.exp(proposed_acceptance_quantity - current_acceptance_quantity)
            accept = stats.binom.rvs(1, accept_proba)  # flip a coin
        
        # Update the MCMC sampler in case of acceptance
        if accept == 1:
            current_parameters = proposed_parameters
            current_acceptance_quantity = proposed_acceptance_quantity

        # Record the current state
        mcmc_record.loc[i_run] = {
            "contact_rate": current_parameters['contact_rate'], 
            "infection_duration": current_parameters['infection_duration'], 
            "acceptance_quantity": float(current_acceptance_quantity),
            "changed_position": accept
        }
    
    return mcmc_record 

# Let's calibrate our model

## Run the algorithm

In [None]:
n_iterations = 5000
initial_parameters = {
    "contact_rate": 0.15,
    "infection_duration": 10.,
}

jumping_sds = {
    "contact_rate": .01,
    "infection_duration": .5
}

mcmc_output = sample_with_metropolis(sir_model, n_iterations, initial_parameters, jumping_sds)

## Explore the MCMC outputs

### Acceptance rate

In [None]:
n_accepted = mcmc_output['changed_position'].sum()
acceptance_perc = 100. * n_accepted / n_iterations
print(f"Our MCMC's acceptance rate is: {round(acceptance_perc,2)}%")

### Progression of the acceptance quantity 

In [None]:
mcmc_output['acceptance_quantity'].plot()

### Parameter traces

In [None]:
mcmc_output["contact_rate"].plot()

In [None]:
mcmc_output["infection_duration"].plot()

### Burn-in
We want to discard the parameter sets sampled before convergence

In [None]:
burn_in = round(n_iterations / 2.)
post_burn_in_mcmc_output = mcmc_output[burn_in:]

### Posterior distributions

In [None]:
post_burn_in_mcmc_output["contact_rate"].plot.hist()

In [None]:
post_burn_in_mcmc_output["infection_duration"].plot.hist()

In [None]:
post_burn_in_mcmc_output.plot.scatter(x="contact_rate", y="infection_duration")

### The best prameter set (with regards to the posterior likelihood)

In [None]:
best_run_id = post_burn_in_mcmc_output['acceptance_quantity'].idxmax()
best_parameters = {
    "contact_rate": post_burn_in_mcmc_output.loc[best_run_id]["contact_rate"],
    "infection_duration": post_burn_in_mcmc_output.loc[best_run_id]["infection_duration"],
}


In [None]:
sir_model.run(best_parameters)
# Plot the model outputs against the data
output_df = pd.DataFrame({
    "modelled": sir_model.get_outputs_df()["I"],
    "observed": data.active_cases
})
output_df.plot(kind='scatter')

### Plot 100 sampled model runs

In [None]:
n_samples = min(100, n_iterations - burn_in)
sampled_df = post_burn_in_mcmc_output.sample(n_samples)

sampled_output_df = pd.DataFrame(index=sir_model.get_outputs_df().index)
for i_run in sampled_df.index:
    selected_parameters = {
        "contact_rate": post_burn_in_mcmc_output.loc[i_run]["contact_rate"],
        "infection_duration": post_burn_in_mcmc_output.loc[i_run]["infection_duration"],
    }
    sir_model.run(selected_parameters)
    sampled_output_df[i_run] = sir_model.get_outputs_df()["I"]


In [None]:
pd.options.plotting.backend = "matplotlib"

fig = sampled_output_df.plot(legend=None, figsize=(15,8))
fig.plot(data.index, data["active_cases"], marker=".",color='black', lw=0,ms=10)

# Restore this for other plots
pd.options.plotting.backend = "plotly"

## Some further considerations
We have presented a very simple implementation of an MCMC-based calibration. In practice, there may be other aspects to consider, including:

- Use of other MCMC algorithms

Here we have implemented a "simple" Metropolis-Hastings algorithm, which is the simplest version of MCMCs. However, there exist other types of MCMCs including Gibbs sampling and the Hamiltonian Monte-Carlo. There are also other Bayesian sampling methods that don't verify the Markov property (i.e. not MCMC) but can be used for the same purpose. This includes adaptive Metropolis samplers such as the Haario algorithm.

- Use of multiple MCMC chains

We have only implemented a single MCMC chain that explores the parameter set and samples posterior estimates. In practice, it is common to use multiple chains that can be run in parallel to generate more samples in the same period of time. Samples from the different chains are then combined and we can perform statistical tests to check for convergence and consistency between the chains (e.g. R-hat statistic).

- Non-symmetric proposal function

We have used a symmetric proposal (jumping) function in this example. This means that $\pi(\theta'|\theta) = \pi(\theta|\theta')$. If the proposal function is not symmetric, we should adjust the acceptance ratio as follows:
$$ H= \frac{P(y | \theta ') \times P(\theta ') \times \pi(\theta|\theta') }{P(y | \theta) \times P(\theta) \times \pi(\theta' |\theta)} \quad .$$

- Parameter transformation

When parameter supports are bounded (e.g. finite interval), we often transform the parameters into quantities that are unbounded to make sampling easier. For example, with the transformed parameter space, we don't have to worry about having a proposal function defined on a bounded support. These transformations imply some more adjustments to the acceptance ratio that are not discussed here.

- Thinning

The samples generated by some MCMC algorithms (e.g. Metropolis-Hastings) are often highly auto-correlated. This is due to the iterative way in which the samples are generated. To address this issue we often apply thinning after generating the samples. That is, we only retain every n-th sampled paramerer sets.

- Algorithm tuning...

This is probably the most challenging aspect of the Metropolis-Hastings sampler. This is about finding an adequate proposal (jumping) function that will ensure an exhaustive and efficient exploration of the parameter space. If transitions (or jumps) are too big, we will rarely accept the proposed parameters because they would be outside the high-density regions. If transitions are too small, we may not explore the parameter space comprehensively because we may always stay in the same regions. There is no pre-defined rule about how to define a "good" proposal function, but we often want to achieve an acceptance rate of about 10-40%.

## Now the good news...
There are multiple libraries that handle Bayesian sampling with MCMC algorithms already implemented. They also have self-tuning functionalities and other features (e.g. automatic parameter transformation) that address the issues listed above. Our next session will introduce one of these libraries: numpyro.