In [None]:
import numpy as np

##################################################
##### Matplotlib boilerplate for consistency #####
##################################################
from ipywidgets import interact
from ipywidgets import FloatSlider
from matplotlib import pyplot as plt

%matplotlib inline

from IPython.display import set_matplotlib_formats
set_matplotlib_formats('svg')

global_fig_width = 8
global_fig_height = global_fig_width / 1.61803399
font_size = 12

plt.rcParams['axes.axisbelow'] = True
plt.rcParams['axes.edgecolor'] = '0.8'
plt.rcParams['axes.grid'] = True
plt.rcParams['axes.labelpad'] = 8
plt.rcParams['axes.linewidth'] = 2
plt.rcParams['axes.titlepad'] = 16.0
plt.rcParams['axes.titlesize'] = font_size * 1.4
plt.rcParams['figure.figsize'] = (global_fig_width, global_fig_height)
plt.rcParams['font.sans-serif'] = ['Computer Modern Sans Serif', 'DejaVu Sans', 'sans-serif']
plt.rcParams['font.size'] = font_size
plt.rcParams['grid.color'] = '0.8'
plt.rcParams['grid.linestyle'] = 'dashed'
plt.rcParams['grid.linewidth'] = 2
plt.rcParams['lines.dash_capstyle'] = 'round'
plt.rcParams['lines.dashed_pattern'] = [1, 4]
plt.rcParams['xtick.labelsize'] = font_size
plt.rcParams['xtick.major.pad'] = 4
plt.rcParams['xtick.major.size'] = 0
plt.rcParams['ytick.labelsize'] = font_size
plt.rcParams['ytick.major.pad'] = 4
plt.rcParams['ytick.major.size'] = 0
##################################################

# Lecture 1: Bayesian Inference and PINTS

by Martin Robinson


- Oxford Research Software Engineering group (http://www.cs.ox.ac.uk/projects/RSE)
- Department of Computer Science, University of Oxford
- Pints: https://github.com/pints-team/pints



# Who am I?

- PhD in Mathematical Science at Monash University
- Now a:
    - Senior Research Software Engineer in the [Oxford RSE group](http://www.cs.ox.ac.uk/projects/RSE)
    - Co-director of EPSRC & MRC [Sustainable Approaches to Biomedical Science Centre for Doctoral Training: Responsible and Reproducible Research](https://sabsr3.web.ox.ac.uk), or SABS:R3 
- Research Interests include:
    - numerical modelling and simulation
    - particle-based methods
    - Bayesian inference
    - developing robust and reliable software for research
        - see [Aboria](https://github.com/aboria/Aboria), Chaste, PINTS, Smoldyn, SPH-DEM, PyBaMM and Trase
    

# Course Structure

- Lecture 1: Introduction to Bayesian Inference and Pints
   - What is Bayesian Inference?
   - Bayes Theorem: Priors, Likelihood functions and Posteriors
   - Introduction to Pints
   - Using your own models in PINTS
- Lecture 2: Maximum Likelihood Estimation
 
- Lecture 3: MCMC sampling
- Lecture 4: Hierarchical models

# Lecture Structure

- What is Bayesian Inference?
- Bayes Theorem
- Introdcution to Pints

# Acknowledgements

# What is Bayesian Statistics?

- A statistical philosophy, or a way of thinking about probabilities

**Frequentist:** $\;P(A)\;$ describes the limiting frequency of an event $A$. there is a fixed value of $\;P(A)\;$ that must be calculated e.g. proportion of heads from a fair coin toss will approach 0.5 after a large number of trials

**Bayesian:** $\;P(A)\;$ is a measure of centainty, quantification of investigators belief that $\;A\;$ is true a fixed value of $\;P(A)\;$ is not neccessary, nor desirable. Pior information must be used to augment sample data


# What is Bayesian Inference?

Inference involves finding parameter values, or distributions of parameters values, for which model outputs are consistent with observations

Bayesian inference uses Bayes' theorem to update prior beliefs after obtaining new data $y$

- *Likelihood function:* the probability of obtaining the data $y$, given a set of parameters $\theta$
- *Prior probability distribution:* encodes your uncertainty in the parameters before the data $y$ has been obtained

```
                     Bayes' Theorem
likelihood + prior -------------------> posterior
```

- *Posterior distribution:* updated probability distribution of $\theta$, given the new data $y$

# Tangible benefits of Bayesian inference

- Straightforward application to scientific modelling and experimentatal data analysis
- Simple and intuitive model building (unlike frequentist statistics there is no need to remember lots of specific formulae).
- Exhaustive and creative model testing.
- Straightforward interpretation of results.


# Bayes' Rule

![Thomas Bayes](fig/225px-Thomas_Bayes.gif)

$$P(\theta | data) = \frac{P(data|\theta) P(\theta)}{P(data)}$$

# Likelihoods

$$P(\theta | data) = \frac{\color{red}{{P(data|\theta)}} P(\theta)}{P(data)}$$

- common to both Frequentist and Bayesian analyses
- probability of generating the particular sample of data, given the model parameters $\theta$
- normally easy to obtain given a statistical model

# Example: flipping a coin

![](fig/coin.jpeg)

Take the classic example of tossing a fair coin that has a probability landing heads up of $\theta = 0.5$

If we flip the coin 2 times, we have the set of possible outcomes:

$$P(H, H | \theta = 0.5) = P(H|\theta=0.5) P(H|\theta=0.5) = 0.25$$
$$P(H, T | \theta = 0.5) = P(H|\theta=0.5) P(T|\theta=0.5) = 0.25$$
$$P(T, H | \theta = 0.5) = P(T|\theta=0.5) P(H|\theta=0.5) = 0.25$$
$$P(T, T | \theta = 0.5) = P(T|\theta=0.5) P(T|\theta=0.5) = 0.25$$

This is a valid probability distribution:

$$ P(H, H | \theta = 0.5) + P(H, T | \theta = 0.5) + P(T, H | \theta = 0.5)+ P(T, T | \theta = 0.5) = 1.0$$

# Calculating the likelihood

- Hold the **data constant**, and find the likelihood of this data given a certain $\theta$
- Provides an infinite number of possibilities to arrive at the given data
- Take for example the case of $\text{data} = H, H$
- The likelihood function is given by $P(data|\theta) = \theta^2$ 




In [None]:
theta = np.linspace(0,1,100)
likelihood = theta*theta
plt.plot(theta,likelihood)
plt.xlabel(r'$\theta$')
plt.ylabel('likelihood')
plt.fill_between(theta,likelihood,alpha=0.2)
plt.show()

This is **not** a valid probability distribution

In [None]:
np.trapz(likelihood,x=theta)

# Priors

$$P(\theta | data) = \frac{{P(data|\theta)} \color{red}{P(\theta)}}{P(data)}$$

- This is particular to Bayesian inference, where we always update our *prior* beliefs using the new data
- This is a function of all the possible parameter values $\theta$

# Example Prior

Back to the coin example, we know that the possible domain for our parameter is $\theta \in [0, 1]$.

1. One option is the consider all the possible values of $\theta$ to be equally likely (i.e. a Uniform prior).
2. Another is to use our previous knowledge that most coins are likely to be fair (i.e. a Gaussian prior around $\theta = 0.5$)




In [None]:
import matplotlib.pyplot as plt
import numpy as np
import scipy
import scipy.stats
import math

mu = 0
variance = 0.1
sigma = math.sqrt(variance)
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
plt.plot(x,scipy.stats.norm.pdf(x, mu, sigma), label='Gaussian prior')
plt.plot(x,np.ones(len(x)), label='Uniform prior')
plt.xlabel(r'$\theta$')
plt.ylabel('probability density')

In [None]:
plt.show()

# The devil's in the denominator

$$P(\theta | data) = \frac{P(data|\theta) P(\theta)}{\color{red}{P(data)}}$$

- represents the probability of obtaining our particular sample of data, given a particular model (i.e. likelihood) and prior
- **normalises** the posterior so the area under the probability distribution is 1

\begin{align*}
P(data) &= \int_{all \theta} P(data, \theta) d\theta \\
        &= \int_{all \theta} P(data|\theta) P(\theta) d\theta
\end{align*}

- Very difficult to calculate for all but low-dimensional $\theta$, this is the motivation behind all the MCMC methods discussed in Lecture 3.


# Posteriors: the goal of Bayesian inference

$$\color{red}{P(\theta | data)} = \frac{P(data|\theta) P(\theta)}{P(data)}$$

**For example:**

![](fig/coin.jpeg)

- How likely is our coin to be biased?
- We perform an experiment of 10 flips, and it lands heads up 7 times
- We will use the Uniform prior given earlier
- Likelihood is:

$$P(7\times H | \theta) = \theta^7 (1-\theta)^3$$



We run two different experiments, one with 10 flips, another with 100

In [None]:
# experiment 1 - 10 flips
theta = np.linspace(0, 1, 1000)
prior = np.ones(len(theta))
likelihood = theta**7 * (1-theta)**3
denominator = np.trapz(likelihood*prior, x=theta)
posterior_exp1 = likelihood * prior / denominator

# experiment 2 - 100 flips
likelihood = theta**70 * (1-theta)**30
denominator = np.trapz(likelihood*prior, x=theta)
posterior_exp2 = likelihood * prior / denominator

In [None]:
plt.plot(theta,posterior_exp1, label='experiment 1')
plt.plot(theta,posterior_exp2, label='experiment 2')
plt.xlabel(r'$\theta$')
plt.ylabel('probability density')
plt.legend()

# Probabilistic Inference on Noisy Time Series (PINTS)

- **Authors:** Michael Clerx, Martin Robinson, Ben Lambert, Chon Lok Lei, Sanmitra Ghosh, Gary R. Mirams, David J. Gavaghan
- An **open-source** (BSD 3-clause license) **Python** library that provides researchers with a broad suite of non-linear optimisation and sampling methods
- Uses a **Bayesian framework**: Many different Priors, Likelihood functions, MCMC samplers and non-linear optimisers available
- Use your *own pre-build model* for inference: users wrap their model and data in a transparent and straightforward interface
- Pre-print available on [arXiv](https://arxiv.org/abs/1812.07388)


# Implementation and architecture

PINTS is designed around two core ideas: 

1. PINTS should work with a wide range of time series models, and make no demands on how they are implemented other than a minimal input/output interface. 
2. It is assumed that model evaluation (simulation) is the most costly step in any optimisation or sampling routine.

![Class Hierarchy](fig/class-hierarchy-eps-converted-to.svg)

# Writing a model

- Pints is intended to work with a wide range of models, and assumes as little as possible about the model's form.
- a "model" in Pints is anything that implements the `ForwardModel` interface:
    - take a parameter vector $(\boldsymbol{\theta})$ and a sequence of times $(\mathbf{t})$ as an input, 
    - and then return a vector of simulated values $(\mathbf{y})$:

$$f(\boldsymbol{\theta}, \mathbf{t}) \rightarrow \mathbf{y}$$

# Example Model

In the example below, we define a system of ODEs (modelling a simple chemical reaction) and use SciPy to solve it. We then wrap everything in a `pints.ForwardModel class`, and use a Pints optimisation to find the best matching parameters.

In this example we'll use a model of a reversible chemical reaction:

$$\dot{y}(t) = k_1 (1 - y) - k_2 y,$$

where $k_1$ represents a forward reaction rate, $k_2$ is the backward reaction rate, and $y$ represents the concentration of a chemical solute.

The next slide shows how you would implement this model using the standard Python package SciPy

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint

# Define the right-hand side of a system of ODEs
def r(y, t, p):
    k1 = p[0] # Forward reaction rate
    k2 = p[1] # Backward reaction rate
    dydt = k1 * (1 - y) - k2 * y
    return dydt

# Run an example simulation
p = [5, 3]    # parameters
y0 = 0.1      # initial conditions

# Call odeint, with the parameters wrapped in a tuple
times = np.linspace(0, 1, 1000)
values = odeint(r, y0, times, (p,))

In [None]:
# Plot the results
plt.figure()
plt.xlabel('Time')
plt.ylabel('Concentration')
plt.plot(times, values)
plt.show()

# Writing a wrapper class for Pints

Now we'll wrap the model in a class that extends `pints.ForwardModel`.

It should have two methods:
  - `simulate()`: Run a simulation with the given parameters for the given times and return the simulated values
  - `n_parameters()`: Return the dimension of the parameter vector

In [None]:
import pints

class ExampleModel(pints.ForwardModel):
    
    def simulate(self, parameters, times):
        y0 = 0.1
        def r(y, t, p):
            dydt = (1 - y) * p[0] - y * p[1]
            return dydt
        return odeint(r, y0, times, (parameters,)).reshape(times.shape)
    
    def n_parameters(self):
        return 2

# Then create an instance of our new model class
model = ExampleModel()

# Run the same simulation using our new model wrapper
values = model.simulate([5, 3], times)

In [None]:
# Plot the results
plt.figure()
plt.xlabel('Time')
plt.ylabel('Concentration')
plt.plot(times, values)
plt.show()

# Running an optimisation problem

Now that our model implements the `pints.ForwardModel` interface, we can use it with Pints tools such as optimisers or MCMC.

First, we use the model to generate test data by adding some generated noise

In [None]:
# Define the 'true' parameters
true_parameters = [5, 3]

# Run a simulation to get test data
values = model.simulate(true_parameters, times)

# Add some noise
values += np.random.normal(0, 0.02, values.shape)

In [None]:
# Show the test data
plt.figure()
plt.xlabel('Time')
plt.ylabel('Concentration')
plt.plot(times, values)
plt.show()

# Error function

We then define a score function that characterises the mismatch between model predictions and data. 

We will use the classic sum of squares error for this (This is related to maximising a Bayesian Likelihood function with indepedent Gaussian Noise, see next Lecture). 

$$\sum_{i=0}^N (f(\boldsymbol{\theta}, t_i) - f^d_i)^2$$

where $\mathbf{f}^d$ is the vector of test data. We can tell Pints about the the test data $\mathbf{f}^d$ by wrapping it and the model $f(\boldsymbol{\theta}, \mathbf{t})$ in a `pints.SingleOutputProblem` object 

 


We then use the SNES optimiser to estimate the model parameters from the data.

In [None]:
# Create an object with links to the model and time series
problem = pints.SingleOutputProblem(model, times, values)

# Select a score function
score = pints.SumOfSquaresError(problem)

# Select some boundaries
boundaries = pints.RectangularBoundaries([0.1, 0.1], [10, 10])

# Select a starting point
x0 = [1, 1]

# Perform an optimization using SNES. 
found_parameters, found_value = pints.optimise(score, x0, boundaries=boundaries, method=pints.SNES)
print('Score at true solution:')
print(score(true_parameters))

In [None]:
# Plot the results
plt.figure()
plt.xlabel('Time')
plt.ylabel('Concentration')
plt.plot(times, values, alpha=0.5, label='noisy signal')
plt.plot(times, problem.evaluate(found_parameters), label='recovered signal')
plt.legend()
plt.show()