# Definitions and notation for Pints

This document is:

1. An attempt to write down some standard notation for Pints
2. A place to stick derivations of stuff used in Pints (e.g. loglikelihoods, not optimiser or sampling algorithms).
3. A place to work out how to get loglikelihoods into NumPy, and document what you've done.


## List of symbols

Please stick to the symbols below when editing this document.

| Description                 | Mathematical version     | Code version          |
|-----------------------------|--------------------------|-----------------------|
| Time                        | $t$, $t_i$               | `times`               |
| Observations (values)       | $v$, $v_i$               | `values`              |
| Time series (data)          | $D$                      |                       |
| Number of times/values      | $n_t$                    | `n_times`             |
| Parameters                  | $x$, $x_i$               | `x`, `parameters`     |
| Number of parameters        | $n_p$                    | `n_parameters`, `n_p` |
| Forward model               | $m(t|x)$                 | `model`               |
| Forward model values        | $m_i(x) = m(t_i|x)$      | `y`                   |
| Number of model outputs     | $n_o$                    | `n_outputs`, `n_o`    |
| PDF                         | $f(D|x)$                 |                       |
| Likelihood                  | $l(x|D)$ or $l(x)$       |                       |
| LogPDF                      |                          | `logpdf`              |
| Loglikelihood               | $L(x)$                   | L                     |

Prior?
Logposterior?


## 1. Problem statement

We have some noisy time-series data and a forward model (simulation) that can be used to replicate it.
We'd like to find out which parameter values are compatible with the experimental evidence.

- Observations $D = \{(t_1, v_1),...,(t_{n_t}, v_{n_t})\}$ where $v_i$ is the experimental measurement at time $t_i$, and the times are ordered $t_{i + 1} > t_i$, with $i = 1, 2, ..., n_t$.

- A forward model $m(t|x)$ that takes a time $t$ as input, as well as a parameter vector $x$ of length $n_p$.

- Observations can be scalars, or vectors of some fixed length $n_o$. If vector outputs are used, the forward model must also produce $n_o$ _outputs_. In general, this means $v \in {\rm I\!R}^{n_o}$ and $m(t|x) \to {\rm I\!R}^{n_o}$.

Often, but not always

- The parameters live in some bounded space $x \in X \subset{\rm I\!R}^{n_p}$


## 2. Noise/error models

If we assume the model can perfectly describe the data with the correct parameters, we can interpret the remaining error as noise.
If we have a probabilistic model for this noise, we can then write a probability density function (PDF) for the probability of a model with parameters $x$ generating the observations $D$:

$$ f(D|x) $$

We can rewrite this as a likelihood:

$$ l(x|D) \equiv f(D|x) $$

As it turns out, it's usually easier to work with the natural logarithm of this function instead.

$$ L(x|D) = \log l(x|D)$$

which we'll often shorten to

$$ L(x) $$.



## 3. Normally distributed independent noise

In this section, we derive a loglikelihood for a parameter vector $x$ assuming normally distributed noise, that is independent from observation to observation.

We start by looking at single-output models, and assume our noise is from a Normal distribution with mean 0 and standard distribution $\sigma$.
For now we'll assume we have some way of knowing $\sigma$, e.g. by doing an independent measurement.

We can then treat our observations as random variables of the form (model prediction + Gausian noise):

$$ V_i \sim m_i(x) + \mathcal{N}(0, \sigma^2) = \mathcal{N}(m_i(x), \sigma^2)$$

Filling in the equation for the normal distribution, we find

$$ f_i(v_i | x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp
                  \left( -\frac{ \left( m_i(x) - v_i \right)^2}{2\sigma^2} \right) $$

The independent noise assumption then gives

$$ f(D | x) = \prod_{i=1}^{n_t} \frac{1}{\sqrt{2\pi\sigma^2}} \exp
              \left( -\frac{ \left( m_i(x) - v_i \right)^2}{2\sigma^2} \right) $$

To find $L(x)$ we use $L(x) = L(x|D) = \log l(x|D) = \log f(D|x)$:

$$ L(x) = 
    - \frac{n_t}{2} \log(2\pi)
    - n_t \log(\sigma)
    - \frac{1}{2\sigma^2} \sum_{i=1}^{n_t} \left(m_i(x) - v_i \right)^2
$$

### 3.1 Multiple outputs

To find a multi-output version, we define the observation in output $j$ at time $i$ as $v_{ij}$. The equivalent model variable is $m_{ij}(x)$.

If we assume that the noise in any one output is independent from the noise in the others, we can write the total log-likelihood of the observations as the sum of the loglikelihoods in each output:

$$ L(x) = \sum_{j=1}^{n_o} L_j(x) $$
$$ L(x) = \sum_{j=1}^{n_o} - \frac{n_t}{2} \log(2\pi)
                           - n_t \log(\sigma)
                           - \frac{1}{2\sigma^2} \sum_{i=1}^{n_t} \left(m_i(x) - v_i \right)^2
$$
$$ L(x) = - \frac{n_t n_o}{2} \log(2\pi) 
          - n_t \sum_{j=1}^{n_o} \log(\sigma_j)
          - \sum_{j=1}^{n_o} \left[ \frac{1}{2\sigma_j^2} \sum_{i=1}^{n_t} \left(m_{ij}(x) - v_{ij} \right)^2 \right]
$$

### 3.2 Derivatives with respect to the parameters

The partial derivatives with respect to parameter $x_k$, evaluated at $x$ are written as

$$ \left. \frac{\partial m_{ij}}{\partial x_k} \right|_x $$

For the independent normal loglikelihood, we then find

$$ \left. \frac{\partial L}{\partial x_k} \right|_x =
    \sum_{j=1}^{n_o} \left[ \frac{1}{\sigma_j^2} \sum_{i=1}^{n_t}
    \left(v_{ij} - m_{ij}(x) \right) \left. \frac{\partial m_{ij}}{\partial x_k} \right|_x \right]
$$

### 3.3 Unknown noise: treating $\sigma$ as a parameter

If we don't have accurate information about $\sigma$, we can try treating it as a parameter an infering it along with the others.
In code, we'll do this by simply extending $x$ to include $\sigma$.
In the descriptions below, we'll leave $\sigma$ separate for clarity.

The normal independent log-likelihood becomes:

$$ L(x, \sigma) = - \frac{n_t n_o}{2} \log(2\pi) 
          - n_t \sum_{j=1}^{n_o} \log(\sigma_j)
          - \sum_{j=1}^{n_o} \left[ \frac{1}{2\sigma_j^2} \sum_{i=1}^{n_t} \left(m_{ij}(x) - v_{ij} \right)^2 \right]
$$

where $x$ is the original parameter vector $x_1, x_2, ..., x_{n_p}$ and $\sigma$ is a vector of the standard deviations in each output $\sigma_1, \sigma_2, ..., \sigma_{n_o}$.


#### 3.4.1 Derivatives

For the derivatives, we now find $n_p$ equations of the previous form:

$$ \left. \frac{\partial L}{\partial x_k} \right|_{x, \sigma} =
    \sum_{j=1}^{n_o} \left[ \frac{1}{\sigma_j^2} \sum_{i=1}^{n_t}
    \left(v_{ij} - m_{ij}(x) \right) \left. \frac{\partial m_{ij}}{\partial x_k} \right|_{x, \sigma} \right]
$$

where $k \in 1, 2, ..., n_p$.
In addition, we get $n_o$ equations:

$$ \left. \frac{\partial L}{\partial x_m} \right|_{x, \sigma} =
    - n_t / \sigma_m
    + \sigma_m^{-3} \sum_{i=1}^{n_t} \left(m_{im}(x) - v_{im} \right)^2
$$

where $m \in 1, 2, ..., n_o$.

## 4. Priors

### 4.1 Normal prior, 1-dimensional

We define a 1-dimensional normal log-prior as 

$$
\frac{1}{\sqrt{2 \pi \sigma^2}} - \frac{(x - \mu)^2}{2 \sigma^2}
$$

where $\mu$ is the prior's mean and $\sigma$ is its standard deviation.

The single parameter is denoted $x$.


#### Derivatives

The derivative of this prior with respect to $x$ is given by

$$
\frac{\mu - x}{\sigma^2}
$$



## 5. LogPosterior

A logposterior is the sum of a logprior and a loglikelihood

### Derivative

The derivative of a logposterior is simply the sum of the derivatives of its logprior and of its loglikelihood.

## 6. Error measures

All error measures implement some measure $E(x)$ that is minimised at $E(x_\text{true})$, but is otherwise completely unrestrained in its values, smoothness, etc.

Any LogLikelihood can be made into an error measure by reversing it's sign.

### 6.1 Sum of squares

One of the simplest error measures is

$$ E(x) = \sum_{i=1}^{n_t} \left( m_i(x) - v_i \right)^2 $$

For multiple outputs, all weighted equally, this becomes

$$ E(x) = \sum_{j=1}^{n_o} \sum_{i=1}^{n_t} \left( m_{ij}(x) - v_{ij} \right)^2 $$

For its derivative, we find

$$ \left. \frac{\partial E}{\partial x_k} \right|_x = 
    2 \sum_{j=1}^{n_o} \sum_{i=1}^{n_t} 
    \left( m_{ij}(x) - v_{ij} \right)
    \left. \frac{\partial m_{ij}}{\partial x_k} \right|_x
$$

In [18]:
# Multi-output case

import numpy as np

nt = 3
no = 2
nx = 2 # np is numpy...

print('(n_times, n_outputs, n_parameters)')
print((nt, no, nx))

times = np.arange(nt)
print('times')
print(times.shape)
print(times)

r = (np.arange(nt).reshape((nt, 1))).dot(np.ones((1, no)) )
print('residuals')
print(r.shape)
print(r)

dm = np.zeros((nt, no, nx))
z = 0
for i in range(nt):
    for j in range(no):
        for k in range(nx):
            dm[i, j, k] = z
            z += 1
del(i, j, k, z)
print('derivatives')
print(dm.shape)
print(dm)

# With sums
print('Expected output')
dE1 = 2 * np.sum([np.sum([np.sum(r[i, j] * dm[i, j, 0]) for i in range(nt)]) for j in range(no)])
dE2 = 2 * np.sum([np.sum([np.sum(r[i, j] * dm[i, j, 1]) for i in range(nt)]) for j in range(no)])
print(dE1)
print(dE2)

# NumPy way
print('NumPy output')
dE = 2 * np.sum((r.T * dm.T), axis=(1,2))
print(dE)

(n_times, n_outputs, n_parameters)
(3, 2, 2)
times
(3,)
[0 1 2]
residuals
(3, 2)
[[ 0.  0.]
 [ 1.  1.]
 [ 2.  2.]]
derivatives
(3, 2, 2)
[[[  0.   1.]
  [  2.   3.]]

 [[  4.   5.]
  [  6.   7.]]

 [[  8.   9.]
  [ 10.  11.]]]
Expected output
92.0
104.0
NumPy output
[  92.  104.]


In [19]:
# Single-output case

import numpy as np

nt = 3
no = 1
nx = 2 # np is numpy...

print('(n_times, n_parameters)')
print((nt, nx))

times = np.arange(nt)
print('times')
print(times.shape)
print(times)

r = np.arange(nt)
print('residuals')
print(r.shape)
print(r)

dm = np.zeros((nt, nx))
z = 0
for i in range(nt):
    for k in range(nx):
        dm[i, k] = z
        z += 1
del(i, k, z)
print('derivatives')
print(dm.shape)
print(dm)

# Manual
print('Expected output')
dE1 = 2 * (0*0 + 1*2 + 2*4)
dE2 = 2 * (0*1 + 1*3 + 2*5)
print(dE1)
print(dE2)

# With sums
print('With sums')
dE1 = 2 * np.sum([np.sum(r[i] * dm[i, 0]) for i in range(nt)])
dE2 = 2 * np.sum([np.sum(r[i] * dm[i, 1]) for i in range(nt)])
print(dE1)
print(dE2)

# NumPy form
print('NumPy output')
dm = dm.reshape((nt, no, nx))
dE = 2 * np.sum((r.T * dm.T), axis=(1,2))
print(dE)

(n_times, n_parameters)
(3, 2)
times
(3,)
[0 1 2]
residuals
(3,)
[0 1 2]
derivatives
(3, 2)
[[ 0.  1.]
 [ 2.  3.]
 [ 4.  5.]]
Expected output
20
26
With sums
20.0
26.0
NumPy output
[ 20.  26.]


### 6.2 Weighted sum of squares

We can scale the error in each output by introducing weighing factors $w_j$ for $j \in 1, 2, ..., n_o$.

This isn't implemented at the moment!


### 6.3 Mean squared error

In some cases, it can be desirable to make the error invariant to the length of the sample $n_t$.

$$ E(x) = \frac{1}{n_t} \sum_{i=1}^{n_t} \left( m_i(x) - v_i \right)^2 $$

For multiple outputs, all weighted equally, this becomes

$$ E(x) = \frac{1}{n_t n_o} \sum_{j=1}^{n_o} \sum_{i=1}^{n_t} \left( m_{ij}(x) - v_{ij} \right)^2 $$

For its derivative, we find

$$ \left. \frac{\partial E}{\partial x_k} \right|_x = 
\frac{2}{n_t n_o} \sum_{j=1}^{n_o} \sum_{i=1}^{n_t} 
    \left( m_{ij}(x) - v_{ij} \right)
    \left. \frac{\partial m_{ij}}{\partial x_k} \right|_x
$$

In [23]:
# Multi-output case

import numpy as np

nt = 3
no = 2
nx = 2 # np is numpy...

print('(n_times, n_outputs, n_parameters)')
print((nt, no, nx))

times = np.arange(nt)
print('times')
print(times.shape)
print(times)

r = (np.arange(nt).reshape((nt, 1))).dot(np.ones((1, no)) )
print('residuals')
print(r.shape)
print(r)

dm = np.zeros((nt, no, nx))
z = 0
for i in range(nt):
    for j in range(no):
        for k in range(nx):
            dm[i, j, k] = z
            z += 1
del(i, j, k, z)
print('derivatives')
print(dm.shape)
print(dm)

# With sums
print('Expected output')
dE1 = 2 / nt / no * np.sum([np.sum([np.sum(r[i, j] * dm[i, j, 0]) for i in range(nt)]) for j in range(no)])
dE2 = 2 / nt / no * np.sum([np.sum([np.sum(r[i, j] * dm[i, j, 1]) for i in range(nt)]) for j in range(no)])
print(dE1)
print(dE2)

# NumPy way
print('NumPy output')
dE = 2 / nt / no * np.sum((r.T * dm.T), axis=(1,2))
print(dE)

(n_times, n_outputs, n_parameters)
(3, 2, 2)
times
(3,)
[0 1 2]
residuals
(3, 2)
[[ 0.  0.]
 [ 1.  1.]
 [ 2.  2.]]
derivatives
(3, 2, 2)
[[[  0.   1.]
  [  2.   3.]]

 [[  4.   5.]
  [  6.   7.]]

 [[  8.   9.]
  [ 10.  11.]]]
Expected output
15.3333333333
17.3333333333
NumPy output
[ 15.33333333  17.33333333]


$$ \left. \frac{\partial E}{\partial x_k} \right|_x = 
\frac{2}{n_t n_o} \sum_{j=1}^{n_o} \sum_{i=1}^{n_t} 
    \left( m_{ij}(x) - v_{ij} \right)
    \left. \frac{\partial m_{ij}}{\partial x_k} \right|_x
$$

In [21]:
# Single-output case

import numpy as np

nt = 3
no = 1
nx = 2 # np is numpy...

print('(n_times, n_parameters)')
print((nt, nx))

times = np.arange(nt)
print('times')
print(times.shape)
print(times)

r = np.arange(nt)
print('residuals')
print(r.shape)
print(r)

dm = np.zeros((nt, nx))
z = 0
for i in range(nt):
    for k in range(nx):
        dm[i, k] = z
        z += 1
del(i, k, z)
print('derivatives')
print(dm.shape)
print(dm)

# Manual
print('Expected output')
dE1 = 2 * (0*0 + 1*2 + 2*4)
dE2 = 2 * (0*1 + 1*3 + 2*5)
print(dE1)
print(dE2)

# With sums
print('With sums')
dE1 = 2 / nt / no * np.sum([np.sum(r[i] * dm[i, 0]) for i in range(nt)])
dE2 = 2 / nt / no * np.sum([np.sum(r[i] * dm[i, 1]) for i in range(nt)])
print(dE1)
print(dE2)

# NumPy form
print('NumPy output')
dm = dm.reshape((nt, no, nx))
dE = 2 / nt / no * np.sum((r.T * dm.T), axis=(1,2))
print(dE)

(n_times, n_parameters)
(3, 2)
times
(3,)
[0 1 2]
residuals
(3,)
[0 1 2]
derivatives
(3, 2)
[[ 0.  1.]
 [ 2.  3.]
 [ 4.  5.]]
Expected output
20
26
With sums
6.66666666667
8.66666666667
NumPy output
[ 6.66666667  8.66666667]


### 6.4 Root-mean squared error

A common error measure is

$$ E(x) = \sqrt{ \frac{1}{n_t} \sum_{i=1}^{n_t} \left( m_i(x) - v_i \right)^2 } $$

For multiple outputs, all weighted equally, this becomes

$$ E(x) = \sqrt{  \frac{1}{n_t n_o} \sum_{j=1}^{n_o} \sum_{i=1}^{n_t} \left( m_{ij}(x) - v_{ij} \right)^2 } $$

For its derivative, we find

$$ \left. \frac{\partial E}{\partial x_k} \right|_x = 
    \frac{
        \frac{2}{n_t n_o} \sum_{j=1}^{n_o} \sum_{i=1}^{n_t} 
        \left( m_{ij}(x) - v_{ij} \right)
        \left. \frac{\partial m_{ij}}{\partial x_k} \right|_x
    }{
        \sqrt{ \frac{1}{n_t n_o} \sum_{j=1}^{n_o} \sum_{i=1}^{n_t} \left( m_{ij}(x) - v_{ij} \right)^2 }
    }
$$
