# Pooling parameters

Consider an inference problem where we have measurements of the same biological process for a number of individuals, e.g. the growth of a tumour. Let's assume that this process is described by a logistic growth

\begin{equation*}
    f(t | \theta) = \frac{k}{1 + (k / f_0 - 1) e^{-r t}},
\end{equation*}

where $\theta = (f_0, r, k)$. Here $f_0$ is the initial size of the tumour $f(t=0, \theta) = f_0$, $r$ is the growth rate, and $k$ is the maximal size of the tumour. Let's denote the measured time series of the $n$ individuals by $\{D_1, \ldots , D_n\}$, where $D_i$ is the tumour volume at the measured time points of individual $i$.

From a biological perspective it makes sense to expect that the tumour growth is captured by the same structural model $f$ across individuals. However, it can also be expected that there are significant biological differences between the indiviudals. As a result, it seems feasible to construct independent likelihoods for each in individual and solve the respective inverse problem independently. This results in an independent set of model parameters $(\theta _i, \sigma _i)$ for each individual. Here $\sigma _i$ are the parameters associated to the error model that is needed for the construction of the likelihood.

However, in some biological settings the available data per individual may be quite sparse, and the accuracy of the inference would greatly improve if we could leverage the population information. For example, we may expect that the noise model across individuals may be identical $\sigma := \sigma _1 = \ldots = \sigma _n$, since the measurement process was identical for all individuals. In other words, *pooling* the noise parameter $\sigma _i$ across individuals may be benificial for the inference.

## Illustration

Let us illustrate this idea of pooling by synthesizing data for a number of individuals. We will choose the above introduced logistic growth model $f$ and a Gaussian error. To mimick the intuition of biological differences, but identical measurement noise aross individuals, we will slightly vary the structural model parameters $\theta $ and keep the noise parameter $\sigma $ the same across indiviudals.

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import pints
import pints.plot
import pints.toy

# Define model parameters
parameters = [
    [2, 0.015, 500, 50],
    [1.5, 0.02, 500, 50],
    [5, 0.01, 500, 50],
    [3, 0.05, 500, 50]]

# Generate data
data = []
for params in parameters:
    # Get parameters of individual
    f_0, r, k, sigma = params

    # Instantiate logistic growth model with f(t=0) = f_0
    model = pints.toy.LogisticModel(initial_population_size=f_0)

    # Generate data
    times = np.linspace(start=0, stop=1000, num=5)
    model_output = model.simulate(parameters=[r, k], times=times)
    gauss = np.random.normal(loc=0.0, scale=1.0, size=len(model_output))
    observations = model_output + sigma * gauss

    # Save data as time-observation tuples
    data += [np.vstack([times, observations])]

In [None]:
# Create figure
plt.subplots(rows=2, cols=2, sharex=True, sharey=True, figsize=(12, 6))

# Plot model output (no noise)
plt.plot(data[0, :], model_output, label='model output')

# Plot generated data
plt.scatter(data[0, :], data[1, :], label='data', edgecolors='black', alpha=0.5)

# Create X and Y axis title
plt.xlabel('Time [dimensionless]')
plt.ylabel('Population size [dimensionless]')

# Create legend
plt.legend()

# Show figure
plt.show()