## The renewal process
The renewal process forms the basis of the models throughout the notebooks in this folder.

The idea of a renewal process is quite a general one,
although there are many ways of constructing renewal model in practice,
and we focus on one such approach in our implementation.
However, for context, we will start off by describing how
our approach to renewal modelling fits with other literature
that adopts this approach.

### What is a renewal process?
The basic idea of a renewal process is that previously observed cases of
an infectious disease in an epidemic are responsible for triggering 
the cases observed on a subsequent date.
This is of course captures the basic process by which
infectious disease epidemics propagate.
As such, the number of cases on this date of interest has two ingredients:
1. The average infectiousness of active cases on the date of interest
2. The weighted infectiousness of all previous cases

#### Average infectiousness of cases on a specific date
This variable is called the "instantaneous reproduction number" or "$R_t$" 
and is distinguished in renewal models because it
is associated with a widely understood epidemiological intuition.
Specifically, $R_t$ should be thought of as 
the average number of secondary cases that
an infectious individual would be expected to generate if they 
retained their current level of infectiousness throughout their 
infectious period.
It is therefore a key metric of the epidemic state and 
the current effectiveness of control interventions.

#### The weighted infectiousness of all previous cases
Having separated out $R_t$, the remaining quantities that we need
in order to relate $R_t$ to new incidence can be thought
of as the proportion of each person's total infectiousness
that occurs on a particular day of their infection episode.
That is, it is the (discrete) distribution of infectiousness over time 
normalised such that the total values of the distribution sum to one.

#### The equation
If we know the current instantaneous reproduction number,
the time series of previous cases and the distribution of the
generation interval from previous cases to new case 
generation, we can define the following relationship.
$$ I_t = R_t \times \sum_{\tau}I_{\tau}\times{g(t-\tau)} $$
Here $t$ indicates the time dependence (usually in days), 
such that $I_t$ represents the observed incidence on day $t$,
and $R_t$ represents the instantaneous reproduction number on day $t$.
The $g(\cdot)$ function represents the (normalised, discrete) generation interval 
for the distribution of the time at which index cases go on to infect their contacts.
We use the symbol $\tau$ to index discrete time from the current time $t$
back to the start of the simulation
(or sometimes this may be truncated when $g(\cdot)$ has declined to very low levels).
We take the product of the incidence on each preceding day
and the density of the generation interval distribution for the time from that
day to the day for which we are making the calculation ($t$).
Finally we sum these products to obtain the infectiousness of all previous cases
weighted by their infectiousness at time $t$.

Note that this equation defines the relationship between
the three quantities of interest (the case incidence,
the instantaneous reproduction number and the generation interval).
Although this defines the main conceptual relationship we are interested in,
this does not imply that we will invariably be calculating 
case incidence from the other two quantities.

## Applications
Having reviewed some key literature on the use of renewal models for infectious disease
inference, we divide the various renewal models we identified into those that 
explicitly model case incidence forward in time as a latent state and those that do not.

### Non-latent state models

#### Extensions of compartmental models
[Bettencourt and Ribeiro](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0002185)
proposed an approach to estimating the reproduction number over time
that is conceptually derived from the assumptions underpinning compartmental models.
Unless modified, this approach inherently assumes that the generation
interval distribution is exponential.
This is unlikely to hold for many infectious diseases and 
flexibility in the generation interval distribution is a key component of
many renewal modelling approaches, including the form we have implemented.

#### Likelihood-based approach
[Wallinga and Teunis](https://academic.oup.com/aje/article-lookup/doi/10.1093/aje/kwh255)
proposed a novel likelihood-based approach to estimating the reproduction number.
This method first defines the relative likelihood that a given case $i$
has been infected by another specific case in the epidemic,
rather than any of the other cases in the epidemic, as:
$$ p_{ij} = \frac{w(t_i - t_j)}{\sum_{i \neq k}{w(t_i - t_k)}}$$
The inequality indicates that cases cannot infect themselves,
(which could also be achieved by setting $w(t)$ to zero at $t=0$).
The reproduction number can then be considered for a specific case $j$ as:
$$ R_j = \sum_{i}{p_{ij}} $$
Although this method was reported by the authors as being relatively robust 
to incomplete reporting, the approach does inherently assume
that notified cases were triggered by preceding cases in the same epidemic.
It also assumes that the chance that any case $j$ infected a specific 
case $i$ is independent of the chance that $j$ infected any other case ($k$),
although this assumption is common to many infectious disease models.

Importantly, this approach does not directly estimate the instantaneous 
reproduction number ($R_t$), but rather works forward to consider the
number of future cases that were infected by cases at a particular point in the epidemic.
This quantity is referred to as the "case" or "cohort" reproduction number,
and so is less applicable to estimates of the effect of population-wide interventions.

#### Conjugate prior-based approach
[Cori et al.](https://academic.oup.com/aje/article/178/9/1505/89262?login=false), 
leverage the association of the gamma distribution 
as the conjugate prior for the Poisson sampling distribution.
With this elegant approach, the authors start from the same renewal equation
as presented above (although notated in the reverse direction as 
$R_t\sum_{s=1}^{t}{I_{t-s}w_s})$,
and take this quantity as the mean of the Poisson distribution.
The distribution for $R_t$ can then be updated with the number
of cases observed on day $t$ to obtain a posterior distribution for $R_t$.
In practice, this approach provides estimates of $R_t$ that 
are highly variable from day to day. However, the authors 
address this problem by extending their technique to estimate
$R_t$ over a time window. This provides smoother and 
more epidemiological plausible estimates of $R_t$,
which can only be calculated from the time that the time window has elapsed onwards.

Limitations of this approach include that it would be impossible to
apply this approach directly to predict incidence or $R_t$ into the future.

### Latent state models
Since the onset of the COVID-19 pandemic, 
renewal models that include explicit representation of an incidence state 
that is calculated sequentially for each time point from the preceding 
incidence values have become more prominent.
These models have underpinned some of the most prominent
publications relating to COVID-19 epidemiology,
as well as several software packages that have been used 
by public health agencies for estimating $R_t$ to guide policy
(EpiEstim, EpiNow, EpiNow2).
With these approaches, a function of time is typically constructed 
to represent the evolution of $R_t$, 
the parameters of which can then be calibrated until an accurate
fit to the empiric observations is achieved.

#### EpiNow2
The [EpiNow2 platform](https://epiforecasts.io/EpiNow2/articles/estimate_infections.html) 
has been particularly popular for estimation of $R_t$ of COVID-19.
Although it offers a non-mechanistic approach in addition to 
the renewal equation-based method, 
the default implementation is based around the renewal model.
The renewal equation implemented by EpiNow2 is denoted:
$$ I_t=R_t\sum_{\tau=1}^{g_{max}}{g(\tau|\mu_g,\sigma_g)}I_{t-\tau}$$
Conceptually, this equation captures the same principles as outlined above,
with the $g_{max}$ limit to the summation indicating that 
the generation interval is typically truncated 
when the distribution density falls to negligible levels.
$\mu_g$ and $\sigma_g$ indicate the parameters to the distribution
used to represent the generation interval,
which may be gamma or log-normal.
By default, the function of $R_t$ over time is modelled with a Gaussian process.

#### Europe application
In this application, the study investigators adapted the standard
renewal approach to incorporate the depletion of the susceptible population.
Analyses were undertaken separately for each European country,
with additional notation used to indicate the country being addressed.
If this additional notation is ignored, the equation for 
the renewal model with susceptible depletion becomes:
$$c_t=(1-\frac{\sum_{i=1}^{t-1}{c_i}}{N})R_t\sum_{\tau=0}^{t-1}{c_\tau g_{t-\tau}}$$
where $N$ represents the total population of the country of analysis.
With this approach, the proportion of the population remaining susceptible
at each time point scales the incidence value,
thus requiring a greater value of $R_t$ to offset this.
This illustrates the flexibility of this approach, 
in that such models can track additional quantities 
that emerge from the model explicitly as time evolves.

#### Manaus application
The renewal approach was used to consider the COVID-19 epidemics 
in the city of Manaus in Brazil's State of Amazonas.
Manaus suffered a major epidemic of wild-type virus 
followed by a major epidemic of the Gamma (or P.1) variant.
As such, the study authors addressed several questions 
pertaining to the extent of immune escape of the new variant.
To achieve this, the previous renewal equation is extended to:
$$i_{s,t}=(1-\frac{n_{s,t}}{N})R_{s,t}\sum_{\tau<t}{i_{s,\tau}g_{t-\tau}}$$
Again, $N$ represents the total population size,
while $n_{s,t}$ represents the extent of population immunity to strain $s$
and incorporates both immunity from previous infection with strain $s$, 
as well as the partial cross-protection afforded by infection 
with the other ciruclating strain.

#### Advantages of latent state models
Because this approach 

The approach implemented within this package is most similar to this approach.


## Semi-mechanistic modelling introduction

### Rationale
Thinking about this equation in Faria, et al:
$\\i_{s,t} = (1-\frac{n_{s,t}}{N})R_{s,t}\sum_{\tau<t} i_{s,\tau}g_{t-\tau}$

This is a standard "semi-mechanistic" or "renewal" modelling approach,
in that the population is not explicitly partitioned into categories or compartments.

First, ignoring strains, we'll consider:

$i_t = (1-\frac{n_t}{N})R_t\sum_{\tau<t} i_{\tau}g_{t-\tau}$

This is essentially the same as the equation provided by [Cori, et al.](https://academic.oup.com/aje/article/178/9/1505/89262?login=true):

$\mathbf{E}[I_t] = R_t\sum_{s=1}^t I_{t-s}w_s$

For now, we'll also ignore susceptible depletion and a varying reproduction number, and so consider:

$i_t = R_0\sum_{\tau<t} i_\tau g_{t-\tau}$

This notebook builds up this basic approach very slowly as an illustration of what we will be doing in the analysis notebook.

In [None]:
from scipy.stats import gamma
import numpy as np
import pandas as pd
pd.options.plotting.backend = "plotly"

### Parameters
We'll some arbitrary model parameters to get started.

In [None]:
n_times = 20
seed = 1.0
r0 = 2.0
incidence = np.zeros(n_times)
incidence[0] = seed

### Generation time
Next, we'll get a distribution we can sensibly use for the generation time,
which could represent an acute immunising respiratory infection.

In [None]:
# Generation time summary statistics
gen_mean = 5.0
gen_sd = 1.5

# Calculate equivalent parameters
var = gen_sd ** 2.0
scale = var / gen_mean
a = gen_mean / scale
gamma_params = {"a": a, "scale": scale}

# Get the increment in the CDF
# (i.e. the integral over the increment by one in the distribution)
gen_time_densities = np.diff(gamma.cdf(range(n_times + 1), **gamma_params))

pd.Series(gen_time_densities, index=range(n_times)).plot(labels={"index": "time", "value": "density"}).update_layout(showlegend=False)

### Calculations
Here, we're using native Python loops with pre-calculated generation times
to be completely explicit (but slow).
Note that the delay is specified as `t - tau - 1` because
delay then starts from zero each time,
which then indexes the first element of the generation time densities.
As shown in the previous cell,
the `gen_time_densities` is the integral of the probability
density over each one-unit interval of the gamma distribution.

In [None]:
for t in range(1, n_times):
    val = 0
    for tau in range(t):  # For each day preceding the day of interest
        delay = t - tau - 1  # The generation time index for each preceding day to the day of interest
        val += incidence[tau] * gen_time_densities[delay] * r0  # Calculate the incidence value
    incidence[t] = val

Get rid of one loop to get lists/arrays for the incidence and generation time distribution 
(and check that calculations are the same).

In [None]:
check_inc = np.zeros(n_times)
check_inc[0] = seed
for t in range(1, n_times):
    delays = [t - tau - 1 for tau in range(t)]
    gammas = gen_time_densities[delays]
    check_inc[t] = (check_inc[:t] * gammas).sum() * r0
assert max(incidence - check_inc) < 1e-10, "Results diverging"

We can get this down to a one-liner if preferred.
The epidemic is going to just keep going up exponentially, of course, 
because $R_{0} > 1$ and there is no susceptible depletion.

In [None]:
check_inc2 = np.zeros(n_times)
check_inc2[0] = seed
for t in range(1, n_times):
    check_inc2[t] = (check_inc2[:t] * gen_time_densities[:t][::-1]).sum() * r0
check_inc2
assert max(incidence - check_inc2) < 1e-10, "Results diverging"
axis_labels = {"index": "day", "value": "incidence"}
pd.Series(incidence).plot(labels=axis_labels).update_layout(showlegend=False)

Already some interesting phenomena are emerging, 
in that the humps are the generations of cases from the first seeding infection
(which occurs at a single time point),
which progressively smooth into one-another with generations of cases.

### Threshold behaviour
Next let's check that the threshold behaviour is approximately correct.
We would expect a declining epidemic with $R_{0} < 1$ (even without
susceptible depletion implemented yet).

In [None]:
low_r_inc = np.zeros(n_times)
low_r_inc[0] = seed
r0 = 0.8
for t in range(1, n_times):
    low_r_inc[t] = (low_r_inc[:t] * gen_time_densities[:t][::-1]).sum() * r0
pd.Series(low_r_inc).plot(labels=axis_labels).update_layout(showlegend=False)

## Susceptible depletion
To add one layer of realism, we'll now start to think about susceptible depletion.

Again, from this equation in Faria, et al:
$\\i_{s,t} = (1-\frac{n_{s,t}}{N})R_{s,t}\sum_{\tau<t} i_{s,\tau}g_{t-\tau}$

And again reducing the complexity of this by ignoring strains,
we'll now consider the equation with susceptible depletion included:
$\\i_t = (1-\frac{n_t}{N})R_t\sum_{\tau<t} i_{\tau}g_{t-\tau}$

We'll now run the model with susceptible depletion,
decrementing the susceptible population by the incidence at each step.
We'll also zero out any negative values for the susceptibles
that could occur if the time step is too large
(which should be negligible for reasonable time step and parameter choices).
We'll need a higher reproduction number to deplete 
the susceptible population within the time window we have.

In [None]:
r0 = 6.0
pop = 100.0
deplete_inc = np.zeros(n_times)
deplete_inc[0] = seed
suscept = pop - seed
for t in range(1, n_times):
    suscept_prop = suscept / pop
    infect_contribution_by_day = deplete_inc[:t] * gen_time_densities[:t][::-1] * r0
    this_inc = infect_contribution_by_day.sum() * suscept_prop
    deplete_inc[t] = this_inc
    suscept = max(suscept - this_inc, 0.0)
pd.Series(deplete_inc).plot(labels=axis_labels).update_layout(showlegend=False)

Now with susceptible depletion, we have an epi-curve that goes up in the initial phase with $R_0 > 1$,
but comes back down as susceptibles are depleted and so $R_t$ falls below one.

## Varying $R_{0}$
Building on the previous cells and including susceptible depletion,
we'll now look at varying the reproduction number with time,
because inferring the variation in this quantity is what
we're aiming to achieve from these models.

As previously, the equation we're considering will be:
$\\i_t = (1-\frac{n_t}{N})R_t\sum_{\tau<t} i_{\tau}g_{t-\tau}$
However, now the $R_{t}$ value is determined both
by the proportion of the population remaining susceptible
and an extrinsic variable ("random") process.
At this stage, the process will be arbitrary values,
and there are several functions that could be used 
at this stage (including a random walk and an 
autoregressive process).

Set model parameters, now including the population size.
Also get the generation times as previously.
Run the model with susceptible depletion,
and a variable reproduction number.
Now we can manipulate the shape of the epicurve a little more.

In [None]:
var_r_inc = np.zeros(n_times)
var_r_inc[0] = seed
process_req = [2.0, 1.2, 2.4, 1.8]
process_times = np.linspace(0.0, n_times, len(process_req))
process_vals = np.interp(range(n_times), process_times, process_req)
suscept = pop - seed
for t in range(1, n_times):
    suscept_prop = suscept / pop
    infect_contribution_by_day = var_r_inc[:t] * gen_time_densities[:t][::-1] * r0
    this_inc = infect_contribution_by_day.sum() * suscept_prop * process_vals[t]
    var_r_inc[t] = this_inc
    suscept = max(suscept - this_inc, 0.0)
pd.Series(var_r_inc).plot(labels=axis_labels).update_layout(showlegend=False)

Alternatively, we may wish to use the log process values
rather than the straight linear parameters,
but we can get the same result back this way.
This is actually the approach we'll be using in the 
presented analyses.

In [None]:
check_var_inc = np.zeros(n_times)
check_var_inc[0] = seed
log_process_vals = np.log(np.interp(range(n_times), process_times, process_req))
suscept = pop - seed
for t in range(1, n_times):
    suscept_prop = suscept / pop
    infect_contribution_by_day = check_var_inc[:t] * gen_time_densities[:t][::-1] * r0
    this_inc = infect_contribution_by_day.sum() * suscept_prop * np.exp(log_process_vals[t])
    check_var_inc[t] = this_inc
    suscept = max(suscept - this_inc, 0.0)
assert max(var_r_inc - check_var_inc) < 1e-3, "Results diverging"