# Derived outputs

In the first notebook, we introduced our general approach to creating and running a simple compartmental model of the transmission of an acute immunising infection. This notebook builds on this simple model to consider other "derived outputs" that we might want to examine other than just the estimated compartment sizes over time. In general, we may wish to estimate epidemiological quantities that are derived from some combination of:

- The model compartment sizes for each timestep
- The model flow rates at each timestep
- Model inputs

_summer_ offers a range of approaches to calculating model outputs beyond
just the absolute size of the various compartments modelled,
which are described in the _summer_ API.

In [None]:
# If running on Google Colab, run the following line of code to install the summer package
# %pip install summerepi2

In [None]:
from jax import numpy as jnp
import pandas as pd
pd.options.plotting.backend = "plotly"

from summer2 import CompartmentalModel
from summer2.parameters import Parameter, DerivedOutput

In [None]:
def get_sir_model(
    config: dict,
) -> CompartmentalModel:
    """
    This is the same model as introduced in notebook 02.
    """
    
    compartments = (
        "susceptible",
        "infectious",
        "recovered",
    )
    analysis_times = (0.0, config["end_time"])
    model = CompartmentalModel(
        times=analysis_times,
        compartments=compartments,
        infectious_compartments=["infectious"],
    )
    model.set_initial_population(
        distribution=
        {
            "susceptible": config["population"] - config["seed"], 
            "infectious": config["seed"],
        }
    )
    model.add_infection_frequency_flow(
        name="infection", 
        contact_rate=Parameter("contact_rate"),
        source="susceptible", 
        dest="infectious",
    )
    model.add_transition_flow(
        name="recovery", 
        fractional_rate=Parameter("recovery"),
        source="infectious", 
        dest="recovered",
    )
    model.add_death_flow(
        name="infection_death", 
        death_rate=Parameter("infection_death"),
        source="infectious",
    )
    return model

In [None]:
model_config = {
    "population": 1000.,
    "seed": 10.,
    "end_time": 20.,
}

parameters = {
    "recovery": 0.333,
    "infection_death": 0.05,
    "contact_rate": 1.,
}

sir_model = get_sir_model(model_config)
sir_model.run(parameters=parameters)
compartment_values = sir_model.get_outputs_df()
compartment_values.plot.area(
    labels={"index": "time", "value": "compartment size"},
)

## Calculations based on compartment sizes
First let's think about the proportion of the population ever infected.
This might be a particularly important output, because it 
might be the best quantity emerging from our model to compare against data from a serosurvey.
Specifically, we want to know the proportion of the total population that is
in either the `infectious` or `recovered` compartments.
In our very simple model, this can be easily derived from the compartment sizes.
This sort of quantity is easy to derive from the
compartment size dataframe that can be output from the model after it has run
using the `get_outputs_df` method, as shown in the previous cell.
However, these calculations can get more complicated,
so we'll demonstrate the syntax for asking the _summer_ object to calculate this.

In [None]:
# Find the size of the compartments that have ever been infected
sir_model = get_sir_model(model_config)
sir_model.request_output_for_compartments(
    name="ever_infected", 
    compartments=["infectious", "recovered"]
)

# Find the total population
sir_model.request_output_for_compartments(
    name="total_population",
    compartments=sir_model.compartments,
)

# Get the proportion
sir_model.request_function_output(
    name="prop_ever_infected",
    func=DerivedOutput("ever_infected") / DerivedOutput("total_population")
)
sir_model.run(parameters=parameters)
derived_outputs = sir_model.get_derived_outputs_df()
derived_outputs["prop_ever_infected"].plot.area(
    title="Seropositive proportion",
    labels={"index": "time", "value": "proportion"},
).update_layout(showlegend=False)

## Flow outputs
Although the distribution of the population is often a quantity we're very interested in,
the rate at which people are transitioning between the modelled compartments may also be of interest.
By this, we now mean the total magnitude of the flow,
or the number of people transitioning along a particular pathway per unit time,
rather than the _per capita_ rate (i.e. the parameter value for the flow).
We can track this sort of quantity by requesting a flow output from _summer_.
In infectious diseases, and epidemiology more generally,
we can refer to this sort of quantity as "incident" rather than "prevalent".
For example, we might be interested in transition rates like:
- The rate of new infections
- The rate of new cases
- The rate of new symptomatic cases
- The rate of new hospitalisations
- The infection-specific mortality rate

Of course, we would need to explicitly simulate these states in order to be ble to estimate them from our model, 
and our simple SIR model does not include compartments to represent many of these states.

Using our existing model, let's track a couple of these quantities that are explicitly modelled,
i.e. the incidence rate of infectious disease 
(number of people entering the `infectious` compartment per unit time)
and the mortality rate (number of new deaths per unit time).
Of course, the units of these quantities are persons per unit time,
rather than persons (which is the unit for the model compartments).

![](../images/sir_transition.svg)

In [None]:
sir_model = get_sir_model(model_config)
sir_model.request_output_for_flow(
    name="incidence", 
    flow_name="infection"
)
sir_model.request_output_for_flow(
    name="mortality", 
    flow_name="infection_death"
)
sir_model.run(parameters=parameters)
sir_outputs = sir_model.get_derived_outputs_df()
sir_outputs.plot(
    labels={"index": "time", "value": "rate per unit time"},
)

### Distinguishing infection from incidence

With the SIR model that we have mostly been using,
infection and incidence of infectious cases are identical
because the onset of infectiousness occurs exactly as infection occurs.
However, as we saw in notebook 04, 
we may often want to include in our model a delay between infection 
and the onset of infectiousness (i.e. a latent interval).
In this case, there would be a distinction between the point at which
people are infected and the point at which they progress to infectiousness,
and so the incidence rate should be tracked through the `progression` flow.

![](../images/seir_transition.svg)

In [None]:
def get_seir_model(
    config: dict,
) -> CompartmentalModel:
    
    compartments = (
        "susceptible",
        "exposed",
        "infectious",
        "recovered",
    )
    analysis_times = (0., model_config["end_time"])
    model = CompartmentalModel(
        times=analysis_times,
        compartments=compartments,
        infectious_compartments=["infectious"],
    )
    model.set_initial_population(
        distribution=
        {
            "susceptible": config["population"] - config["seed"], 
            "infectious": config["seed"],
        }
    )
    model.add_infection_frequency_flow(
        name="infection", 
        contact_rate=Parameter("contact_rate"),
        source="susceptible", 
        dest="exposed",
    )
    model.add_transition_flow(
        name="progression", 
        fractional_rate=Parameter("progression"),
        source="exposed", 
        dest="infectious",
    )
    model.add_transition_flow(
        name="recovery", 
        fractional_rate=Parameter("recovery"),
        source="infectious", 
        dest="recovered",
    )
    model.add_death_flow(
        name="infection_death", 
        death_rate=Parameter("infection_death"),
        source="infectious",
    )
    model.request_output_for_flow(
        name="infection", 
        flow_name="infection",
    )
    model.request_output_for_flow(
        name="incidence",
        flow_name="progression",
    )
    return model

In [None]:
seir_model = get_seir_model(model_config)
parameters.update(
    {
        "progression": 0.5,
    }
)
seir_model.run(parameters=parameters)
seir_outputs = seir_model.get_derived_outputs_df()
seir_outputs.plot(
    labels={"index": "time", "value": "rate per unit time"},
)

## Incomplete case detection
In the previous model,
we have an output called "incidence" which tracks all transitions
to the infectious state.
However, for most infectious diseases, we do not observe all the 
infection episodes that occur in the population.
Rather, the situation in reality is one of "partial case detection"
or "partial case ascertainment", 
whereby only some fraction of all new incident episodes are
detected by the surveillance system through which we observe the epidemic.'
Let's consider a situation in which only half of all incident cases
result in notification to the surveillance system of the simulated population.

In [None]:
partial_cdr_model = get_seir_model(model_config)

parameters.update({"cdr": 0.5})

# Add a very simple derived output for our partial observation model
partial_cdr_model.request_function_output(
    name="notifications",
    func=DerivedOutput("incidence") * Parameter("cdr"),
)

partial_cdr_model.run(parameters=parameters)
partial_cdr_model = partial_cdr_model.get_derived_outputs_df()
partial_cdr_model.plot(
    labels={"index": "time", "value": "rate per unit time"},
)