# Thinking about flow rates
## Risk and rates
The parameters applied to the flows in compartmental models  represent the rate of transition,
e.g. the rate of transition from the source to destination compartment
discussed in [notebook 03](./03-flows-introduction.ipynb).
However, in epidemiology, often the empiric data we are dealing with 
has not been provided us in the format of a rate (per unit time).
More commonly, we might know the risk of a process happening,
either over the course of an illness episode or over a period of time.
It's really important to understand the relationship between 
the parameters values applied to flows
and field epidemiological observations.

In [None]:
try:
    import google.colab
    %pip install summerepi2==1.1.1
except:
    pass

In [None]:
import numpy as np

from summer2 import CompartmentalModel
from summer2.parameters import Parameter

To get started thinking about risks and rates,
let's grab our extremely simple single transition model from the [previous notebook](./03-flows-introduction.ipynb).

In [None]:
def get_single_transition_model(
    config: dict,
) -> CompartmentalModel:
    
    compartments = (
        "source",
        "destination",
    )
    analysis_times = (0.0, config["end_time"])
    model = CompartmentalModel(
        times=analysis_times,
        compartments=compartments,
        infectious_compartments=[],
    )
    model.set_initial_population(
        distribution={"source": config["population"]}
    )
    
    model.add_transition_flow(
        "transition", 
        fractional_rate=Parameter("transition_rate"), 
        source="source", 
        dest="destination"
    )
    
    return model

In [None]:
model_config = {
    "population": 1.0,
    "end_time": 20.0,
}
parameter = {
    "transition_rate": 0.1,
}

transition_model = get_single_transition_model(model_config)
transition_model.run(parameters=parameter)

risk_time = 10.0
dec_places = 3
risk_value = round(transition_model.get_outputs_df()["destination"][risk_time], dec_places)

print(f"The risk of reaching the destination compartment after ten time units is {risk_value}")

As introduced in the previous notebook,
where we have a single outflow from our compartment of interest,
the relationship between the size of the destination compartment
and the transition rate can be given by:
$$ risk = 1 - e^{-rate \times t} $$
This is the risk of reaching the destination compartment after $t$ time units have elapsed.

Solving for rate, we have:
$$ rate = \frac{-log_{e}(1 - risk)}{t} $$
So we could use this rearrangement of the formula to work backwards and calculate that
the rate we should apply to the source compartment using empiric data.

This would be the approach we should use 
if we had epidemiological evidence that told us that 63.2% of people
had reached the destination compartment after ten time units
(e.g. from a cohort study or clinical trial that reports information in this way).

In [None]:
recalculated_rate = round(-np.log(1.0 - risk_value) / risk_time, dec_places)
print(f"To achieve a risk of {risk_value} after {round(risk_time)} time units, we need a rate of {0.632}.")

## Competing flows
Let's dig into what applying these flows to a model means a little further.
The analyses in this notebook are extremely simple,
and are just intended to give a sense for how to think about flows and compartments.
Consider a very similar model to the one we saw in the previous notebook,
but with two competing transition flows applied to our source compartment instead of one.

![](../images/source_two_dest_structure.svg)

In [None]:
def get_competing_transition_model(
    config: dict,
) -> CompartmentalModel:
    
    compartments = (
        "source",
        "destination_0",
        "destination_1",
    )
    analysis_times = (0.0, config["end_time"])
    model = CompartmentalModel(
        times=analysis_times,
        compartments=compartments,
        infectious_compartments=[],
    )
    model.set_initial_population(
        distribution={"source": config["population"]}
    )
    
    model.add_transition_flow(
        "transition_0", 
        fractional_rate=Parameter("transition_0"), 
        source="source", 
        dest="destination_0"
    )
    model.add_transition_flow(
        "transition_1", 
        fractional_rate=Parameter("transition_1"), 
        source="source", 
        dest="destination_1"
    )
    
    return model

In [None]:
model_config = {
    "population": 1.0,
    "end_time": 200.0,
}
parameters = {
    "transition_0": 0.01,
    "transition_1": 0.02,
}

transition_model = get_competing_transition_model(model_config)
transition_model.run(parameters=parameters)
outputs = transition_model.get_outputs_df()

## Sojourn times
When considering the rate at which a person exits a compartment in the absence of any inward flows,
the rate of exiting the compartment is the sum of the outflow rates.
In this example, the average time spent in the `source` compartment is 
the reciprocal of the sum of the two transition rates.

If these rates remain constant over time and in the absence of inward flows,
the size of the compartment at time $t$ is:
$$ e ^{-outflows \times t} $$
and the average time in the compartment is given by:
$$ \int_0^\infty e ^{-outflows \times t} dt $$
which is equal to:
$$ \frac{1}{outflows} $$
This can also be termed the "sojourn time" of the compartment.

In [None]:
print(f"The average sojourn time for the source compartment is {round(1.0 / sum(parameters.values()), 2)} time units.")

We can check this numerically by multiplying the number of people arriving in the destination
compartment at each time step by the time it took them to arrive there
(although this gives us a very slight underestimate 
because the model isn't run for infinite time).

In [None]:
weighted_arrival_times = outputs.diff()["source"] * outputs.index
print(f"The average time to arrive in the destination compartment is {-weighted_arrival_times.sum()}")

## Median transition time
Of course, this does not imply that half of the population 
will have left the `source` category after one sojourn time.
To obtain this value we would solve the equation:
$$ e ^{-outflows \times t} = \tfrac{1}{2} $$
to get:
$$ \frac{-log(\tfrac{1}{2})}{outflows} $$

In [None]:
print(f"The time when half of the population have left the source is {round(-np.log(0.5) / sum(parameters.values()), 3)} time units.")

Again, we can check whether this is approximately correct 
from the numerical solutions we obtained for our simple model.

In [None]:
print(f"The first time step with more than half the source compartment depleted is {outputs['source'][outputs['source'].lt(0.5)].index[0]} time units.")

## Risks from competing rates
Often in epidemiology,
we may want to think about the risk of an outcome for a person in a particular state.
For example, we may want to know what the risk of ending up
in the `destination_0` compartment rather than `destination_1`.
A more real-world example may be if we have applied a recovery
and a death flow to the infectious model compartment and want to calculate
the infection fatality rate that this implies.
In this example, there are only two possible outcomes,
which are applied together for any person entering this compartment,
and these flows compete with one another.

The risk of following each of the outflows from the compartment is
proportional to the magnitude or rate of that flow.
In our example, 
the risk of transition_0 for a person entering the source compartment
is the rate of transition_0 divided by the total of all outflows.

In [None]:
print(f"The risk of following transition_0 is {round(parameters['transition_0'] / sum(parameters.values()) * 100)}%.")

This is probably pretty obvious,
but we can easily check from our numeric solutions.

In [None]:
outputs["destination_0"] / outputs[["destination_0", "destination_1"]].sum(axis=1)

These considerations are often important when we come 
to estimating parameters for our system.

## Non-proportional hazards
Things may often not be as easy as we would like,
and the proportion of people reaching one outcome
rather than the other may be different depending on 
the time point that we consider.
In this situation, we should not apply our risk/rate logic 
to a single compartment directly 
if we have information on the risk of two competing outcomes at
two different points in time.

For example, we might know that 30% of people recover 
and 70% of people die as a result of a particular infection,
and that recoveries occur after an average of 5 days
but the deaths occur after an average of 10 days.
We can't calculate the two flow rates from risks using
our earlier equation for deriving rates from risks.

There are a few options available for a situation like this,
but we would first have to decide whether to explicitly represent
the two processes with compartments of their own.
If we did this, we could make the two calculations separately,
which would be a valid approach and might be the simplest option.
However, if we really wanted to reduce complexity and represent 
the two competing outcomes as outflows from one compartment,
it may be impossible to incorporate all of our empiric data
using flows represented by exponential declines.
Rather, the best we could do would be to decide on 
the proportion of people that should reach each of 
out two outcomes and settle on an average sojourn time
for anyone entering the compartment.

Nevertheless, if we understand these principles of model parameterisation,
we can make a judgement about what the least unrealistic
model parameter choices are to the situation at hand.