# Capacity Building
## Prerequisites
- Some basic understanding of Python variables, data types, looping, conditionals and functions will be of benefit
- Completion of  01-basic-model.ipynb, 02-flow-types.ipynb

## Stratification introduction

So far we've looked at how to create a compartmental model, add flows, request derived outputs and use different solvers. Now we'll look into stratifying a model using _summer_'s [Stratification](http://summerepi.com/api/stratification.html) class.

So far, we have modelled transmission dynamics across the overall population, with the population being entirely homogeneous except with regards the infection-related states. However, we may wish to consider that the infection dynamics vary with age, which they do in the case of COVID-19. For example, infection mortality may vary across different age groups, with older age groups having a higher risk of death. For some infections, we may also wish to consider differences in susceptibility to infection or differences in infectiousness. To capture such differences that are observed in the population structure, we can use stratifications in our models. 

A commonly used stratification is age-based stratifications. Here, the basic stratification methodology is to sub-divide the population into a number of discrete compartments classified by the age. Although age is a continuous parameter, age-structured compartmental models must group individuals into a limited number of classes. The number of compartments will depend on factors such as data availability and the epidemiological question being addressed. In age-structured modelsm, we may wish to allow people to progress into increasingly older age classes, although this would be less relevant for short-lived epidemics.

The simplest way to approach is to allow for two strata (children and adults), without any ageing between the two strata.

Age stratification would be essential to consider epidemiological issues or interventions that act differentially on different age categories, or where we wish to get age-specific outputs from our model.

In this example we'll cover:

- [No stratification](#No-stratification)
- [Minimal stratification](#Minimal-stratification)
- [Population distribution](#Population-distribution)
- [Flow adjustments](#Flow-adjustments)
- [Infectiousness adjustments](#Infectiousness-adjustments)
- [Partial stratifications](#Partial-stratifications)
- [Multiple stratifications](#Multiple-stratifications)
- [Multiple interdependent stratifications](#Multiple-interdependent-stratifications)

Note that stratification is just about the most complicated thing we can do with a compartmental model, and throws up 

## Data inputs
### Imports
The following few cells are just standard boilerplate that we will include in a similar format in most notebooks.
This should support you to run these cells either locally or over Colab.
We also include code to get the Philippines data set in memory, even though this notebook doesn't make use of it.

In [None]:
# If we are running in google colab, pip install the required packages, 
# but do not modify local environments
try:
  import google.colab
  IN_COLAB = True
  %pip install summerepi
except:
  IN_COLAB = False

In [None]:
from datetime import datetime, timedelta
import pandas as pd

from summer import CompartmentalModel

pd.options.plotting.backend = "plotly"

In [None]:
# The data import module lives in a file on AuTuMN github - download it for colab use
if IN_COLAB:
    !wget https://raw.githubusercontent.com/monash-emu/AuTuMN/master/notebooks/capacity_building/philippines/import_phl_data.py

import import_phl_data

In [None]:
from import_phl_data import get_population_and_epi_data

analysis_start_date = datetime(2021,1,1)  # Define the start date
analysis_end_date = analysis_start_date + timedelta(days=300)  # Define the duration

# Shareable google drive links
PHL_DOH_LINK = "1fFKoNVan7PS6BpBr01nwByNTLLu6z1AA"  # sheet 05 daily report.
PHL_FASSSTER_LINK = "15eDyTjXng2Zh38DVhmeNy0nQSqOMlGj3" # Fassster google drive zip file.
initial_population, df = get_population_and_epi_data(PHL_DOH_LINK, PHL_FASSSTER_LINK) 
notifications_target = df[analysis_start_date: analysis_end_date]["cases"]  # Could be used as calibration target later

# We define a day zero for the analysis.
COVID_BASE_DATE = datetime(2019, 12, 31)

# Integer representation of the start and end dates.
start_date_int = (analysis_start_date - COVID_BASE_DATE).days
end_date_int = (analysis_end_date - COVID_BASE_DATE).days

## Build a model

Recall the `build_base_model` wrapper function from the last training session.

In [None]:
def build_base_model() -> CompartmentalModel:
    """
    Create a minimal model, that we can then wrap the other pieces
    of functionality around.
    The model will not produce any interesting dynamics,
    because it currently has no flows linking the compartments at all.
    
    Returns:
        A summer compartmental model with compartments called S, E, I and R    
    """
    model = CompartmentalModel(
        times=(start_date_int, end_date_int),
        compartments=["S", "E", "I", "R"],
        infectious_compartments=["I"],
        ref_date=COVID_BASE_DATE
    )

    model.set_initial_population(
        distribution={"S": initial_population - 100, "E": 0, "I": 100}
    )
    
    return model

In [None]:
def build_model_with_flows(parameters: dict) -> CompartmentalModel:
    """
    Create a 'proper' SEIR model that now has its compartments linked
    together with connecting flows, including a transmission flow.
    
    Arguments:
        parameters: The user-specified quantities for building this model
    Returns:
        An unstratified SEIR summer compartmental model
    
    """

    # This base model does not take parameters, but have a think about how it might...
    model = build_base_model()

    # Susceptible people can get infected
    model.add_infection_frequency_flow(
        name="infection", 
        contact_rate=parameters["contact_rate"],
        source="S", 
        dest="E"
    )
    
    # Expose people transition to infected
    model.add_transition_flow(
        name="progression",
        fractional_rate=parameters["progression_rate"],
        source="E",
        dest="I",
    )

    # Infectious people recover
    model.add_transition_flow(
        name="recovery",
        fractional_rate=parameters["recovery_rate"],
        source="I",
        dest="R",
    )

    # Add an infection-specific death flow to the I compartment
    model.add_death_flow(name="infection_death", death_rate=0.01, source="I")

    # We will also request an output for the 'progression' flow, and name this 'notifications'
    # This is just in case we want to compare against the Philippines data and 
    # will be available after running through get_derived_outputs_df() (but isn't actually used in this notebook)

    model.request_output_for_flow("notifications", "progression")

    return model

In [None]:
# Create a parameters dictionary - we'll reuse this whenever building the model
user_params = {
    "contact_rate": 0.5,
    "progression_rate": 1 / 3,
    "recovery_rate": 1 / 5,
}

## No stratification

With no stratification, this is just a regular SEIR model: there are 4 compartments where susceptible people get exposed, infected/infectious, some of them die, and some of them recover.
This is just our basic building block to start adding stratifications to in the cells below - we haven't really achieved anything yet.

In [None]:
# Build and run model with no stratifications
nostrat_model = build_model_with_flows(user_params)
nostrat_model.run()

# Plot compartments
outputs_df = nostrat_model.get_outputs_df()
outputs_df.plot()

## Minimal stratification

Next, let's try a simple stratification where we split the population into 'young' (say, 0 to 18 years old) and 'old' (age 19 and above). Notice the following changes to the model outputs:

- There are now 8 compartments instead of 4: each original compartment has been split into an "old" and "young" compartment, with the original population evenly divided between them (by default).
- The model dynamics haven't changed otherwise: we will get the same results as before if we add the old and young compartments back together. This is because there is homogeneous mixing between strata and no demographic processes, etc.

In [None]:
from summer import Stratification

# Create a stratification named 'age', applying to all compartments, which
# splits the population into 'young' and 'old'.
strata = ["young", "old"]
strat = Stratification(name="age", strata=strata, compartments=["S","E", "I", "R"])

In [None]:
# Build and run model with the stratification we just defined
min_model = build_model_with_flows(user_params)

# After creating the compartments and flonostrat_modelneed to stratify the model 
# using the stratification object we created above.
min_model.stratify_with(strat)

And plot let's plot the eight epi curves ["young", "old"] * ["S","E", "I", "R"]

In [None]:
min_model.run()
outputs_df = min_model.get_outputs_df()
outputs_df.plot()

**Questions:**

Why are we seeing only four curves?

What is the difference between this model and its unstratified version above?

## Population distribution

We may not always wish to split the population evenly between strata. For example, we might know that 25% of the population is 'young' while 75% is 'old'. Notice that

- The stratified compartments are now split according to a 25:75 ratio into young and old respectively
- The overall model dynamics still haven't changed otherwise

In [None]:
strat = Stratification(name="age", strata=strata, compartments=["S", "E", "I", "R"])

# Create our population split dictionary, whose keys match the strata
pop_split = {"young": 0.25, "old": 0.75}

# Set a population distribution
strat.set_population_split(pop_split)

# Build and run model with the stratification we just defined
uneven_split_model = build_model_with_flows(user_params)
uneven_split_model.stratify_with(strat)

In [None]:
uneven_split_model.run()
outputs_df = uneven_split_model.get_outputs_df()
outputs_df.plot()

#### Reusable age stratification function

Now that we've got something slightly more meaningful, let's wrap it in a function for reuse.

In [None]:
def get_age_stratification() -> Stratification:
    """
    Get the a basic age stratification to apply to the standard compartments
    for an SEIR compartmental model,
    with an uneven split between the two age groups.
    Note that this stratification object has no function in itself,
    rather it just provides the instructions for this sort of stratification.
   
    Returns:
        A summer stratification object.
    """
    
    # Create the stratification
    strat = Stratification(name="age", strata=strata, compartments=["S", "E", "I", "R"])

    # Create our population split dictionary, whose keys match the strata
    pop_split = {"young": 0.25, "old": 0.75}

    # Set a population distribution
    strat.set_population_split(pop_split)
    
    return strat

## Flow adjustments

As noted so far, we've been successful in subdividing the population, but haven't actually changed our model dynamics, which means that there wasn't really that much point in applying the stratification in the first place. Next let's consider how we can adjust the flow rates being applied to the various strata. Let's assume three new facts about our disease:

- Young people are twice as susceptible to infection
- Old people die of the infectious disease at three times the baseline rate, while younger people die at half the rate (that is, old people die at six times the rate of young people)
- Younger people take twice as long to recover

These inter-strata differences can be modelled using flow adjustments. Now we'll see some genuinely new model dynamics. We'll also see that there are fewer recovered 'old' people at the end of the model run, because of their higher death rate.

In [None]:
# Re-create the stratification object
age_strat = get_age_stratification()

# Add an adjustment to the 'infection' flow
age_strat.set_flow_adjustments(
    "infection",
    {
        "old": None,  # No adjustment for old people, use baseline requested value
        "young": 2.0,  # Young people are twice twice as susceptible to infection
    },
)

# Add an adjustment to the 'infection_death' flow
age_strat.set_flow_adjustments(
    "infection_death",
    {
        "old": 3.0,  # Older people die at three times the rate requested under the original parameters
        "young": 0.5,  # Younger people die at half the rate requested under the original parameters
    },
)

# Add an adjustment to the 'recovery' flow
age_strat.set_flow_adjustments(
    "recovery",
    {
        "old": None,  # No adjustment for old people, use baseline
        "young": 0.5,  # Young people take twice as long to recover
    },
)

# Build and run model with the stratification we just defined
adjusted_model = build_model_with_flows(user_params)
adjusted_model.stratify_with(age_strat)

**Homework:**
1. Create a single data structure that represents the three disease dynamics discussed above.
2. Write a function and/or 'for loop' which calls set_flow_adjustments with each disease dynamic.

In [None]:
adjusted_model.run()
outputs_df = adjusted_model.get_outputs_df()
outputs_df.plot()

## Infectiousness adjustments

In addition to adjusting flow rates for each strata, we can also adjust the infectiousness of people with active infectious disease in a given strata. This affects how likely an infectious person in that stratum is to infect someone else. For example we could consider the following:

- Young people are 1.2 times as infectious (perhaps because of the nature of the infectious disease or because they aren't using face masks as much)
- Young people are twice as susceptible to the disease, because their immune system is relatively immature

In [None]:
# Create a stratification named
age_strat = get_age_stratification()

# Add an adjustment to the 'infection' flow
age_strat.set_flow_adjustments(
    "infection",
    {
        "old": None,  # No adjustment for old people, use baseline
        "young": 2.0,  # Young people twice as susceptible
    },
)

# Add an adjustment to infectiousness levels for young people in the 'I' compartment
age_strat.add_infectiousness_adjustments(
    "I",
    {
        "old": None,  # No adjustment for old people, use baseline
        "young": 1.2,  # Young people 1.2 times more infectious
    },
)

# Build and run model with the stratification we just defined
infect_adjust_model = build_model_with_flows(user_params)
infect_adjust_model.stratify_with(age_strat)

In [None]:
infect_adjust_model.run()
outputs_df = infect_adjust_model.get_outputs_df()
outputs_df.plot()

## Partial stratifications

So far we've been stratifying all compartments, but Summer allows only some of the compartments to be stratified. For example, we can stratify only the infectious compartment to model three different levels of disease severity: asymptomatic, mild and severe.

When you do a partial stratification, flow rates into that stratified compartment will automatically be adjusted with an even split to conserve the behaviour by default, e.g. a flow rate of 3 from a source will be evenly split into (1, 1, 1) across the three destinations. This behaviour can be manually overriden with a flow adjustment.

In [None]:
# This time, we'll create a function right away

def get_severity_strat() -> Stratification:
    # Create a stratification named 'severity', applying to the infectious, which
    # splits that compartment into 'asymptomatic', 'mild' and 'severe'.
    severity_strata = ["asymptomatic", "mild", "severe"]

    # Notice the new argument ["I"] for the compartment parameter.
    severity_strat = Stratification(name="severity", strata=severity_strata, compartments=["I"])

    # Set a population distribution - everyone starts out asymptomatic.
    severity_strat.set_population_split({"asymptomatic": 1.0, "mild": 0, "severe": 0})
    
    return severity_strat

# We need to call the function so we have a Stratification object to work with
severity_strat = get_severity_strat()

# Add an adjustment to the 'infection' flow, overriding default split.
severity_strat.set_flow_adjustments(
    "progression",
    {
        "asymptomatic": 0.3,  # 30% of incident cases are asymptomatic
        "mild": 0.5,  # 50% of incident cases are mild
        "severe": 0.2,  # 20% of incident cases are severe
    },
)

# Add an adjustment to the 'infection_death' flow
severity_strat.set_flow_adjustments(
    "infection_death",
    {
        "asymptomatic": 0.5,
        "mild": None,
        "severe": 1.5,
    },
)

severity_strat.add_infectiousness_adjustments(
    "I",
    {
        "asymptomatic": 0.5,
        "mild": None,
        "severe": 1.5,
    },
)

# Build and run model with the stratification we just defined
partial_strat_model = build_model_with_flows(user_params)
partial_strat_model.stratify_with(severity_strat)

In [None]:
partial_strat_model.run()
outputs_df = infect_adjust_model.get_outputs_df()
outputs_df.plot()

## Multiple stratifications

A model can have multiple stratifications applied in series. For example, we can add an 'age' stratification, followed by a 'severity' one.

In [None]:
### Age stratification

# Get the age stratification
age_strat = get_age_stratification()

# Add an adjustment to the 'infection' flow
age_strat.set_flow_adjustments(
    "infection",
    {
        "old": None,  # No adjustment for old people, use unstratified parameter value
        "young": 2.0,  # Young people are twice as susceptible
    },
)

# Add an adjustment to infectiousness levels for young people the 'I' compartment
age_strat.add_infectiousness_adjustments(
    "I",
    {
        "old": None,  # No adjustment for old people, use unstratified parameter value
        "young": 1.2,  # Young people are 1.2x more infectious
    },
)


### Disease severity stratification

# Get our severity stratification using the previously defined function
severity_strat = get_severity_strat()

# Add an adjustment to the 'infection' flow (overriding the default split of one third to each stratum)
severity_strat.set_flow_adjustments(
    "progression",
    {
        "asymptomatic": 0.3,  # 30% of cases are asympt.
        "mild": 0.5,  # 50% of cases are mild.
        "severe": 0.2,  # 20% of cases are severse.
    },
)

# Add an adjustment to the 'infection_death' flow
severity_strat.set_flow_adjustments(
    "infection_death",
    {
        "asymptomatic": 0.5,
        "mild": None,
        "severe": 1.5,
    },
)

severity_strat.add_infectiousness_adjustments(
    "I",
    {
        "asymptomatic": 0.5,
        "mild": None,
        "severe": 1.5,
    },
)


# Build and run model with the stratifications we just defined
multi_strat_model = build_model_with_flows(user_params)
# Apply age, then severity stratifications
multi_strat_model.stratify_with(age_strat)
multi_strat_model.stratify_with(severity_strat)

In [None]:
multi_strat_model.run()
outputs_df = infect_adjust_model.get_outputs_df()
outputs_df.plot()

## Multiple interdependent stratifications

In the previous example we assumed that the age and severity stratifications were independent. For example, we assumed that the proportion of infected people who have a disease severity of asymptomatic, mild and severe is the same for both young and old people. Perhaps, for a given disease, this is not true! it's easy to imagine an infection for which younger people tend towards being more asymptomatic, and older people tend towards having a more severe infection.

This interdependency between stratifications can be modelled using Summer, where a flow adjustment for a stratification can selectively refer to strata used for previous stratifications. You can refer to the API reference for [set_flow_adjustments](http://summerepi.com/api/stratification.html#summer.stratification.Stratification.set_flow_adjustments) for more details.

To clarify, let's consider the example described above:

In [None]:
### Age stratification

# Get the age stratification
age_strat = get_age_stratification()

### Disease severity stratification (depends on the age stratification)
# Get the severity stratification
severity_strat = get_severity_strat()

# Add an adjustment to the 'progression' flow for young people
# where younger people tend towards asymptomatic infection
young_progression_adjustments = {
    "asymptomatic": 0.5,  # 50% of cases are asympt.
    "mild": 0.4,  # 40% of cases are mild.
    "severe": 0.1,  # 10% of cases are severe.
}

severity_strat.set_flow_adjustments(
    "progression",
    young_progression_adjustments,
    source_strata={
        "age": "young"
    },  # Only apply this adjustment to flows of young people
)

# Add an adjustment to the 'infection' flow for old people
# where older people tend towards severe infection
old_progression_adjustments = {
    "asymptomatic": 0.1,  # 10% of cases are asympt.
    "mild": 0.4,  # 40% of cases are mild.
    "severe": 0.5,  # 50% of cases are severe.
}

severity_strat.set_flow_adjustments(
    "progression",
    old_progression_adjustments,
    source_strata={"age": "old"},  # Only apply this adjustment to flows of old people
)

# Add an adjustment to the 'infection_death' flow (for all age groups)
severity_strat.set_flow_adjustments(
    "infection_death",
    {
        "asymptomatic": 0.5,
        "mild": None,
        "severe": 1.5,
    },
)

# Adjust infectiousness levels (for all age groups)
severity_strat.add_infectiousness_adjustments(
    "I",
    {
        "asymptomatic": 0.5,
        "mild": None,
        "severe": 1.5,
    },
)


# Build and run model with the stratifications we just defined
interact_strat_model = build_model_with_flows(user_params)
# Apply age, then severity stratifications
interact_strat_model.stratify_with(age_strat)
interact_strat_model.stratify_with(severity_strat)

In [None]:
interact_strat_model.run()
outputs_df = infect_adjust_model.get_outputs_df()
outputs_df.plot()