# Capacity Building
## Prerequisites
- Some basic understanding of Python variables, data types, looping, conditionals and functions will be of benefit.
- Completion of  01-basic-model.ipynb

## Introduction
Once a suitable model structure is identified to describe infection disease dynamics, parameter values need to be inferred to describe the flows between the compartments. These flows determine how many individuals move into and out of the compartments per unit time. The model parameter values can be estimated through existing knowledge or through fitting the model to available data, or some combination. These parameter values will govern the rate of change in the compartments (e.g. the rate of change in the number of susceptibles or immune individuals). When estimating these rates, the average per capita time for an event (i.e. the reciprocal of rate at which the event occurs) is the key consideration. For example, the average life expectancy and mortality rate are related as,
<br><center>Average life expectancy = 1 / mortality rate.</center>
<br>The ODE system will be numerically solved behind the scenes using an ODE solver.

## Data inputs
### Imports

Let's import some modules. A module is a library of Python code that we can leverage to provide useful functionality.<br> These may be part of the standard Python library, or be external packages

In [None]:
# Install the summer package
# Pip is Python's standard package manager

%pip install summerepi

In [None]:
# Python standard library imports come first
from datetime import datetime, timedelta 
from typing import List

# Then external package imports
import pandas as pd
import numpy as np
from summer import CompartmentalModel
from matplotlib import pyplot as plt 

plt.style.use("ggplot")

# Define constants
GITHUB_MOH = (
    "https://raw.githubusercontent.com/MoH-Malaysia/covid19-public/main/epidemic/"
)

MOH_FILES = [
    "cases_malaysia",
    "deaths_malaysia",
    "hospital",
    "icu",
    "cases_state",
    "deaths_state",
]

COVID_BASE_DATE = datetime(2019, 12, 31)

region = "Malaysia"

### Utility functions

In [None]:
def fetch_mys_data(base_url: str, file_list: List[str]) -> pd.DataFrame:
    """
    Request files from MoH and combine them into one data frame.
    """
    a_list = []
    for file in file_list:
        data_type = file.split('_')[0]
        df = pd.read_csv(base_url + file + ".csv")
        df['type']  = data_type
        a_list.append(df)
    df = pd.concat(a_list) 
    
    return df

Now call the function and pass it the MoH url.<br>

In [None]:
df = fetch_mys_data(GITHUB_MOH, MOH_FILES)

# Same preprocessing steps as in notebook 1
df.loc[df['state'].isna(), 'state'] = 'Malaysia' 
df['date'] = pd.to_datetime(df['date'])
df['date_index'] = (df['date'] - COVID_BASE_DATE).dt.days

# Configure mask for analysis
mask = (df['state'] == region) & (df['type'] == 'cases')

Let's also download the latest population distributions from the MoH GitHub repository.

In [None]:
population_url = 'https://raw.githubusercontent.com/MoH-Malaysia/covid19-public/main/static/population.csv'
df_pop = pd.read_csv(population_url)
initial_population = df_pop[df_pop['state'] == region]['pop'][0]

## Build a model

In [None]:
start_date = datetime(2021,1,1)  # Define the start date
end_date = start_date + timedelta(days=300)  # Define the duration

# Integer representation of the start and end dates.
start_date_int = (start_date - COVID_BASE_DATE).days
end_date_int = (end_date- COVID_BASE_DATE).days

Extract the target data from the MoH dataframe - we'll use this later

In [None]:
notifications_target = df[mask][start_date_int: end_date_int]['cases_new']

## Structuring Model Code

We will wrap the model building code in a function. Any parameterised input can be passed in to this function so the model can be easily modified.

In [None]:
# This is a very basic starting function that simply creates a CompartmentalModel
# and initialises it with the initial_population

# Note that we are not passing anything into this function yet;
# the start and end dates, and the population are 'captured' from the external scope
# ie the global program environment

def build_base_model() -> CompartmentalModel:
    model = CompartmentalModel(
        times=(start_date_int, end_date_int),
        compartments=["S", "E", "I", "R"],
        infectious_compartments=["I"],
        timestep=1.0,
    )

    model.set_initial_population(distribution={"S": initial_population - 100, "E": 0, "I": 100})
    return model

### Time varying parameters (transition flow)
The rate at which people transition can be set as a constant, or it can be defined as a function of time. This is the case of all of the flows: every parameter can be a constant or a function of time. Parameters also take a ‘computed_values’ argument, which is a dictionary of values computed at runtime that is not specific to any individual flow.

In [None]:
# We define this function globally - 
# it will be used as an argument later when we create flows in the model

def recovery_rate(time, computed_values):
    """
    Returns the recovery rate for a given time.
    People recover faster after day ten due to a magic drug.
    """
    if time < 517: # half way through our analysis
        return 0.1
    else:
        return 0.4

### Adding inter-compartmental flows 

Now, let's add some flows for people to transition between the compartments. These flows will define the dynamics of our infection. We will add:

- an infection flow from S to E (using frequency-dependent transmission)
- an exposed individual becomes infected E to I
- a recovery flow from I to R

In [None]:
def build_model_with_hardcoded_flows() -> CompartmentalModel:
    
    # We start by calling our base model building function from above
    # This allows us to easily create variations, always building on the same base
    malaysia_model = build_base_model()
    # Susceptible people can get infected.
    malaysia_model.add_infection_frequency_flow(name="infection", contact_rate=0.18, source="S", dest="E")

    # Expose people transition to infected.
    malaysia_model.add_transition_flow(name="progression", fractional_rate=1/10, source="E", dest="I")

    # Infectious people recover.
    # Note that we are passing our 'recovery_rate' function, defined above,
    # as an argument to this flow. Many aspects of summer can be customised and extended 
    # in this way, by passing a function rather than a scalar
    malaysia_model.add_transition_flow(name="recovery", fractional_rate=recovery_rate, source="I", dest="R")

    # Importantly, we will also request an output for the 'progression' flow, and name this 'notifications'
    # This will be available after a model run using the get_derived_outputs_df() method

    malaysia_model.request_output_for_flow("notifications", "progression")

    return malaysia_model

In [None]:
model_with_flows = build_model_with_hardcoded_flows() 
# Inspect the new flows, which we just added to the model.
model_with_flows._flows

## Writing a better build_model - parameterization

In the function above, you can see a lot of hardcoded constants inside the flow arguments

Let's extract these to a dictionary, and rewrite the function to take this dictionary
as an argument

In [None]:
parameters = {
    "contact_rate": 0.18,
    "progression_rate": 1/10
}

In [None]:
def build_model_with_flows(parameters: dict) -> CompartmentalModel:
    
    # Call the base model as before
    # This base model does not take parameters, but have a think about how it might...
    malaysia_model = build_base_model()
    # Susceptible people can get infected.
    # Note that we now look up the parameters dictionary instead of hardcoding a constant
    malaysia_model.add_infection_frequency_flow(
        name="infection", 
        contact_rate=parameters["contact_rate"], 
        source="S", 
        dest="E"
    )

    # Expose people transition to infected.
    malaysia_model.add_transition_flow(
        name="progression", 
        fractional_rate=parameters["progression_rate"], 
        source="E", 
        dest="I"
    )

    # Infectious people recover.
    malaysia_model.add_transition_flow(name="recovery", fractional_rate=recovery_rate, source="I", dest="R")

    # Importantly, we will also request an output for the 'progression' flow, and name this 'notifications'
    # This will be available after a model run using the get_derived_outputs_df() method

    malaysia_model.request_output_for_flow("notifications", "progression")

    return malaysia_model

In [None]:
malaysia_model = build_model_with_flows(parameters=parameters)

In [None]:
# Iterate through the flows in the model
# As you can see, they have the values we passed in from the dictionary
for f in malaysia_model._flows:
    print(f, f.param)

In [None]:
# Actually, that's handy - let's make it a function

def inspect_flows(model: CompartmentalModel):
    for f in model._flows:
        print(f, f.param)

In [None]:
# Exercise : build a model with some different parameter values
new_parameters = "create a dictionary here"

# What arguments does this need?
new_params_model = build_model_with_flows()

# Inspect the flows for this model
# Enter your code here...


### Running the model

Now we can calculate the outputs for the model over the requested time period. 
The model calculates the compartment sizes by solving a system of differential equations (defined by the flows we just added) over the requested time period.

In [None]:
malaysia_model.run()

### Display the model outputs

The recommended way to access the model's results is via the get_outputs_df() method

In [None]:
mm_outputs_df = malaysia_model.get_outputs_df()
mm_outputs_df[["E","I","R"]].plot(figsize=(10,5)); # Don't plot the susceptible compartment because of y-axis scale

### Accessing derived outputs

Derived outputs are accessed in much the same way as the raw compartment outputs, via the get_derived_outputs_df() method

**Question: Which flow control contributes the most to notifications? Would you increase it or decrease it?**

In [None]:
mm_derived_df = malaysia_model.get_derived_outputs_df()
mm_derived_df.plot(figsize=(10,5));

### Plot the outputs

Sometimes it can be useful to write more customised plotting code using matplotlib

**Exercise: Modify the code below to show the susceptible individuals over time.**

In [None]:
# Visualise the results
subplot = {"title": "SEIR Model Outputs", "xlabel": "Days", "ylabel": "Compartment size"}
fig, ax = plt.subplots(1, 1, figsize=(10,5), subplot_kw=subplot)

for compartment in mm_outputs_df[["E","I","R"]]: # Loop over each compartment. 
    ax.plot(malaysia_model.times, mm_outputs_df[compartment]) # Plot the times and compartment values

ax.legend(["E", "I", "R"]);

### Function flow

A function flow gives you more control over how a flow should work. This is when you need to include more complex behaviour in your model which cannot be expressed using the built-in flows above.


In [None]:
def get_vaccination_flow_rate(flow, comp_names, comp_vals, flows, flow_rates, derived_values, time):
    """
    Returns the flow-rate of susceptible people who get vaccinated and become recovered.

    Args:
        flow: The flow object being run
        comp_names: List of compartment names (Compartment)
        comp_vals: Array of compartment values at this timestep
        flows: List of flow objects (used to calculate flow rates)
        flow_rates: Calculated flow rate for each non-function flow at this timestep
        time: Current timestep

    Returns: The flow rate (float)
    """
    if time < 450:
        # Vaccinate 500 people per day until day 450
        return 500
    elif 450 <= time < 500:
        # Vaccinate a tiny fraction of the population per day until day 500
        return 0.00001 * comp_vals.sum()
    else:
        # After day 500 stop vaccinations, because we ran out of money
        return 0


**Question: A vaccinated individual should transition from which two compartments?**

In [None]:
# Use a custom function to model vaccinations
# We could do it like this, operating directly on our malaysia_model object...

malaysia_model.add_function_flow("vaccination", flow_rate_func=get_vaccination_flow_rate, source="S", dest="R")

In [None]:
malaysia_model.run()
malaysia_model.get_outputs_df()[["E","I","R"]].plot()

In [None]:
# Now, let's rerun the last 2 cells a few times - notice anything happening?

In [None]:
# That's obviously a problem!
# The answer is again to rewrite this in a functional style...
# We can then use different parameters for the vaccination model, and
# as a bonus, we can use a single function that turns vaccination on or off

In [None]:
def build_vaccination_model(parameters: dict, vaccination: bool) -> CompartmentalModel:
    model = build_model_with_flows(parameters)
    if vaccination:
        model.add_function_flow(
            "vaccination", 
            flow_rate_func=get_vaccination_flow_rate, 
            source="S", 
            dest="R"
        )
    return model

In [None]:
# We already created this at the start of the notebook,
# but let's specify it again here, just for clarity
parameters = {
    "contact_rate": 0.18,
    "progression_rate": 1/10
}

In [None]:
model_vacc = build_vaccination_model(parameters, True)
model_unvacc = build_vaccination_model(parameters, False)

In [None]:
model_vacc.run()
model_unvacc.run()

In [None]:
# Let's see how this affected the notifications

vacc_derived_outputs = model_vacc.get_derived_outputs_df()
unvacc_derived_outputs = model_unvacc.get_derived_outputs_df()

vacc_outputs = model_vacc.get_outputs_df()
unvacc_outputs = model_unvacc.get_outputs_df()

vacc_derived_outputs.plot()
unvacc_derived_outputs.plot()

In [None]:
# That's probably a little hard to see - let's write a function we can reuse
# for model comparison

def plot_comparison(label_a: str, data_a: pd.Series, label_b: str, data_b: pd.Series, title: str):
    # Visualize the results.
    subplot = {"title": title, "xlabel": "Days", "ylabel": "Value"}
    fig, ax = plt.subplots(1, 1, figsize=(10,5), subplot_kw=subplot)

    ax.plot(data_a.index, data_a) 
    ax.plot(data_b.index, data_b)

    ax.legend([label_a, label_b]);

In [None]:
plot_comparison(
    "Vacc", vacc_derived_outputs["notifications"],
    "Unvacc", unvacc_derived_outputs["notifications"],
    "Notifications"
)

In [None]:
# How would you use the plot_comparison to view other outputs..? 

In [None]:
# Let's allow for the fact that case detection is never complete,
# by multiplying the model outputs through by a constant value
proportion_of_cases_detected = 0.5

plot_comparison("Notification", 
                notifications_target, 
                "Modelled",
                vacc_derived_outputs["notifications"] * proportion_of_cases_detected,
                "Modelled vs data")

In [None]:
# This cell pulls all of the above together; 
# Try changing the various values

parameters = {
    "contact_rate": 0.18, 
    "progression_rate": 0.1
}

vacc_model = build_vaccination_model(parameters, True)
vacc_model.run()
vacc_derived_outputs = vacc_model.get_derived_outputs_df()

proportion_of_cases_detected = 0.5

plot_comparison("Notification", 
                notifications_target, 
                "Modelled",
                vacc_derived_outputs["notifications"] * proportion_of_cases_detected,
                "Modelled vs data")

## Summary

That's it for now, now you know how to:

- Define compartmental flows
- Understand the different flow types in summer
- Build reusable model code with a basic expression of parameters

A detailed API reference for the flow types can be found [here](http://summerepi.com/api/flows.html)
