# Capacity Building
## Prerequisites
- Some basic understanding of Python variables, data types, looping, conditionals and functions will be of benefit.
- Completion of  01-basic-model.ipynb
## Data inputs
### Imports

Let's import some modules. A module is a library of Python code that we can leverage to provide useful functionality.<br> These may be part of the standard Python library, or be external packages

In [None]:
# Install the summer package
# Pip is Python's standard package manager

%pip install summerepi

In [None]:
# Python standard library imports come first
from datetime import datetime, timedelta 
from typing import List

# Then external package imports
import pandas as pd
import numpy as np
from summer import CompartmentalModel
from matplotlib import pyplot as plt 

plt.style.use("ggplot")

# Define constants
GITHUB_MOH = (
    "https://raw.githubusercontent.com/MoH-Malaysia/covid19-public/main/epidemic/"
)

MOH_FILES = [
    "cases_malaysia",
    "deaths_malaysia",
    "hospital",
    "icu",
    "cases_state",
    "deaths_state",
]

COVID_BASE_DATE = datetime(2019, 12, 31)

region = "Malaysia"

### Utility functions

In [None]:
def fetch_mys_data(base_url: str, file_list: List[str]) -> pd.DataFrame:
    """
    Request files from MoH and combine them into one data frame.
    """
    a_list = []
    for file in file_list:
        data_type = file.split('_')[0]
        df = pd.read_csv(base_url + file + ".csv")
        df['type']  = data_type
        a_list.append(df)
    df = pd.concat(a_list) 
    
    return df

Now call the function and pass it the MoH url.<br> Well done! We have scraped Malaysia's entire national and regional Covid-19 dataset into one dataframe

In [None]:
df = fetch_mys_data(GITHUB_MOH, MOH_FILES)

# Some preprocessing steps
df.loc[df['state'].isna(), 'state'] = 'Malaysia' 
df['date'] = pd.to_datetime(df['date'])
df['date_index'] = (df['date'] - COVID_BASE_DATE).dt.days

# Configure mask for analysis.
mask = (df['state'] == region) & (df['type'] == 'cases')

Let's also download the latest population distributions from the MoH GitHub repository.

In [None]:
population_url = 'https://raw.githubusercontent.com/MoH-Malaysia/covid19-public/main/static/population.csv'
df_pop = pd.read_csv(population_url)
initial_population = df_pop[df_pop['state'] == region]['pop'][0]

## Build a model

In [None]:
start_date = datetime(2021,1,1)  # Define the start date
end_date = start_date + timedelta(days=300)  # Define the duration

# Integer representation of the start and end dates.
start_date_int = (start_date - COVID_BASE_DATE).days
end_date_int = (end_date- COVID_BASE_DATE).days

Extract the target data from the MoH dataframe.

In [None]:
target = df[mask][start_date_int: end_date_int]['cases_new']
x_range = range(start_date_int, end_date_int)  # Create a integer range from the start date to the end date

In [None]:
# Define the model
malaysia_model = CompartmentalModel(
    times=(start_date_int, end_date_int),
    compartments=["S", "E", "I", "R"],
    infectious_compartments=["I"],
    timestep=1.0,
)

malaysia_model.set_initial_population(distribution={"S": initial_population - 100, "E": 0, "I": 100})

### Time varying parameters (transition flow)
The rate at which people transition can be set as a constant, or it can be defined as a function of time. This is the case of all of the flows: every parameter can be a constant or a function of time. Parameters also take a ‘computed_values’ argument, which is a dictionary of values computed at runtime that is not specific to any individual flow.

In [None]:
def recovery_rate(time, computed_values):
    """
    Returns the recovery rate for a given time.
    People recover faster after day ten due to a magic drug.
    """
    if time < 517: # half way through our analysis
        return 0.1
    else:
        return 0.4

### Adding inter-compartmental flows 

Now, let's add some flows for people to transition between the compartments. These flows will define the dynamics of our infection. We will add:

- an infection flow from S to E (using frequency-dependent transmission)
- an exposed individual becomes infected E to I.
- a recovery flow from I to R

In [None]:
# Susceptible people can get infected.
malaysia_model.add_infection_frequency_flow(name="infection", contact_rate=0.18, source="S", dest="E")

# Expose people transition to infected.
malaysia_model.add_transition_flow(name="progression", fractional_rate=1/10, source="E", dest="I")

# Infectious people recover.
malaysia_model.add_transition_flow(name="recovery", fractional_rate=recovery_rate, source="I", dest="R")

# Importantly, we will also request an output for the 'progression' flow, and name this 'notifications'
# This will be available after a model run using the get_derived_outputs_df() method

malaysia_model.request_output_for_flow("notifications", "progression")

# Inspect the new flows, which we just added to the model.
malaysia_model._flows



### Running the model

Now we can calculate the outputs for the model over the requested time period. 
The model calculates the compartment sizes by solving a system of differential equations (defined by the flows we just added) over the requested time period.

In [None]:
# Use Runge-Kutta 4 solver to better capture sharp discontinuity.
malaysia_model.run(solver="rk4")


### Print the model outputs

The recommended way to view the model's results is via the get_outputs_df() method

In [None]:
mm_outputs_df = malaysia_model.get_outputs_df()
mm_outputs_df[['E',"I","R"]].plot(figsize=(10,5)); # We don't plot the susceptable compartment due to excessive y-axis scaling.

### Accessing derived outputs

Derived outputs are accessed in much the same way as the raw compartment outputs, via the get_derived_outputs_df() method

**Question: Which flow control contributes the most to notifications? Would you increase it or decrease it?**

In [None]:
mm_derived_df = malaysia_model.get_derived_outputs_df()
mm_derived_df.plot(figsize=(10,5));

### Plot the outputs

You can get a better idea of what is going on inside the model by visualising how the compartment sizes change over time.

**Exercise: Modify the code below to show the susceptible individuals over time.**

In [None]:
# Visualize the results.
subplot = {"title": "SEIR Model Outputs", "xlabel": "Days", "ylabel": "Compartment size"}
fig, ax = plt.subplots(1, 1, figsize=(10,5), subplot_kw=subplot)

for compartment in mm_outputs_df[['E',"I","R"]]: # Loop over each compartment. 
    ax.plot(malaysia_model.times, mm_outputs_df[compartment]) # Plot the times and compartment values

ax.legend(["E", "I", "R"]);

### Function flow

A function flow gives you more control over how a flow should work. This is when you need to include more complex behaviour in your model which cannot be expressed using the built-in flows above.


In [None]:
def get_vaccination_flow_rate(flow, comp_names, comp_vals, flows, flow_rates, derived_values, time):
    """
    Returns the flow-rate of susceptible people who get vaccinated and become recovered.

    Args:
        flow: The flow object being run
        comp_names: List of compartment names (Compartment)
        comp_vals: Array of compartment values at this timestep
        flows: List of flow objects (used to calculate flow rates)
        flow_rates: Calculated flow rate for each non-function flow at this timestep
        time: Current timestep

    Returns: The flow rate (float)
    """
    if time < 450:
        # Vaccinate 100 people per day until day 450
        return 100
    elif 450 < time < 500:
        # Vaccinate a tiny fraction of the population per day until day 500
        return 0.000001 * comp_vals.sum()
    else:
        # After day 500 stop vaccinations, because we ran out of money
        return 0


**Question: A vaccinated individual should transition from which two compartments?**

In [None]:
# Use a custom function to model vaccinations
malaysia_model.add_function_flow("vacinnation", flow_rate_func=get_vaccination_flow_rate, source="S", dest="R")

# Use Runge-Kutta 4 solver to better capture sharp discontinuity.
malaysia_model.run(solver="rk4")

In [None]:
mm_derived_df = malaysia_model.get_derived_outputs_df()
mm_derived_df.plot(figsize=(10,5));

In [None]:
# Let's allow for the fact that case detection is never complete,
# by multiplying the model outputs through by a constant value
proportion_of_cases_detected = 0.5

fig, ax = plt.subplots(1, 1, figsize=(12, 6), dpi=120)
ax.plot(x_range, target)  # Plot the MoH target values
ax.plot(malaysia_model.times, mm_derived_df["notifications"] * proportion_of_cases_detected)
ax.legend(["Notification", "Modelled"]);

## Summary

That's it for now, now you know how to:

- Define compartmental flows
- Understand the different flow types in summer
- How these relate to differential equations


A detailed API reference for the flow types can be found [here](http://summerepi.com/api/flows.html)
