# Capacity Building
## Prerequisites
Some basic understanding of Python variables, data types, looping, conditionals and functions is required.
## Data inputs
### Import modules/packages

Let's import some modules/packages. A module or package is pre-built Python code that we can leverage to provide useful functionality.<br> Generally, it will have the form of *module_name.function*

In [None]:
%pip install summerepi

In [None]:
import pandas as pd # pd is an alias for pandas. This is similar to dataframes in R.
import matplotlib.pyplot as plt # matplotlib is the defacto visualisation package for python.
from datetime import datetime, timedelta # We also use datetime to manipulate date-time indexes.

plt.style.use('ggplot') # This sets the style of the plots. 


Try: There's a function inside plt.style that will show the styles. Change the plotting style to something you like.

### Define constants
Defining and capitalising constants is recommended at the start of a Python script.

In [None]:
# URL to the Ministry of health's GitHub repository.
# What is the data type here, a tuple or string? Do you know how to check for the type?
GITHUB_MOH = (
    "https://raw.githubusercontent.com/MoH-Malaysia/covid19-public/main/epidemic/"
)

# A list containing the files to download.
FILES = [
    "cases_malaysia",
    "deaths_malaysia",
    "hospital",
    "icu",
    "cases_state",
    "deaths_state",
]

# By defining a region parameter, we can easily change the analysis later.
REGION = "Malaysia"

# We define a day zero for the analysis.
COVID_BASE_DATE = datetime(2019, 12, 31)


### Utility functions

In [None]:
def fetch_mys_data(a_url:str)->pd.DataFrame:
    """Request files from MoH and combine them into one data frame.
    Args:
        a_url (str): A url to fetch data from.

    Returns:
        pd.DataFrame: A data frame containing all the files.
    """
    a_list = [] # A empty list to hold each data frame. (a list can hold any python object)
    for file in FILES: # Loop over each file name.
        data_type = file.split('_')[0] # Split the file name on '_' and take the first part.
        df = pd.read_csv(a_url + file + ".csv") # Build the full url path to the file and ask pandas to download it. 
        df['type']  = data_type # Create a new column 'type' and enter the data_type.

        a_list.append(df) # Place this dataframe into the list. 

    # We have looped over all the files, downloaded and entered it into a list of shape [df1,df2,df3,...].
    
    # Pandas will automatically combine this list into a single dataframe. It will expand the rows and columns as necessary.
    df = pd.concat(a_list) 
    
    return df # The function returns the dataframe.

Now call the function and pass it the MoH url.<br> Well done! We have scraped Malaysia's entire national and regional Covid-19 dataset into one dataframe

In [None]:
df = fetch_mys_data(GITHUB_MOH)
df

In [None]:
df.columns

In [None]:
df['state']

We need to do some housekeeping.
- Fill the missing state values with 'Malaysia'
- Ensure the date type is correct and not a string '10-06-2022'
- Create an integer offset from COVID_BASE_DATE. 

In [None]:
df.loc[df['state'].isna(), 'state'] = 'Malaysia' 
df['date'] = pd.to_datetime(df['date'])
df['date_index'] = (df['date'] - COVID_BASE_DATE).dt.days

Let's create a boolean mask to aid with our analysis. Recall the 'REGION' variable we set at the beginning and the type column we created while downloading the data.<br>

We define a FILTER. In this example, it's for Malaysia's cases. By changing the 'REGION' variable and or type column, we can change the focus of the analysis.

In [None]:
FILTER = (df['state'] == REGION) & (df['type'] == 'cases')

In [None]:
df[FILTER][['date', 'cases_new','deaths_new']] # Notice how the death data is NaN due to the filtering.

After all that work, let's look at the results.<br />
Pandas has a .plot() function. Here is a [quick](https://pandas.pydata.org/docs/getting_started/intro_tutorials/04_plotting.html) or [detailed](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html?highlight=plot) tutorial.<br />
We can also use `x='date_index` and change the `y` to any `case_` column.

In [None]:
df[FILTER].plot(x='date_index', y='cases_new', figsize=(20, 10));

Let's also download the latest population distributions from the MoH GitHub repository.

In [None]:
POPULATION = 'https://raw.githubusercontent.com/MoH-Malaysia/covid19-public/main/static/population.csv'
df_pop = pd.read_csv(POPULATION)

In [None]:
df_pop

In [None]:
initial_population = df_pop[df_pop['state'] == REGION]['pop'][0]

## Basic model introduction

This page introduces the processes for building and running a simple compartmental disease model with Summer.
In the following example, we will create an SIR compartmental model for a general, unspecified emerging infectious disease spreading through a fully susceptible population. In this model there will be:

- three compartments: susceptible (S), exposed(E), infected (I) and recovered (R)
- a starting population of the REGION, with 100 of them infected (and infectious)
- an evaluation timespan from day zero to END_DATE in 0.1 day steps
- inter-compartmental flows for infection, deaths and recovery

First, let's look at a complete example of this model in action, and then examine the details of each step. This is the complete example model that we will be working with:

In [None]:
import numpy as np
from summer import CompartmentalModel

START_DATE = datetime(2021,1,1) # Define the start date.
END_DATE = START_DATE + timedelta(days=300) # Define the duration.

# Integer representation of the start and end dates.
START_DATE_INT = (START_DATE- COVID_BASE_DATE).days
END_DATE_INT = (END_DATE- COVID_BASE_DATE).days

In [None]:
# Define the model compartments and time step.
model = CompartmentalModel(
    times=(START_DATE_INT, END_DATE_INT),
    compartments=["S", "E", "I", "R"],
    infectious_compartments=["I"],
    timestep=0.1,
)

In [None]:
# Define the initial population and compartmental flows.
model.set_initial_population(distribution={"S": 100000, "E": 0, "I": 100})
model.add_infection_frequency_flow(name="exposure", contact_rate=0.12, source="S", dest="E")
model.add_transition_flow(name="infection", fractional_rate=1/15, source="E", dest="I")
model.add_transition_flow(name="recovery", fractional_rate=0.04, source="I", dest="R")
#model.add_death_flow(name="infection_death", death_rate=0.05, source="I")

# Run the model
model.run()


Our `model` object has many `model.functions()` attached to it. You are encouraged to explore these functions as this object is integral to the platform.

In [None]:
output_df = model.get_outputs_df()

We now have a Pandas dataframe of compartments sizes at each time step.

In [None]:
output_df.head(20)

Extract the target data from the MoH dataframe.

In [None]:
target = df[FILTER][START_DATE_INT:END_DATE_INT]['cases_new']
xrange = range(START_DATE_INT,END_DATE_INT) # Create a integer range from the start date to the end date.

Useful Matplotlib [guide](https://matplotlib.org/stable/tutorials/introductory/usage.html#sphx-glr-tutorials-introductory-usage-py)

In [None]:
# Visualize the results.
subplot = {"title": "SIR Model Outputs", "xlabel": "Days", "ylabel": "Compartment size"} # A dictionary of key:values pairs that matplotlib will use to label items.
fig, ax = plt.subplots(1, 1, figsize=(12, 6), dpi=120, subplot_kw=subplot) # Create a subplot object.

for compartment in output_df: # Loop over each compartment. 
    ax.plot(model.times, output_df[compartment]) # Plot the times and compartment values

ax.plot(xrange, target) # Also plot the MoH target values.

ax.legend(["S", "E","I", "R","Cases"])
plt.show();


Now let's inspect each step of the example in more detail. To start, here's how to create a new model: let's import the summer library and create a new [CompartmentalModel](/api/model.html) object. You can see that our model has an attribute called `compartments`, which contains a description of each modelled compartment.

In [None]:
# Define the model
model = CompartmentalModel(
    times=(START_DATE_INT, END_DATE_INT),
    compartments=["S", "E", "I", "R"],
    infectious_compartments=["I"],
    timestep=0.1,
)

### Adding a population 

Initially the model compartments are all empty. Let's add:

- 100000 people to the susceptible (S) compartment, plus
- 100 in the infectious (I) compartment.

In [None]:
# Add people to the model
model.set_initial_population(distribution={"S": 100000, "E": 0, "I": 100})

# View the initial population
model.initial_population

### Adding inter-compartmental flows 

Now, let's add some flows for people to transition between the compartments. These flows will define the dynamics of our infection. We will add:

- an infection flow from S to E (using frequency-dependent transmission)
- an exposed individual becomes infected E to I.
- a recovery flow from I to R

In [None]:
# Susceptible people can get infected.
model.add_infection_frequency_flow(name="exposure", contact_rate=0.12, source="S", dest="E")

# Expose people transition to infected.
model.add_transition_flow(name="infection", fractional_rate=1/15, source="E", dest="I")

# Infectious people recover.
model.add_transition_flow(name="recovery", fractional_rate=0.04, source="I", dest="R")

# Inspect the new flows, which we just added to the model.
model._flows

### Running the model

Now we can calculate the outputs for the model over the requested time period. 
The model calculates the compartment sizes by solving a system of differential equations (defined by the flows we just added) over the requested time period.

In [None]:
model.run()

### Print the model outputs

The model's results are available in a NumPy array named `model.outputs`. 
This array is available after the model has been run. Let's have a look at what's inside:

In [None]:
# Force NumPy to format the output array nicely. 
import numpy as np
np.set_printoptions(formatter={'all': lambda f: f"{f:0.2f}"})

# View the first 25 timesteps of the output array.
model.outputs[:25]

### Plot the outputs

You can get a better idea of what is going on inside the model by visualising how the compartment sizes change over time.

In [None]:
# Visualize the results.
subplot = {"title": "SIR Model Outputs", "xlabel": "Days", "ylabel": "Compartment size"}
fig, ax = plt.subplots(1, 1, figsize=(12, 6), dpi=120, subplot_kw=subplot)

for compartment in output_df: # Loop over each compartment. 
    ax.plot(model.times, output_df[compartment]) # Plot the times and compartment values

ax.plot(xrange, target) # Also plot the MoH target values.

ax.legend(["S", "E","I", "R","Cases"])
plt.show();

## Summary

That's it for now, now you know how to:

- Create a model
- Add a population
- Add flows
- Run the model
- Access and visualise the outputs

A detailed API reference for the CompartmentalModel class can be found [here](http://summerepi.com/api/model.html)

## Bonus: how the model works inside

This section presents a code snippet that shows an approximation of what is happening inside the model we just built and ran.

In the example code below we use the [Euler method](https://en.wikipedia.org/wiki/Euler_method) to solve an ordinary differential equation (ODE) which is defined by the model's flows. We don't actually use Euler in Summer though, you can read more about the actual ODE solvers available to evaluate models [here](http://summerepi.com/examples/4-ode-solvers.html).

In [None]:
import numpy as np
import matplotlib.pyplot as plt

TIMESTEP = 0.1
START_TIME = 0
END_TIME = 20

# Get times
time_period = END_TIME - START_TIME + 1
num_steps = time_period / TIMESTEP
times = np.linspace(START_TIME, END_TIME, num=int(num_steps))

# Define initial conditions
initial_conditions = np.array([990.0, 10.0, 0.0])  # S, I, R

# Define outputs
outputs = np.zeros((int(num_steps), 3))
outputs[0] = initial_conditions

# Model parameters
contact_rate = 1.0
sojourn_time = 3.0
death_rate = 0.05

# Calculate outputs for each timestep
for t_idx, t in enumerate(times):
    if t_idx == 0:
        continue

    flow_rates = np.zeros(3)
    compartment_sizes = outputs[t_idx - 1 ]

    # Susceptible people can get infected (frequency-dependent).
    num_sus = compartment_sizes[0]
    num_inf = compartment_sizes[1]
    num_pop = compartment_sizes.sum()
    force_of_infection = contact_rate * num_inf / num_pop
    infection_flow_rate = force_of_infection * num_sus
    flow_rates[0] -= infection_flow_rate
    flow_rates[1] += infection_flow_rate

    # Infectious take some time to recover.
    num_inf = compartment_sizes[1]
    recovery_flow_rate = num_inf / sojourn_time
    flow_rates[1] -= recovery_flow_rate
    flow_rates[2] += recovery_flow_rate
    
    # Add an infection-specific death flow to the I compartment.
    num_inf = compartment_sizes[1]
    recovery_flow_rate = num_inf * death_rate
    flow_rates[1] -= recovery_flow_rate
    
    # Calculate compartment sizes at next timestep given flowrates.
    outputs[t_idx] = compartment_sizes + flow_rates * TIMESTEP  
    
# Plot the results as a function of time for S, I, R respectively.
fig, ax = plt.subplots(1, 1, figsize=(12, 6), dpi=120)

# Add each compartment to the plot.
for i in range(outputs.shape[1]):
    ax.plot(times, outputs.T[i])

ax.set_title("SIR Model Outputs")
ax.set_xlabel("Days")
ax.set_ylabel("Compartment size")
ax.legend(["S", "I", "R"])
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start + 1, end, 5))
plt.show();