# PoPS Global Model: Calibration 
Use this notebook to run and evaluate a parameter grid-search. This calibration process will use the parameter ranges defined in notebook 2 to generate a grid of parameter samples. The results of multiple stochastic model runs from each parameter set are then evaluated, and top performing sets are sampled in notebook 3c to generate a forecast. 

This notebook can be run after 0, 1, and 2. We recommend also running 3a first, to check for and troubleshoot issues.

In [None]:
import os
import dotenv
import json

import pandas as pd
import numpy as np
import itertools

import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings(action='once')

## Set up workspace from env and configuration files 

Navigate to main repository.

In [None]:
# Navigate one level up to the main repository
os.chdir("..")

Import needed PoPS Global functions.

In [None]:
from pandemic.multirun_helpers import write_commands

Read in path variables from .env.

In [None]:
# Read environmental variables
env_file = os.path.join(".env")
dotenv.load_dotenv(env_file)

input_dir = os.getenv("INPUT_PATH")
out_dir = os.getenv("OUTPUT_PATH")
sim_name = os.getenv("SIM_NAME")

Read in parameter variables from config.json.

In [None]:
config_json_path = f"{out_dir}/config_{sim_name}.json"

with open(config_json_path) as json_file:
    config = json.load(json_file)

# Read in parameters to calibrate

alphas = config["alphas"]
betas = config["betas"]
lamdas = config["lamdas"]
start_years = config["start_years"]

start_run = config["start_run"]
end_run = config["end_run"]

validation_method = config["validation_method"]

## Run Model in parallel

Run the model over a range of parameter values to perform the calibration grid search. The model runs are 
sent using command line and are run in parallel. The results will print out after each command is run. 

The below commands create a list of all possible parameter combinations, and write out each as a command. 

In [None]:
# Create all parameter combinations

param_list = [alphas, betas, lamdas, start_years]
param_sets = list(itertools.product(*param_list))

# Write out commands

commands_calibrate = ""

for set in param_sets:
    commands_calibrate += write_commands(
        set, start_run=start_run, end_run=end_run, run_type="calibrate"
    )

If you will run on HPC or later, write these to file (un-comment the below code).

In [None]:
# f1 = open(input_dir + "/commands_calibrate.txt", 'w')
# f1.write(commands_calibrate)
# f1.close()

Run the cell below to execute all model runs. These must complete before you can calculate the summary 
statistics. If you have set a wide range of parameter sets or have indicated many model runs, this may 
take some time (approximately 2 - 5 minutes per run per core, depending on your computer and number of 
time-steps in your simulation), so prepare accordingly!

In [None]:
# Run the model from script

for command in commands_calibrate.split('\n'):
    print(f"Running command: {command}")
    ! {command}


These runs will write out to "outputs/{sim_name}_calibrate/". 

Calculate summary statistics on completed runs. This is also run in parallel, so time will vary depending
on how many cores you use. If you have many runs or many parameter samples, this may take some time as 
well (approximately 1 hour per 50,000 runs included, when run on 8 cores).

In [None]:
# Calculate summary statistics
# Note: The summary stats  may generate a "warning" from the pandas library. This should not cause any errors.

! python pandemic/get_stats.py calibrate


## Evaluate grid performance
F-beta is the primary metric used to evaluate model run performance. The below visualizations help 
evaluate parameter sample convergence and performance according to this metric.

In [None]:
stats_dir = f"{out_dir}/summary_stats/{sim_name}_calibrate"

validation_df = pd.read_csv(
    input_dir + "/first_records_validation.csv", header=0, index_col=0,
)

col_dict = {
    "start_max": "start",
    "alpha_max": "alpha",
    "beta_max": "beta",
    "lamda_max": "lamda",
}

agg_df = pd.read_csv(f"{stats_dir}/summary_stats_bySample.csv").rename(columns=col_dict)
stats = pd.read_csv(
    f"{stats_dir}/summary_stats_wPrecisionRecallF1FBetaAggProb.csv")

In [None]:
# Create folder to save calibration figures

fig_dir = f"{stats_dir}/figs/calibration/"

if not os.path.exists(fig_dir):
    os.makedirs(fig_dir)

### Data visualization 

- Assess run convergence: Evaluate the top performing parameter samples to see if the F-beta score has 
converged. If it hasn't converged, you may need to conduct more runs. You can include additional runs 
by updating the start_run (to continue after previous end_run), end_run, and run_count. 
- Assess parameter set performance: Visualize F-beta across values for alpha, lamda, beta, and start 
year. Highest performing values should generally cluster towards a limited range for each parameter. 
If the highest performing values are at an extreme of a parameter range (e.g. at the highest end of 
lamda, at the earliest year), you may need to expand your grid search. 

#### Assess run convergence
Do the individual lines (parameter samples) sufficiently converge with respect to F-beta over the 
number of runs conducted, or are more runs needed?

In [None]:
# Top N parameter samples

n_samples = 5

top_runs = agg_df.sort_values("fbeta_mean", ascending=False).head(n_samples)
fbeta_range = [min(top_runs.fbeta_mean) * 0.75, max(top_runs.fbeta_mean) * 1.25]

Calculate the performance of each sample with each additional run.

In [None]:
samples = list(
    agg_df.sort_values("fbeta_mean", ascending=False).head(n_samples).reset_index()["sample"]
)
runs = list(range(1, stats["run_num"].max()+2))

samples_df = pd.DataFrame({"runs": runs})
i = 1
for sample in samples:
    sample_fbeta = []
    stdev = []
    sterr = []
    for run in runs:
        filtered_stats = stats.loc[
            (stats["run_num"] <= run - 1) & (stats["sample"] == sample)
        ]
        value = filtered_stats["fbeta"].mean()
        sdev = filtered_stats["fbeta"].std()
        # this gives the standard deviation of the sample - mean
        sample_fbeta.append(value)
        stdev.append(sdev)
        sterr.append(np.std(sample_fbeta))
        # this gives the standard error of the mean
    samples_df[f"sample {i}"] = sample_fbeta
    samples_df[f"stdev {i}"] = stdev
    samples_df[f"sterr {i}"] = sterr
    i += 1

samples_df.set_index("runs", inplace=True)
samples_df["all samples"] = samples_df.mean(axis=1)

Plot to visualize.

In [None]:
plt.style.use("ggplot")

ax = samples_df.loc[:, "sample 1":f"sample {n_samples}":3].plot(
    ylim=fbeta_range,
    cmap="mako",
    ylabel="fbeta",
    title=f"Mean fbeta convergence \n for the top {n_samples} parameter samples",
    legend=False,
)
for i in range(1, len(samples)):
    ax.fill_between(
        samples_df.index,
        samples_df[f"sample {i}"] + samples_df[f"sterr {i}"],
        samples_df[f"sample {i}"] - samples_df[f"sterr {i}"],
        color="#366da0",
        alpha=0.15,
    )
ax.set_xlabel("# of Runs", fontsize=16)
ax.set_ylabel("Fbeta mean", fontsize=16)
ax.tick_params(labelsize=13)
plt.savefig(f"{fig_dir}/run_convergence.png")
plt.show()

#### Assessing parameter set performance 
Once you have estalished that the sample F-beta has converged, you can assess the overall performance 
of different parameter sets to identify the highest performing sets. These visuals an also help you 
identify issues in your parameter sampling range. For instance, if your highest performing parameter 
samples all have the maximum lamda value in your sample range, you may need to increase the upper 
bounds of the values for that parameter.

If leave-one-out cross validation is used, this will produce a visual per omitted validation location.

In [None]:
sns.set_context(font_scale=5)

In [None]:
# Select all columns with an Fbeta mean

fbeta_cols = [col for col in agg_df.columns if "fbeta" in col and "mean" in col] 

Sample performance (F-beta) by alpha value:

In [None]:
for fbeta_col in fbeta_cols:

    ax = sns.stripplot(
        x="alpha",
        y=fbeta_col,
        hue="start",
        palette="mako",
        linewidth=0.2,
        data=agg_df,
        jitter=0.4,
        alpha=0.8,
    )
    handles, labels = ax.get_legend_handles_labels()
    ax.legend(
        handles[::-1],
        labels[::-1],
        bbox_to_anchor=(1.25, 1),
        loc="upper right",
        borderaxespad=0,
        title="start year",
    )
    ax.set(ylim=(0, 1))
    ax.axes.set_title(f"Fbeta {' '.join(fbeta_col.split('_')[1:])}, by Alpha Value\n (Color = Year)", fontsize=16)
    ax.set_xlabel("Alpha", fontsize=16)
    ax.set_ylabel(f"Fbeta {' '.join(fbeta_col.split('_')[1:])}", fontsize=16)
    ax.tick_params(labelsize=13)
    plt.setp(ax.get_legend().get_texts(), fontsize="15")  # for legend text
    plt.setp(ax.get_legend().get_title(), fontsize="15")  # for legend title
    plt.savefig(f"{fig_dir}/{fbeta_col}_alpha.png", bbox_inches="tight")
    plt.show()

Sample performance by lamda value: 

In [None]:
for fbeta_col in fbeta_cols:

    ax = sns.scatterplot(
        x="lamda",
        y=fbeta_col,
        hue="start",
        data=agg_df,
        palette="mako",
        edgecolor="black",
        linewidth=0.2,
        legend="full",
        alpha=0.8
    )
    ax.set(ylim=(0, 1))
    handles, labels = ax.get_legend_handles_labels()
    ax.legend(
        handles[::-1],
        labels[::-1],
        bbox_to_anchor=(1.25, 1),
        loc="upper right",
        borderaxespad=0,
        title="start year",
    )
    ax.axes.set_title(f"Fbeta {' '.join(fbeta_col.split('_')[1:])}, by Lambda Value\n (Color = Year)", fontsize=16)
    ax.set_xlabel("Lambda", fontsize=16)
    ax.set_ylabel(f"Fbeta {' '.join(fbeta_col.split('_')[1:])}", fontsize=16)
    ax.tick_params(labelsize=13)
    plt.setp(ax.get_legend().get_texts(), fontsize="15")  # for legend text
    plt.setp(ax.get_legend().get_title(), fontsize="15")  # for legend title
    plt.savefig(f"{fig_dir}/{fbeta_col}_lambda.png", bbox_inches="tight")
    plt.show()

Sample performance by beta value:

In [None]:
for fbeta_col in fbeta_cols:

    ax = sns.scatterplot(
        x="beta",
        y=fbeta_col,
        hue="start",
        data=agg_df,
        palette="mako",
        edgecolor="black",
        linewidth=0.2,
        legend="full",
        alpha=0.8
    )
    ax.set(ylim=(0, 1))
    handles, labels = ax.get_legend_handles_labels()
    ax.legend(
        handles[::-1],
        labels[::-1],
        bbox_to_anchor=(1.25, 1),
        loc="upper right",
        borderaxespad=0,
        title="start year",
    )
    ax.axes.set_title(f"Fbeta {' '.join(fbeta_col.split('_')[1:])}, by Beta Value\n (Color = Year)", fontsize=16)
    ax.set_xlabel("Lambda", fontsize=16)
    ax.set_ylabel(f"Fbeta {' '.join(fbeta_col.split('_')[1:])}", fontsize=16)
    ax.tick_params(labelsize=13)
    plt.setp(ax.get_legend().get_texts(), fontsize="15")  # for legend text
    plt.setp(ax.get_legend().get_title(), fontsize="15")  # for legend title
    plt.savefig(f"{fig_dir}/{fbeta_col}_beta.png", bbox_inches="tight")
    plt.show()

Sample performance by start year:

In [None]:
for fbeta_col in fbeta_cols:

    ax = sns.stripplot(
        x="start",
        y=fbeta_col,
        hue="alpha",
        palette="mako",
        linewidth=0.2,
        data=agg_df,
        jitter=0.3,
        alpha=0.8,
    )
    handles, labels = ax.get_legend_handles_labels()
    ax.legend(
        handles[::-1],
        labels[::-1],
        bbox_to_anchor=(1.25, 1),
        loc="upper right",
        borderaxespad=0,
        title="alpha",
    )
    ax.set(ylim=(0, 1))
    ax.axes.set_title(f"Fbeta {' '.join(fbeta_col.split('_')[1:])}, by Start Year\n (Color = Alpha)", fontsize=16)
    ax.set_xlabel("Start year", fontsize=16)
    ax.set_ylabel(f"Fbeta {' '.join(fbeta_col.split('_')[1:])}", fontsize=16)
    plt.setp(ax.get_legend().get_texts(), fontsize="15")  # for legend text
    plt.setp(ax.get_legend().get_title(), fontsize="15")  # for legend title
    plt.savefig(f"{fig_dir}/{fbeta_col}_start.png", bbox_inches="tight")
    plt.show()

## Next: Model run - Forecast

If you are satisfied with the convergence and the parameter performance in the range of your grid, 
you can use these values to fit a parameter distribution that will be sampled from to conduct 
the forecast. 