# PoPS Global Model: Interactive test run
## Predicting the spread of plant pests or pathogens using trade, environment, and pest ecology open data

Notebooks 3a - 3c provide the workflow for running the PoPS Global Model. To run these notebooks, 
the following are assumed:
- Cloned the Pandemic GitHub repository (git clone 
https://github.com/ncsu-landscape-dynamics/PoPS-Global.git)
- Notebook launched from the notebook folder of the cloned repo
- Already have created the environment file (see 0_create_env_file notebook), required data 
downloaded and formatted (see 1_data_acquisition_format notebook), and set up the model 
configurations (see 2_create_model_config notebook).

## Imports

Import requisite python libraries.

In [None]:
import os
import glob
import json
import numpy as np
import pandas as pd
import geopandas
import dotenv

Navigate to the repository main directory.

In [None]:
os.chdir("../")

Import relevant PoPS Global functions.

In [None]:
from pandemic.helpers import create_trades_list
from pandemic.model_equations import pandemic_multiple_time_steps
from pandemic.output_files import (
    create_model_dirs,
    save_model_output,
    aggregate_monthly_output_to_annual,
    write_model_metadata,
    write_annual_output,
)

## Set Paths and Environment Variables
Read in variables from the .env file.


In [None]:
# Load variables and paths from .env
dotenv.load_dotenv(".env")

# Read environmental variables
input_dir = os.getenv("INPUT_PATH")
out_dir = os.getenv("OUTPUT_PATH")
countries_path = os.getenv("COUNTRIES_PATH")

## Test run the model in this notebook

After configuring your model parameters in 2, you can run the model interactively in this notebook as a 
test run. This will still generate model outputs, one run at a time.

Once you ensure that the model is functioning as expected (no errors due to missing input data, 
misformatted parameters, etc.), you can launch full calibration runs using grid search and the subsequent 
forecast using sampled parameters can from notebooks 3b and 3c. In these notebooks, multiple model runs 
are run in parallel and outputs are written directly to file. 

### Read in config parameters

In [None]:
# Read model arguments from configuration file
with open("config.json") as json_file:
    config = json.load(json_file)

sim_name = config["sim_name"]

commodity_path = config["commodity_path"]
commodity_forecast_path = config["commodity_forecast_path"]
commodity_list = config["commodity_list"]
native_countries_list = config["native_countries_list"]
season_dict = config["season_dict"]
alpha = config["alpha"]
beta = config["beta"]
mu = config["mu"]
lamda_c_list = config["lamda_c_list"]
phi = config["phi"]
w_phi = config["w_phi"]
start_year = config["start_year"]
stop_year = config["stop_year"]
random_seed = config["random_seed"]
# cols_to_drop = config["columns_to_drop"] # Not sure where and how this gets defined
time_infect_units = config["transmission_lag_unit"]
transmission_lag_type = config["transmission_lag_type"]
time_infect = config["time_to_infectivity"]
gamma_shape = config["transmission_lag_shape"]
gamma_scale = config["transmission_lag_scale"]
save_entry = config["save_entry"]
save_estab = config["save_estab"]
save_intro = config["save_intro"]
save_country_intros = config["save_country_intros"]

### Load Model Input Data

Open the countries, distance, and climate similarity files created during data acquisition.

In [None]:
# Read formatted countries geopackage, distance matrix, and climate similarities matrix
countries = geopandas.read_file(countries_path, driver="GPKG")
distances = np.load(input_dir + "/distance_matrix.npy")
climate_similarities = np.load(input_dir + "/climate_similarities_hiiMask16.npy")

Format trade data using the `create_trades_list` model function.

In [None]:
# Read & format trade data
trades_list, file_list_filtered, code_list, commodities_available = create_trades_list(
    commodity_path=commodity_path,
    commodity_forecast_path=commodity_forecast_path,
    commodity_list=commodity_list,
    start_year=start_year,
    stop_year=stop_year,
    distances=distances,
)

Get the dimensions of the trade data for the simulation from the trade files.

In [None]:
# Create list of unique dates from trade data
date_list = []
for f in file_list_filtered:
    fn = os.path.split(f)[1]
    ts = str.split(os.path.splitext(fn)[0], "_")[-1]
    date_list.append(ts)
date_list.sort()
end_sim_year = date_list[-1][:4]

# Example trade array for formatting outputs
traded = pd.read_csv(
    file_list_filtered[0], sep=",", header=0, index_col=0, encoding="latin1"
)

# Checking trade array shapes
print("Length of trades list: ", len(trades_list))
for i in range(len(trades_list)):
    print("\tcommodity array shape: ", trades_list[i].shape)

### Run the Model 

After running the above cells to import data, the following cell runs the model.

To generate multiple stochastic runs of the same model configuration with test_run, re-run just this cell. 

In [None]:
print("Number of commodities: ", len([c for c in lamda_c_list if c > 0]))
print("Number of time steps: ", trades_list[0].shape[0])
for i in range(len(trades_list)):
    if len(trades_list) > 1:
        code = code_list[i]
        print("\nRunning model for commodity: ", code)
    else:
        code = code_list[0]
        print(
            "\nRunning model for commodity: ",
            os.path.basename(commodities_available[0]),
        )
    trades = trades_list[i]
    distances = distances
    locations = countries
    prob = np.zeros(len(countries.index))
    pres_ts0 = [False] * len(prob)
    infect_ts0 = np.empty(locations.shape[0], dtype="object")
    for country in native_countries_list:
        country_index = countries.index[countries["ISO3"] == country][0]
        pres_ts0[country_index] = True
        # if time steps are monthly and time to infectivity is in years
        if len(date_list[0]) > 4:
            infect_ts0[country_index] = str(start_year) + "01"
        # else if time steps are annual and time to infectivity is in years
        else:
            infect_ts0[country_index] = str(start_year)
    locations["Presence"] = pres_ts0
    locations["Infective"] = infect_ts0

    sigma_h = (1 - countries["Host Percent Area"]).std()

    if len(climate_similarities.shape) == 1:
        sigma_kappa = np.std(1 - climate_similarities)
    else:
        iu1 = np.triu_indices(climate_similarities.shape[0], 1)
        sigma_kappa = np.std(1 - climate_similarities[iu1])

    np.random.seed(random_seed)
    lamda_c = lamda_c_list[i]

    if lamda_c > 0:
        e = pandemic_multiple_time_steps(
            trades=trades,
            distances=distances,
            locations=locations,
            climate_similarities=climate_similarities,
            alpha=alpha,
            beta=beta,
            mu=mu,
            lamda_c=lamda_c,
            phi=phi,
            sigma_h=sigma_h,
            sigma_kappa=sigma_kappa,
            w_phi=w_phi,
            start_year=start_year,
            date_list=date_list,
            season_dict=season_dict,
            transmission_lag_type=transmission_lag_type,
            # time_infect_units=time_infect_units,
            time_infect=time_infect,
            gamma_shape=gamma_shape,
            gamma_scale=gamma_scale,
            # scenario_list=scenario_list
        )

        run_prefix = f"{sim_name}_{code}"

        try:
            run_num = (
                int(
                    os.path.basename(
                        os.path.normpath(
                            glob.glob(f"{out_dir}/{sim_name}/{run_prefix}/run*/")[0]
                        )
                    ).split("_")[1]
                )
                + 1
            )
        except IndexError:
            run_num = 0

        print(f"Starting run {run_num}...")

        arr_dict = {
            "prob_entry": "probability_of_entry",
            "prob_intro": "probability_of_introduction",
            "prob_est": "probability_of_establishment",
            "country_introduction": "country_introduction",
        }

        outpath = out_dir + f"/{sim_name}/{run_prefix}/run_{run_num}/"
        create_model_dirs(
            outpath=outpath,
            output_dict=arr_dict,
            write_entry_probs=save_entry,
            write_estab_probs=save_estab,
            write_intro_probs=save_intro,
            write_country_intros=save_country_intros,
        )
        print("saving model outputs: ", outpath)
        full_out_df = save_model_output(
            model_output_object=e,
            example_trade_matrix=traded,
            outpath=outpath,
            date_list=date_list,
            write_entry_probs=save_entry,
            write_estab_probs=save_estab,
            write_intro_probs=save_intro,
            write_country_intros=save_country_intros,
            columns_to_drop=None,
        )

        # If time steps are monthly, aggregate predictions to
        # annual for dashboard display
        if len(date_list[i]) > 4:
            print("aggregating monthly predictions to annual time steps...")
            aggregate_monthly_output_to_annual(
                formatted_geojson=full_out_df, outpath=outpath
            )

        # If time steps are annual, export the predictions
        if len(date_list[i]) == 4:
            print("exporting annual predictions...")
            write_annual_output(formatted_geojson=full_out_df, outpath=outpath)

        # Save model metadata to text file
        print("writing model metadata...")
        write_model_metadata(
            main_model_output=e[0],
            alpha=alpha,
            beta=beta,
            mu=mu,
            lamda_c_list=lamda_c_list,
            phi=phi,
            sigma_h=sigma_h,
            sigma_kappa=sigma_kappa,
            w_phi=w_phi,
            start_year=start_year,
            end_sim_year=end_sim_year,
            transmission_lag_type=transmission_lag_type,
            time_infect_units=time_infect_units,
            gamma_shape=gamma_shape,
            gamma_scale=gamma_scale,
            random_seed=random_seed,
            time_infect=time_infect,
            native_countries_list=native_countries_list,
            countries_path=countries_path,
            commodities_available=commodities_available[i],
            commodity_forecast_path=commodity_forecast_path,
            phyto_weights=list(locations["Phytosanitary Capacity"].unique()),
            outpath=outpath,
            run_num=run_num,
        )

    else:
        print("\tskipping as pest is not transported with this commodity")

## Next: Calibration (Notebook 3b)

After you've ensured that the model runs with no errors and the outputs are what you would expect, 
continue to notebook 3b to run and evaluate a full parameter grid search to calibrate the model.