# Generate a Configuration File to Run the Pandemic Model

This notebook creates a configuration file to run a particular model scenario. Some parameters will be fixed across your case study. Others (calibrated parameters: alpha, lamda, beta, start_year) can be set as a range to be calibrated and later sampled to run the complete forecasts.

## Imports

In [None]:
import os
import json
import dotenv

## Load Environment Variables and Set Paths

In [None]:
# Navigate one level up to the main repository
os.chdir('..')

In [None]:
# Read environmental variables
env_file = os.path.join('.env') 
dotenv.load_dotenv(env_file)

input_dir = os.getenv('INPUT_PATH')
out_dir = os.getenv('OUTPUT_PATH')

In [None]:
config_json_path = "config.json"

if os.path.isfile(config_json_path):
    with open(config_json_path) as file:
        prev_config = json.load(file)

## Set Model Parameters
Depending on how you will be running the model, arguments will be provided in different ways.

- **To run the model once** (e.g. to conduct a test run): Set the single sample test-run values (below), and run the model in notebook 3a (interactive). We recommend running the model interactively first to test that your input data and model configuration is correct and working.
- **To calibrate the model over a range of possible parameter values** (i.e. to fit key model parameters to the data): Set range of calibrated parameters (below), and run the model in notebook 3b.  
- **To provide a sample of parameter sets**: After conducting the grid search and evaluating the results (3b), use (3c) to generate sampled parameter sets and run a forecast of the model. 

## Define Argument Values

In [None]:
args = {}

### Name your configuration scenario

In [None]:
# What should this model scenario be called
short_name = 'slf'

# What additional description defines this scenario
# e.g., specific parameter/value of interest + commodity range
add_descript = 'ensemble_rerun'

args["sim_name"] = f"{short_name}_{add_descript}"

### Commodity Data

In [None]:
# Which temporal resolution, commodities or aggregation should be used
# for historical trade data:

if prev_config:
    timestep = prev_config["timestep"]
    trade_type = prev_config["trade_type"]
    commodity_list = prev_config["commodity_list"]

    print(f"Timestep: {timestep}, Trade Type: {trade_type}, Commodity list: {commodity_list}")

# Pulled from Data Aquisition. Option to modify below (NOT recommended)

# timestep = "monthly" # "monthly" or "annual"
# trade_type = "adjusted" # "adjusted" (for individual commodities) or "agg"
# commodity_list = ["6802-6803"]

In [None]:
# Create paths
commodity_data_path = os.path.join(input_dir, f"comtrade/{timestep}_{trade_type}/")

# If no forecast is desired, set argument to None.
commodity_forecast_path = os.path.join(input_dir + f"comtrade/trade_forecast/{timestep}_{trade_type}/")

# Write to args 
args["timestep"] = timestep # "monthly" or "annual"
args["trade_type"] = trade_type # "adjusted" (for individual commodities) or "agg"
args["commodity_list"] = commodity_list

args["commodity_path"] = commodity_data_path
args["commodity_forecast_path"] = commodity_forecast_path

In [None]:
# Option to include the path to a lamda weights .csv

args["lamda_weights_path"] = None

### Pest Native to Which Countries?

In [None]:
# Countries where the pest is native or present at first time step of the model run

if prev_config:
    args["native_countries_list"] = prev_config["native_countries_list"]
    print(f'Native countries list: {args["native_countries_list"]}')

# Pulled from Data Aquisition. Option to modify below (NOT recommended)

# args["native_countries_list"] = ["China", "Viet Nam"]

### During which months can the pest be present in the shipment?

In [None]:
# List of months when pest can be transported
args["season_dict"] = {
    "NH_season": ["09", "10", "11", "12", "01", "02", "03", "04"],
    "SH_season": ["04", "05", "06", "07", "08", "09", "10"],
}

### Model Parameter Values

In [None]:
# Define test-run parameter values (single sample)

args["alpha"] = 0.7
args["beta"] = 0.5
args["lamda_c_list"] = [0.8]  # list length matches number of commodities
args["start_year"] = 2005


In [None]:
# Define grid-search range of calibrated parameter values

args["alphas"] = [0.7]
args["betas"] = [0.5]
args["lamdas"] = [0.8]  # list length matches number of commodities
# grid-search is currently configured to only handle a single lambda value/single commodity
# I'm not sure how we should handle the calibration stats if there are multiple parameters - concat the introductions if all
# params (except... lambda? not sure how we will ID the pairs with how it is currently formulated - need to change
# the write-out to include both sets) + run number.... hmmmmmmm I guess that would work, and is actually a best
# case scenario because calibration is only 2x instead of len(lambda)

args["start_years"] = [2005]

args["run_count"] = 80 # How many runs to complete for each parameter set
args["start_run"] = 0 
args["end_run"] = 79 


In [None]:
# Define static parameter values 
# (Note to self - add definitions in comments)

args["mu"] = 0.0
args["phi"] = 1
args["w_phi"] = 1
args["sigma_epsilon"] = 0.5
args["sigma_phi"] = 1
args["start_year"] = 2005
args["stop_year"] = 2019

# Set random seed (optional)

args["random_seed"] = None

In [None]:
# Define transmission lag values

args["transmission_lag_unit"] = "year"
args["time_to_infectivity"] = None

args["transmission_lag_type"] = "stochastic"
args["gamma_shape"] = 4
args["gamma_scale"] = 1


In [None]:
# Save n x n matrices for each time step where n is the number of countries, 
# and values represent the origin-destination probability of entry or 
# probability of establishment 

args["save_entry"] = False
args["save_estab"] = False
args["save_intro"] = False
args["save_country_intros"] = False

### Define scenarios (optional)

In [None]:
# scenario_list = []

# for i in range(2010, 2030):
#     start_scenario = [2010, 'CHN', 'USA', 'decrease', 1]
#     new_scenario = start_scenario
#     new_scenario[0] = i
#     scenario_list .append(new_scenario)
    
# for i in range(2014, 2030):
#     start_scenario = [2014, 'JPN', 'USA', 'decrease', 0.8]
#     new_scenario = start_scenario
#     new_scenario[0] = i
#     scenario_list .append(new_scenario)
    
# for i in range(2014, 2030):
#     start_scenario = [2014, 'KOR', 'USA', 'decrease', 0.8]
#     new_scenario = start_scenario
#     new_scenario[0] = i
#     scenario_list .append(new_scenario)
    
# for i in range(2020, 2030):
#     start_scenario = [2020, 'ITA', 'USA', 'decrease', 0.8]
#     new_scenario = start_scenario
#     new_scenario[0] = i
#     scenario_list .append(new_scenario)
    
# for i in range(2020, 2030):
#     start_scenario = [2020, 'TUR', 'USA', 'decrease', 0.8]
#     new_scenario = start_scenario
#     new_scenario[0] = i
#     scenario_list .append(new_scenario)

## Configure summary statistics

In [None]:
# What is your primary country of interest? (format = ISO3 code)
args["coi"] = "USA"

# Calculate cumulative probability of intro to COI for the following years:

args["sim_years"] = [2014, 2020]

# End-valid year: At what year do you want to stop evaluating the summary statistics? 
# E.g. often, present year or year near the last known introduction

args["end_valid_year"] = 2019

# How many years BEFORE and AFTER the first record do you consider a presence to be accurate? 

# BEFORE: e.g. Model predicts 2013 introduction, but first documented record was 2015. This may still be 
# considered accurate if you (1) consider that first documented records may have a detection lag 
# from the true first introduction date and (2) if you prefer to favor risk-conservative forecasts 
# (avoid penalizing reasonably early predictions that are useful for preventative measures)

args["years_before_firstRecord"] = 4

# AFTER: e.g. Model predicts 2016 introduction, but first documented record was 2015. This may be considered
# accurate if you are less concerned with capturing the exact temporal window, but could be considered
# not accurate if you are mainly concerned with preventative measures (set to 0).

args["years_after_firstRecord"] = 0

# Cores to use for parallel processing

In [None]:
args["cores"] = 4

## Write and save configuration file

In [None]:
if os.path.isfile(config_json_path):
    with open(config_json_path) as file:
        prev_config = json.load(file)

    prev_config.update(args)

    with open(config_json_path, mode='w') as f:
        f.write(json.dumps(prev_config, indent=4))

else: 
    with open(config_json_path, "w") as file:
        json.dump(args, file, indent=4)

print("\tSaved ", config_json_path)

## Next: Run model (interactive)

Run the model first interactively to test that your data is complete, parameters are sensible, and the model is configured to run. Then, go on to calibrate and foreast with the model. 