# Ensemble Challenge
Goal: to capture the complexity and nuances around the evolution of the pandemic at various stages and locations.

## Consider the following settings:
1. *Timepoint 1*: May 1st, 2020. Setting: Michigan State at the beginning of the pandemic when masking was the main preventative measure. No vaccines available.
2. *Timepoint 2*: May 1st, 2021. Setting: Michigan State prior to the arrival of the Delta variant. Vaccines available.
3. *Timepoint 3*: December 15th, 2021. Setting: Michigan State during the start of the first Omicron wave.

4. *BONUS*: Consider the same three time points, but change the setting to Louisiana, which had different COVID-19 dynamics compared to the Northern and Northeastern states.

## ...and related questions for each:
1. What is the most relevant data to use for model calibration?
2. What was our understanding of COVID-19 viral mechanisms at the time? For example, early in the pandemic, we didn't know if reinfection was a common occurance, or even possible.
3. What are the parameters related to contagiousness/transmissibility and severity of the dominant strain at the time?
4. What policies were in place for a stated location, and how can this information be incorporated into models? (See https://www.bsg.ox.ac.uk/research/covid-19-government-response-tracker for time series of interventions.)

## For each setting:
1. (a) Take a single model, calibrate it using historical data prior to the given date, and create a 4-week forecast for cases, hospitalizations, and deaths beginning on the given date. (b) Evaluate the forecast using the COVID-19 Forecasting Hub Error Metrics (WIS, MAE). The single model evaluation should be done in the same way as the ensemble.

2. Repeat (1), but with an ensemble of different models.

    a. It is fine to calibrate each model independently and weight naively.
    
    b. It would also be fine to calibrate the ensemble as a whole, assigning weights to the different component models, so that you minimize the error of the ensemble vs. historical data.
    
    c. Use the calibration scores and error metrics computed by the CDC Forecasting Hub. As stated on their [website](https://covid19forecasthub.org/doc/reports/): 
    
    “Periodically, we evaluate the accuracy and precision of the [ensemble forecast](https://covid19forecasthub.org/doc/ensemble/) and component models over recent and historical forecasting periods. Models forecasting incident hospitalizations at a national and state level are evaluated using [adjusted relative weighted interval scores (WIS, a measure of distributional accuracy)](https://arxiv.org/abs/2005.12881), and adjusted relative mean absolute error (MAE), and calibration scores. Scores are evaluated across weeks, locations, and targets. You can read [a paper explaining these procedures in more detail](https://www.medrxiv.org/content/10.1101/2021.02.03.21250974v1), and look at [the most recent monthly evaluation reports](https://covid19forecasthub.org/eval-reports). The final report that includes case and death forecast evaluations is 2023-03-13.” 

3. Produce the forecast outputs in the format specified by the CDC forecasting challenge, including the specified quantiles.

## Data
Use the following data sources:
1. Cases: [Johns Hopkins](https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv), [Reich Lab](https://github.com/reichlab/covid19-forecast-hub/blob/master/data-truth/truth-Incident%20Cases.csv) (pulled from Johns Hopkins, but formatted)

2. Hospitalizations: [HealthData.gov](https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/g62h-syeh)

3. Deaths: [Johns Hopkins](https://github.com/reichlab/covid19-forecast-hub/blob/master/data-truth/truth-Incident%20Deaths.csv), [Reich Lab](https://github.com/reichlab/covid19-forecast-hub/blob/master/data-truth/truth-Cumulative%20Deaths.csv)

In [1]:
# Load dependencies and functions from utils file
from pyciemss.utils.toronto_hackathon_utils.toronto_ensemble_challenge_utils import *

### Set the region of interest and infectious period, get the DataFrame containing case and hospital census data, and death data for that region, and plot said data if desired

In [2]:
# Set the region of interest and infectious period, get the DataFrame for that region
US_region = "MI" # 2-letter state abbreviation string (or "US")
regional_population = 10050000 # Michigan: 10,050,000 / Louisiana: 4,624,000
infectious_period = 7 # duration of infectious period (in days)
plot_data = False # plot the data when true

# Note: source datasets are quite large, so this will take a minute to run
data = get_case_hosp_death_data(US_region = US_region, infectious_period = infectious_period, make_csv=False)
data = data.reset_index()
# print(data)

# FYI: hosp data starts around 07/14/2020 and is NaN before, case and death data ends 03/04/2023 and is NaN after

if plot_data:
    # Plot case census data
    plt.subplot(1, 3, 1)
    plt.plot(data.index, data["case_census"], 'o')
    plt.title("Case Census")

    # Plot hosp census data
    plt.subplot(1, 3, 2)
    plt.plot(data.index, data["hosp_census"], 'o')
    plt.title("Hospital Census")

    # Plot cumulative deaths
    plt.subplot(1, 3, 3)
    plt.plot(data.index, data["cumulative_deaths"], 'o')
    plt.title("Cumulative Deaths")

  raw_cases = pd.read_csv(url)
  raw_cases['date'] = pd.to_datetime(raw_cases.date, infer_datetime_format=True)
  raw_hosp['date'] = pd.to_datetime(raw_hosp.date, infer_datetime_format=True)
  raw_deaths = pd.read_csv(url)
  raw_deaths['date'] = pd.to_datetime(raw_deaths.date, infer_datetime_format=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  regional_cases["case census"] = 0


### Set relevant dates, test and train intervals

In [3]:
# Set train start date at the 55th day of data to use most/best historic data available
train_start_date = str(data["date"][49]) # this is 03/17/2020

# Given timepoints will act as test start dates
timepoint1 = "2020-05-01" 
timepoint2 = "2021-05-01"
timepoint3 = "2021-12-15"

# Set test end dates 4 weeks after timepoints
test_end_date1 = "2020-05-29"
test_end_date2 = "2021-05-29"
test_end_date3 = "2022-01-12"

### Set up train and test data

In [4]:
# Gather train and test data corresponding to Timepoint 1
train_data1, train_cases1, train_timepoints1, test_cases1, test_timepoints1, all_timepoints1 = \
get_train_test_data(data, train_start_date, timepoint1, test_end_date1)
# all_timepoints1 = get_all_timepoints_forreal(all_timepoints1)
data_file1 = US_region + "_train_data_1.csv"
train_data_to_csv(train_data1, data_file1)

# Gather train and test data corresponding to Timepoint 2
train_data2, train_cases2, train_timepoints2, test_cases2, test_timepoints2, all_timepoints2 = \
get_train_test_data(data, train_start_date, timepoint2, test_end_date2)
# all_timepoints2 = get_all_timepoints_forreal(all_timepoints2)
data_file2 = US_region + "_train_data_2.csv"
train_data_to_csv(train_data2, data_file2)

# Gather train and test data corresponding to Timepoint 3
train_data3, train_cases3, train_timepoints3, test_cases3, test_timepoints3, all_timepoints3 = \
get_train_test_data(data, train_start_date, timepoint3, test_end_date3)
# all_timepoints3 = get_all_timepoints_forreal(all_timepoints3)
data_file3 = US_region + "_train_data_3.csv"
train_data_to_csv(train_data2, data_file3)

## Models:
1. You may consider any of the models you have seen in the started kit, or 6-month hackathon and evaluation scenarios.

2. You may search for new models in the literature, or use TA2 model extension/transformation capabilities to modify models already in Terarium.

### Load dependencies for ensembling

In [5]:
# Load ensembling dependencies
import os
from pyciemss.PetriNetODE.interfaces import load_petri_model, load_and_calibrate_and_sample_petri_model
from pyciemss.Ensemble.interfaces import load_and_sample_petri_ensemble, load_and_calibrate_and_sample_ensemble_model
import pyciemss.visuals.plots as plots

### Get models to be ensembled

In [10]:
FIRST_PATH = "../Examples_for_TA2_Model_Representation/"

##### CUSTOM MODELS

# Model 1
filename1 = "SEIARHDS_AMR.json"
filename1 = os.path.join(FIRST_PATH, filename1)
model1 = load_petri_model(filename1, add_uncertainty=True)

# Model 2
filename2 = "SEIARHD_AMR.json"
filename2 = os.path.join(FIRST_PATH, filename2)
model2 = load_petri_model(filename2, add_uncertainty=True)

# Model 3
filename3 = "SIRHD_AMR.json"
filename3 = os.path.join(FIRST_PATH, filename3)
model3 = load_petri_model(filename3, add_uncertainty=True)

##### HACKATHON MODELS

# # Model 1
# filename2 = "scenario1_a.json"
# model2 = load_petri_model(filename2, add_uncertainty=True)

# # Model 2
# filename2 = "scenario1_c.json"
# model2 = load_petri_model(filename2, add_uncertainty=True)

# # Model 3
# filename3 = "scenario1_d.json"
# model3 = load_petri_model(filename3, add_uncertainty=True)

model_paths = [filename1, filename2, filename3]

In [11]:
import urllib.request, json 
def change_model_parameters(filename, new_params):
    # new params = [(param, value), (param, value)]
    with open(filename, 'r') as f:
        model = json.load(f)
        # Change initial parameters
        for (param, value) in new_params:
            for idx in model["semantics"]["ode"]["parameters"]:
                if idx["id"] == param:
                    idx["value"] = value
    return model

In [38]:
model1 = change_model_parameters(filename1, [("beta", 0.55), ("total_population", 1.0), ("delta", 1.5), ("pS", 0.7), ("alpha", 4), ("gamma", 0.2), 
                                             ("hosp", 0.1), ("dnh", 0.0001), ("dh", 0.04), ("los", 7), ("tau", 30)])
model2 = change_model_parameters(filename2, [("beta", 0.55), ("total_population", 1.0), ("delta", 1.5), ("pS", 0.7), ("alpha", 4), ("gamma", 0.2), 
                                             ("hosp", 0.1), ("dnh", 0.0001), ("dh", 0.04), ("los", 7)])
model3 = change_model_parameters(filename3, [("beta", 0.55), ("total_population", 1.0), ("gamma", 0.2), ("hosp", 0.1), ("dnh", 0.0001), ("dh", 0.04), ("los", 7)])
model_list = [model1, model2, model3]

### Create functions to define solution mapping dictionaries

In [45]:
# Define type of solution mapping required by each model

##### CUSTOM MODELS

def solution_mapping1(model_solution: dict) -> dict:
    # solution mapping for model 1: SEIARHDS
    mapped_dict = {}
    mapped_dict["Cases"] = model_solution["symptomatic_population"] + model_solution["asymptomatic_population"]
    mapped_dict["Deaths"] = model_solution["deceased_population"]
    mapped_dict["Hospitalizations"] = model_solution["hospitalized_population"]
    return mapped_dict

def solution_mapping2(model_solution: dict) -> dict:
    # solution mapping for model 2: SEIARHD
    mapped_dict = {}
    mapped_dict["Cases"] = model_solution["symptomatic_population"] + model_solution["asymptomatic_population"]
    mapped_dict["Deaths"] = model_solution["deceased_population"]
    mapped_dict["Hospitalizations"] = model_solution["hospitalized_population"]
    return mapped_dict

def solution_mapping3(model_solution: dict) -> dict:
    # solution mapping for model 3: SIRHD
    mapped_dict = {}
    mapped_dict["Cases"] = model_solution["infectious_population"]
    mapped_dict["Deaths"] = model_solution["deceased_population"]
    mapped_dict["Hospitalizations"] = model_solution["hospitalized_population"]
    return mapped_dict

##### HACKATHON MODELS S E I R D

# def solution_mapping2(model_solution: dict) -> dict:
#     # solution mapping for model 2: SEIARHD
#     mapped_dict = {}
#     mapped_dict["Cases"] = model_solution["I"]
#     mapped_dict["Hospitalizations"] = model_solution["I"]*0.05
#     mapped_dict["Deaths"] = model_solution["D"]
#     return mapped_dict

solution_mappings = [solution_mapping1, solution_mapping2, solution_mapping3]

### Create start states for each model at designated time points

In [46]:
# Define start states for each model

##### CUSTOM MODELS

def create_start_state1(data, t_0, regional_population):
    '''Create the start state for Model 1 from data using our best guesses for
    mapping from observed variables to model state variables.'''
    
    start_state = data.set_index('date').loc[t_0].to_dict()
    returned_state = {}
    returned_state["exposed_population"] = start_state['case_census'] / 2
    if start_state['case_census'] <= 1:
        returned_state["symptomatic_population"] = 1
    else:
        returned_state["symptomatic_population"] = start_state['case_census'] / 2
    returned_state["asymptomatic_population"] = start_state['case_census'] / 2
    returned_state["recovered_population"] = 2 * start_state['case_census']
    
    if start_state["hosp_census"] > 0:
        returned_state["hospitalized_population"] = start_state["hosp_census"]
    else:
        returned_state["hospitalized_population"] = 0
    
    returned_state["deceased_population"] = start_state["cumulative_deaths"]
    returned_state["susceptible_population"] = regional_population - sum(returned_state.values())
    
    assert(returned_state["susceptible_population"] > 0)
    return {k:v/regional_population for k, v in returned_state.items()}

def create_start_state2(data, t_0, regional_population):
    '''Create the start state for Model 2 from data using our best guesses for
    mapping from observed variables to model state variables.'''
    
    start_state = data.set_index('date').loc[t_0].to_dict()
    returned_state = {}
    returned_state["exposed_population"] = start_state['case_census'] / 2
    if start_state['case_census'] <= 1:
        returned_state["symptomatic_population"] = 1
    else:
        returned_state["symptomatic_population"] = start_state['case_census'] / 2
    returned_state["asymptomatic_population"] = start_state['case_census'] / 2
    returned_state["recovered_population"] = 2 * start_state['case_census']
    
    if start_state["hosp_census"] > 0:
        returned_state["hospitalized_population"] = start_state["hosp_census"]
    else:
        returned_state["hospitalized_population"] = 0
    
    returned_state["deceased_population"] = start_state["cumulative_deaths"]
    returned_state["susceptible_population"] = regional_population - sum(returned_state.values())
    
    assert(returned_state["susceptible_population"] > 0)
    return {k:v/regional_population for k, v in returned_state.items()}

def create_start_state3(data, t_0, regional_population):
    '''Create the start state for Model 3 from data using our best guesses for
    mapping from observed variables to model state variables.'''
    
    start_state = data.set_index('date').loc[t_0].to_dict()
    returned_state = {}
    if start_state['case_census'] <= 1:
        returned_state["infectious_population"] = 1
    else:
        returned_state["infectious_population"] = start_state['case_census']
    returned_state["recovered_population"] = 2 * start_state['case_census']
    
    if start_state["hosp_census"] > 0:
        returned_state["hospitalized_population"] = start_state["hosp_census"]
    else:
        returned_state["hospitalized_population"] = 0
    
    returned_state["deceased_population"] = start_state["cumulative_deaths"]
    returned_state["susceptible_population"] = regional_population - sum(returned_state.values())
    
    assert(returned_state["susceptible_population"] > 0)
    return {k:v/regional_population for k, v in returned_state.items()}

##### HACKATHON MODELS

# def create_start_state2(data, t_0, regional_population):
#     '''Create the start state for Model 2 from data using our best guesses for
#     mapping from observed variables to model state variables.'''
    
#     start_state = data.set_index('date').loc[t_0].to_dict()
#     returned_state = {}
#     returned_state["E"] = start_state['case_census'] / 2
#     if start_state['case_census'] <= 0:
#         returned_state["I"] = 1
#     else:
#         returned_state["I"] = start_state['case_census']
#     returned_state["R"] = 2 * start_state['case_census']
    
#     returned_state["D"] = start_state["cumulative_deaths"]
#     returned_state["S"] = regional_population - sum(returned_state.values())
    
#     assert(returned_state["S"] > 0)
#     return returned_state #{k:v/regional_population for k, v in returned_state.items()}

start_states = [create_start_state1(data, train_start_date, regional_population), 
               create_start_state2(data, train_start_date, regional_population), 
               create_start_state3(data, train_start_date, regional_population)]

In [47]:
start_states

[{'exposed_population': 9.950248756218906e-08,
  'symptomatic_population': 9.950248756218906e-08,
  'asymptomatic_population': 9.950248756218906e-08,
  'recovered_population': 3.9800995024875624e-07,
  'hospitalized_population': 0.0,
  'deceased_population': 0.0,
  'susceptible_population': 0.999999303482587},
 {'exposed_population': 9.950248756218906e-08,
  'symptomatic_population': 9.950248756218906e-08,
  'asymptomatic_population': 9.950248756218906e-08,
  'recovered_population': 3.9800995024875624e-07,
  'hospitalized_population': 0.0,
  'deceased_population': 0.0,
  'susceptible_population': 0.999999303482587},
 {'infectious_population': 1.9900497512437812e-07,
  'recovered_population': 3.9800995024875624e-07,
  'hospitalized_population': 0.0,
  'deceased_population': 0.0,
  'susceptible_population': 0.9999994029850746}]

### Set up common ensembling inputs

In [42]:
num_samples = 10
timepoints = all_timepoints3
total_population = 1.0 # Double check that population is normalized to 1.0
start_time = train_timepoints1[0] - 1e-5 # Start time (for all simulations)
DATA_PATH = "../hackathon_prep/"
data_filename = data_file3 # data_file number must be consistent with all_timepoints number
data_path = os.path.join(DATA_PATH, data_filename)
num_iterations = 100

### Set up and sample an ensemble of 1- 3 models 

In [52]:
ensemble_samples = load_and_sample_petri_ensemble(
    [model_list[0]], [1], [solution_mapping1], num_samples, timepoints, 
    # start_states=[start_states[0]], 
    total_population=total_population, start_time=start_time,
)

display(ensemble_samples)
schema = plots.trajectories(ensemble_samples, subset=".*_sol", title="SEAIRHDS Model Samples")
schema = plots.pad(schema, 5)
plots.ipy_display(schema)



Unnamed: 0,timepoint_id,sample_id,model_0/beta_param,model_0/delta_param,model_0/total_population_param,"model_0/(('susceptible_population', ('identity', 'ido:0000514')), ('exposed_population', ('identity', 'ido:0000594')), 'NaturalConversion', 'rate')_param",model_0/alpha_param,model_0/pS_param,"model_0/(('exposed_population', ('identity', 'ido:0000594')), ('symptomatic_population', ('identity', 'ido:0000573')), 'NaturalConversion', 'rate')_param","model_0/(('exposed_population', ('identity', 'ido:0000594')), ('asymptomatic_population', ('identity', 'ido:0000569')), 'NaturalConversion', 'rate')_param",...,"model_0/(('symptomatic_population', ('identity', 'ido:0000573')), ('deceased_population', ('identity', 'ncit:C168970')), 'NaturalConversion', 'rate')_param",model_0/dh_param,model_0/los_param,"model_0/(('hospitalized_population', ('identity', 'ncit:C25179')), ('recovered_population', ('identity', 'ido:0000592')), 'NaturalConversion', 'rate')_param","model_0/(('hospitalized_population', ('identity', 'ncit:C25179')), ('deceased_population', ('identity', 'ncit:C168970')), 'NaturalConversion', 'rate')_param",model_0/tau_param,model_0_weight,Cases_sol,Deaths_sol,Hospitalizations_sol
0,0,0,0.516277,1.636600,1.053009,0.074719,3.680708,0.695002,0.251517,0.229817,...,0.428831,0.042099,7.558776,0.932308,0.703642,28.415672,1.0,0.999981,0.000004,0.000006
1,1,0,0.516277,1.636600,1.053009,0.074719,3.680708,0.695002,0.251517,0.229817,...,0.428831,0.042099,7.558776,0.932308,0.703642,28.415672,1.0,1120.621582,92.368370,71.360008
2,2,0,0.516277,1.636600,1.053009,0.074719,3.680708,0.695002,0.251517,0.229817,...,0.428831,0.042099,7.558776,0.932308,0.703642,28.415672,1.0,3094.440918,509.976257,243.937973
3,3,0,0.516277,1.636600,1.053009,0.074719,3.680708,0.695002,0.251517,0.229817,...,0.428831,0.042099,7.558776,0.932308,0.703642,28.415672,1.0,5114.402832,1221.806519,391.174988
4,4,0,0.516277,1.636600,1.053009,0.074719,3.680708,0.695002,0.251517,0.229817,...,0.428831,0.042099,7.558776,0.932308,0.703642,28.415672,1.0,6945.812500,2118.313721,483.040588
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6715,667,9,0.517935,1.448367,0.944830,0.144901,4.103091,0.684921,0.363927,0.903043,...,0.762058,0.042573,6.376356,0.066263,0.711876,31.905844,1.0,0.687807,99998.570312,0.005164
6716,668,9,0.517935,1.448367,0.944830,0.144901,4.103091,0.684921,0.363927,0.903043,...,0.762058,0.042573,6.376356,0.066263,0.711876,31.905844,1.0,0.676700,99998.585938,0.005081
6717,669,9,0.517935,1.448367,0.944830,0.144901,4.103091,0.684921,0.363927,0.903043,...,0.762058,0.042573,6.376356,0.066263,0.711876,31.905844,1.0,0.665772,99998.601562,0.004999
6718,670,9,0.517935,1.448367,0.944830,0.144901,4.103091,0.684921,0.363927,0.903043,...,0.762058,0.042573,6.376356,0.066263,0.711876,31.905844,1.0,0.655021,99998.609375,0.004918


In [23]:
ensemble_samples = load_and_sample_petri_ensemble(
    [model_list[1]], [1], [solution_mapping2], num_samples, timepoints, 
    start_states=[start_states[1]], total_population=total_population, start_time=start_time,
)

display(ensemble_samples)
schema = plots.trajectories(ensemble_samples, subset=".*_sol", title="SEAIRHDS Model Samples")
schema = plots.pad(schema, 5)
plots.ipy_display(schema)



Unnamed: 0,timepoint_id,sample_id,model_0/beta_param,model_0/delta_param,model_0/total_population_param,"model_0/(('susceptible_population', ('identity', 'ido:0000514')), ('exposed_population', ('identity', 'ido:0000594')), (('symptomatic_population', ('identity', 'ido:0000573')), ('asymptomatic_population', ('identity', 'ido:0000569'))), 'GroupedControlledConversion', 'rate')_param",model_0/alpha_param,model_0/pS_param,"model_0/(('exposed_population', ('identity', 'ido:0000594')), ('symptomatic_population', ('identity', 'ido:0000573')), 'NaturalConversion', 'rate')_param","model_0/(('exposed_population', ('identity', 'ido:0000594')), ('asymptomatic_population', ('identity', 'ido:0000569')), 'NaturalConversion', 'rate')_param",...,"model_0/(('symptomatic_population', ('identity', 'ido:0000573')), ('hospitalized_population', ('identity', 'ncit:C25179')), 'NaturalConversion', 'rate')_param","model_0/(('symptomatic_population', ('identity', 'ido:0000573')), ('deceased_population', ('identity', 'ncit:C168970')), 'NaturalConversion', 'rate')_param",model_0/dh_param,model_0/los_param,"model_0/(('hospitalized_population', ('identity', 'ncit:C25179')), ('recovered_population', ('identity', 'ido:0000592')), 'NaturalConversion', 'rate')_param","model_0/(('hospitalized_population', ('identity', 'ncit:C25179')), ('deceased_population', ('identity', 'ncit:C168970')), 'NaturalConversion', 'rate')_param",model_0_weight,Cases_sol,Deaths_sol,Hospitalizations_sol
0,0,0,0.497580,1.477087,93369.429688,0.012556,4.005779,0.758597,0.960260,0.034283,...,0.409549,0.080003,0.041720,7.083000,0.572597,0.395049,1.0,1.999996,8.000328e-07,0.000004
1,1,0,0.497580,1.477087,93369.429688,0.012556,4.005779,0.758597,0.960260,0.034283,...,0.409549,0.080003,0.041720,7.083000,0.572597,0.395049,1.0,1.496482,1.214310e-01,0.214194
2,2,0,0.497580,1.477087,93369.429688,0.012556,4.005779,0.758597,0.960260,0.034283,...,0.409549,0.080003,0.041720,7.083000,0.572597,0.395049,1.0,1.029907,2.472293e-01,0.205150
3,3,0,0.497580,1.477087,93369.429688,0.012556,4.005779,0.758597,0.960260,0.034283,...,0.409549,0.080003,0.041720,7.083000,0.572597,0.395049,1.0,0.726213,3.347526e-01,0.138620
4,4,0,0.497580,1.477087,93369.429688,0.012556,4.005779,0.758597,0.960260,0.034283,...,0.409549,0.080003,0.041720,7.083000,0.572597,0.395049,1.0,0.532601,3.869208e-01,0.080894
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6715,667,9,0.572592,1.639211,97899.921875,0.327887,3.739211,0.674133,0.213887,0.644825,...,0.612015,0.508807,0.037933,6.850338,0.798976,0.165442,1.0,0.032925,5.364592e+05,-0.002323
6716,668,9,0.572592,1.639211,97899.921875,0.327887,3.739211,0.674133,0.213887,0.644825,...,0.612015,0.508807,0.037933,6.850338,0.798976,0.165442,1.0,0.022909,5.364596e+05,0.009162
6717,669,9,0.572592,1.639211,97899.921875,0.327887,3.739211,0.674133,0.213887,0.644825,...,0.612015,0.508807,0.037933,6.850338,0.798976,0.165442,1.0,0.029444,5.364591e+05,-0.001125
6718,670,9,0.572592,1.639211,97899.921875,0.327887,3.739211,0.674133,0.213887,0.644825,...,0.612015,0.508807,0.037933,6.850338,0.798976,0.165442,1.0,0.024744,5.364594e+05,0.003487


In [24]:
ensemble_samples = load_and_sample_petri_ensemble(
    [model_list[2]], [1], [solution_mapping3], num_samples, timepoints, 
    start_states=[start_states[2]], total_population=total_population, start_time=start_time,
)

display(ensemble_samples)
schema = plots.trajectories(ensemble_samples, subset=".*_sol", title="SEAIRHDS Model Samples")
schema = plots.pad(schema, 5)
plots.ipy_display(schema)



Unnamed: 0,timepoint_id,sample_id,model_0/beta_param,model_0/total_population_param,"model_0/(('susceptible_population', ('identity', 'ido:0000514')), ('infectious_population', ('identity', 'ido:0000513')), ('infectious_population', ('identity', 'ido:0000513')), 'ControlledConversion', 'rate')_param",model_0/dnh_param,model_0/gamma_param,model_0/hosp_param,"model_0/(('infectious_population', ('identity', 'ido:0000513')), ('recovered_population', ('identity', 'ido:0000592')), 'NaturalConversion', 'rate')_param","model_0/(('infectious_population', ('identity', 'ido:0000513')), ('hospitalized_population', ('identity', 'ncit:C25179')), 'NaturalConversion', 'rate')_param","model_0/(('infectious_population', ('identity', 'ido:0000513')), ('deceased_population', ('identity', 'ncit:C168970')), 'NaturalConversion', 'rate')_param",model_0/dh_param,model_0/los_param,"model_0/(('hospitalized_population', ('identity', 'ncit:C25179')), ('recovered_population', ('identity', 'ido:0000592')), 'NaturalConversion', 'rate')_param","model_0/(('hospitalized_population', ('identity', 'ncit:C25179')), ('deceased_population', ('identity', 'ncit:C168970')), 'NaturalConversion', 'rate')_param",model_0_weight,Cases_sol,Deaths_sol,Hospitalizations_sol
0,0,0,0.509694,96994.804688,0.618401,0.000102,0.180141,0.109857,0.288874,0.804331,0.532224,0.037002,7.159360,0.463071,0.340039,1.0,1.999980,0.000011,0.000016
1,1,0,0.509694,96994.804688,0.618401,0.000102,0.180141,0.109857,0.288874,0.804331,0.532224,0.037002,7.159360,0.463071,0.340039,1.0,0.730674,0.824284,0.651452
2,2,0,0.509694,96994.804688,0.618401,0.000102,0.180141,0.109857,0.288874,0.804331,0.532224,0.037002,7.159360,0.463071,0.340039,1.0,0.267315,1.277829,0.528741
3,3,0,0.509694,96994.804688,0.618401,0.000102,0.180141,0.109857,0.288874,0.804331,0.532224,0.037002,7.159360,0.463071,0.340039,1.0,0.097218,1.511590,0.324819
4,4,0,0.509694,96994.804688,0.618401,0.000102,0.180141,0.109857,0.288874,0.804331,0.532224,0.037002,7.159360,0.463071,0.340039,1.0,0.036152,1.627650,0.175708
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6715,667,9,0.542790,108594.554688,0.116782,0.000107,0.189419,0.104170,0.624974,0.779853,0.902838,0.036240,6.371753,0.212969,0.126387,1.0,-0.004354,1.091002,0.001834
6716,668,9,0.542790,108594.554688,0.116782,0.000107,0.189419,0.104170,0.624974,0.779853,0.902838,0.036240,6.371753,0.212969,0.126387,1.0,0.008467,1.086030,-0.003566
6717,669,9,0.542790,108594.554688,0.116782,0.000107,0.189419,0.104170,0.624974,0.779853,0.902838,0.036240,6.371753,0.212969,0.126387,1.0,-0.006250,1.091738,0.002633
6718,670,9,0.542790,108594.554688,0.116782,0.000107,0.189419,0.104170,0.624974,0.779853,0.902838,0.036240,6.371753,0.212969,0.126387,1.0,0.002393,1.088386,-0.001008


### Load, calibrate, and sample an ensemble of 1 model

In [34]:
# Check that you can load and sample and calibrate an ensemble of 1-3 models (hooray! you can!)
calibrated_ensemble_of1 = load_and_calibrate_and_sample_ensemble_model(
    [model_list[0]],
    data_path,
    [1],
    [solution_mapping1],
    num_samples,
    timepoints,
    start_states=[start_states[0]],
    total_population=total_population,
    start_time=start_time,
    num_iterations=num_iterations,
    verbose=True,
)

display(calibrated_ensemble_of1)
schema = plots.trajectories(calibrated_ensemble_of1, subset=".*_sol", title="SEIARHDS Model Samples")
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

AssertionError: 

In [35]:
start_states

[{'exposed_population': 9.950248756218906e-08,
  'symptomatic_population': 9.950248756218906e-08,
  'asymptomatic_population': 9.950248756218906e-08,
  'recovered_population': 3.9800995024875624e-07,
  'hospitalized_population': 0.0,
  'deceased_population': 0.0,
  'susceptible_population': 0.999999303482587},
 {'exposed_population': 9.950248756218906e-08,
  'symptomatic_population': 9.950248756218906e-08,
  'asymptomatic_population': 9.950248756218906e-08,
  'recovered_population': 3.9800995024875624e-07,
  'hospitalized_population': 0.0,
  'deceased_population': 0.0,
  'susceptible_population': 0.999999303482587},
 {'infectious_population': 1.9900497512437812e-07,
  'recovered_population': 3.9800995024875624e-07,
  'hospitalized_population': 0.0,
  'deceased_population': 0.0,
  'susceptible_population': 0.9999994029850746}]

### Load, calibrate, and sample an ensemble of 3 models

In [None]:
# Check that you can load and sample and calibrate an ensemble of 1-3 models (hooray! you can!)
DATA_PATH = "../hackathon_prep/"
data_filename = data_file3
data_path = os.path.join(DATA_PATH, data_filename)
weights = [] # set equal weights initially
for i in range(0, len(model_paths)):
    weights.append(1/len(model_paths)) 
num_samples = 10
timepoints = all_timepoints3
num_iterations = 300

calibrated_ensemble_of3 = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    start_states=start_states,
    total_population=regional_population,
    start_time=start_time,
    num_iterations=num_iterations,
    verbose=True,
)

In [None]:
nice_labels = {"Cases_sol": "Infectious", "Deaths_sol": "Deaths", "Hospitalizations_sol": "Hospitalized"}
schema = plots.trajectories(calibrated_ensemble_of3, points=pd.read_csv(data_path), subset=".*_sol", title="Calibrated Ensemble of Three Models", relabel=nice_labels)
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

In [None]:
data_path

In [None]:
a = pd.read_csv(data_path)
observed_data = a.drop(columns = ["Timestep", "Hospitalizations"])
observed_data

In [None]:
nice_labels = {"Cases_sol": "Infectious", "Deaths_sol": "Deaths", "Hospitalizations_sol": "Hospitalized"}
schema = plots.trajectories(calibrated_ensemble_of3, points=observed_data, subset=".*_sol", title="Calibrated Ensemble of Three Models", relabel=nice_labels)
schema = plots.pad(schema, 5)
plots.ipy_display(schema)