# Simple data parsing + plotting notebook
---

This notebook will aggregate summary data from a set of corvid runs and parse to hdf5 format and/or plot statitstics.

## Generating data
To generate a set of runs to analyze, run the `batchrun_haswell.slr` slurm script (currently configured for the haswell node debug queue, as long as it fits into <30 mins). This will take a haswell node and run corvid w/ 64 different random seeds in parallel. To do more concurrent runs, use the `--array=0-M` option when submitting via `sbatch` to submit a job array of M+1 jobs, each running 64 unique random seeds.

In the batch script, adjust the `dirName` field to point to your desired storage directory in `$SCRATCH` -- currently I have it to be `$SCRATCH/corvid_demo/$dirName`. Then, provide the `configFile` to adjust which config file you'd like to use.

## Formats/Info
Array shapes are (number of runs, simulation length in days, d), where d can be 1 for scalar data, 5 for age-binned data. The parser uses a sample summary from the first run in job (`${dataDir}-0/out_0/${summfname}`) to get variable names and the corresponding shapes.

If job outputs are in `$SCRATCH`, parsing takes around 2-3s per run for 180 day runs. This changes with file system variability and the size of the summary files. So to parse O(1000) runs, start the parsing cell and then go make some coffee ;)

In [None]:
import numpy as np
from plotting import *
from utils import RunParser
import os

# Directory where hdf5 summaries are saved
saveDir= '/global/cfs/projectdirs/covid19/sys_uncertainty/seattle-26/' 

### Can parse a set of runs generated by the batch submission script:

In [None]:
dataDir = '/global/cscratch1/sd/pharring/corvid_demo/schoolWFH-long' # Where output of slurm job array lives
N_jobs_array = 13 # Number of jobs in job array
N_per_job = 80 # number of runs per job (equals srun --ntasks in batch script)
dat = RunParser(dataDir, Narr=N_jobs_array, Nperjob=N_per_job)
dat.parse_runs()
h5name = os.path.join(saveDir, dataDir.split('/')[-1]+'.h5')
dat.save_h5(h5name)

### Or load from an hdf5 file if you have already done the above:

In [None]:
# hdf5 filename to load from
h5name = os.path.join(saveDir,'schoolWFH-long.h5')
dat = RunParser(load_from_h5=h5name)

### Plot daily new symptomatic individuals

In [None]:
plot_daily_new_symptomatic(dat.agg)

In [None]:
plot_peakdist(dat.agg)

### Plot timeseries:

In [None]:
plot_timeseries(dat.agg, figsz=14, agebins=dat.age_bins)

### Plot end-of-sim stats


In [None]:
plot_end(dat.agg, figsz=14, agebins=dat.age_bins)