# Test mat2py module

This notebook tests the mat2py module. It uses the Epidem class to extract data from a matlab output file. The notebook uses Bokeh to plot the data. Several plotting and data extraction options are shown. The first cell shows how to instantiate the Epidem class and extract metadata about the run. 

Notice that the number of available policies is different from the total number of policies. As of this writing the simulation sorts the output in this fashion. The index of the available policies are stored in the `Scenario` `SchoolPolicies` and `SocialDistancePolicies` properties of the Epidem class. Arguments to the `get_outcomes` method must be chosen from among those available policies.

In [1]:
#mat2py test code 
#import modules
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
import sys,os
import pandas as pd
#mat2py is located relative to this notebook in the repo.
#these two lines insure the module is found
modulepath = os.getcwd() + ""
print(f'modulepath: {modulepath}')
#sys.path.append(modulepath)
#import the Epidem module
from Mat2Py.Epidem import Epidem
output_notebook()
#The input file location relative to the location of this
#notebook. 
filespec = "..\DATA\Matlab\Multi_v2_Full_COVID_cpuNum_30_March_24_737874.2655.mat"
#instantiate the Epidem class object. 
run = Epidem(filepath = filespec)
# print some things about the run
print(f'Input file:\n{run.Filename}\n')
#
print(run)


modulepath: C:\Users\semeraro-la\Documents\Projects\covid\Code


Input file:
Multi_v2_Full_COVID_cpuNum_30_March_24_737874.2655.mat

Stats: 
	Outcomes                      3        
	Risks                         2
	Age Groups                    5        
	Cities                      217
	Data Points                 363        
	Stochastic Iterations         1        
Policies: 
	School                       13        
	Social Distance               4
	Scenarios                     7        
Available Policies:  
	School                        3        
	Social Distance               4
	Scenarios                     2


## Selecting Data
The following code cell demonstrates how to access data using the mat2py package. Once the Epidem class is instantiated the main entry point for data extraction is The get_outcome method. The method takes several arguments which control the amount and form of the output. The get_outcome method returns a pandas dataframe with time on the first axis and categories on the second. Categories correspond to columns in a dataframe. For example one could request counts for each age group for a particular outcome type, risk factor, and scenario/policy set. 

### Outcomes alone

The best way to illustrate is through examples. The code cell below extracts counts for each of the outcomes for a particluar set of scenario-policy-age_group-risk-location combination. The result consists of the sum across age groups for the three outcomes available. 

In [2]:
# Set the arguments to the get_outcomes method. Arbitrarily chosen.
# The scenario, policy arguments are single integers. A set of these 
# values defines a simulation and the output from that simulation is what you get.
scenario          = run.Scenarios[0]             #the first available
school_policy     = run.SchoolPolicies[-1]       #the last available
social_distancing = run.SocialDistancePolicies[0] #the first available
risk              = run.NumberOfRisks - 1        #the last risk
location          = 35620                        #CBSA for New York
# The following variables define how the output is delivered. The
# outcome and the age_group can have iterable values. If age_group is 
# None then the sum across age groups for all of the outcomes is returned.
# In other words the sum over all age groups is returned. If age_group
# is not None then the results are returned as per age group and a 
# multi index dataframe is returned from get_outcome.
outcome           = [item for item in range(run.NumberOfOutcomes)]
age_group         = None                            #Sum across all age groups
run_data = run.get_outcome(outcome,scenario,school_policy,social_distancing,risk,age_group,location)
#Bokeh plotting foo
colorset = ['red','blue','purple','olivedrab','brown']
source = ColumnDataSource(run_data)
p = figure(title="Test Outcome Plot")
p.xaxis.axis_label = "Time"
p.yaxis.axis_label = "Count"
ii=0
for item,value in run_data.items():
    #print(type(item))
    p.line(x='index',y=item,source = source,legend_label=item,line_width=2,color=colorset[ii])
    ii = ii + 1
show(p)

### With Age Groups

The next code cell shows what happens when you specify age groups in addition to outcomes. We examine all outcomes and all age groups. The returned pandas dataframe is a multi indexed affair with outcomes as the level 0 column index and age groups as the level 1 column index.

In [3]:
# Use the same scenario and policy data as above and just change the age groups. 
age_group = (0,1,2,3,4)
# Get the data
run_data = run.get_outcome(outcome,scenario,school_policy,social_distancing,risk,age_group,location)
source=ColumnDataSource(run_data)
# The ColumnDataSource flattens the column multi index 
# plot counts per age group for each outcome. 
for case in run_data.columns.levels[0]:
    p = figure(title=f'Age Groups for {case}')
    p.xaxis.axis_label='Time'
    p.yaxis.axis_label = 'Count'
    ii = 0
    for age in run_data.columns.levels[1]:
        y_data = f'{case}_{age}'
        p.line(x='index',y=y_data,source = source,legend_label=age,line_width=2,color=colorset[ii])
        ii = ii + 1
    show(p)

## Fin

This notebook illustrates the two end cases of usage of the mat2py package. 