# hillmaker - OO design ideas

https://github.com/misken/hillmaker/tree/develop

**TODO - Add material from previous explainer notebooks that covers horizon, warmup, use of CLI details and more.**

Overall application design goals and objectives

- should be easy to run a scenario and get all the standard outputs
- scenario specific settings should be persistable as something like a json file
- should be possible to generate only outputs wanted
- should have a CLI
- should be importable so that it can be used from notebook or other custom Python scripts
- be nice to have a GUI for non-technie users
- should be easy to explore multiple scenarios
- global and scenario specific settings can be managed through settings files, command line args or function args
- current occupancy, arrival and departure stats all still desirable
- los summary would be nice
- outputs should be in formats that lend themselves to further analysis and reporting such as csvs for the occ stats (bydatetime and summary), standard graphic file formats, perhaps JSON for los summary and occ stats
- dataset profiling should be done to identify potential issues with horizon effects, warmup effects, missing data periods, or other anomolies.


Should hillmaker be redesigned as an OO based application?

- does OO design make for a better analyst experience? For example, does OO make it easier to create and manage a bunch of scenarios in which each is a separate hillmaker run? OO would make it easier to document scenarios through their settings (e.g. as json file).
- does OO lead to potential performance gains by making it easier to only run the parts we want to run. For example, maybe we don't want individual day of week plots.
- right now hillmaker is an (almost) all or nothing experience with each run standing alone. 
- OO would likely be better for those using hillmaker programmatically. 
- no matter what the design, there will always be a CLI.
- not sure how OO or not affects GUI dev

How should hillmaker be redesigned as an OO based application?

## Use case 1 - overall and by patient type summaries

Patients flow through a short stay unit for a variety of procedures, tests or therapies. Let's assume patients can be classified into one of five categories of patient types: ART (arterialgram), CAT (post cardiac-cath), MYE (myelogram), IVT (IV therapy), and OTH (other). From one of our hospital information systems we were able to get raw data about the entry and exit times of each patient and exported the data to a csv file. We call each row of such data a *stop* (as in, the patient stopped here for a while). 

- We want to generate summaries of occupancy as well as arrivals and discharges to go into a summary report for hospital administration. 
- We want these overall and by patient type. 
- We also want LOS summaries by patient type. 
- Volume and occupancy trends over time

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from pathlib import Path
from pprint import pprint

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
from IPython.display import Image

from datetime import datetime, date
from typing import Dict, List, Optional, Tuple, Union

In [None]:
ssu_stopdata = '../data/ShortStay.csv'
ssu_stops_df = pd.read_csv(ssu_stopdata, parse_dates=['InRoomTS','OutRoomTS'])
ssu_stops_df.info() # Check out the structure of the resulting DataFrame

In [None]:
ssu_stops_df.head()

## An OO version of hillmaker (0.5.0)

This is a work in progress and just wanted to share some early ideas. To summarize:

- added a `Scenario` class which has methods for running hillmaker (`make_hills()`) and for retrieving plots and dataframes from the results dictionary (`get_plot()`, `get_summary_df()`, and `get_bydatetime_df()`). 
- the plots and dataframes produced by `make_hills` are stored in a dictionary called `hills` that is an attribute of the `Scenario` class
- the methods of the `Scenario` class are just wrappers that call module level functions of the same name that do the actual work. By doing this, I haven't broken the existing hillmaker API - you can still call a `make_hills` function and pass in a bunch of key word arguments and run things. The `make_hills()` function returns a `hills` dictionary and `get_plot` and `get_*_df` functions can be used to extract plots and dataframes from `hills` in a simple way.
- the `Scenario` class is actually a `pydantic` model which handles input validation.

So, hillmaker can be used in either an objected oriented way or a via standard function calls. It also has (or will have) a CLI. For now, this lets us try it out and see what makes the most sense for future development.

## Usage example 1 - the OO approach

In [None]:
import hillmaker as hm

Here are a collection of inputs that we'll use to create a scenario. Notice I purposely set one of the input dates to a string and the other to a `Timestamp` just to show that the `pydantic` model can handle the automatic transformation for us to a `datetime`.

In [None]:
# Required inputs
scenario_name = 'ssu_1'
stops_df = ssu_stops_df
in_field_name = 'InRoomTS'
out_field_name = 'OutRoomTS'
start_date = '1996-01-01'
end_date = pd.Timestamp('9/30/1996')

# Optional inputs

cat_field_name = 'PatType'
verbosity = 1 # INFO level logging
output_path = './output'
bin_size_minutes = 60

I created a `Scenario` class that is a Pydantic model. It handles a bunch of type constraints, validation, and default values. What does the `Scenario` class look like?

In [None]:
hm.Scenario?

We can create scenarios a few different ways.

- instantiate an instance of `Scenario` by passing in keyword args
- if the args are in a dict, can use dictionary unpacking to do the same
- there's a `create_scenario` function in the `utils` module that can take any of a dict, a TOML path or keyword args and returns a `Scenario` object (precedence is in the reverse order - kwargs get the final say)

### Create a new scenario with keyword arguments

You can create an instance of `Scenario` by passing in keyword arguments.

In [None]:
scenario_1 = hm.Scenario(scenario_name=scenario_name, 
                         stops_df=stops_df,
                         in_field=in_field_name,
                         out_field=out_field_name,
                         start_analysis_dt=start_date,
                         end_analysis_dt=end_date,
                         cat_field=cat_field_name,
                         output_path=Path('./output'),
                         verbosity=verbosity)

In [None]:
pprint(scenario_1.model_dump())

You can use pydantic's `model_dump` function to create dictionary from a pydantic model. I'll do that and make a few changes and create a new scenario from the modified dict using dictionary unpacking to pass in the keyword arguments to `Scenario`.

In [None]:
# Dump pydantic model to dict (could wrap this with a to_dict() method
scenario_2_dict = scenario_1.model_dump()

# Make some changes
scenario_2_dict['scenario_name'] = 'ssu_2'
scenario_2_dict['bin_size_minutes'] = 30

# Make a new scenario using dictionary unpacking
scenario_2 = hm.Scenario(**scenario_2_dict)

pprint(scenario_2)

Since the scenario objects are really just a Python class, attributes can be get and set in the usual way.

### Create a new scenario with a TOML config file

To use a TOML configuration file to create a scenario, we can use the `create_scenario` function in the `utils` module.
Notice here that instead of specifying a pandas `DataFrame`, we are specifiying a path to a csv file which will be read to create the stops `DataFrame`. It would be also possible to allow a string corresponding to the name of an existing `DataFrame` to be specified - we could use the `globals()[<string_name_of_dataframe>]` construct to access the actual object.

Here's an example config file:

```
[scenario_data]
scenario_name = "ssu_3"
stop_data_csv = "../data/ShortStay.csv"

[fields]
in_field = "InRoomTS"
out_field = "OutRoomTS"
# Just remove the following line if no category field
cat_field = "PatType"

[analysis_dates]
start_analysis_dt = 1996-01-01
end_analysis_dt = 1996-09-30

[settings]
bin_size_minutes = 60
verbosity = 1
output_path = "./output"

# Add any additional arguments here
# Strings should be surrounded in double quotes
# Floats and ints are specified in the normal way as values
# Dates are specified as shown above

# For arguments that take lists, the entries look
# just like Python lists and following the other rules above

# cats_to_exclude = ["IVT", "OTH"]
# percentiles = [0.5, 0.8, 0.9]
```

In [None]:
config_file = Path('ssu_3.toml')
scenario_3 = hm.create_scenario(toml_path=config_file)
print(scenario_3)

In [None]:
scenario_1.bin_size_minutes

While this is convenient, it does mean that as the programmer, you can get around some of the validation checks that were already done. For example, `bin_size_minutes` must evenly divide into 1440. I can add code to revalidate the model before allowing `make_hills` to run but I'm not going to bother for now. The standard Python error system will catch bad things.

In [None]:
# Bad
scenario_1.bin_size_minutes = 17

# Set it back to a valid value
scenario_1.bin_size_minutes = 60

### Create a new scenario using `create_scenario()` and a dictionary

The `create_scenario` function also can take, as input, a dictionary of input arguments. Notice in the example below that I've used strings for the dates but I could just as well have used `datetime` or `TimeStamp` objects - anything that can be converted to a pandas `TimeStamp` is allowed. I've only included the required parameters and two optional parameters - `cat_field` and `bin_size_mins`.

```
scenario_4_dict = {
    'scenario_name': 'ssu_4',
    'stops_df': ssu_stops_df,
    'in_field': 'InRoomTS',
    'out_field': 'OutRoomTS',
    'start_analysis_dt': '1996-01-01',
    'end_analysis_dt': '1996-09-30',
    'cat_field': 'PatType',
    'bin_size_minutes': 60
}
```

In [None]:
hm.create_scenario?

In [None]:
scenario_4_dict = {
    'scenario_name': 'ssu_4',
    'stops_df': ssu_stops_df,
    'in_field': 'InRoomTS',
    'out_field': 'OutRoomTS',
    'start_analysis_dt': '1996-01-01',
    'end_analysis_dt': '1996-09-30',
    'cat_field': 'PatType',
    'bin_size_minutes': 60
}

scenario_4 = hm.create_scenario(params_dict=scenario_4_dict)
print(scenario_4)


With `create_scenario`, you can also include keyword arguments that will take precedence over those specified in either a TOML file or a dictionary. 

In [None]:
scenario_5_dict = {
    'scenario_name': 'ssu_5',
    'stops_df': ssu_stops_df,
    'in_field': 'InRoomTS',
    'out_field': 'OutRoomTS',
    'start_analysis_dt': '1996-01-01',
    'end_analysis_dt': '1996-09-30',
    'cat_field': 'PatType',
    'bin_size_minutes': 60
}

scenario_5 = hm.create_scenario(params_dict=scenario_5_dict, 
                                export_all_week_plots=True, bin_size_minutes=30)
print(scenario_5)

Now let's generate hills by using the `make_hills` method of one of the scenario instances.

In [None]:
scenario_1.make_hills()

All of the outputs get stored in the `hills` dictionary attribute. It's a nested dictionary and it can be cumbersome to pull out specific items. Later in this notebook we'll describe "getter" methods to make it easier to pull out specific plots or dataframes.

In [None]:
scenario_1.hills.keys()

In [None]:
scenario_1.hills['bydatetime'].keys()

In [None]:
scenario_1.hills['bydatetime']['PatType_datetime'].head()

In [None]:
scenario_1.hills['summaries'].keys()

In [None]:
scenario_1.hills['summaries']['nonstationary'].keys()

In [None]:
scenario_1.hills['summaries']['nonstationary']['PatType_dow_binofday'].keys()

In [None]:
scenario_1.hills['summaries']['nonstationary']['PatType_dow_binofday']['occupancy'].head()

In [None]:
scenario_1.hills['settings']

## Retrieving specific plots and/or DataFrames

As previously pointed out, it's clunky to have to traverse these dictionaries to pull out plots and dataframes. Seems like we'd want to be able to quickly view a specific plot or `DataFrame`.

Methods `get_plot()` and `get_dataframe()` were added the `Scenario` class. These are actually just wrappers for module level functions `hills.get_plot()` and `hills.get_dataframe()`.

In [None]:
scenario_1.get_plot??

In [None]:
plot = scenario_1.get_plot('occupancy')

In [None]:
plot

In [None]:
scenario_1.get_plot('occ')

In [None]:
scenario_1.get_plot('departures', 'Mon')

In [None]:
scenario_2.get_summary_df?

In [None]:
occ_summary_df = scenario_1.get_summary_df('o')
occ_summary_df

In [None]:
overall_occ_summary_df = scenario_1.get_summary_df('o', by_category=False)
overall_occ_summary_df

In [None]:
overall_stationary_occ_summary_df = scenario_1.get_summary_df('o', by_category=False, stationary=True)
overall_stationary_occ_summary_df

## Exporting summary or datetime csv files

Currently, if you use the `make_hills` method (or the `make_hills` legacy function - more on this below), you can use the following arguments to control exporting of plots and dataframes:

```
# Exporting dataframes
export_bydatetime_csv : bool, optional
       If True, bydatetime DataFrames are exported to csv files. Default is False.
export_summaries_csv : bool, optional
       If True, summary DataFrames are exported to csv files. Default is False.

# Exporting plots       
export_all_dow_plots : bool, optional
   If True, day of week plots are exported for occupancy, arrival, and departure. Default is False.
export_all_week_plots : bool, optional
   If True, full week plots are exported for occupancy, arrival, and departure. Default is False.
 
```

It's an "all or none" kind of decision with respect to each argument. Of course, you can always use `get_plot` or `get_dataframe` and then manually export it yourself. See cell below.

**QUESTION** What might an API look like that supported more fine grained control of plot and dataframe exporting but didn't rely on the user doing the manual kind of thing below?



In [None]:
# Fetch the dataframe of interest
overall_occ_summary_df = scenario_1.get_summary_df('o', by_category=False)

# Create output filename with path
export_path = Path('./output')
file_summary_csv = 'scenario_1_occ.csv'
csv_wpath = Path(export_path, file_summary_csv)

# Export the dataframe
overall_occ_summary_df.to_csv(csv_wpath, index=True, float_format='%.6f')

## Computing statistics with no plotting

The default DOW and weekly plots can be supressed through key word arguments when creating a `Scenario` instance by setting `make_all_dow_plots=False` and `make_all_week_plots=False`.

```

```



In [None]:
start_date_ts = pd.Timestamp(start_date)
scenario_6 = hm.Scenario(scenario_name=scenario_name, 
                         stops_df=stops_df,
                         in_field=in_field_name,
                         out_field=out_field_name,
                         start_analysis_dt=start_date_ts,
                         end_analysis_dt=start_date_ts + pd.Timedelta(90, 'd'),
                         cat_field=cat_field_name,
                         output_path=Path('./output'),
                         verbosity=0,
                         make_all_dow_plots=False,
                         make_all_week_plots=False)

scenario_6.make_hills()

print(scenario_6.hills.keys())


I also added a `compute_hills_stats` method that just does the bydatetime and summary stats, but does NOT create plots or export anything. It populates the `hills` attribute (a dict). 

In [None]:
start_date_ts = pd.Timestamp(start_date)
scenario_7 = hm.Scenario(scenario_name=scenario_name, 
                         stops_df=stops_df,
                         in_field=in_field_name,
                         out_field=out_field_name,
                         start_analysis_dt=start_date_ts,
                         end_analysis_dt=start_date_ts + pd.Timedelta(90, 'd'),
                         cat_field=cat_field_name,
                         output_path=Path('./output'),
                         verbosity=0)

scenario_7.compute_hills_stats()

print(scenario_7.hills.keys())


The plan is to add additional plot types and plotting related input arguments to allow better plot customization. For now, I've just added a function, `make_week_dow_plots()` to the `plotting` module that creates all of the DOW and weekly plots that are currently created. So, if you want, you can call `make_week_dow_plots()` after computing statistics with `compute_hills_stats` by passing in the resulting `hills` dictionary. **This feels kludgy in that the `scenario.hills` dict isn't updated which means that the `get_plot` method won't work.**

In [None]:
scenario_7_plots = hm.plotting.make_week_dow_plots(scenario_7, scenario_7.hills)

In [None]:
scenario_7_plots.keys()

**Just as with the dataframes, we need to design an API for fine grained plotting control.**

## Usage example 2 - the non-OO approach

This is just the way that `make_hills` has been used in recent versions of hillmaker. I wanted to keep this around as a "legacy" function that still works. The way I ended up doing it was to:

- created a `legacy.make_hills()` function that creates a `Scenario` object from the user specified input arguments which...
- then calls `hills.make_hills(<scenario object>)` to actually does the work

By creating the legacy wrapper function, the inputs can be validated with the Pydantic `Scenario` model class. The user never knows that they are actually using a `Scenario` object and `hills.make_hills()` returns the same dictionary that gets stored in the `hills` attribute of a `Scenario` instance.



In [None]:
# Required inputs
scenario_name = 'legacy_example_1'
in_field_name = 'InRoomTS'
out_field_name = 'OutRoomTS'
start_date = '1996-01-01'
end_date = pd.Timestamp('9/30/1996')

# Optional inputs

cat_field_name = 'PatType'
verbosity = 1 # INFO level logging
output_path = './output'
bin_size_minutes = 60


hills_legacy_example_1 = hm.make_hills(scenario_name=scenario_name, stops_df=ssu_stops_df,
              in_field=in_field_name, out_field=out_field_name,
              start_analysis_dt=start_date, end_analysis_dt=end_date,
              cat_field=cat_field_name,
              bin_size_minutes=bin_size_minutes,
              output_path='./output', verbosity=verbosity)

Now to get a plot we call the module level `get_plot` and pass in the hills dictionary. The `hills` dictionary contains a `'settings'` key that is used to store things needed for plots and dataframes.

In [None]:
hills_legacy_example_1['settings']

In [None]:
plot = hm.get_plot(hills_legacy_example_1, 'o', 'week')
plot

## Usage example 3 - the CLI

I updated some of the names of the input arguments to match the current arguments used by the `Scenario` model.



In [None]:
!hillmaker --help

In [None]:
!hillmaker --scenario_name cli_example_1 --stop_data_csv '../data/ShortStay.csv' \
--in_field 'InRoomTS' --out_field 'OutRoomTS' --cat_field 'PatType' \
--start_analysis_dt '1996-01-01' --end_analysis_dt '1996-09-30' \
--export_all_dow_plots --export_all_week_plots --output_path './output' --verbosity 1