# hillmaker - OO design ideas

Overall application design goals and objectives

- should be easy to run a scenario and get all the standard outputs
- scenario specific settings should be persistable as something like a json file
- should be possible to generate only outputs wanted
- should have a CLI
- should be importable so that it can be used from notebook or other custom Python scripts
- be nice to have a GUI for non-technie users
- should be easy to explore multiple scenarios
- global and scenario specific settings can be managed through settings files, command line args or function args
- current occupancy, arrival and departure stats all still desirable
- los summary would be nice
- outputs should be in formats that lend themselves to further analysis and reporting such as csvs for the occ stats (bydatetime and summary), standard graphic file formats, perhaps JSON for los summary and occ stats
- dataset profiling should be done to identify potential issues with horizon effects, warmup effects, missing data periods, or other anomolies.


Should hillmaker be redesigned as an OO based application?

- does OO design make for a better analyst experience? For example, does OO make it easier to create and manage a bunch of scenarios in which each is a separate hillmaker run? OO would make it easier to document scenarios through their settings (e.g. as json file).
- does OO lead to potential performance gains by making it easier to only run the parts we want to run. For example, maybe we don't want individual day of week plots.
- right now hillmaker is an (almost) all or nothing experience with each run standing alone. 
- OO would likely be better for those using hillmaker programmatically. 
- no matter what the design, there will always be a CLI.
- not sure how OO or not affects GUI dev

How should hillmaker be redesigned as an OO based application?

## Other similar projects

The [pandas-profiling](https://pandas-profiling.ydata.ai/docs/master/index.html) project has some similarities and has high quality code (certainly better than what I write).

- Similar flow of doing analysis on a dataframe and producing various visualizations, reports, and other outputs
- Produces plots, html reports, jupyter based report as well as providing results in json format
- Uses pydantic to help with config settings management and input validation
- Very focused use case - analyze dataframes
- Very thorough documentation
- The docs on [Changing Settings](https://pandas-profiling.ydata.ai/docs/master/pages/advanced_usage/changing_settings.html) is pretty much what we want to do (except don't need env vars option)
- the CLI code is in console.py and it's the `Settings` class that sublclasses Pydantic's `BaseModel` class


The [pyfolio](https://github.com/quantopian/pyfolio) project is also good for ideas.

- financial analysis of a range of dates for a single stock - see tutorial at https://quantopian.github.io/pyfolio/notebooks/single_stock_example/
- other more elaborate analyses
- uses a `plotting.context` decorator function to allow plot customization. Matplotlib and seaborn support context managers for temporary changes to plot settings. The matplotlib context handles all the plot details whereas the Seaborn context manager is for higher level changes like plot scaling for different output targets such as notebook, paper or poster.

An apache sniffer tool called [thrift](https://github.com/pinterest/thrift-tools)

- simple, clean interface
- CLI or library

## Use case 1 - overall and by patient type summaries

Patients flow through a short stay unit for a variety of procedures, tests or therapies. Let's assume patients can be classified into one of five categories of patient types: ART (arterialgram), CAT (post cardiac-cath), MYE (myelogram), IVT (IV therapy), and OTH (other). From one of our hospital information systems we were able to get raw data about the entry and exit times of each patient and exported the data to a csv file. We call each row of such data a *stop* (as in, the patient stopped here for a while). 

- We want to generate summaries of occupancy as well as arrivals and discharges to go into a summary report for hospital administration. 
- We want these overall and by patient type. 
- We also want LOS summaries by patient type. 
- Volume and occupancy trends over time

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from pathlib import Path
from pprint import pprint

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
from IPython.display import Image

from datetime import datetime, date
from typing import Dict, List, Optional, Tuple, Union

In [3]:
ssu_stopdata = '../data/ShortStay.csv'
stops_df = pd.read_csv(ssu_stopdata, parse_dates=['InRoomTS','OutRoomTS'])
stops_df.info() # Check out the structure of the resulting DataFrame

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59877 entries, 0 to 59876
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   PatID      59877 non-null  int64         
 1   InRoomTS   59877 non-null  datetime64[ns]
 2   OutRoomTS  59877 non-null  datetime64[ns]
 3   PatType    59877 non-null  object        
dtypes: datetime64[ns](2), int64(1), object(1)
memory usage: 1.8+ MB


In [4]:
stops_df.head()

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType
0,1,1996-01-01 07:44:00,1996-01-01 08:50:00,IVT
1,2,1996-01-01 08:28:00,1996-01-01 09:20:00,IVT
2,3,1996-01-01 11:44:00,1996-01-01 13:30:00,MYE
3,4,1996-01-01 11:51:00,1996-01-01 12:55:00,CAT
4,5,1996-01-01 12:10:00,1996-01-01 13:00:00,IVT


Create a new hills scenario

In [7]:
import hillmaker as hm
from hillmaker import hmlib as hmlib

In [8]:
# Required inputs
scenario_name = 'ss_example_1'
in_field_name = 'InRoomTS'
out_field_name = 'OutRoomTS'
start_date = '1996-01-01'
end_date = pd.Timestamp('9/30/1996')

# Optional inputs

cat_field_name = 'PatType'
verbosity = 1 # INFO level logging
output_path = './output'
bin_size_minutes = 60

In [9]:
hmlib.bin_of_analysis_range(np.datetime64('1996-01-01 12:30'), np.datetime64(start_date), bin_size_minutes)

12

In [18]:
scenario_1 = hm.Scenario(scenario_name=scenario_name, 
                         stops_df=stops_df,
                      in_field=in_field_name,
                      out_field=out_field_name,
                      start_analysis_dt=start_date,
                      end_analysis_dt=end_date,
                      cat_field=cat_field_name)

Need a pretty print method.

In [19]:
pprint(scenario_1.dict())

{'bin_size_minutes': 60,
 'cap': None,
 'cat_field': 'PatType',
 'cats_to_exclude': None,
 'edge_bins': <EdgeBinsEnum.FRACTIONAL: 1>,
 'end_analysis_dt': datetime.date(1996, 9, 30),
 'export_bydatetime_csv': True,
 'export_dow_plot': True,
 'export_summaries_csv': True,
 'export_week_plot': True,
 'hills': None,
 'in_field': 'InRoomTS',
 'make_dow_plot': True,
 'make_week_plot': True,
 'nonstationary_stats': True,
 'occ_weight_field': None,
 'out_field': 'OutRoomTS',
 'output_path': PosixPath('.'),
 'percentiles': (0.25, 0.5, 0.75, 0.95, 0.99),
 'scenario_name': 'ss_example_1',
 'start_analysis_dt': datetime.date(1996, 1, 1),
 'stationary_stats': True,
 'stops_df':        PatID            InRoomTS           OutRoomTS PatType
0          1 1996-01-01 07:44:00 1996-01-01 08:50:00     IVT
1          2 1996-01-01 08:28:00 1996-01-01 09:20:00     IVT
2          3 1996-01-01 11:44:00 1996-01-01 13:30:00     MYE
3          4 1996-01-01 11:51:00 1996-01-01 12:55:00     CAT
4          5 1996-01-

/tmp/ipykernel_133558/4280743258.py:1: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.1.1/migration/
  pprint(scenario_1.dict())


In [20]:
pprint(scenario_1)

Scenario(scenario_name='ss_example_1', stops_df=       PatID            InRoomTS           OutRoomTS PatType
0          1 1996-01-01 07:44:00 1996-01-01 08:50:00     IVT
1          2 1996-01-01 08:28:00 1996-01-01 09:20:00     IVT
2          3 1996-01-01 11:44:00 1996-01-01 13:30:00     MYE
3          4 1996-01-01 11:51:00 1996-01-01 12:55:00     CAT
4          5 1996-01-01 12:10:00 1996-01-01 13:00:00     IVT
...      ...                 ...                 ...     ...
59872  59873 1996-09-30 19:31:00 1996-09-30 20:15:00     IVT
59873  59874 1996-09-30 20:23:00 1996-09-30 21:30:00     IVT
59874  59875 1996-09-30 21:00:00 1996-09-30 22:45:00     CAT
59875  59876 1996-09-30 21:57:00 1996-09-30 22:40:00     IVT
59876  59877 1996-09-30 22:45:00 1996-09-30 23:35:00     CAT



In [21]:
scenario_1.bin_size_minutes

60

Now let's generate hills.

In [24]:
scenario_1.make_hills()

KeyError: "None of [DatetimeIndex(['NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT',\n               'NaT',\n               ...\n               'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT',\n               'NaT'],\n              dtype='datetime64[ns]', length=59877, freq=None)] are in the [columns]"

In [None]:
scenario_1.hills.keys()

In [None]:
pprint(scenario_1.hills['plots'].keys())

Seems like we'd want to be able to quickly view a specific plot

In [None]:
def get_plot(scenario_obj, flow_metric, day_of_week='week'):
    flow_metric_str = flow_metric # Possibly allow abbreviations as inputs and then convert to full metric name
    day_of_week_str = day_of_week # Again, need to decide on API to make easy on user
    plot_name = f'{scenario_obj.scenario_params.scenario_name}_{flow_metric_str}_plot_{day_of_week_str}'
    return scenario_obj.hills['plots'][plot_name]

In [None]:
get_plot(scenario_1, 'occupancy')

In [None]:
get_plot(scenario_1, 'occupancy', 'Wed')

I added `get_plot()` as a method to the `Scenario` class.

In [None]:
scenario_1.get_plot('occupancy')

In [None]:
scenario_1.get_plot('occ')

In [None]:
scenario_1.get_plot('departures')

In [None]:
scenario_1.hills['plots']['s202307251605_arrivals_plot_Tue']

In [None]:
scenario_1.hills['summaries'].keys()

In [None]:
scenario_1.hills['summaries']['nonstationary'].keys()

In [None]:
scenario_1.hills['summaries']['nonstationary']['dow_binofday'].keys()

In [None]:
scenario_1.hills['summaries']['nonstationary']['dow_binofday']['occupancy']

## Use case 2 - partition patient types into two holding areas
The hospital is considering sending some patient types to a new dedicated holding area. We want to be able to generate hillmaker outputs for various subsets of patients going to each of the two units.

In [None]:
stops_df['PatType'].unique()

In [None]:
a_subset = ['IVT', 'ART']

In [None]:
b_subset = [type for type in stops_df['PatType'].unique() if type not in a_subset]
b_subset

In [None]:
def which_holding_area(pat_type):
    if pat_type in a_subset:
        return 'unitA'
    else:
        return 'unitB'

In [None]:
stops_df['new_hold_area'] = stops_df['PatType'].map(lambda x: which_holding_area(x))

In [None]:
stops_df.head()

In [None]:
stops_df['los'] = stops_df['OutRoomTS'] - stops_df['InRoomTS']

In [None]:
scenario02 = hm.HillsScenario(stops_df = stops_df, scenario_name = 'scenario02',
                              in_field = in_field_name, out_field = out_field_name,
                              start_analysis_dt = start_date, end_analysis_dt = end_date,
                              cat_field = 'new_hold_area')

In [None]:
scenario02.make_hills()

In [None]:
scenario02.hills['plots'].keys()

In [None]:
scenario02.hills['plots']['scenario02_occupancy_plot_week']

As part of an operational analysis we would like to compute a number of relevant statistics, such as:

- mean and 95th percentile of overall SSU occupancy by hour of day and day of week,
- similar hourly statistics for patient arrivals and departures,
- all of the above but by patient type as well.

In addition to tabular summaries, plots are needed. Like this:

In [None]:
Image(filename="images/ssu-occ.png")

Hillmaker was designed for precisely this type of problem. In fact, the very first version of hillmaker was written for analyzing an SSU when the author was an undergraduate interning at a large health care system. That very first version was written in BASIC on a [DECwriter](https://en.wikipedia.org/wiki/DECwriter)!

In [None]:
Image(filename="images/DECwriter,_Tektronix,_PDP-11_(192826605).jpg")

<p align = "center">
<font size="-2">Source: By Wolfgang Stief from Tittmoning, Germany - DECwriter, Tektronix, PDP-11, CC0, https://commons.wikimedia.org/w/index.php?curid=105322423</font>
</p>

Over the years, hillmaker was migrated to [FoxPro](https://en.wikipedia.org/wiki/FoxPro), and then to MS Access where it [lived for many years](http://hillmaker.sourceforge.net/). In 2016, I [moved it to Python](https://misken.github.io/blog/hillmaker-python-released/).