# Hillmaker - basic usage

In this notebook we'll focus on basic use of hillmaker for analyzing arrivals, departures, and occupancy by time of day and day of week for a typical *discrete entity flow system*. A few examples of such systems include:

- patients arriving, undergoing some sort of care process and departing some healthcare system (e.g. emergency department, surgical recovery, nursing unit, outpatient clinic, and many more)
- customers renting, using, and returning bikes in a bike share system,
- users of licensed software checking out, using, checking back in a software license,
- products undergoing some sort of manufacturing or assembly process - occupancy is WIP,
- patrons arriving, dining and leaving a restaurant,
- travelers renting, residing in, and checking out of a hotel,
- flights taking off and arriving at their destination,
- ...

Basically, any sort of discrete [stock and flow system](https://en.wikipedia.org/wiki/Stock_and_flow) for which you are interested in time of day and day of week specific statistical summaries of occupancy, arrivals and departures, and have raw data on the arrival and departure times, is fair game for hillmaker.

## A prototypical example of a hillmaker use case

Patients flow through a short stay unit for a variety of procedures, tests or therapies. Let's assume patients can be classified into one of five categories of patient types: ART (arterialgram), CAT (post cardiac-cath), MYE (myelogram), IVT (IV therapy), and OTH (other). From one of our hospital information systems we were able to get raw data about the entry and exit times of each patient and exported the data to a csv file. Let's take a peek at the data.

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import pandas as pd

ssu_stopdata = './input/ShortStay.csv'
stops_df = pd.read_csv(ssu_stopdata, parse_dates=['InRoomTS','OutRoomTS'])
stops_df.info() # Check out the structure of the resulting DataFrame

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59877 entries, 0 to 59876
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   PatID      59877 non-null  int64         
 1   InRoomTS   59877 non-null  datetime64[ns]
 2   OutRoomTS  59877 non-null  datetime64[ns]
 3   PatType    59877 non-null  object        
dtypes: datetime64[ns](2), int64(1), object(1)
memory usage: 1.8+ MB


In [4]:
stops_df.head()

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType
0,1,1996-01-01 07:44:00,1996-01-01 08:50:00,IVT
1,2,1996-01-01 08:28:00,1996-01-01 09:20:00,IVT
2,3,1996-01-01 11:44:00,1996-01-01 13:30:00,MYE
3,4,1996-01-01 11:51:00,1996-01-01 12:55:00,CAT
4,5,1996-01-01 12:10:00,1996-01-01 13:00:00,IVT


As part of an operational analysis we would like to compute a number of relevant statistics, such as:

- mean and 95th percentile of overall SSU occupancy by hour of day and day of week,
- similar hourly statistics for patient arrivals and departures,
- all of the above but by patient type as well.

In addition to tabular summaries, plots are needed. Like this:

![SSU occupancy plot](images/ssu-occ.png)

Hillmaker was designed for precisely this type of problem. In fact, the very first version of hillmaker was written for analyzing an SSU when the author was an undergraduate interning at a large health care system. That very first version was written in BASIC on a [DECwriter](https://en.wikipedia.org/wiki/DECwriter)!

![DECwriter](images/DECwriter,_Tektronix,_PDP-11_(192826605).jpg)
<p align = "center">
<font size="-2">Source: By Wolfgang Stief from Tittmoning, Germany - DECwriter, Tektronix, PDP-11, CC0, https://commons.wikimedia.org/w/index.php?curid=105322423</font>
</p>

Over the years, hillmaker was migrated to [FoxPro](https://en.wikipedia.org/wiki/FoxPro), and then to MS Access where it [lived for many years](http://hillmaker.sourceforge.net/). In 2016, I [moved it to Python](https://misken.github.io/blog/hillmaker-python-released/).

## Current (2022-06-28) status of code

Version 0.4.0 was just released and is available on [PyPI](https://pypi.org/project/hillmaker/) or source from https://github.com/misken/hillmaker. This version is much faster than previous versions (thank you numpy) and includes a CLI, flow conservation checks and better logging. It does however revert back to only allowing a single category field (multiple category fields can easily be handled by constructing composite category strings). You can read more about this latest release of Hillmaker at [https://misken.github.io/blog/hillmaker-python-released/](https://misken.github.io/blog/hillmaker-python-released/). It's free and open source.

Hillmaker is implemented as an importable Python module and as a runnable script with a simple CLI. This new version of Hillmaker is still in what I'd call a pre-release state. The output does match the Access version for the ShortStay database that I included in the original Hillmaker. I've been actively using it to process thousands of simulation output log files as part of a research project on OB patient flow. More testing is needed before I release it publicly as version 1.0, but it does appear to be doing its primary job correctly. Please open an issue on GitHub if you think it's computing something incorrectly. Before using for any real project work, you should do your own testing to confirm that it is working appropriately for your needs. Use at your own risk - see [LICENSE file in GitHub](https://github.com/misken/hillmaker/blob/master/LICENSE).

## User interface plans
Over the years, I (and many others) have used the old Hillmaker in a variety of ways, including:

- MS Access form based GUI
- run main Hillmaker sub from Access VBA Immediate Window
- run Hillmaker main sub (and/or components subs) via custom VBA procedures

I'd like users to be able to use the new Python based version in a number of different ways as well. As I'll show in this Jupyter notebook, it can be used by importing the `hillmaker` module and then calling Hillmaker functions via:

- a Jupyter notebook (or any Python terminal such as an IPython shell or QT console, or IDLE)
- a Python script with the input arguments set and passed via Python statements

### A CLI

While these two options provide tons of flexibility for power users, I have also added a CLI. The CLI is demo'd in this notebook as well.

### A GUI for hillmaker
This is uncharted territory for me. Python has [a number of frameworks/toolkits for creating GUI apps](https://wiki.python.org/moin/GuiProgramming). This is not the highest priority for me but I do plan on creating a GUI for Hillmaker. If anyone wants to help with this, let me know.



## Installing Hillmaker

Whereas the old Hillmaker required MS Access, the new one requires an installation of Python 3.7+ along with several Python modules that are widely used for analytics and data science work. The free and open source [Anaconda Distribution of Python](https://www.anaconda.com/products/distribution) is a great way to get started with Python for analytics work. It is available on all platforms. Once you've got a working version of Python, you can install into a Python (or Conda) virtual environment with:

```
pip install hillmaker
```

This latest version (0.4.0) should be available on `conda-forge` soon.

Obviously if you are comfortable working with source code, you can also install `hillmaker` from its GitHub repo or a clone/fork of it.

## Module imports
To run Hillmaker we only need to import a few modules. Since the main hillmaker function uses pandas DataFrames for both data input and output, we need to import `pandas` in addition to `hillmaker`.

In [5]:
#import pandas as pd
import hillmaker as hm

## Read main stop data file
Here's the first few lines from our csv file containing the patient stop data:

    PatID,InRoomTS,OutRoomTS,PatType
    1,1/1/1996 7:44,1/1/1996 8:50,IVT
    2,1/1/1996 8:28,1/1/1996 9:20,IVT
    3,1/1/1996 11:44,1/1/1996 13:30,MYE
    4,1/1/1996 11:51,1/1/1996 12:55,CAT
    5,1/1/1996 12:10,1/1/1996 13:00,IVT
    6,1/1/1996 14:16,1/1/1996 15:35,IVT
    7,1/1/1996 14:40,1/1/1996 15:25,IVT


We have already read this data into a pandas DataFrame named `stops_df`. Each record is a "stop" at the SSU.

Check out the top and bottom of `stops_df`. 

In [6]:
stops_df.head(7)

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType
0,1,1996-01-01 07:44:00,1996-01-01 08:50:00,IVT
1,2,1996-01-01 08:28:00,1996-01-01 09:20:00,IVT
2,3,1996-01-01 11:44:00,1996-01-01 13:30:00,MYE
3,4,1996-01-01 11:51:00,1996-01-01 12:55:00,CAT
4,5,1996-01-01 12:10:00,1996-01-01 13:00:00,IVT
5,6,1996-01-01 14:16:00,1996-01-01 15:35:00,IVT
6,7,1996-01-01 14:40:00,1996-01-01 15:25:00,IVT


In [7]:
stops_df.tail(7)

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType
59870,59871,1996-09-30 19:20:00,1996-09-30 20:20:00,IVT
59871,59872,1996-09-30 19:26:00,1996-09-30 21:05:00,CAT
59872,59873,1996-09-30 19:31:00,1996-09-30 20:15:00,IVT
59873,59874,1996-09-30 20:23:00,1996-09-30 21:30:00,IVT
59874,59875,1996-09-30 21:00:00,1996-09-30 22:45:00,CAT
59875,59876,1996-09-30 21:57:00,1996-09-30 22:40:00,IVT
59876,59877,1996-09-30 22:45:00,1996-09-30 23:35:00,CAT


Let's compute some basic summary statistics such as the earliest and latest arrival and departure as well as counts by patient type.

In [8]:
print(f'Earliest arrival = {stops_df["InRoomTS"].min()}')
print(f'Latest departure = {stops_df["OutRoomTS"].max()}')

Earliest arrival = 1996-01-01 07:44:00
Latest departure = 1996-09-30 23:35:00


In [9]:
stops_df.groupby(['PatType'])['PatID'].count()

PatType
ART     5761
CAT    10692
IVT    33179
MYE     6478
OTH     3767
Name: PatID, dtype: int64

Let's get a sense of the number of patient visits by month.

In [10]:
stops_df['InRoomTS'].groupby(stops_df.InRoomTS.dt.to_period("M")).agg('count')

InRoomTS
1996-01    6802
1996-02    6371
1996-03    6628
1996-04    6778
1996-05    6982
1996-06    6580
1996-07    6719
1996-08    6935
1996-09    6082
Freq: M, Name: InRoomTS, dtype: int64

You probably want to do some length of stay analysis, so let's compute it in hours and then do `describe` by patient type.

In [11]:
stops_df['LOS'] = (stops_df['OutRoomTS'] - stops_df['InRoomTS']) / pd.Timedelta(1, "h")

In [12]:
stops_df.groupby(['PatType'])['LOS'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
PatType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
ART,5761.0,1.479483,0.375888,0.533333,1.183333,1.416667,1.75,2.483333
CAT,10692.0,1.043914,0.567298,0.066667,0.666667,0.983333,1.25,5.983333
IVT,33179.0,1.147418,0.7026,0.0,0.733333,1.0,1.4,10.916667
MYE,6478.0,1.429021,0.397153,0.533333,1.133333,1.366667,1.7,2.483333
OTH,3767.0,1.502048,0.349509,1.016667,1.2,1.45,1.75,2.483333


No obvious problems. We'll assume the data was all read in correctly.

## Creating occupancy summaries
The primary function in hillmaker is called `make_hills` and plays the same role as the `Hillmaker` function in the original Access VBA version of Hillmaker. Let's get a little help on this function.

In [13]:
help(hm.make_hills)

Help on function make_hills in module hillmaker.hills:

make_hills(scenario_name, stops_df, in_field, out_field, start_analysis_dt, end_analysis_dt, cat_field=None, bin_size_minutes=60, percentiles=(0.25, 0.5, 0.75, 0.95, 0.99), cat_to_exclude=None, occ_weight_field=None, totals=1, nonstationary_stats=True, stationary_stats=True, export_bydatetime_csv=True, export_summaries_csv=True, export_path=PosixPath('.'), edge_bins=1, verbose=0)
    Compute occupancy, arrival, and departure statistics by category, time bin of day and day of week.
    
    Main function that first calls `bydatetime.make_bydatetime` to calculate occupancy, arrival
    and departure values by date by time bin and then calls `summarize.summarize`
    to compute the summary statistics.
    
    Parameters
    ----------
    
    scenario_name : str
        Used in output filenames
    stops_df : DataFrame
        Base data containing one row per visit
    in_field : str
        Column name corresponding to the arrival

Most of the parameters are similar to those in the original VBA version, though a few new ones have been added. For example, the `cat_to_exclude` parameter allows you to specify a list of category values for which you do not want occupancy statistics computed. Also, since the VBA version used an Access database as the container for its output, new parameters were added to control output to csv files instead.

### Example 1: 60 minute bins, all categories, export to csv
Specify values for all the required inputs:

In [14]:
# Required inputs
scenario = 'ss_example_1'
in_fld_name = 'InRoomTS'
out_fld_name = 'OutRoomTS'
cat_fld_name = 'PatType'
start = '1/1/1996'
end = '9/30/1996 23:45'

# Optional inputs
verbose = 1 # INFO level logging


Now we'll call the main `make_hills` function. We won't capture the return values but will simply take the default behavior of having the summaries exported to csv files. You'll see that the filenames will contain the scenario value.

In [20]:
results = hm.make_hills(scenario, stops_df, in_fld_name, out_fld_name, start, end, cat_fld_name, verbose=verbose)

2022-07-07 11:31:54,829 - hillmaker.bydatetime - INFO - min of intime: 1996-01-01 07:44:00
2022-07-07 11:31:54,830 - hillmaker.bydatetime - INFO - max of intime: 1996-09-30 22:45:00
2022-07-07 11:31:54,831 - hillmaker.bydatetime - INFO - min of outtime: 1996-01-01 08:50:00
2022-07-07 11:31:54,832 - hillmaker.bydatetime - INFO - max of outtime: 1996-09-30 23:35:00
2022-07-07 11:31:56,157 - hillmaker.bydatetime - INFO - cat IVT {'inner': 33179}
2022-07-07 11:31:56,218 - hillmaker.bydatetime - INFO - cat IVT num_arrivals_hm 33179 num_arrivals_stops 33179
2022-07-07 11:31:56,218 - hillmaker.bydatetime - INFO - cat IVT num_departures_hm 33179 num_departures_stops 33179
2022-07-07 11:31:56,220 - hillmaker.bydatetime - INFO - cat IVT tot_occ_hm 38070.18 tot_occ_stops 38070.18
2022-07-07 11:31:56,506 - hillmaker.bydatetime - INFO - cat MYE {'inner': 6478}
2022-07-07 11:31:56,519 - hillmaker.bydatetime - INFO - cat MYE num_arrivals_hm 6478 num_arrivals_stops 6478
2022-07-07 11:31:56,520 - hillm

In [15]:
!conda list

# packages in environment at /home/mark/anaconda3/envs/hm2_p37:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
backcall                  0.2.0              pyhd3eb1b0_0  
blas                      1.0                         mkl  
bottleneck                1.3.4            py37hce1f21e_0  
brotli                    1.0.9                he6710b0_2  
ca-certificates           2022.4.26            h06a4308_0  
certifi                   2022.6.15        py37h06a4308_0  
cycler                    0.11.0             pyhd3eb1b0_0  
dbus                      1.13.18              hb2f20db_0  
debugpy                   1.5.1            py37h295c915_0  
decorator                 5.1.1              pyhd3eb1b0_0  
entrypoints               0.4              py37h06a4308_0  
expat                     2.4.4                h295c915_0  
fontconfig                2

In [15]:
!hillmaker cli_test_ssu input/ShortStay.csv InRoomTS OutRoomTS 1996-01-01 1996-09-30 --cat_field PatType --verbose 1

2022-07-07 11:14:45,247 - hillmaker.bydatetime - INFO - min of intime: 1996-01-01 07:44:00
2022-07-07 11:14:45,247 - hillmaker.bydatetime - INFO - max of intime: 1996-09-29 19:51:00
2022-07-07 11:14:45,247 - hillmaker.bydatetime - INFO - min of outtime: 1996-01-01 08:50:00
2022-07-07 11:14:45,247 - hillmaker.bydatetime - INFO - max of outtime: 1996-09-29 20:15:00
2022-07-07 11:14:46,320 - hillmaker.bydatetime - INFO - cat IVT {'inner': 33019}
2022-07-07 11:14:46,372 - hillmaker.bydatetime - INFO - cat IVT num_arrivals_hm 33019 num_arrivals_stops 33019
2022-07-07 11:14:46,372 - hillmaker.bydatetime - INFO - cat IVT num_departures_hm 33019 num_departures_stops 33019
2022-07-07 11:14:46,373 - hillmaker.bydatetime - INFO - cat IVT tot_occ_hm 37871.13 tot_occ_stops 37871.13
2022-07-07 11:14:46,590 - hillmaker.bydatetime - INFO - cat MYE {'inner': 6447}
2022-07-07 11:14:46,600 - hillmaker.bydatetime - INFO - cat MYE num_arrivals_hm 6447 num_arrivals_stops 6447
2022-07-07 11:14:46,600 - hillm

Here's a screenshot of the current folder containing this IPython notebook (**basic_usage_shortstay_unit.ipynb**) and the csv files created by Hillmaker. 

![folder with output csv files](example_1_files.png)

If you've used the previous version of Hillmaker, you'll recognize these files. A few more statistics have been added, but otherwise they are the same. These csv files can be imported into a spreadsheet application for plot creation. Of course, we can also make plots in Python. We'll do that in the next example. 

![folder with output csv files](example_1_occ.png)

The files with 'cat' in their name are new. They contain summary overall summary statistics by category. In other words, they are NOT by time of day and day of week.

![folder with output csv files](example_1_occ_cat.png)

### Example 2: 30 minute bins, only CAT and IVT, return values to DataFrames

In [None]:
# Required inputs - same as Example 1 except for scenario name
scenario = 'ss_example_2'
in_fld_name = 'InRoomTS'
out_fld_name = 'OutRoomTS'
cat_fld_name = 'PatType'
start = '1/1/1996'
end = '3/30/1996 23:45'

# Optional inputs
tot_fld_name = 'CAT_IVT' # Just to make it clear that it's only these patient types
bin_mins = 30 # Half-hour time bins
exclude = ['ART','MYE','OTH'] # Tell Hillmaker to ignore these patient types


Now we'll call `make_hills` and tuck the results (a dictionary of DataFrames) into a local variable. Then we can explore them a bit with Pandas.

In [None]:
results_ex2 = hm.make_hills(scenario, stops_df, in_fld_name, out_fld_name, start, end, cat_fld_name,
                            total_str=tot_fld_name, bin_size_minutes=bin_mins, 
                            cat_to_exclude=exclude, return_dataframes=True)

In [None]:
results_ex2.keys()

In [None]:
occ_df = results_ex2['occupancy']

In [None]:
occ_df.head()

In [None]:
occ_df.tail()

In [None]:
occ_df.info()

### Example 3 - Running via a Python script
Of course, you don't have to run Python statements through an IPython notebook. You can simply create a short Python script and run that directly in a terminal. An example, `test_shortstay.py`, can be found in the `scripts` subfolder of the hillmaker-examples project. Here's what it looks like - you can modify as necessary for your needs. There is another example in that folder as well, `test_obsim_log.py`, that is slightly more complex in that the input data has raw simulation times (i.e. minutes past t=0) and we need to do some datetime math to turn them into calendar based inputs.

In [None]:
import pandas as pd

import hillmaker as hm

file_stopdata = '../data/ShortStay.csv'

# Required inputs
scenario = 'sstest_60'
in_fld_name = 'InRoomTS'
out_fld_name = 'OutRoomTS'
cat_fld_name = 'PatType'
start = '1/1/1996'
end = '3/30/1996 23:45'

# Optional inputs
tot_fld_name = 'SSU'
bin_mins = 60


df = pd.read_csv(file_stopdata, parse_dates=[in_fld_name, out_fld_name])

hm.make_hills(scenario, df, in_fld_name, out_fld_name,
                     start, end, cat_fld_name,
                     tot_fld_name, bin_mins,
                     cat_to_exclude=None,
                     verbose=1)

More elaborate versions of scripts like `test_shortstay.py` can be envisioned. For example, an entire folder of input data files could be processed by simple enclosing the `hm.make_hills` call inside a loop over the collection of input files:

In [None]:
for log_fn in glob.glob('logs/*.csv'):
    
        # Read the log file and filter by included categories
        stops_df = pd.read_csv(log_fn, parse_dates=[in_fld_name, out_fld_name])
        
        hm.make_hills(scenario, df, in_fld_name, out_fld_name, start, end, cat_fld_name)
        ...