# Using hillmaker (v0.2.0)

In this notebook we'll focus on basic use of hillmaker for analyzing occupancy in a typical hospital setting. The data is fictitious data from a hospital short stay unit (SSU). Patients flow through a SSU for a variety of procedures, tests or therapies. Let's assume patients can be classified into one of five categories of patient types: ART (arterialgram), CAT (post cardiac-cath), MYE (myelogram), IVT (IV therapy), and OTH (other). In addition, patients are given a severity score of 1 or 2 which is related to the amount of time required in hte SSU and the level of resources required. From one of our hospital information systems we were able to get raw data about the entry and exit times of each patient along with their patient type and severity values. For simplicity, the data is in a csv file. We are interested in occupancy statistics (e.g. mean, standard deviation, percentiles) by time of day and by day of week. While overall occupancy statistics are important, we are also interested in occupancy statistics for different patient types and severity levels. Since we also are interested in required staffing for this unit, we'll also use hillmaker to analyze workload levels.

This example assumes you are already familiar with statistical occupancy analysis using the old version of [Hillmaker](http://hillmaker.sourceforge.net/) or some similar such tool. It also assumes some knowledge of using Python for analytical work.

The following blog posts are helpful if you are not familiar with occupancy analysis:

* [New version of hillmaker (finally) released - and it's Python ](http://hselab.org/hillmaker-python-released.html)
* [Using hillmaker from R with reticulate to analyze time of day patterns in bike share data ](http://hselab.org/r_hillmaker_reticulate.html)
* [Computing occupancy statistics with Python - Part 1 of 3](http://nbviewer.ipython.org/github/misken/hselab-tutorials/blob/master/hillpy_bydate_demo.ipynb)
* [Computing occupancy statistics with Python - Part 2 of 3](http://nbviewer.ipython.org/github/misken/hselab-tutorials/blob/master/hillpy_occstats_demo.ipynb)

## Current status of code
The new hillmaker is implemented as a Python module which can be used by importing `hillmaker` and then calling the main hillmaker function, `make_hills()` (or any component function included in the module).  This new version of hillmaker is in what I'd call an alpha state. The output does match the Access version for the ShortStay database that I included in the original Hillmaker. Use at your own risk.

It is licensed under an [Apache 2.0 license](http://www.apache.org/licenses/LICENSE-2.0). It is a widely used permissive free software license. See https://en.wikipedia.org/wiki/Apache_License for additional information.



# Getting Started
In order to use hillmaker, the major steps are:

* make sure you have Python and necessary packages installed,
* download and install hillmaker,
* load hillmaker and start using it from either a Jupyter notebook, Python terminal or Python script.

I'll go through each of these in more detail. As a big part of the audience for this post is former users of the MS Access version of Hillmaker using the Windows OS, many of whom have little experience with tools like Python, I'll try to make the transition as easy as possible.

## Dependencies
Whereas the old Hillmaker required MS Access, the new one requires an installation of 
Python 3 (3.7+) along 
with several Python modules that are widely used for analytics and data science work. 

Most importantly, hillmaker 0.2.0 requires pandas 1.0.0 or later.

### Getting Python and many analytical packages via Anaconda
An very easy way to get Python 3 pre-configured with tons of analytical Python packages is to use the Anaconda distro for Python. From their [Downloads page](https://www.continuum.io/downloads):

> Anaconda is a completely free Python distribution (including for commercial use and redistribution). 
> It includes more than 300 of the most popular Python packages for science, math, engineering, and 
> data analysis. See the packages included with Anaconda and the Anaconda changelog.
    
There are several really nice reasons to use the Anaconda Python distro for data science work:

- it comes preconfigured with hundreds of the most popular data science Python packages installed and they just work
- large community of Anaconda data science users and vibrant user community on places like StackOverflow
- it has a companion package manager called Conda which makes it easy to install new packages as well as to create and manage virtual environments

If you use Anaconda, you already have all of the necessary libraries for using hillmaker other than hillmaker itself.

### Getting Hillmaker

Since 2016, hillmaker has been freely available from the Python Package Index known as [PyPi](https://pypi.python.org/pypi) as well as [Anaconda Cloud](http://anaconda.org/). They are similar to CRAN for R. Source code is also be available from my GitHub site https://github.com/misken/hillmaker and it is an open-source project. If you work with Python, you should know a little bit about [Python package installation](https://docs.python.org/3/installing/). There is already a companion project on GitHub called `hillmaker-examples` which contains, well, examples of hillmaker use cases. 

### Installing Hillmaker

You can use either `pip` or `conda` to install hillmaker. I suggest learning about Python virtual environments and either using `pyenv`, `virtualenv` or `conda` (preferred) to create a Python virtual environment and then install hillmaker into it. This way you avoid mixing developmental third-party packages like hillmaker with your base Anaconda Python environment. 


#### Step 1 - Open a terminal and install using Conda or Pip

To install using  `conda`:

```sh
conda install -c https://conda.anaconda.org/hselab hillmaker
```

OR

To install using  `pip`:

```sh
pip install hillmaker
```

#### Step 2 - Confirm that hillmaker was installed

Use the `conda list` command to see all the installed packages in your Anaconda3 root.

```sh
conda list
```

You should see hillmaker in the listing.

#### Step 3 - Confirm that hillmaker can be loaded

Now fire up a Python session (just type python at a Linux/Mac shell or a Windows Anaconda command prompt) and try:

    import hillmaker as hm 

If the install went well, you shouldn't get any errors when you import hillmaker. To see the main help docstring, do the following at your Python prompt:

    help(hm.make_hills)

## Using hillmaker
The rest of this Jupyter notebook will illustrate a few ways to use the `hillmaker` package to analyze occupancy in our SSU.

### Module imports
To run Hillmaker we only need to import a few modules. Since the main Hillmaker function uses Pandas DataFrames for both data input and output, we need to import `pandas` in addition to `hillmaker`.

In [20]:
import pandas as pd
import hillmaker as hm

### Read main data file containing patient visits to short stay unit
Here's the first few lines from our csv file containing the patient stop data:

    PatID,InRoomTS,OutRoomTS,PatType,Severity,PatTypeSeverity
    1,01/01/96 07:44 AM,01/01/96 08:50 AM,IVT,1,IVT_1
    2,01/01/96 08:28 AM,01/01/96 09:20 AM,IVT,1,IVT_1
    3,01/01/96 11:44 AM,01/01/96 01:30 PM,MYE,1,MYE_1
    4,01/01/96 11:51 AM,01/01/96 12:55 PM,CAT,1,CAT_1
    5,01/01/96 12:10 PM,01/01/96 01:00 PM,IVT,2,IVT_2


Read the short stay data from a csv file into a DataFrame and tell Pandas which fields to treat as dates. 

In [21]:
file_stopdata = '../data/ShortStay2.csv'
stops_df = pd.read_csv(file_stopdata, parse_dates=['InRoomTS','OutRoomTS'])
stops_df.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59877 entries, 0 to 59876
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   PatID            59877 non-null  int64         
 1   InRoomTS         59877 non-null  datetime64[ns]
 2   OutRoomTS        59877 non-null  datetime64[ns]
 3   PatType          59877 non-null  object        
 4   Severity         59877 non-null  int64         
 5   PatTypeSeverity  59877 non-null  object        
dtypes: datetime64[ns](2), int64(2), object(2)
memory usage: 2.7+ MB


Check out the top and bottom of `stops_df`. 

In [22]:
stops_df.head(7)

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType,Severity,PatTypeSeverity
0,1,1996-01-01 07:44:00,1996-01-01 08:50:00,IVT,1,IVT_1
1,2,1996-01-01 08:28:00,1996-01-01 09:20:00,IVT,1,IVT_1
2,3,1996-01-01 11:44:00,1996-01-01 13:30:00,MYE,1,MYE_1
3,4,1996-01-01 11:51:00,1996-01-01 12:55:00,CAT,1,CAT_1
4,5,1996-01-01 12:10:00,1996-01-01 13:00:00,IVT,2,IVT_2
5,6,1996-01-01 14:16:00,1996-01-01 15:35:00,IVT,2,IVT_2
6,7,1996-01-01 14:40:00,1996-01-01 15:25:00,IVT,2,IVT_2


In [23]:
stops_df.tail(5)

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType,Severity,PatTypeSeverity
59872,59873,1996-09-30 19:31:00,1996-09-30 20:15:00,IVT,1,IVT_1
59873,59874,1996-09-30 20:23:00,1996-09-30 21:30:00,IVT,2,IVT_2
59874,59875,1996-09-30 21:00:00,1996-09-30 22:45:00,CAT,1,CAT_1
59875,59876,1996-09-30 21:57:00,1996-09-30 22:40:00,IVT,2,IVT_2
59876,59877,1996-09-30 22:45:00,1996-09-30 23:35:00,CAT,1,CAT_1


## Enhancement to handle multiple categorical fields

Notice that the `PatType` field are strings while `Severity` is integer data. In the previous version of hillmaker (v0.1.1), you could only specify a single category field and it needed to be of type string. So, to compute occupancy statistics by `Severity` required some data wrangling (convert int to string) and to analyze occupancy by `PatType` and `Severity` required further wrangling to concatenate the two fields into a single field that we could feed to hillmaker. Note in the output above that I've included an example of such a concatenation just for illustration purposes. 

In this latest version, you can specify zero or more categorical fields which can either be string or integer data types. There is no need to create a concatenated version such as the `PatTypeSeverity` field above. We'll see that you also have finer control over category field subtotaling.

Let's do some counts of patients by the two categorical fields.

In [24]:
stops_df.groupby('PatType')['PatID'].count()

PatType
ART     5761
CAT    10692
IVT    33179
MYE     6478
OTH     3767
Name: PatID, dtype: int64

In [25]:
stops_df.groupby('Severity')['PatID'].count()

Severity
1    23803
2    36074
Name: PatID, dtype: int64

No obvious problems. We'll assume the data was all read in correctly.

### Creating occupancy summaries
The primary function in Hillmaker is called `make_hills` and plays the same role as the `Hillmaker` function in the original Access VBA version of Hillmaker. Let's get a little help on this function.

In [26]:
help(hm.make_hills)

Help on function make_hills in module hillmaker.hills:

make_hills(scenario_name, stops_df, infield, outfield, start_analysis, end_analysis, catfield=None, bin_size_minutes=60, percentiles=(0.25, 0.5, 0.75, 0.95, 0.99), cat_to_exclude=None, occ_weight_field=None, totals=1, nonstationary_stats=True, stationary_stats=True, export_bydatetime_csv=True, export_summaries_csv=True, export_path='.', edge_bins=1, verbose=0)
    Compute occupancy, arrival, and departure statistics by time bin of day and day of week.
    
    Main function that first calls `bydatetime.make_bydatetime` to calculate occupancy, arrival
    and departure values by date by time bin and then calls `summarize.summarize`
    to compute the summary statistics.
    
    Parameters
    ----------
    
    scenario_name : string
        Used in output filenames
    stops_df : DataFrame
        Base data containing one row per visit
    infield : string
        Column name corresponding to the arrival times
    outfield : str

Most of the parameters are similar to those in the original VBA version, though a few new ones have been added. Since the VBA version used an Access database as the container for its output, new parameters were added to control output to csv files  and/or pandas DataFrames instead.

#### Example 1: 60 minute bins, PatientType and Severity, export to csv
Specify values for all the required inputs:

In [27]:
# Required inputs
scenario = 'example1'
in_fld_name = 'InRoomTS'
out_fld_name = 'OutRoomTS'
start = '1/1/1996'
end = '3/30/1996 23:45'

# Optional inputs
cat_fld_name = ['PatType', 'Severity']
verbose = 1
output = './output'


Now we'll call the main `make_hills` function. We won't capture the return values but will simply take the default behavior of having the summaries exported to csv files. You'll see that the filenames will contain the scenario value.

In [28]:
hm.make_hills(scenario, stops_df, in_fld_name, out_fld_name, start, end, 
              catfield=cat_fld_name, 
              export_path = output, verbose=verbose)

min of intime: 1996-01-01 07:44:00
max of intime: 1996-09-30 22:45:00
min of outtime: 1996-01-01 08:50:00
max of outtime: 1996-09-30 23:35:00
19795 stop records processed.
Datetime DataFrame created (seconds): 23.1437
Created nonstationary summaries - ['PatType', 'Severity']
Created nonstationary summaries - []
Created stationary summaries - ['PatType', 'Severity']
Created stationary summaries - []
Summaries by datetime created (seconds): 57.4191
By datetime exported to csv (seconds): 0.1106
Summaries exported to csv (seconds): 0.1095
Total time (seconds): 80.7830


{'bydatetime': {'PatType_Severity_datetime':                                       arrivals  departures  occupancy  \
  PatType Severity datetime                                               
  ART     1        1996-01-01 00:00:00       0.0         0.0        0.0   
                   1996-01-01 01:00:00       0.0         0.0        0.0   
                   1996-01-01 02:00:00       0.0         0.0        0.0   
                   1996-01-01 03:00:00       0.0         0.0        0.0   
                   1996-01-01 04:00:00       0.0         0.0        0.0   
  ...                                        ...         ...        ...   
  OTH     2        1996-03-30 19:00:00       0.0         0.0        0.0   
                   1996-03-30 20:00:00       0.0         0.0        0.0   
                   1996-03-30 21:00:00       0.0         0.0        0.0   
                   1996-03-30 22:00:00       0.0         0.0        0.0   
                   1996-03-30 23:00:00       0.0         

Let's list the contents of the output folder containing the csv files created by hillmaker.  For Windows users, the following is the Linux `ls` command. The leading exclamation point tells Jupyter that this is an operating system command. To list the files in Windows, the equivalent would be:

    !dir output\example1*.csv

In [29]:
!ls ./output/example1*.csv

./output/example1_arrivals.csv
./output/example1_arrivals_dow_binofday.csv
./output/example1_arrivals_PatType_Severity.csv
./output/example1_arrivals_PatType_Severity_dow_binofday.csv
./output/example1_bydatetime_datetime.csv
./output/example1_bydatetime_PatType_Severity_datetime.csv
./output/example1_departures.csv
./output/example1_departures_dow_binofday.csv
./output/example1_departures_PatType_Severity.csv
./output/example1_departures_PatType_Severity_dow_binofday.csv
./output/example1_occupancy.csv
./output/example1_occupancy_dow_binofday.csv
./output/example1_occupancy_PatType_Severity.csv
./output/example1_occupancy_PatType_Severity_dow_binofday.csv


There are three groups of statistical summary files related to arrivals, departures and occupancy. In addition, the intermediate "bydatetime" files are also included. The filenames indicate whether or not the statistics are by category we well as if they are by day of week and time of day. 

### Occupancy, arrival and departure summaries
Let's look at the occupancy summaries (the structure is identical for arrivals and departures.) Here's a peek into the middle of **example1_occupancy_PatType_Severity_dow_binofday.csv**.

In [30]:
pd.set_option('precision', 2)
pd.read_csv("./output/example1_occupancy_PatType_Severity_dow_binofday.csv").iloc[100:110]

Unnamed: 0,PatType,Severity,day_of_week,dow_name,bin_of_day,count,mean,min,max,stdev,sem,var,cv,skew,kurt,p25,p50,p75,p95,p99
100,ART,1,4,Friday,4,13.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
101,ART,1,4,Friday,5,13.0,0.02,0.0,0.18,0.05,0.01,0.00259,3.06,3.42,11.92,0.0,0.0,0.0,0.09,0.17
102,ART,1,4,Friday,6,13.0,0.85,0.0,2.5,0.72,0.2,0.524,0.85,1.07,0.96,0.37,0.63,1.15,2.13,2.43
103,ART,1,4,Friday,7,13.0,2.83,0.0,6.35,1.5,0.42,2.26,0.53,0.59,2.12,1.93,2.75,3.55,4.94,6.07
104,ART,1,4,Friday,8,13.0,2.72,0.0,4.18,1.28,0.35,1.63,0.47,-0.85,0.1,2.1,2.83,3.83,4.08,4.16
105,ART,1,4,Friday,9,13.0,1.89,0.13,3.15,0.87,0.24,0.753,0.46,-0.46,-0.28,1.15,1.83,2.67,2.96,3.11
106,ART,1,4,Friday,10,13.0,1.72,0.42,4.08,1.03,0.29,1.07,0.6,1.31,1.4,1.0,1.48,2.13,3.7,4.01
107,ART,1,4,Friday,11,13.0,2.07,0.05,3.35,1.1,0.31,1.22,0.53,-0.56,-0.77,1.35,2.25,3.02,3.35,3.35
108,ART,1,4,Friday,12,13.0,2.01,0.95,3.3,0.6,0.17,0.361,0.3,0.5,0.78,1.75,1.82,2.33,2.9,3.22
109,ART,1,4,Friday,13,13.0,1.6,0.1,3.57,0.93,0.26,0.863,0.58,0.5,0.47,1.08,1.68,2.0,3.08,3.47


Statistics by day and time but aggregated over all the categories are also available.

In [31]:
pd.read_csv("./output/example1_occupancy_dow_binofday.csv").iloc[20:40]

Unnamed: 0,day_of_week,dow_name,bin_of_day,count,mean,min,max,stdev,sem,var,cv,skew,kurt,p25,p50,p75,p95,p99
20,0,Monday,20,13.0,2.66,0.0,4.92,1.52,0.42,2.31,0.57,0.08,-0.76,1.67,2.63,3.67,4.92,4.92
21,0,Monday,21,13.0,1.69,0.0,3.47,1.08,0.3,1.16,0.64,-0.15,-0.76,0.83,1.98,2.18,3.24,3.42
22,0,Monday,22,13.0,0.83,0.0,1.98,0.72,0.2,0.52,0.87,0.24,-1.43,0.08,0.9,1.4,1.84,1.96
23,0,Monday,23,13.0,0.75,0.0,2.08,0.81,0.23,0.66,1.08,0.6,-1.22,0.0,0.42,1.13,2.06,2.08
24,1,Tuesday,0,13.0,0.37,0.0,1.42,0.45,0.12,0.2,1.21,1.14,0.83,0.0,0.25,0.58,1.12,1.36
25,1,Tuesday,1,13.0,0.15,0.0,0.75,0.26,0.07,0.07,1.76,1.58,1.26,0.0,0.0,0.17,0.6,0.72
26,1,Tuesday,2,13.0,0.05,0.0,0.67,0.18,0.05,0.03,3.61,3.61,13.0,0.0,0.0,0.0,0.27,0.59
27,1,Tuesday,3,13.0,0.07,0.0,0.73,0.2,0.06,0.04,3.06,3.42,11.92,0.0,0.0,0.0,0.37,0.66
28,1,Tuesday,4,13.0,0.11,0.0,0.58,0.22,0.06,0.05,2.03,1.84,1.98,0.0,0.0,0.0,0.58,0.58
29,1,Tuesday,5,13.0,0.29,0.0,1.3,0.44,0.12,0.19,1.51,1.37,0.95,0.0,0.0,0.58,1.07,1.25


For those files without "dow_binofday" in their name, the statistics are by category only.

In [32]:
pd.read_csv("./output/example1_occupancy_PatType_Severity.csv").head(20)

Unnamed: 0,PatType,Severity,count,mean,min,max,stdev,sem,var,cv,skew,kurt,p25,p50,p75,p95,p99
0,ART,1,2160.0,0.49,0.0,6.35,1.0,0.02,1.01,2.04,2.25,4.85,0.0,0.0,0.34,2.88,4.04
1,ART,2,2160.0,0.84,0.0,10.35,1.57,0.03,2.45,1.87,1.91,3.05,0.0,0.0,1.0,4.42,5.92
2,CAT,1,2160.0,0.72,0.0,7.67,1.12,0.02,1.26,1.56,1.95,4.05,0.0,0.0,1.05,3.2,4.84
3,CAT,2,2160.0,1.04,0.0,8.35,1.5,0.03,2.24,1.44,1.61,1.98,0.0,0.25,1.65,4.3,5.97
4,IVT,1,2160.0,2.32,0.0,16.42,3.33,0.07,11.09,1.43,1.43,1.06,0.0,0.42,3.95,9.43,12.01
5,IVT,2,2160.0,3.54,0.0,21.47,4.99,0.11,24.92,1.41,1.36,0.74,0.0,0.67,6.25,14.22,18.35
6,MYE,1,2160.0,0.55,0.0,5.83,1.01,0.02,1.03,1.85,2.06,3.76,0.0,0.0,0.75,2.92,4.08
7,MYE,2,2160.0,0.85,0.0,8.33,1.46,0.03,2.12,1.72,1.92,3.27,0.0,0.0,1.12,4.15,5.71
8,OTH,1,2160.0,0.34,0.0,6.27,0.78,0.02,0.61,2.27,2.95,10.42,0.0,0.0,0.07,2.0,3.4
9,OTH,2,2160.0,0.53,0.0,6.1,1.11,0.02,1.24,2.08,2.44,5.83,0.0,0.0,0.5,3.07,5.06


There's even a summary that aggregates over categories and time. Obviously, it contains a single row.

In [33]:
pd.read_csv("./output/example1_occupancy.csv")

Unnamed: 0,count,mean,min,max,stdev,sem,var,cv,skew,kurt,p25,p50,p75,p95,p99
0,2160.0,11.22,0.0,55.17,14.54,0.31,211.49,1.3,1.07,-0.26,0.33,2.0,22.25,41.03,47.46


### Intermediate bydatetime files
The intermediate tables used to compute the summaries we just looked at, are also available both by category and overall. Each row is a single time bin (e.g. date and hour of day). Note that the occupancy values are not necessarily integer since hillmaker's default behavior is to use fractional occupancy contributions for the bins in which the patient arrives and departs (e.g. if the patient arrived half-way through the time bin, they contribute 0.5 to total occupancy during that time bin). This behavior can be changed by specifying `edge_bins=2` when calling `make_hills`.

In [34]:
pd.read_csv("./output/example1_bydatetime_datetime.csv").iloc[100:125]

Unnamed: 0,datetime,arrivals,departures,occupancy,day_of_week,dow_name,bin_of_day,bin_of_week
100,1996-01-05 04:00:00,0.0,0.0,0.0,4,Friday,4,100
101,1996-01-05 05:00:00,4.0,0.0,1.03,4,Friday,5,101
102,1996-01-05 06:00:00,23.0,0.0,15.28,4,Friday,6,102
103,1996-01-05 07:00:00,18.0,13.0,30.38,4,Friday,7,103
104,1996-01-05 08:00:00,29.0,34.0,24.07,4,Friday,8,104
105,1996-01-05 09:00:00,37.0,18.0,37.98,4,Friday,9,105
106,1996-01-05 10:00:00,38.0,39.0,41.67,4,Friday,10,106
107,1996-01-05 11:00:00,37.0,32.0,46.93,4,Friday,11,107
108,1996-01-05 12:00:00,40.0,46.0,47.42,4,Friday,12,108
109,1996-01-05 13:00:00,28.0,33.0,44.47,4,Friday,13,109


In [35]:
pd.read_csv("./output/example1_bydatetime_PatType_Severity_datetime.csv").iloc[100:125]

Unnamed: 0,PatType,Severity,datetime,arrivals,departures,occupancy,day_of_week,dow_name,bin_of_day,bin_of_week
100,ART,1,1996-01-05 04:00:00,0.0,0.0,0.0,4,Friday,4,100
101,ART,1,1996-01-05 05:00:00,1.0,0.0,0.03,4,Friday,5,101
102,ART,1,1996-01-05 06:00:00,2.0,0.0,1.88,4,Friday,6,102
103,ART,1,1996-01-05 07:00:00,4.0,2.0,4.0,4,Friday,7,103
104,ART,1,1996-01-05 08:00:00,2.0,4.0,3.87,4,Friday,8,104
105,ART,1,1996-01-05 09:00:00,2.0,1.0,2.83,4,Friday,9,105
106,ART,1,1996-01-05 10:00:00,3.0,4.0,4.08,4,Friday,10,106
107,ART,1,1996-01-05 11:00:00,1.0,1.0,3.35,4,Friday,11,107
108,ART,1,1996-01-05 12:00:00,2.0,3.0,1.65,4,Friday,12,108
109,ART,1,1996-01-05 13:00:00,1.0,1.0,1.98,4,Friday,13,109


If you've used the previous version of Hillmaker, you'll recognize these files. The default behavior has changed to compute fewer percentiles but any percentiles you want can be computed by specifying them in the `percentiles` argument to `make_hills`. 

#### Example 2: Compute totals for individual category fields, select percentiles, output to DataFrames
We'll repeat the example above but use `totals=2` so that we get totals computed for each of the category fields in addition to overall totals. I'm also specifying a custom list of percentiles to compute. Instead of exporting CSV files, we'll capture the results as a dictionary of DataFrames.

In [52]:
# Required inputs
scenario = 'example2'
in_fld_name = 'InRoomTS'
out_fld_name = 'OutRoomTS'
start = '1/1/1996'
end = '3/30/1996 23:45'

# Optional inputs
cat_fld_name = ['PatType', 'Severity']
totals= 2
percentiles=[0.5, 0.95]
verbose = 0 # Silent mode
output = './output'
export_bydatetime_csv = True
export_summaries_csv = True


Now we'll call `make_hills` and tuck the results (a dictionary of DataFrames) into a local variable. Then we can explore them a bit with Pandas.

In [53]:
example2_dfs = hm.make_hills(scenario, stops_df, in_fld_name, out_fld_name, start, end, cat_fld_name, 
                             totals=totals, export_path=output, verbose=verbose,
                             export_bydatetime_csv=export_bydatetime_csv, 
                             export_summaries_csv=export_summaries_csv)

The `example2_dfs` return value is several nested dictionaries eventually leading to pandas DataFrames as values. Let's explore the key structure. It's pretty simple.

In [38]:
example2_dfs.keys()

dict_keys(['bydatetime', 'summaries'])

Let's explore the 'summaries' key first. As you might guess, this will eventually lead to the statistical summary DataFrames.

In [39]:
example2_dfs['summaries'].keys()

dict_keys(['nonstationary', 'stationary'])

In [40]:
example2_dfs['summaries']['nonstationary'].keys()

dict_keys(['PatType_Severity_dow_binofday', 'dow_binofday', 'PatType_dow_binofday', 'Severity_dow_binofday'])

In [41]:
example2_dfs['summaries']['nonstationary']['Severity_dow_binofday'].keys()

dict_keys(['occupancy', 'arrivals', 'departures'])

In [42]:
example2_dfs['summaries']['nonstationary']['Severity_dow_binofday']['occupancy']

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count,mean,min,max,stdev,sem,var,cv,skew,kurt,p25,p50,p75,p95,p99
Severity,day_of_week,dow_name,bin_of_day,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
1,0,Monday,0,13.0,0.21,0.0,0.67,0.26,0.07,0.07,1.22,0.63,-1.33,0.0,0.0,0.42,0.63,0.66
1,0,Monday,1,13.0,0.10,0.0,0.50,0.19,0.05,0.03,1.82,1.76,1.84,0.0,0.0,0.17,0.50,0.50
1,0,Monday,2,13.0,0.16,0.0,0.90,0.28,0.08,0.08,1.79,1.90,3.37,0.0,0.0,0.32,0.66,0.85
1,0,Monday,3,13.0,0.06,0.0,0.42,0.13,0.04,0.02,2.24,2.36,5.07,0.0,0.0,0.00,0.32,0.40
1,0,Monday,4,13.0,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.0,0.00,0.00,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2,6,Sunday,19,12.0,0.42,0.0,2.55,0.74,0.21,0.55,1.74,2.46,6.83,0.0,0.0,0.60,1.60,2.36
2,6,Sunday,20,12.0,0.17,0.0,0.83,0.28,0.08,0.08,1.66,1.57,1.68,0.0,0.0,0.30,0.65,0.80
2,6,Sunday,21,12.0,0.19,0.0,0.75,0.30,0.09,0.09,1.60,1.29,0.12,0.0,0.0,0.35,0.75,0.75
2,6,Sunday,22,12.0,0.21,0.0,1.33,0.42,0.12,0.18,2.05,2.18,4.44,0.0,0.0,0.10,0.99,1.27


The stationary summaries are similar except that there are no day of week and time bin of day related files.

Now let's look at the 'bydatetime' key at the top level. Yep, gonna lead to bydatetime DataFrames.

In [43]:
example2_dfs['bydatetime'].keys()

dict_keys(['PatType_Severity_datetime', 'datetime', 'PatType_datetime', 'Severity_datetime'])

In [44]:
example2_dfs['bydatetime']['PatType_Severity_datetime']

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,arrivals,departures,occupancy,day_of_week,dow_name,bin_of_day,bin_of_week
PatType,Severity,datetime,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ART,1,1996-01-01 00:00:00,0.0,0.0,0.0,0,Monday,0,0
ART,1,1996-01-01 01:00:00,0.0,0.0,0.0,0,Monday,1,1
ART,1,1996-01-01 02:00:00,0.0,0.0,0.0,0,Monday,2,2
ART,1,1996-01-01 03:00:00,0.0,0.0,0.0,0,Monday,3,3
ART,1,1996-01-01 04:00:00,0.0,0.0,0.0,0,Monday,4,4
...,...,...,...,...,...,...,...,...,...
OTH,2,1996-03-30 19:00:00,0.0,0.0,0.0,5,Saturday,19,139
OTH,2,1996-03-30 20:00:00,0.0,0.0,0.0,5,Saturday,20,140
OTH,2,1996-03-30 21:00:00,0.0,0.0,0.0,5,Saturday,21,141
OTH,2,1996-03-30 22:00:00,0.0,0.0,0.0,5,Saturday,22,142


#### Example 3 - Workload hills instead of occupancy
Assume that we are doing a staffing analysis and want to look at the distribution of workload by time of day and day of week. In order to translate patients to workload, we'll use simple staff to patient ratios based on severity. For example, let's assume that for `Severity=1` we want to have a 1:4 staff to patient ratio and for `Severity=2` we need a 1:2 ratio. Let's create a new field called `workload` using these ratios.

In [45]:
severity_to_workload = {'1':0.25, '2':0.5}
stops_df['workload'] = stops_df['Severity'].map(lambda x: severity_to_workload[str(x)])

In [46]:
stops_df.head(10)

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType,Severity,PatTypeSeverity,workload
0,1,1996-01-01 07:44:00,1996-01-01 08:50:00,IVT,1,IVT_1,0.25
1,2,1996-01-01 08:28:00,1996-01-01 09:20:00,IVT,1,IVT_1,0.25
2,3,1996-01-01 11:44:00,1996-01-01 13:30:00,MYE,1,MYE_1,0.25
3,4,1996-01-01 11:51:00,1996-01-01 12:55:00,CAT,1,CAT_1,0.25
4,5,1996-01-01 12:10:00,1996-01-01 13:00:00,IVT,2,IVT_2,0.5
5,6,1996-01-01 14:16:00,1996-01-01 15:35:00,IVT,2,IVT_2,0.5
6,7,1996-01-01 14:40:00,1996-01-01 15:25:00,IVT,2,IVT_2,0.5
7,8,1996-01-01 17:25:00,1996-01-01 19:00:00,CAT,2,CAT_2,0.5
8,9,1996-01-02 06:17:00,1996-01-02 08:25:00,MYE,1,MYE_1,0.25
9,10,1996-01-02 06:35:00,1996-01-02 08:30:00,ART,2,ART_2,0.5


Now we can create workload hills. I'm just going to compute overall workload by not specifiying a category field. Notice the use of the `occ_weight_field` argument.

In [54]:
# Required inputs
scenario = 'example3'
in_fld_name = 'InRoomTS'
out_fld_name = 'OutRoomTS'
start = '1/1/1996'
end = '3/30/1996 23:45'

# Optional inputs
occ_weight_field = 'workload'
verbose = 0
output = './output'

In [55]:
example3_dfs = hm.make_hills(scenario, stops_df, in_fld_name, out_fld_name, start, end, 
              occ_weight_field=occ_weight_field, 
              export_path = output, verbose=verbose)

In [50]:
example2_dfs['summaries']['stationary']['Severity']['occupancy']

Unnamed: 0_level_0,count,mean,min,max,stdev,sem,var,cv,skew,kurt,p25,p50,p75,p95,p99
Severity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1,2160.0,4.43,0.0,26.3,5.89,0.13,34.69,1.33,1.2,0.25,0.0,0.92,8.39,16.35,20.63
2,2160.0,6.8,0.0,36.6,8.94,0.19,79.92,1.32,1.13,-0.04,0.0,1.33,13.06,25.08,30.18


In [61]:
example3_dfs['summaries']['stationary']['']['occupancy']

Unnamed: 0,count,mean,min,max,stdev,sem,var,cv,skew,kurt,p25,p50,p75,p95,p99
1,2160.0,4.5,0.0,22.33,5.85,0.13,34.26,1.3,1.08,-0.23,0.12,0.83,8.87,16.55,19.25


We can check the overall mean workload in example3 by doing a weighted average of the mean occupancies by Severity from example2 with the workload ratios as weights.

In [66]:
import numpy as np

In [67]:
mean_occ = np.asarray(example2_dfs['summaries']['stationary']['Severity']['occupancy'].loc[:,'mean'])
mean_occ

array([4.42507716, 6.79535494])

In [74]:
ratios = [severity_to_workload[str(i+1)] for i in range(2)]
ratios

[0.25, 0.5]

In [75]:
overall_mean_workload = np.dot(mean_occ, ratios)
overall_mean_workload

4.503946759259259

#### Example 4 - Running via a Python script
Of course, you don't have to run Python statements through a Jupyter notebook. You can create a  Python script and run that directly in a terminal. An example, `test_shortstay2_multicats.py`, can be found in the `scripts` subfolder of the hillmaker-examples project. You can run it from a command prompt like this:

```sh
python test_shortstay2_multicats.py
```

There is another example in that folder as well, `test_obsim_log.py`, that is slightly more complex in that the input data has raw simulation times (i.e. minutes past t=0) and we need to do some datetime math to turn them into calendar based inputs.

More elaborate versions of scripts like `test_shortstay2_multicats.py` can be envisioned. For example, an entire folder of input data files could be processed by enclosing the `hm.make_hills` call inside a loop over the collection of input files:

    for log_fn in glob.glob('logs/*.csv'):

            # Read the log file and filter by included categories
            stops_df = pd.read_csv(log_fn, parse_dates=[in_fld_name, out_fld_name])

            hm.make_hills(scenario, df, in_fld_name, out_fld_name, start, end, cat_fld_name)
            ...

## User interface plans
Over the years, I (and many others) have used Hillmaker in a variety of ways, including:

- MS Access form based GUI
- run main Hillmaker sub from Access VBA Immediate Window
- run Hillmaker main sub (and/or components subs) via custom VBA procedures

I'd like users to be able to use the new Python based version in a number of different ways as well. As I've shown in this Jupyter notebook, it can be used by importing the `hillmaker` module and then calling Hillmaker functions via:

- an Jupyter notebook (or any Python terminal such as an IPython shell or QT console, or IDLE)
- a Python script with the input arguments set and passed via Python statements

While these two options provide tons of flexibility for power users, I also want to create other interfaces that don't require users to write Python code. At a minimum, I plan to create a command line interface (CLI) as well as a GUI that is similar to the old Access version.

### A CLI for Hillmaker
Python has several nice tools for creating CLI's. Both `docopt` and `argparse` are part of the standard library. Layered on top of these are tools like [Click](http://click.pocoo.org/5/). See http://docs.python-guide.org/en/latest/scenarios/cli/ for more. A well designed CLI will make it easy to use Python from the command line in either Windows or Linux. 

### A GUI for Hillmaker
This is uncharted territory for me. Python has [a number of frameworks/toolkits for creating GUI apps](https://wiki.python.org/moin/GuiProgramming). This is not the highest priority for me but I do plan on creating a GUI for Hillmaker. If anyone wants to help with this, awesome.

