In [1]:
# Set up autoreloading of modules so that I can debug code in external files
%load_ext autoreload
%autoreload 2

# Hillmaker - basic usage

In this notebook we'll focus on basic use of Hillmaker for analyzing occupancy in a typical hospital setting. The data is fictitious data from a hospital short stay unit. Patients flow through a short stay unit for a variety of procedures, tests or therapies. Let's assume patients can be classified into one of five categories of patient types: ART (arterialgram), CAT (post cardiac-cath), MYE (myelogram), IVT (IV therapy), and OTH (other). From one of our hospital information systems we were able to get raw data about the entry and exit times of each patient. For simplicity, the data is in a csv file.

This example assumes you are already familiar with statistical occupancy analysis using the old version of Hillmaker or some similar such tool. See the Hillmaker documentation or the following blog posts:

[Computing occupancy statistics with Python - Part 1 of 3](http://nbviewer.ipython.org/github/misken/hselab-tutorials/blob/master/hillpy_bydate_demo.ipynb)

[Computing occupancy statistics with Python - Part 2 of 3](http://nbviewer.ipython.org/github/misken/hselab-tutorials/blob/master/hillpy_occstats_demo.ipynb)

## Module imports
To run Hillmaker we only need to import a few modules. Since the main Hillmaker function uses Pandas DataFrames for both data input and output, we need to import `pandas` in addition to `hillmaker`.

In [7]:
import pandas as pd
import hillmaker as hm

## Read main stop data file
Here's the first few lines from our csv file containing the patient stop data:

    PatID,InRoomTS,OutRoomTS,PatType
    1,1/1/1996 7:44,1/1/1996 8:50,IVT
    2,1/1/1996 8:28,1/1/1996 9:20,IVT
    3,1/1/1996 11:44,1/1/1996 13:30,MYE
    4,1/1/1996 11:51,1/1/1996 12:55,CAT
    5,1/1/1996 12:10,1/1/1996 13:00,IVT
    6,1/1/1996 14:16,1/1/1996 15:35,IVT
    7,1/1/1996 14:40,1/1/1996 15:25,IVT


Read the short stay data from a csv file into a DataFrame and tell Pandas which fields to treat as dates. 

In [9]:
file_stopdata = '../data/ShortStay.csv'
stops_df = pd.read_csv(file_stopdata, parse_dates=['InRoomTS','OutRoomTS'])
stops_df.info() # Check out the structure of the resulting DataFrame

<class 'pandas.core.frame.DataFrame'>
Int64Index: 59877 entries, 0 to 59876
Data columns (total 4 columns):
PatID        59877 non-null int64
InRoomTS     59877 non-null datetime64[ns]
OutRoomTS    59877 non-null datetime64[ns]
PatType      59877 non-null object
dtypes: datetime64[ns](2), int64(1), object(1)
memory usage: 2.3+ MB


Check out the top and bottom of `stops_df`. 

In [12]:
stops_df.head(7)

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType
0,1,1996-01-01 07:44:00,1996-01-01 08:50:00,IVT
1,2,1996-01-01 08:28:00,1996-01-01 09:20:00,IVT
2,3,1996-01-01 11:44:00,1996-01-01 13:30:00,MYE
3,4,1996-01-01 11:51:00,1996-01-01 12:55:00,CAT
4,5,1996-01-01 12:10:00,1996-01-01 13:00:00,IVT
5,6,1996-01-01 14:16:00,1996-01-01 15:35:00,IVT
6,7,1996-01-01 14:40:00,1996-01-01 15:25:00,IVT


In [11]:
stops_df.tail(5)

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType
59872,59873,1996-09-30 19:31:00,1996-09-30 20:15:00,IVT
59873,59874,1996-09-30 20:23:00,1996-09-30 21:30:00,IVT
59874,59875,1996-09-30 21:00:00,1996-09-30 22:45:00,CAT
59875,59876,1996-09-30 21:57:00,1996-09-30 22:40:00,IVT
59876,59877,1996-09-30 22:45:00,1996-09-30 23:35:00,CAT


No obvious problems. We'll assume the data was all read in correctly.

## Creating occupancy summaries
The primary function in Hillmaker is called `make_hills` and plays the same role as the `Hillmaker` function in the original Access VBA version of Hillmaker. Let's get a little help on this function.

In [17]:
help(hm.make_hills)

Help on function make_hills in module hillmaker.hills:

make_hills(scenario_name, stops_df, infield, outfield, start_analysis, end_analysis, catfield='', total_str='Total', bin_size_minutes=60, cat_to_exclude=None, totals=True, export_csv=True, export_path='.', verbose=0)
    Compute occupancy, arrival, and departure statistics by time bin of day and day of week.
    
    Main function that first calls `bydatetime.make_bydatetime` to calculate occupancy, arrival
    and departure values by date by time bin and then calls `summarize.summarize_bydatetime`
    to compute the summary statistics.
    
    Parameters
    ----------
    scenario_name : string
        Used in output filenames
    stops_df : DataFrame
        Base data containing one row per visit
    infield : string
        Column name corresponding to the arrival times
    outfield : string
        Column name corresponding to the departure times
    start_analysis : datetime-like, str
        Starting datetime for the analy

Most of the parameters are similar to those in the original VBA version, though a few new ones have been added. For example, the `cat_to_exclude` parameter allows you to specify a list of category values for which you do not want occupancy statistics computed. Also, since the VBA version used an Access database as the container for its output, new parameters were added to control output to csv files instead.

### Example 1: 60 minute bins, all categories, export to csv
Specify values for all the required inputs:

In [18]:
# Required inputs
scenario = 'ss_example_1'
in_fld_name = 'InRoomTS'
out_fld_name = 'OutRoomTS'
cat_fld_name = 'PatType'
start = '1/1/1996'
end = '3/30/1996 23:45'


Now we'll call the main `make_hills` function. We won't capture the return values but will simply take the default behavior of having the summaries exported to csv files. You'll see that the filenames will contain the scenario value.

In [21]:
hm.make_hills(scenario, stops_df, in_fld_name, out_fld_name, start, end, cat_fld_name)

Here's a screenshot of the current folder containing this IPython notebook (**basic_usage_shortstay_unit.ipynb**) and the csv files created by Hillmaker.

![folder with output csv files](example_1_files.png)