In [1]:
# Set up autoreloading of modules so that I can debug code in external files
%load_ext autoreload
%autoreload 2

# Hillmaker - basic usage

In this notebook we'll focus on basic use of Hillmaker for analyzing occupancy in a typical hospital setting. The data is fictitious data from a hospital short stay unit. Patients flow through a short stay unit for a variety of procedures, tests or therapies. Let's assume patients can be classified into one of five categories of patient types: ART (arterialgram), CAT (post cardiac-cath), MYE (myelogram), IVT (IV therapy), and OTH (other). From one of our hospital information systems we were able to get raw data about the entry and exit times of each patient. For simplicity, the data is in a csv file.

This example assumes you are already familiar with statistical occupancy analysis using the old version of Hillmaker or some similar such tool. See the Hillmaker documentation or the following blog posts:

[Computing occupancy statistics with Python - Part 1 of 3](http://nbviewer.ipython.org/github/misken/hselab-tutorials/blob/master/hillpy_bydate_demo.ipynb)

[Computing occupancy statistics with Python - Part 2 of 3](http://nbviewer.ipython.org/github/misken/hselab-tutorials/blob/master/hillpy_occstats_demo.ipynb)

## Module imports
To run Hillmaker we only need to import a few modules. Since the main Hillmaker function uses Pandas DataFrames for both data input and output, we need to import `pandas` in addition to `hillmaker`.

In [2]:
import pandas as pd
import hillmaker as hm

## Read main stop data file
Here's the first few lines from our csv file containing the patient stop data:

    PatID,InRoomTS,OutRoomTS,PatType
    1,1/1/1996 7:44,1/1/1996 8:50,IVT
    2,1/1/1996 8:28,1/1/1996 9:20,IVT
    3,1/1/1996 11:44,1/1/1996 13:30,MYE
    4,1/1/1996 11:51,1/1/1996 12:55,CAT
    5,1/1/1996 12:10,1/1/1996 13:00,IVT
    6,1/1/1996 14:16,1/1/1996 15:35,IVT
    7,1/1/1996 14:40,1/1/1996 15:25,IVT


Read the short stay data from a csv file into a DataFrame and tell Pandas which fields to treat as dates. 

In [3]:
file_stopdata = '../data/ShortStay.csv'
stops_df = pd.read_csv(file_stopdata, parse_dates=['InRoomTS','OutRoomTS'])
stops_df.info() # Check out the structure of the resulting DataFrame

<class 'pandas.core.frame.DataFrame'>
Int64Index: 59877 entries, 0 to 59876
Data columns (total 4 columns):
PatID        59877 non-null int64
InRoomTS     59877 non-null datetime64[ns]
OutRoomTS    59877 non-null datetime64[ns]
PatType      59877 non-null object
dtypes: datetime64[ns](2), int64(1), object(1)
memory usage: 2.3+ MB


Check out the top and bottom of `stops_df`. 

In [4]:
stops_df.head(7)

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType
0,1,1996-01-01 07:44:00,1996-01-01 08:50:00,IVT
1,2,1996-01-01 08:28:00,1996-01-01 09:20:00,IVT
2,3,1996-01-01 11:44:00,1996-01-01 13:30:00,MYE
3,4,1996-01-01 11:51:00,1996-01-01 12:55:00,CAT
4,5,1996-01-01 12:10:00,1996-01-01 13:00:00,IVT
5,6,1996-01-01 14:16:00,1996-01-01 15:35:00,IVT
6,7,1996-01-01 14:40:00,1996-01-01 15:25:00,IVT


In [5]:
stops_df.tail(5)

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType
59872,59873,1996-09-30 19:31:00,1996-09-30 20:15:00,IVT
59873,59874,1996-09-30 20:23:00,1996-09-30 21:30:00,IVT
59874,59875,1996-09-30 21:00:00,1996-09-30 22:45:00,CAT
59875,59876,1996-09-30 21:57:00,1996-09-30 22:40:00,IVT
59876,59877,1996-09-30 22:45:00,1996-09-30 23:35:00,CAT


No obvious problems. We'll assume the data was all read in correctly.

## Creating occupancy summaries
The primary function in Hillmaker is called `make_hills` and plays the same role as the `Hillmaker` function in the original Access VBA version of Hillmaker. Let's get a little help on this function.

In [6]:
help(hm.make_hills)

Help on function make_hills in module hillmaker.hills:

make_hills(scenario_name, stops_df, infield, outfield, start_analysis, end_analysis, catfield='', total_str='Total', bin_size_minutes=60, cat_to_exclude=None, totals=True, export_csv=True, export_path='.', return_dataframes=False, verbose=0)
    Compute occupancy, arrival, and departure statistics by time bin of day and day of week.
    
    Main function that first calls `bydatetime.make_bydatetime` to calculate occupancy, arrival
    and departure values by date by time bin and then calls `summarize.summarize_bydatetime`
    to compute the summary statistics.
    
    Parameters
    ----------
    scenario_name : string
        Used in output filenames
    stops_df : DataFrame
        Base data containing one row per visit
    infield : string
        Column name corresponding to the arrival times
    outfield : string
        Column name corresponding to the departure times
    start_analysis : datetime-like, str
        Starti

Most of the parameters are similar to those in the original VBA version, though a few new ones have been added. For example, the `cat_to_exclude` parameter allows you to specify a list of category values for which you do not want occupancy statistics computed. Also, since the VBA version used an Access database as the container for its output, new parameters were added to control output to csv files instead.

### Example 1: 60 minute bins, all categories, export to csv
Specify values for all the required inputs:

In [7]:
# Required inputs
scenario = 'ss_example_1'
in_fld_name = 'InRoomTS'
out_fld_name = 'OutRoomTS'
cat_fld_name = 'PatType'
start = '1/1/1996'
end = '3/30/1996 23:45'

# Optional inputs
verbose = 1


Now we'll call the main `make_hills` function. We won't capture the return values but will simply take the default behavior of having the summaries exported to csv files. You'll see that the filenames will contain the scenario value.

In [8]:
hm.make_hills(scenario, stops_df, in_fld_name, out_fld_name, start, end, cat_fld_name, verbose=verbose)

min of intime: 1996-01-01 07:44:00
max of outtime: 1996-09-30 23:35:00
max of intime: 1996-09-30 22:45:00
min of outtime: 1996-01-01 08:50:00
rng_bydt created: 0.0042
Seeded bydatetime DataFrame created: 0.0877
dayofweek, bin_of_day, bin_of_week computed: 0.3531
Multi-index on bydatetime DataFrame created: 0.3707
Multi-index fully lexsorted: 0.3898
Done processing 19795 stop recs: 17.6560
{'inner': 19795}
Done adding totals: 17.7446


Here's a screenshot of the current folder containing this IPython notebook (**basic_usage_shortstay_unit.ipynb**) and the csv files created by Hillmaker. 

![folder with output csv files](example_1_files.png)

If you've used the previous version of Hillmaker, you'll recognize these files. A few more statistics have been added, but otherwise they are the same. These csv files can be imported into a spreadsheet application for plot creation. Of course, we can also make plots in Python. We'll do that in the next example. 

![folder with output csv files](example_1_occ.png)

The files with 'cat' in their name are new. They contain summary overall summary statistics by category. In other words, they are NOT by time of day and day of week.

![folder with output csv files](example_1_occ_cat.png)

### Example 2: 30 minute bins, only CAT and IVT, return values to DataFrames

In [9]:
# Required inputs - same as Example 1 except for scenario name
scenario = 'ss_example_2'
in_fld_name = 'InRoomTS'
out_fld_name = 'OutRoomTS'
cat_fld_name = 'PatType'
start = '1/1/1996'
end = '3/30/1996 23:45'

# Optional inputs
tot_fld_name = 'CAT_IVT' # Just to make it clear that it's only these patient types
bin_mins = 30 # Half-hour time bins
exclude = ['ART','MYE','OTH'] # Tell Hillmaker to ignore these patient types


Now we'll call `make_hills` and tuck the results (a dictionary of DataFrames) into a local variable. Then we can explore them a bit with Pandas.

In [10]:
results_ex2 = hm.make_hills(scenario, stops_df, in_fld_name, out_fld_name, start, end, cat_fld_name,
                            total_str=tot_fld_name, bin_size_minutes=bin_mins, 
                            cat_to_exclude=exclude, return_dataframes=True)

In [11]:
results_ex2.keys()

dict_keys(['tot_occ', 'departures', 'tot_dep', 'tot_arr', 'arrivals', 'bydatetime', 'occupancy'])

In [12]:
occ_df = results_ex2['occupancy']

In [13]:
occ_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count,cv,kurt,max,mean,min,p50,p55,p60,p65,...,p80,p85,p90,p95,p975,p99,sem,skew,stdev,var
category,day_of_week,bin_of_day,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
CAT,0,0,13,1.916684,0.509139,0.633333,0.125641,0,0,0,0.0,0.0,...,0.3,0.5,0.5,0.553333,0.593333,0.617333,0.06679,1.522946,0.240814,0.057991
CAT,0,1,13,3.605551,13.0,0.333333,0.025641,0,0,0,0.0,0.0,...,0.0,0.0,0.0,0.133333,0.233333,0.293333,0.025641,3.605551,0.09245,0.008547
CAT,0,2,13,3.605551,13.0,0.966667,0.074359,0,0,0,0.0,0.0,...,0.0,0.0,0.0,0.386667,0.676667,0.850667,0.074359,3.605551,0.268105,0.07188
CAT,0,3,13,3.076846,12.025087,0.966667,0.087179,0,0,0,0.0,0.0,...,0.0,0.033333,0.133333,0.486667,0.726667,0.870667,0.074396,3.436715,0.268238,0.071952
CAT,0,4,13,1.344087,-1.65717,1.0,0.328205,0,0,0,0.126667,0.506667,...,0.82,0.866667,0.966667,1.0,1.0,1.0,0.122349,0.672111,0.441136,0.194601


In [14]:
occ_df.tail()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count,cv,kurt,max,mean,min,p50,p55,p60,p65,...,p80,p85,p90,p95,p975,p99,sem,skew,stdev,var
category,day_of_week,bin_of_day,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
IVT,6,43,12,1.929904,2.592905,1.0,0.172222,0,0,0,0,0,...,0.4,0.523333,0.56,0.761667,0.880833,0.952333,0.095948,1.835588,0.332372,0.110471
IVT,6,44,12,3.464102,12.0,1.0,0.083333,0,0,0,0,0,...,0.0,0.0,0.0,0.45,0.725,0.89,0.083333,3.464102,0.288675,0.083333
IVT,6,45,12,2.372342,3.667887,0.833333,0.119444,0,0,0,0,0,...,0.0,0.21,0.54,0.705,0.769167,0.807667,0.0818,2.194808,0.283363,0.080295
IVT,6,46,12,2.335497,2.64,1.0,0.166667,0,0,0,0,0,...,0.0,0.35,0.9,1.0,1.0,1.0,0.112367,2.055237,0.389249,0.151515
IVT,6,47,12,3.464102,12.0,0.666667,0.055556,0,0,0,0,0,...,0.0,0.0,0.0,0.3,0.483333,0.593333,0.055556,3.464102,0.19245,0.037037


In [15]:
occ_df.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1008 entries, (CAT, 0, 0) to (IVT, 6, 47)
Data columns (total 22 columns):
count    1008 non-null float64
cv       1008 non-null float64
kurt     1008 non-null float64
max      1008 non-null float64
mean     1008 non-null float64
min      1008 non-null float64
p50      1008 non-null float64
p55      1008 non-null float64
p60      1008 non-null float64
p65      1008 non-null float64
p70      1008 non-null float64
p75      1008 non-null float64
p80      1008 non-null float64
p85      1008 non-null float64
p90      1008 non-null float64
p95      1008 non-null float64
p975     1008 non-null float64
p99      1008 non-null float64
sem      1008 non-null float64
skew     1008 non-null float64
stdev    1008 non-null float64
var      1008 non-null float64
dtypes: float64(22)
memory usage: 181.1+ KB
