# Using the `make_hills()` function

The `hillmaker.hills.make_hills` function is the gateway to hillmaker and is used by the CLI, the object oriented API, or on its own to launch the hillmaking process. It has numerous input arguments for customizing how hillmaker works. In this tutorial we will describe all of the input arguments and discuss their use. This same information applies to the CLI and the object oriented API's `Scenario.make_hills` method.

In [4]:
%load_ext autoreload
%autoreload 2

In [5]:
import pandas as pd
import hillmaker as hm

In [9]:
ssu_stopdata = './data/ssu_2024.csv'
ssu_stops_df = pd.read_csv(ssu_stopdata, parse_dates=['InRoomTS','OutRoomTS'])
ssu_stops_df.info() # Check out the structure of the resulting DataFrame

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59877 entries, 0 to 59876
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   PatID      59877 non-null  int64         
 1   InRoomTS   59877 non-null  datetime64[ns]
 2   OutRoomTS  59877 non-null  datetime64[ns]
 3   PatType    59877 non-null  object        
 4   LOS_hours  59877 non-null  float64       
dtypes: datetime64[ns](2), float64(1), int64(1), object(1)
memory usage: 2.3+ MB


In [4]:
hm.make_hills?

[0;31mSignature:[0m
[0mhm[0m[0;34m.[0m[0mmake_hills[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mscenario_name[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mstops_df[0m[0;34m:[0m [0mpandas[0m[0;34m.[0m[0mcore[0m[0;34m.[0m[0mframe[0m[0;34m.[0m[0mDataFrame[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0min_field[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mout_field[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mstart_analysis_dt[0m[0;34m:[0m [0mstr[0m [0;34m|[0m [0mdatetime[0m[0;34m.[0m[0mdate[0m [0;34m|[0m [0mdatetime[0m[0;34m.[0m[0mdatetime[0m [0;34m|[0m [0mpandas[0m[0;34m.[0m[0m_libs[0m[0;34m.[0m[0mtslibs[0m[0;34m.[0m[0mtimestamps[0m[0;34m.[0m[0mTimestamp[0m [0;34m|[0m [0mnumpy[0m[0;34m.[0m[0mdatetime64[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mend_analysis_dt[0m[0;34m:[0m [0mstr[0m [0;34m|[0m [0mdatetime[0m[0;34m.[0m[0mdate[0m

## Required input arguments

### `scenario_name` (*str*)

This is a string that gets used in a few places:

- part of filenames of exported CSV files,
- part of filenames of exported plots,
- plot subtitle default

Since it gets used in filenames, best to avoid spaces and special characters (other than underscore). Any non-alphanumeric characters other than the underscore will get transformed to underscores.

### `stops_df` (DataFrame)

The `Dataframe` with each row representing one visit, or stop, by an entity. For example, in the SSU example, each row is a a patient who visits the short stay unit. In cycle share data, each row might be a rental of a bike for some period of time. Here are the first few records from `ssu_stops_df`. It is **NOT** necessary to have a field containing the duration of time that the entity spent in the system (e.g. `LOS_hours` below). You only need to have fields representing the arrival and departure times from the system - `InRoomTS` and `OutRoomTS` in this example.

In [10]:
ssu_stops_df.head()

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType,LOS_hours
0,1,2024-01-01 07:44:00,2024-01-01 09:20:00,IVT,1.6
1,2,2024-01-01 08:28:00,2024-01-01 11:13:00,IVT,2.75
2,3,2024-01-01 11:44:00,2024-01-01 12:48:00,MYE,1.066667
3,4,2024-01-01 11:51:00,2024-01-01 21:10:00,CAT,9.316667
4,5,2024-01-01 12:10:00,2024-01-01 12:57:00,IVT,0.783333


### `in_field` (*str*)

The fieldname in `stops_df` containing the arrival times. The datatype for the field itself must be a pandas `Timestamp` (or `datetime64`). 

### `out_field` (*str*)

The fieldname in `stops_df` containing the departure times. The datatype for the field itself must be a pandas `Timestamp` (or `datetime64`). 

In [11]:
ssu_stops_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59877 entries, 0 to 59876
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   PatID      59877 non-null  int64         
 1   InRoomTS   59877 non-null  datetime64[ns]
 2   OutRoomTS  59877 non-null  datetime64[ns]
 3   PatType    59877 non-null  object        
 4   LOS_hours  59877 non-null  float64       
dtypes: datetime64[ns](2), float64(1), int64(1), object(1)
memory usage: 2.3+ MB


### `start_analysis_dt` and `end_analysis_dt` (*something convertible to a pandas `Timestamp`*)

These two dates define what we call the *analysis date range*. All records in `stops_df` whose `in_field` and `out_field` values overlap this range in any way, are included in the hillmaker computations.

Care must be taken in selecting the analysis date range. In an example like the SSU, where most patients are staying less than 24 hours, we are probably fine with picking a `start_analysis_dt` very close or even equal to the earliest arrival date in our stop data. However, for a system in which the length of stay may be on the order of several days, we need to be congnizant of *warm up* effects. In such a case, if we used the earliest arrival date for the start of the analysis, we are essentially assuming that the system starts out empty on that date. This is certainly not likely to be true in a busy system where entities are staying multiple days. Similarly, the end date should not be after the date of the latest arrival or the system will appear to be emptying out - again, not realistic.

See {doc}`basic_occupancy_analysis` for more on this issue.

## Optional input arguments

### `cat_field` (*str*)

The fieldname in `stops_df` containing some sort of categorical information for which you would like to get hillmaker statistics. In the SSU example, this would be the `PatType` field. If a `cat_field` is specified, then arrival, departure and occupancy statistics are computed by category as well as overall. A common use of the category field is to specify a location. In this way, one hillmaker run can compute occupancy statitics for multiple locations. An example could be the name of the nursing unit visited as inpatients flow through a hospital. In the cycle share data example, a field specifying whether the renter was a subscription holder or a casual renter, lets us see the very different bike rental patterns by these two distinct populations.

### `bin_size_minutes` (*int*)

Central to hillmaker is the notion of dividing each day into equally sized time bins such as hours (`bin_size_minutes=60`) or half-hours. All of the summary tables and plots will use `bin_size_minutes`. Pick a value that makes sense for your study and for the level of time of day fluctuations present. Try different values and compare the plots. Large values might obscure important short-term fluctions in arrivals or occupancy.  

### `highres_bin_size_minutes` (*int*)

Number of minutes in each time bin of the day used for initial computation of the number of arrivals,
departures, and the occupancy level - i.e. in the creation of the bydatetime table. By default, this is set equal to the value of `bin_size_minutes` since it doesn't affect aggregate arrival, occupancy or departure statistics. So, why would you ever use this parameter?
    
    
    This value should be <= `bin_size_minutes`. The shorter the duration of
    stays, the smaller the resolution should be. The current default is 5 minutes.

### `in_field` (*str*)

The fieldname in `stops_df` containing the arrival times. The datatype for the field itself must be a pandas `Timestamp` (or `datetime64`). 

## Calling `make_hills()`

## Output dictionary