# How is occupancy computed?

In order to compute aggregate summary statistics for occupancy by time of day and day of week, we first need to determine how many patients (entities, in general) are present during each time bin of each date over the analysis range. In our Short Stay Unit (SSU) example, we have been using an *analysis date range* of 2024-01-02 through 2024-09-30. Let's use hourly time bins. This leads to 6552 hourly datetime bins as shown below. Note that we are using a [24-hour clock](https://simple.wikipedia.org/wiki/24-hour_clock).



In [1]:
import pandas as pd

In [2]:
bydatetime_df = pd.read_csv('output/cli_demo_ssu_60_bydatetime_datetime.csv')
bydatetime_df['datetime']

0       2024-01-02 00:00:00
1       2024-01-02 01:00:00
2       2024-01-02 02:00:00
3       2024-01-02 03:00:00
4       2024-01-02 04:00:00
               ...         
6547    2024-09-30 19:00:00
6548    2024-09-30 20:00:00
6549    2024-09-30 21:00:00
6550    2024-09-30 22:00:00
6551    2024-09-30 23:00:00
Name: datetime, Length: 6552, dtype: object

Assume the very first patient arrives at 06:15 on 2024-01-02 and departs at 09:36 the same day. For the time bins starting at 07:00 and 08:00, the patient is in the unit for the entire time bin. However, for the *arrival bin*, 06:00, the patient is only present for 45 minutes. Similarly, for the *departure bin*, the patient is in the unit for 36 minutes. 

During the hillmaking process, occupancy contributions by datetime bin for each patient are computed and are accumulated in NumPy arrays. Eventually these arrays are converted to a pandas `DataFrame` that we refer to as the *bydatetime* table.


## What about the boundaries of the analysis date range?

As long as the records are in the stops dataframe, hillmaker will account for patients who might have arrived before but discharged after the start date. In our SSU example, the start date was 2024-01-02 because we wanted to ignore the impact of the January 1 holiday. However, records from 2024-01-01 are in the stops dataframe. 

In [3]:
ssu_stopdata = 'https://raw.githubusercontent.com/misken/hillmaker-examples/main/data/ssu_2024.csv'
# ssu_stopdata = './data/ssu_2024.csv'
stops_df = pd.read_csv(ssu_stopdata, parse_dates=['InRoomTS','OutRoomTS'])
stops_df[stops_df['InRoomTS'] < pd.Timestamp('2024-01-02')]

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType,LOS_hours
0,1,2024-01-01 07:44:00,2024-01-01 09:20:00,IVT,1.6
1,2,2024-01-01 08:28:00,2024-01-01 11:13:00,IVT,2.75
2,3,2024-01-01 11:44:00,2024-01-01 12:48:00,MYE,1.066667
3,4,2024-01-01 11:51:00,2024-01-01 21:10:00,CAT,9.316667
4,5,2024-01-01 12:10:00,2024-01-01 12:57:00,IVT,0.783333
5,6,2024-01-01 14:16:00,2024-01-01 17:35:00,IVT,3.316667
6,7,2024-01-01 14:40:00,2024-01-01 17:24:00,IVT,2.733333
7,8,2024-01-01 17:25:00,2024-01-02 01:53:00,CAT,8.466667


Notice that the last patient who arrived on 2024-01-01 wasn't discharged until 2024-01-02 01:53. If we look at the bydatetime table, we can see the occupancy contributions of this patient.

In [4]:
bydatetime_df.head(3)

Unnamed: 0,datetime,arrivals,departures,occupancy,dow_name,bin_of_day_str,day_of_week,bin_of_day,bin_of_week
0,2024-01-02 00:00:00,0.0,0.0,1.0,Tue,00:00,1,0,24
1,2024-01-02 01:00:00,0.0,1.0,0.883333,Tue,01:00,1,1,25
2,2024-01-02 02:00:00,0.0,0.0,0.0,Tue,02:00,1,2,26


We see that:

- there is one patient in the system from 12a-1a. This patient arrived on 2024-01-01 and had not yet been discharged as of midnight on 2024-01-02. 
- between 1a-2a (at 01:53), this patient was discharged. The occupancy value of 0.883333 for the 1a-2a time bin means that the patient was discharged after spending approximately $88\%$ ($53/60$ minutes) of the 1a-2a period in the SSU. 

Similarly, those patients who arrive during the analysis date range but are discharged after the end date, are included by hillmaker for the time spent in the system duing the analysis date range.

## Using different bin sizes for bydatetime and summary dataframes

By default, whatever you specify for the `bin_size_minutes` parameter (default is 60 minutes) is the resolution at which the `bydatetime` table is created. However, if for some reason you want to create (and save) a version of the `bydatetime` table with smaller time bin sizes, you can do it. 

There is a `highres_bin_size_minutes` parameter that you can set to a smaller value than `bin_size_minutes` if you would like to compute occupancy in the bydatetime table at a finer resolution but still want to report aggregate statistics using `bin_size_minutes`. For example, you could set `highres_bin_size_minutes=10` but keep `bin_size_minutes=60`. Using the default settings in hillmaker (see next section), this will **NOT** affect the aggregate statistics. However, it allows you to create a separate version of the bydatetime table at this higher resolution for further analysis. In order to save the high resolution version, set `keep_highres_bydatetime=True`.



## The `edge_bins` parameter and its impact on occupancy calculations

Since hillmaker's earliest days, it has always been possible to treat the arrival and departure bins in two different ways. The default behavior, `edge_bins=1`, uses the method described in this notebook in which a fractional occupancy contribution is computed based on the fraction of time the entity was in the system during the arrival and departure bins. However, if you really want to give the "full credit" for occupancy during the arrival and departure bins (i.e. use a value of $1.0$ instead of the fraction of the bin occupied, you can set `edge_bins=2`.



```{warning}
Using `edge_bins=2` with coarse time bins and short lengths of stay can lead to dramatic overestimates of occupancy. 
```

If you do use `edge_bins=2` for some reason, you should consider setting the `highres_bin_size_minutes` to a small value to mitigate overestimating occupancy. Quite honestly, we've really only kept this option around for research purposes.