# Choosing the date range for analysis

### **WARNING** - Horizon and warmup effects

You need to be careful when specifying the start and end dates for your analysis (we call this the *analysis range*). You need to consider *horizon and warmup effects*. In the SSU example, each stop is just a few hours and there aren't any patients who arrive before 2024-01-01 and are still in the SSU on 2024-01-01. However, if we were working with data in which the stops are a few days in length (such as on an inpatient nursing unit), we need to think about what start date we should use and exactly how the original dataset was extracted. hillmaker is completely capable of properly accounting for patients who arrive before the specified start date for the analysis as well as those who are discharged after the end date. However, it can only work with the stop data provided. 

## Example 1 - short length of stay and horizon effects

If you have relatively short lengths of stay (up to several hours) such as in the SSU example, you just need to make sure that your specify an analysis range that is fully contained within the date range for the stop data records you are using. If our data contains all patients **discharged** between 2024-01-01 and 2024-09-30, we can safely use an analysis date range of `start_analysis_dt = 2024-01-01` and `end_analysis_dt = 2024-09-29`. We might be even able to use and end date of 2024-09-30 if we are not concerned about patients who arrived on or before 2024-09-30 but were discharged after this date. Even better, if when pulling the original data we made sure to grab all patient records for those patients who were discharged on or after 2024-01-01 and who arrived on or before 2024-09-30, then we could use `start_analysis_dt = 2024-01-01` and `end_analysis_dt = 2024-09-30`. 

Obviously, with a large number or records and multi-month timeframes, these types of horizon effects are neglible when the length of stay is relatively short.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import pandas as pd
import hillmaker as hm

In [4]:
!hillmaker --scenario example1 --data ./data/ssu_2024.csv \
--in_field InRoomTS --out_field OutRoomTS --cat_field PatType --bin_size_minutes 60 \
--start_analysis_dt 2023-01-02 --end_analysis_dt 2024-09-30 --csv_export_path output --plot_export_path output --ylabel Patients 

Traceback (most recent call last):
  File "/home/mark/anaconda3/envs/hm_oo/bin/hillmaker", line 33, in <module>
    sys.exit(load_entry_point('hillmaker', 'console_scripts', 'hillmaker')())
  File "/home/mark/Documents/projects/hillmaker/src/hillmaker/console.py", line 347, in main
    scenario = create_scenario(params_dict=args_dict)
  File "/home/mark/Documents/projects/hillmaker/src/hillmaker/scenario.py", line 706, in create_scenario
    scenario = Scenario(**params)
  File "/home/mark/anaconda3/envs/hm_oo/lib/python3.10/site-packages/pydantic/main.py", line 159, in __init__
    __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 1 validation error for Scenario
  Value error, start analysis date of 2024-09-30T23:59:59.000000000 is > 48 hours before earliest arrival of 2024-01-01 07:44:00 [type=value_error, input_value={'scenario_name': 'exampl...rt_summaries_csv': True}, input_type=dict]
    F

considering warmup effects - a transient phase as occupancy builds to some stochastic steady state. Assume you know that you have stop data that was extracted, say, to include all patients discharged between 1/1/2021 and 12/30/2021 and that each stop might last for several days. You wouldn't want to set your hillmaker start date to 1/1/2021 as the system will appear to start out empty and occupancy will have a transient phase until the system fills to some sort of steady state. The longer the length of stay, the longer this warmup phase will take. You might want to experiment with start dates ranging from a few weeks to a few months **after** your earliest arrival time in your hillmaker stop data to see how long the system takes to reach a steady state. Similarly, if your criteria for selectng the stop data was discharges in 1/1/2021-12/30/2021, your data will **not** contain records for those patients admitted before 12/30/2021 but discharged after 12/30/2021. So, you might want to set your end date for hillmaker to be a few weeks before 12/20/2021.

For our SSU data, we don't need to worry about this as patients only stay a few hours and the SSU typically only houses patients between ~6am-10pm.

In [None]:
import numpy as np
from matplotlib import pyplot as plt

In [None]:
alpha = 0.001
b = np.linspace(start = 0.1, stop = 10.0)

t_star = -np.log(alpha) * b 
plt.plot(b, t_star)
