# Generate new SSU dataset

My existing SSU dataset has realistic arrival times but rather unrealistic LOS. More realistic would be:

- Arteriogram - 1-2 hrs for procedure, 4-6 hrs for recovery
- Cardiac cath - 2-6 hours for recovery
- IV Therapy - a few hours
- Myelogram - a few hours for recovery
- Other - a few hours up to 24 hours

Let's generate an updated version using more realistic LOS that are on the order of a few hours.

- Arteriogram - 1-2 hrs for procedure, 4-6 hrs for recovery
- Cardiac cath - 2-6 hours for recovery
- IV Therapy - a few hours
- Myelogram - a few hours for recovery
- Other - a few hours up to 24 hours

In [1]:
import pandas as pd
from numpy.random import default_rng

In [2]:
ssu_stopdata = './data/ShortStay.csv'
ssu_stops_df = pd.read_csv(ssu_stopdata, parse_dates=['InRoomTS','OutRoomTS'])

In [4]:
ssu_stops_df.head()

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType
0,1,1996-01-01 07:44:00,1996-01-01 08:50:00,IVT
1,2,1996-01-01 08:28:00,1996-01-01 09:20:00,IVT
2,3,1996-01-01 11:44:00,1996-01-01 13:30:00,MYE
3,4,1996-01-01 11:51:00,1996-01-01 12:55:00,CAT
4,5,1996-01-01 12:10:00,1996-01-01 13:00:00,IVT


Now we'll generate a new LOS in hours

In [10]:
def new_los(rg, pat_type):
    
    art_mean = 4
    art_k = 4
    art_stage_mean = art_mean / art_k

    cat_mean = 5
    cat_k = 2
    cat_stage_mean = cat_mean / cat_k

    iv_mean = 2
    iv_k = 4
    iv_stage_mean = iv_mean / iv_k

    mye_mean = 2
    mye_k = 3
    mye_stage_mean = mye_mean / mye_k

    other_mean = 8
    other_k = 6
    other_stage_mean = other_mean / other_k
    
    if pat_type == 'ART':
        return rg.gamma(art_k, art_stage_mean)
    elif pat_type == 'CAT':
        return rg.gamma(cat_k, cat_stage_mean)
    elif pat_type == 'IVT':
        return rg.gamma(iv_k, iv_stage_mean)
    elif pat_type == 'MYE':
        return rg.gamma(mye_k, mye_stage_mean)
    else: 
        return rg.gamma(other_k, other_stage_mean)


Initialize a random number generator

In [8]:
import secrets

seed = secrets.randbits(128)
rg = default_rng(seed=seed)

In [14]:
new_los(rg, 'IVT')

5.117157791063751

In [40]:
ssu_stops_df['new_los_minutes'] = ssu_stops_df.apply(lambda x: int(new_los(rg, x.PatType) * 60), axis=1)

In [41]:
ssu_stops_df.head()

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType,new_los_hours,new_InRoomTS,new_OutRoomTS,new_los_minutes
0,1,1996-01-01 07:44:00,1996-01-01 08:50:00,IVT,1.481479,2024-01-01 07:44:00,2024-01-01 09:12:53.323233600,96
1,2,1996-01-01 08:28:00,1996-01-01 09:20:00,IVT,1.311995,2024-01-01 08:28:00,2024-01-01 09:46:43.183224000,165
2,3,1996-01-01 11:44:00,1996-01-01 13:30:00,MYE,2.162773,2024-01-01 11:44:00,2024-01-01 13:53:45.983700000,64
3,4,1996-01-01 11:51:00,1996-01-01 12:55:00,CAT,9.93287,2024-01-01 11:51:00,2024-01-01 21:46:58.332288000,559
4,5,1996-01-01 12:10:00,1996-01-01 13:00:00,IVT,1.742876,2024-01-01 12:10:00,2024-01-01 13:54:34.353366000,47


In [21]:
day1 = pd.Timestamp('1996-01-01')
day1.weekday()

0

Need to find a year in which Jan 1 is a Monday and that is a leap year (since 1996 is a leap year).

In [29]:
new_day1 = pd.Timestamp('2024-01-01')
new_day1.weekday()

0

In [33]:
new_day1 - day1

Timedelta('10227 days 00:00:00')

In [34]:
ssu_stops_df['new_InRoomTS'] = ssu_stops_df['InRoomTS'] + pd.Timedelta(10227, 'd')

In [35]:
ssu_stops_df.head()

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType,new_los_hours,new_InRoomTS
0,1,1996-01-01 07:44:00,1996-01-01 08:50:00,IVT,1.481479,2024-01-01 07:44:00
1,2,1996-01-01 08:28:00,1996-01-01 09:20:00,IVT,1.311995,2024-01-01 08:28:00
2,3,1996-01-01 11:44:00,1996-01-01 13:30:00,MYE,2.162773,2024-01-01 11:44:00
3,4,1996-01-01 11:51:00,1996-01-01 12:55:00,CAT,9.93287,2024-01-01 11:51:00
4,5,1996-01-01 12:10:00,1996-01-01 13:00:00,IVT,1.742876,2024-01-01 12:10:00


In [42]:
ssu_stops_df['new_OutRoomTS'] = ssu_stops_df.apply(lambda x: x.new_InRoomTS + pd.Timedelta(x.new_los_minutes, 'm'), axis=1)

In [43]:
ssu_stops_df.head()

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType,new_los_hours,new_InRoomTS,new_OutRoomTS,new_los_minutes
0,1,1996-01-01 07:44:00,1996-01-01 08:50:00,IVT,1.481479,2024-01-01 07:44:00,2024-01-01 09:20:00,96
1,2,1996-01-01 08:28:00,1996-01-01 09:20:00,IVT,1.311995,2024-01-01 08:28:00,2024-01-01 11:13:00,165
2,3,1996-01-01 11:44:00,1996-01-01 13:30:00,MYE,2.162773,2024-01-01 11:44:00,2024-01-01 12:48:00,64
3,4,1996-01-01 11:51:00,1996-01-01 12:55:00,CAT,9.93287,2024-01-01 11:51:00,2024-01-01 21:10:00,559
4,5,1996-01-01 12:10:00,1996-01-01 13:00:00,IVT,1.742876,2024-01-01 12:10:00,2024-01-01 12:57:00,47


In [44]:
new_ssu_stops_df = ssu_stops_df[['PatID', 'new_InRoomTS', 'new_OutRoomTS', 'PatType']]

In [45]:
new_ssu_stops_df.head()

Unnamed: 0,PatID,new_InRoomTS,new_OutRoomTS,PatType
0,1,2024-01-01 07:44:00,2024-01-01 09:20:00,IVT
1,2,2024-01-01 08:28:00,2024-01-01 11:13:00,IVT
2,3,2024-01-01 11:44:00,2024-01-01 12:48:00,MYE
3,4,2024-01-01 11:51:00,2024-01-01 21:10:00,CAT
4,5,2024-01-01 12:10:00,2024-01-01 12:57:00,IVT


In [50]:
new_col_names = {'new_InRoomTS': 'InRoomTS', 'new_OutRoomTS': 'OutRoomTS'}
new_ssu_stops_df = new_ssu_stops_df.rename(columns=new_col_names)
new_ssu_stops_df.head()
new_ssu_stops_df['LOS_hours'] = new_ssu_stops_df.apply(lambda x: (x.OutRoomTS - x.InRoomTS) / pd.Timedelta(1, 'h'), axis=1)

In [51]:
new_ssu_stops_df.head()

Unnamed: 0,PatID,InRoomTS,OutRoomTS,PatType,LOS_hours
0,1,2024-01-01 07:44:00,2024-01-01 09:20:00,IVT,1.6
1,2,2024-01-01 08:28:00,2024-01-01 11:13:00,IVT,2.75
2,3,2024-01-01 11:44:00,2024-01-01 12:48:00,MYE,1.066667
3,4,2024-01-01 11:51:00,2024-01-01 21:10:00,CAT,9.316667
4,5,2024-01-01 12:10:00,2024-01-01 12:57:00,IVT,0.783333


In [52]:
new_ssu_stops_df.to_csv('ssu_2024.csv', index=False)