In [23]:
import pandas as pd
from datetime import datetime, date
from functions import convertTime, timeDifference

In [24]:
df = pd.read_csv('(1)nonDuplicateData.csv') # Resets the dataframe | FOR TESTING

In [25]:
print(df.to_string()) # Displays dataframe | FOR TESTING

    INPATIENT_DATA_ID_x DEPARTMENT_ID      DEPARTMENT_NAME                 VISIT_TYPE  APPT_LENGTH APPT_STATUS_NAME CONTACT_DATE            APPT_DTTM         CHECKIN_DTTM        CHECKOUT_DTTM APPT_MADE_DATE APPT_CANC_DATE CHAIR_START INFUSION_START INFUSION_END  CHAIR_OUT                                                ORDER_DESCRIPTION               ORDER_STATUS                      CHAIR Unnamed: 19
0               APTT001       DEPT001  INFUSION Department            INFUSION 30 MIN           30          Arrived     10/31/21  2021-10-31  02:00PM  2021-10-31  01:28PM                  NaN        10/3/21            NaN     1:40 PM            NaN          NaN        NaN                                                              NaN  *Unspecified Order Status                    Chair 1         NaN
1               APTT002       DEPT001  INFUSION Department              INFUSION 2 HR          150          Arrived     10/31/21  2021-10-31  12:00PM  2021-10-31  12:21PM                  NaN 

The next cell cleans up the data. First it removes any rows that are missing data in CHECKIN_DTTM, CHAIR_START, INFUSION_START. This also cleans up any appointments that were cancelled or scheduled but skipped. Next, it applies the convertTime function to these columns, converting the times within the columns to datetime objects, allowing for easier addition and subtraction of them later. 

In [26]:
df = df.dropna(subset=['CHECKIN_DTTM', 'CHAIR_START', 'INFUSION_START'], thresh=2)
df = df.copy()
df['CHECKIN_DTTM'] = df['CHECKIN_DTTM'].apply(convertTime)
df['CHAIR_START'] = df['CHAIR_START'].apply(convertTime)
df['INFUSION_START'] = df['INFUSION_START'].apply(convertTime)

The next cell calculates the wait time between CHECKIN_DTTM and CHAIR_START (WAIT_ONE). This marks the first time the patients are waiting  
Next, it calculates the wait time between CHAIR_START AND INFUSION_START (WAIT_TWO). This marks the second time the patients are waiting

In [27]:
df['WAIT_ONE'] = df.apply(lambda row: timeDifference(row['CHECKIN_DTTM'], row['CHAIR_START']), axis=1)
df['WAIT_TWO'] = df.apply(lambda row: timeDifference(row['CHAIR_START'], row['INFUSION_START']), axis=1)

This next cell filters the wait time columns. If the wait time is negative, then that row must have invalid data, so I kept only the rows with valid datas.  
Also added a total wait time column that added both columns up.

In [28]:
df = df[(df['WAIT_ONE'] >= 0) | (df['WAIT_ONE'].isna())]
df = df[(df['WAIT_TWO'] >= 0) | (df['WAIT_TWO'].isna())]
df['TOTAL_WAIT'] = df['WAIT_ONE'] + df['WAIT_TWO']

In [29]:
df.to_csv('(3)finalData.csv', index=False)