# Case Study - Visualize Patient Arrivals in Singapore’s Public Hospitals

## Learning Objectives:
1. Manipulate data through filtering
2. Draw basic visualisation to describe the findings   

<i><b>Background</b></i>: Understanding demand is always a key issue in business operations. In healthcare management, patient arrivals are the key to affecting the efficiency of hospital/clinic operations. Without a sufficient number of healthcare professionals to serve patients, the consequence is a long waiting time for patients; thus their lives may be jeopardized. Increasing the number of healthcare professionals, without a doubt, can build a very efficient healthcare system with shorter waiting time, thereby gaining the great satisfaction of patients. However, the corresponding labour cost will become a big burden of the operations. From a managerial point of view, it is important to balance the operation cost and patients’ satisfaction. To achieve this, the first task is being able to know the pattern of patient arrivals as accurate as possible. 
<n>

The `EDdata.csv` contains Singaporeans’ arrivals at some major public hospitals’ emergency departments (EDs) in Oct 2011 and April 2012. Those hospitals are Tan Tock Seng Hospital, 
Singapore General Hospital, National University Hospital, Changi General Hospital, Alexandra Hospital, Khoo Teck Puat Hospital, and KK Women's and Children's Hospital. The data were retrieved from each hospital’s data warehouse system and were a random sample from all the patients who arrived at those hospitals’ EDs during a study period (a whole month). Please import `EDdata.csv` first and check the data.


In [1]:
import pandas as pd

df = pd.read_csv("EDdata.csv")  
df.head(10)

Unnamed: 0,Case,Hospital_Name,REGIS_DATE,REGIS_TIME,reg_sec,Triage Time,triage_sec,Triage_Class,Age,Gender,Race
0,92408,KTPH,7/4/2012,9:48:33,35313,9:58:12,35892,P2,40.0,M,Chinese
1,54452,KKH,7/10/2011,16:21:05,58865,16:23:42,59022,P2,0.0,M,Chinese
2,28303,CGH,3/10/2011,3:57:45,14265,4:00:00,14400,P3,33.0,M,Indian
3,121169,SGH,16/10/2011,4:08:47,14927,4:10:00,15000,P3,53.0,F,Malay
4,146488,TTSH,24/10/2011,3:09:47,11387,3:14:21,11661,P3,23.0,M,Others
5,93761,KTPH,11/4/2012,0:13:43,823,2:51:08,10268,P3,21.0,M,Chinese
6,95762,KTPH,16/04/2012,23:14:41,83681,1:05:13,3913,P2,47.0,M,Chinese
7,149941,TTSH,2/4/2012,10:12:32,36752,10:15:06,36906,P3,48.0,F,Others
8,23665,AH,29/10/2011,13:43:40,49420,13:49:00,49740,P2,43.0,M,Malay
9,12883,SGH,11/10/2011,6:23:14,22994,6:28:00,23280,P2,44.0,M,Malay


## Task 1-1
<i><b>Do male Singaporeans have preferences over different hospitals to attend in case of an emergency? Please also find the pattern for female Singaporeans and draw a visualisation to convey your findings effectively. </b></i> 
<n>

Please remember to delete the patient visits to KKH in the data set. KKH is a Women's and Children's hospital. If an emergency happens, the male patient will not be sent to KKH basically.

- Male Singaporeans

- Female Singaporeans

- A bar chart to show the findings

## Task 1-2
<n>

<i><b>Are the patients’ waiting time distributions similar across different public hospitals? Please draw a line chart to compare the median waiting time across different public hospitals.</b></i>
- To find the waiting time, there are two possible scenarios as follows:
    1. (Case 1) triage time is larger than registration time (normal cases)
    2. (Case 2) the triage will be conducted after midnight. However, the "sec" columns are always computed using `00:00:00` as the origin.

**Do you notice any anomaly in the table generated?**

- Please filter the records with waiting time larger than 300 minutes 

In [17]:
filter_check = df["Wait_min"] > 300
df_check = df.loc[filter_check, ["REGIS_TIME", "Triage Time", "reg_sec", "triage_sec", "Wait_time"]]
df_check.head()

Unnamed: 0,REGIS_TIME,Triage Time,reg_sec,triage_sec,Wait_time
26,21:44:24,21:44:00,78264,78240,86376
42,14:36:06,14:36:00,52566,52560,86394
50,18:42:38,18:41:00,67358,67260,86302
69,9:16:01,9:14:00,33361,33240,86279
75,6:21:20,6:21:00,22880,22860,86380


In practice, it is common to have anomalous data. Moreover, anomalous data values are due mainly to two possible reasons:
1. The way/logic you use to compute values is incorrect. (Logical error!)
2. The data records are not correct. (Data entry error!)

## Task 1-3
<n>

To make a staffing plan, which decides the number of nurses and doctors to serve patients, a deep understanding of patient arrivals is crucial. The staffing plan in practice will be made on an hourly basis (24 intervals) every day. Thus, please create a new column, `REGIS_HOUR`, in df. Moreover, the patients’ arrival pattern may vary by the day of a month. Please also create a new column, `REGIS_DAY`, in df.

## Task 1-4: Understand the hourly trend of patient arrivals in Singapore's public hospitals.
<n>

First, find out the average number of patient arrivals in each hour of a day. Then, please show the hourly trend of patient arrivals in Singapore's public hospitals.
    
To answer this question, we assume the arrival pattern is similar across different days. 

Draw a line chart to show the hourly trend of patient arrivals in Singapore's public hospitals. What is the conclusion you can draw from your chart?

## Task 1-5
<n>

    The assumption that the arrival pattern is similar across different days is too strong to be true. Let's discuss the weekday effect (including Saturday and Sunday) on the arrival pattern of patients. Please create a `WEEKDAY` column in df. For example, if a patient's arrival occurred on 01/10/2011, the corresponding value in `WEEKDAY` column is Saturday.

## Task 1-6
<n>
    
With the `WEEKDAY` column, please find out the average number of patient arrivals in each hour by weekday categories. Your answer should be a 7-by-24 table. 

## Task 1-7
<n>

Using the result of Task 1-6, draw a chart to show the hourly trend of patient arrivals in Singapore's public hospitals by different weekdays. What is the conclusion you can draw from your chart?