In [1]:
import pandas as pd
import numpy as np

## 1. Introduction

Emergency departments (EDs) are critical units within healthcare systems, responsible for handling urgent and life-threatening cases. However, high patient volumes, diverse case types, and limited clinical resources can cause long waiting times and increase pressure on emergency services.

This project analyses an Emergency Room dataset consisting of 9,216 patient records. The dataset includes demographic information (e.g., gender, age, race), operational variables (e.g., referral source, admission flag), and outcome attributes such as patient satisfaction scores and waiting time. By examining these features, we aim to understand patterns and potential drivers of ED congestion.

The main goals of this project are:

- To describe and summarise key characteristics of Emergency Room patients.
- To explore relationships between demographics and operational outcomes, such as waiting time and satisfaction scores.
- To investigate whether patient-level attributes (e.g., age, referral type) influence ER waiting time.
- To visualise insights through meaningful charts and a Power BI dashboard.
- To support healthcare decision-making by providing data-driven insights into patient flow and service quality.

This analysis aligns with data analytics principles including data cleaning, statistical reasoning, exploratory analysis, hypothesis testing, and storytelling through visualisation.

In [7]:


df = pd.read_csv(r"C:\Users\ABDUL\Documents\ER-Capstone-Project\data\emergency_room.csv")
df.head()

Unnamed: 0,Patient Id,Patient Admission Date,Patient First Inital,Patient Last Name,Patient Gender,Patient Age,Patient Race,Department Referral,Patient Admission Flag,Patient Satisfaction Score,Patient Waittime,Patients CM
0,145-39-5406,20-03-2024 08:47,H,Glasspool,M,69,White,,False,10.0,39,0
1,316-34-3057,15-06-2024 11:29,X,Methuen,M,4,Native American/Alaska Native,,True,,27,0
2,897-46-3852,20-06-2024 09:13,P,Schubuser,F,56,African American,General Practice,True,9.0,55,0
3,358-31-9711,04-02-2024 22:34,U,Titcombe,F,24,Native American/Alaska Native,General Practice,True,8.0,31,0
4,289-26-0537,04-09-2024 17:48,Y,Gionettitti,M,5,African American,Orthopedics,False,,10,0


In [8]:
df.shape
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9216 entries, 0 to 9215
Data columns (total 12 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Patient Id                  9216 non-null   object 
 1   Patient Admission Date      9216 non-null   object 
 2   Patient First Inital        9216 non-null   object 
 3   Patient Last Name           9216 non-null   object 
 4   Patient Gender              9216 non-null   object 
 5   Patient Age                 9216 non-null   int64  
 6   Patient Race                9216 non-null   object 
 7   Department Referral         3816 non-null   object 
 8   Patient Admission Flag      9216 non-null   bool   
 9   Patient Satisfaction Score  2517 non-null   float64
 10  Patient Waittime            9216 non-null   int64  
 11  Patients CM                 9216 non-null   int64  
dtypes: bool(1), float64(1), int64(3), object(7)
memory usage: 801.1+ KB


In [9]:
df.isnull().sum()


Patient Id                       0
Patient Admission Date           0
Patient First Inital             0
Patient Last Name                0
Patient Gender                   0
Patient Age                      0
Patient Race                     0
Department Referral           5400
Patient Admission Flag           0
Patient Satisfaction Score    6699
Patient Waittime                 0
Patients CM                      0
dtype: int64

There are noticeable gaps in some fields, including Patient Satisfaction Score, which is populated for only around 27% of records. These missing values require handling decisions during analysis. 

This observation reflects real-world data quality challenges, where patient feedback or documentation is often incomplete.