# Feature engineering

## Introduction
This notebook outlines the methodology for building a feature out of the provided data (Length of Stay).

### Potential features
Given the data, there are a number of features that could be built, including:

#### Readmission Risk: 
Flag patients who were readmitted within 28 days (Readmission28Days). This could be critical in understanding patterns leading to frequent readmissions and for developing preventive strategies.

#### Emergency Admission Indicator
A binary feature indicating whether the admission was an emergency (UrgencyOfAdmission). This could help in predicting resource needs and managing emergency room workloads.

#### Total Charges
Sum up all charges (AccommodationCharge, CCU_Charges, ICU_Charge, etc.) to get a feature representing the total cost of each patient's stay. Useful for cost analysis and identifying high-cost patients.

#### Diagnosis and Procedure Count
Count the number of diagnoses and procedures (PrincipalDiagnosis, Diagnosis2, etc.). This feature helps in understanding the complexity and severity of each case.

#### Length of Stay
The duration between AdmissionDate and SeparationDate. 

In [3]:
import pandas as pd

# Ingest raw data
input_filepath = "../data/Data Insights - Synthetic Dataset.csv"
input_df = pd.read_csv(input_filepath)

In [4]:
import pandas as pd

# Assuming you have already loaded the dataframe as input_df
input_df['AdmissionDate'] = pd.to_datetime(input_df['AdmissionDate'], format='%d/%m/%Y')
input_df['SeparationDate'] = pd.to_datetime(input_df['SeparationDate'], format='%d/%m/%Y')

# Create LengthOfStay feature
input_df['LengthOfStay'] = (input_df['SeparationDate'] - input_df['AdmissionDate']).dt.days

print(input_df[['AdmissionDate', 'SeparationDate', 'LengthOfStay']].head())

  AdmissionDate SeparationDate  LengthOfStay
0    2024-07-22     2024-07-29             7
1    2023-10-05     2023-11-03            29
2    2024-02-02     2024-02-08             6
3    2022-08-02     2022-08-27            25
4    2022-08-30     2022-09-07             8


# Conclusion
While the data allows for the creation of many useful features, the feature 'Length of Stay' was selected for the purposes of this excercise as it's an essential metric in healthcare analytics because it directly influences resource allocation, patient outcomes, and cost management. Shorter stays might indicate better efficiency or less severe cases, while longer stays might highlight more severe conditions or potential complications. These insights are invaluable in allocating resources and identifying areas of improvement.