# 📊 AI Case Study: Predicting Patient Readmission Risk Within 30 Days


## 🔍 Introduction
This notebook applies the **AI Development Workflow** to a real-world healthcare scenario: predicting the likelihood of a patient being readmitted to the hospital within 30 days of discharge. The aim is to help hospitals take proactive measures in patient care and reduce operational and financial burdens.



## 1️⃣ Problem Scope

### 🧠 Problem Definition:
Hospital readmissions within 30 days can result in increased costs and penalties. An AI model that can predict the risk of readmission enables clinicians to take preemptive actions and improve patient outcomes.

### 🎯 Objectives:
- Build a predictive model for patient readmission within 30 days.
- Assist hospital staff in identifying at-risk patients during discharge.
- Support planning for post-discharge care and interventions.

### 👥 Stakeholders:
- **Primary:** Physicians, nurses, hospital administration, patients.
- **Secondary:** Data science team, regulatory bodies, insurance providers.



## 2️⃣ Data Strategy

### 📚 Data Sources:
- **EHRs:** Clinical visits, medications, diagnoses, vitals, and lab tests.
- **Demographics:** Age, gender, income level, living situation.
- **Administrative Data:** Length of stay, admission/discharge codes.
- **Historical Readmission Labels:** Ground truth for training.

### ⚖️ Ethical Considerations:
1. **Patient Privacy:** Adherence to HIPAA; anonymization of identifiers.
2. **Algorithmic Bias:** Avoid reinforcing healthcare disparities across demographics.

### 🧼 Preprocessing Pipeline:
- **Impute Missing Values:** Mean/mode for numerical; 'Unknown' for categorical.
- **Standardize and Normalize:** Scale numeric features.
- **Feature Engineering:** Create features such as:
  - Count of prior hospital visits.
  - Flag for chronic conditions.
  - NLP-based discharge summary sentiment.
- **Encoding:** One-hot encode categorical variables.
- **Data Split:** Train (70%), validation (15%), test (15%).


In [None]:

import pandas as pd
import numpy as np

# Simulated dataset for demonstration
data = pd.DataFrame({
    'age': [65, 45, 70, 60],
    'gender': ['Male', 'Female', 'Male', 'Female'],
    'num_prev_visits': [1, 3, 2, 5],
    'chronic_condition': [1, 0, 1, 1],
    'days_in_hospital': [5, 3, 7, 10],
    'readmitted_30_days': [1, 0, 1, 0]
})

# Encode categorical variables
data_encoded = pd.get_dummies(data, columns=['gender'], drop_first=True)

# Feature-target split
X = data_encoded.drop(columns='readmitted_30_days')
y = data_encoded['readmitted_30_days']

X.head()



## 3️⃣ Model Development

### 📌 Model Selection:
We select **Gradient Boosting (XGBoost)** for its robustness in handling structured/tabular data and imbalanced datasets.

### 🔢 Training and Evaluation:
Using sample data to simulate training and evaluate with a confusion matrix.


In [None]:

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import confusion_matrix, precision_score, recall_score

# Train a simple model
model = GradientBoostingClassifier(random_state=42)
model.fit(X, y)
y_pred = model.predict(X)

# Confusion matrix
cm = confusion_matrix(y, y_pred)
precision = precision_score(y, y_pred)
recall = recall_score(y, y_pred)

print("Confusion Matrix:\n", cm)
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")



## 4️⃣ Deployment

### 🏥 Integration into Hospital System:
- **Backend Integration:** Host model via API (Flask/FastAPI).
- **Frontend Dashboards:** Integrate into physician decision tools.
- **Data Pipelines:** Automatic EHR data ingestion using HL7/FHIR.

### 📋 Ensuring Compliance:
- Implement access control, role-based permissions.
- Store logs of predictions and user access.
- Maintain compliance via periodic audits (HIPAA, GDPR).

### ⚙️ Monitoring:
- Performance drift monitoring.
- Alert when accuracy degrades.
- Schedule retraining if performance falls below a threshold.



## 5️⃣ Optimization

### 🛡️ Overfitting Prevention:
**Method: Early Stopping with Cross-Validation**
- Monitor validation loss.
- Stop training after N rounds without improvement.
- Prevents the model from learning noise.

Also, consider:
- Reducing model complexity.
- Using regularization (L1/L2).
- Increasing dataset size with data augmentation or synthetic data.
