Skip to content

sd9829/random

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 

Repository files navigation

Predicting 30 Days Hospital Readmission

Project Overview

Hospital readmissions within 30 days pose substantial challenges for healthcare systems—both in terms of clinical quality and financial cost. This project develops a machine-learning–based framework to predict early readmissions using the Diabetic Readmission Dataset from the UCI Machine Learning Repository.

The workflow includes data cleaning, feature engineering, ordinal and one-hot encoding, SMOTE oversampling to address extreme class imbalance, and training/evaluating five models: Logistic Regression, Random Forest, CatBoost, Artificial Neural Network (ANN), and Naive Bayes.

Results show that while accuracy remains high for most models, recall—which is crucial for clinical risk prediction—varies widely. CatBoost consistently achieves the strongest recall and AUC, making it the most effective model for detecting true readmissions. The project emphasizes the importance of evaluating models beyond accuracy when working with highly imbalanced healthcare datasets.


Dataset Source

This project uses the Diabetes 130-US Hospitals dataset from UCI: https://archive.ics.uci.edu/dataset/296/diabetes+130+us+hospitals+for+years+1999+2008

You must download the dataset (diabetic_data.csv) and upload it into your Google Drive before running the Colab notebook.


Repository Contents

├── Predicting_30_Days_Hospital_Readmission_ds633_projectcode.ipynb # Main Colab notebook for execution
├── README.md # Project documentation
└── diabetic_data.csv # Stored in your Google Drive, not in this repo


How to Run This Project in Google Colab

1. Open the Colab Notebook

Upload and open: Predicting_30_Days_Hospital_Readmission_ds633_projectcode.ipynb in Google Colab (or use the link provided in the report)


2. Upload the Dataset to Your Google Drive

Place the file: diabetic_data.csv

in any folder in your Drive (recommended: MyDrive/).


3. Update the File Path in the Notebook (If Needed)

Inside the notebook, you will find:

file_path = "/content/drive/MyDrive/diabetic_data.csv"

Change this path if your dataset is stored elsewhere in Drive.


4. Run with these steps

Start from the top of the notebook and execute each cell sequentially:

from google.colab import drive
drive.mount('/content/drive')

Load the dataset

If the df.head() output appears without errors, the dataset loaded correctly.

Perform data cleaning and feature engineering
Split, encode, and apply SMOTE
Train and evaluate all models
Review performance tables and plots

All results will appear directly in the Colab notebook.


Notes

Ensure your dataset remains accessible in Google Drive while running the notebook.

The notebook will not run unless the file path is correct.

CatBoost uses raw (non-SMOTE) data; all other models use the SMOTE-balanced data.


Author

Soumya Dayal — DS 633: Foundations of Data Science and Analytics
Rochester Institute of Technology

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors