Hospital Readmissions Prediction with XGBoost-Classifier

Machine learning project that predicts hospital readmissions using XGBoost. The project involves data preprocessing, model training, model evaluation and deployment

Overview

In this project, we use the XGBoost Classifier algorithm to build a predictive model for hospital readmissions. We start by loading and preprocessing the dataset, splitting it into training, validation, and test sets, and encoding categorical and ordinal features. We then train an XGBoost model, evaluate its performance, and save the trained model for future use.

Getting Started

Before you can run this project, make sure you have the required libraries installed. You can install them using the following commands:

 pip install pandas
 pip install xgboost
 pip install scikit-learn
 pip install fastapi
 pip install "uvicorn[standard]"

Data

The dataset used in this project is loaded from a CSV file named hospital_readmissions.csv. The target variable is readmitted, which indicates whether a patient was readmitted to the hospital. The data preprocessing steps include converting the target variable to a binary format and encoding categorical and ordinal features.

Data Source: The dataset can be obtained from the UCI Machine Learning Repository.

Diabetes 130-US Hospitals for Years 1999-2008

Preprocessing

Categorical columns: glucose_test, A1Ctest
Ordinal columns: age, medical_specialty, diag_1, diag_2, diag_3, change, diabetes_med

A column transformer is used to apply appropriate encodings to the features. One-hot encoding is used for categorical features, and Ordinal encoding is used for ordinal features.

Training

We train an XGBoost model with the following hyperparameters:

Learning rate (eta): 0.1
Maximum depth of trees: 4
Minimum child weight: 5
Objective: binary:logistic
Random seed: 1
Gamma: 1

Evaluation metric: AUC (Area Under the Receiver Operating Characteristic curve) The number of boosting rounds is set to 105. We train the model using the training data and evaluate it on the validation data.

K-Fold Cross-Validation

To ensure the model's robustness, we perform K-Fold cross-validation with K=10. This helps us assess the model's performance on different subsets of the data. The mean AUC and standard deviation of AUC across the folds are reported to provide a better understanding of model performance.

Testing

After cross-validation, we train the final model using the entire training dataset and evaluate it on the test set. The AUC score is reported as the final performance metric.

Saving the model

The trained XGBoost model and the preprocessing transformers are saved to a binary file named xgb_eta01.bin using the pickle module. This allows for reusing the model without the need for retraining.

testing.mov

Usage

You can use the saved model for making predictions on new data. Here's an example of how to load the model and make predictions:

import pickle
import xgboost as xgb

# Load the saved model
with open('xgb_eta01.bin', 'rb') as f_in:
    preprocessor, model = pickle.load(f_in)

# Your new data (X_new) should be in the same format as the training data
X_new = preprocessor.transform(X_new)
X_new_dmat = xgb.DMatrix(X_new, feature_names=feature_names)
y_pred = model.predict(X_new_dmat)

FastApi Application

The FastAPI application includes two endpoints:

1. Individual Prediction (/Individual)

This endpoint allows you to make predictions for individual patient data. The input data is provided as a JSON request body in the following format:

{
    "age": "string",
    "time_in_hospital": int,
    "n_lab_procedures": int,
    "n_procedures": int,
    "n_medications": int,
    "n_outpatient": int,
    "n_inpatient": int,
    "n_emergency": int,
    "medical_specialty": "string",
    "diag_1": "string",
    "diag_2": "string",
    "diag_3": "string",
    "glucose_test": "string",
    "A1Ctest": "string",
    "change": "string",
    "diabetes_med": "string"
}

The application processes the input data, pre-processes it, and makes predictions. It returns the readmission probability and a binary readmitted status for the provided patient data.

2. Group Prediction (/Group)

This endpoint allows you to upload a CSV file containing multiple patient records for prediction. The file should have the same structure as the training data. The application processes the uploaded file, pre-processes the data, makes predictions, and returns a mixed result of readmission probabilities and binary readmitted status for each patient record. Additionally, it calculates the proportion of positive readmitted cases in the group data.

Files for Python Development and Containerization

Dockerfile: Contains instructions to set up the container environment, install software, and configure the application. Ensures consistent application deployment in isolated containers.

FROM python:3.9-slim

RUN pip install pipenv

WORKDIR /app
COPY ["Pipfile", "Pipfile.lock", "./"]

RUN pipenv install --system --deploy

COPY ["main.py", "xgb_eta01.bin", "./"]

EXPOSE 8000

ENTRYPOINT ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Pipfile: Lists project dependencies and their versions. Used with tools like Pipenv for managing Python project environments.
Pipfile.lock : Lock file generated by Pipenv, ensuring that the same package versions are installed when recreating a virtual environment. Guarantees reproducible Python environments.

Conclusion

This project demonstrates how to create a FastAPI application for making hospital readmission predictions using a pre-trained XGBoost model and data preprocessing transformers. The application provides endpoints for both individual and group predictions, making it useful for various scenarios in healthcare analytics.

Demonstration

demo.mov

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hospital Readmissions Prediction with XGBoost-Classifier

Table of Contents

Overview

Getting Started

Data

Preprocessing

Training

K-Fold Cross-Validation

Testing

Saving the model

Usage

FastApi Application

1. Individual Prediction (/Individual)

2. Group Prediction (/Group)

Files for Python Development and Containerization

Conclusion

Demonstration

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
hospital_readmissions.csv		hospital_readmissions.csv
main.py		main.py
mid_proyect.ipynb		mid_proyect.ipynb
train.py		train.py
xgb_eta01.bin		xgb_eta01.bin

Folders and files

Latest commit

History

Repository files navigation

Hospital Readmissions Prediction with XGBoost-Classifier

Table of Contents

Overview

Getting Started

Data

Preprocessing

Training

K-Fold Cross-Validation

Testing

Saving the model

Usage

FastApi Application

1. Individual Prediction (/Individual)

2. Group Prediction (/Group)

Files for Python Development and Containerization

Conclusion

Demonstration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages