Skip to content

A Final Project for the course EE660- Machine Learning. Implementation of classification model to accurately categorize incoming unknown datapoints with accuracy greater than 65 percent . Identified main features contributing to readmission of patients back to hospital using KNN, LR, Decision Trees,Random Forest, AdaBoost, Stochastic Gradient cl…

Notifications You must be signed in to change notification settings

sbsreedh/Identification-of-Risk-of-readmission-of-Diabetic-patients

Repository files navigation

Identification-of-Risk-of-readmission-of-Diabetic-patients

Background

It was reported that in 2011 more than 3.3 million patients were readmitted in the US within 30 days of being discharged, and they were associated with about $41 billion in hospital costs. The need for readmission indicates that inadequate care was provided to the patient at the time of first admission. The readmission rate has become an important metric measuring the overall quality of a hospital.

Diabetes is the 7th leading cause of death and affects about 23.6 million people in the US. 1.4 million Americans are diagnosed with diabetes every year. Hospital readmission being a major concern in diabetes care, over 250 million dollars was spent on treatment of readmitted diabetic patients in 2011. Early identification of patients facing a high risk of readmission can enable healthcare providers to conduct additional investigations and possibly prevent future readmissions.

In this project, I build a machine learning classifier model to predict diabetes patients with high risk of readmission. Note that higher sensitivity (recall) is more desirable for hospitals because it is more crucial to correctly identify "high risk" patients who are likely to be readmitted than identifying "low risk" patients. The machine algorithms like KNN, Logistic Regression, Decision Trees, Random Forest, Stochastic Gradient Descent Classifier, AdaBoost were used to train the model. Later t the optimised model after tuning the hyper parameters of the models were validated on the test to obtaoin the best accuracy .

Important features were identified using Random forest and Logistic Algorithm.

Please open the Jupyter Notebook file to see the details.

Dataset Description

The dataset represents 10 years (1999-2008) of clinical care at 130 US hospitals and integrated delivery networks. It includes 50 features representing 101766 diabetes patient and hospital outcomes. Information was extracted from the database for encounters that satisfied the following criteria:

It is an inpatient encounter (a hospital admission). It is a diabetic encounter, that is, one during which any kind of diabetes was entered to the system as a diagnosis. The length of stay was at least 1 day and at most 14 days. Laboratory tests were performed during the encounter. Medications were administered during the encounter. The data contains such attributes as patient number, race, gender, age, admission type, time in hospital, medical specialty of admitting physician, number of lab test performed, HbA1c test result, diagnosis, number of medication, diabetic medications, number of outpatient, inpatient, and emergency visits in the year before the hospitalization, etc.

Source: UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008

About

A Final Project for the course EE660- Machine Learning. Implementation of classification model to accurately categorize incoming unknown datapoints with accuracy greater than 65 percent . Identified main features contributing to readmission of patients back to hospital using KNN, LR, Decision Trees,Random Forest, AdaBoost, Stochastic Gradient cl…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published