Identification-of-Risk-of-readmission-of-Diabetic-patients

Background

It was reported that in 2011 more than 3.3 million patients were readmitted in the US within 30 days of being discharged, and they were associated with about $41 billion in hospital costs. The need for readmission indicates that inadequate care was provided to the patient at the time of first admission. The readmission rate has become an important metric measuring the overall quality of a hospital.

Diabetes is the 7th leading cause of death and affects about 23.6 million people in the US. 1.4 million Americans are diagnosed with diabetes every year. Hospital readmission being a major concern in diabetes care, over 250 million dollars was spent on treatment of readmitted diabetic patients in 2011. Early identification of patients facing a high risk of readmission can enable healthcare providers to conduct additional investigations and possibly prevent future readmissions.

In this project, I build a machine learning classifier model to predict diabetes patients with high risk of readmission. Note that higher sensitivity (recall) is more desirable for hospitals because it is more crucial to correctly identify "high risk" patients who are likely to be readmitted than identifying "low risk" patients. The machine algorithms like KNN, Logistic Regression, Decision Trees, Random Forest, Stochastic Gradient Descent Classifier, AdaBoost were used to train the model. Later t the optimised model after tuning the hyper parameters of the models were validated on the test to obtaoin the best accuracy .

Important features were identified using Random forest and Logistic Algorithm.

Please open the Jupyter Notebook file to see the details.

Dataset Description

The dataset represents 10 years (1999-2008) of clinical care at 130 US hospitals and integrated delivery networks. It includes 50 features representing 101766 diabetes patient and hospital outcomes. Information was extracted from the database for encounters that satisfied the following criteria:

It is an inpatient encounter (a hospital admission). It is a diabetic encounter, that is, one during which any kind of diabetes was entered to the system as a diagnosis. The length of stay was at least 1 day and at most 14 days. Laboratory tests were performed during the encounter. Medications were administered during the encounter. The data contains such attributes as patient number, race, gender, age, admission type, time in hospital, medical specialty of admitting physician, number of lab test performed, HbA1c test result, diagnosis, number of medication, diabetic medications, number of outpatient, inpatient, and emergency visits in the year before the hospitalization, etc.

Source: UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
best_classifier.pkl		best_classifier.pkl
df_test.csv		df_test.csv
df_train.csv		df_train.csv
df_valid.csv		df_valid.csv
diabetic_data.csv		diabetic_data.csv
main.ipynb		main.ipynb
scaler.sav		scaler.sav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

best_classifier.pkl

best_classifier.pkl

df_test.csv

df_test.csv

df_train.csv

df_train.csv

df_valid.csv

df_valid.csv

diabetic_data.csv

diabetic_data.csv

main.ipynb

main.ipynb

scaler.sav

scaler.sav

Repository files navigation

Identification-of-Risk-of-readmission-of-Diabetic-patients

Background

Dataset Description

About

Releases

Packages

Languages

sbsreedh/Identification-of-Risk-of-readmission-of-Diabetic-patients

Folders and files

Latest commit

History

Repository files navigation

Identification-of-Risk-of-readmission-of-Diabetic-patients

Background

Dataset Description

About

Resources

Stars

Watchers

Forks

Languages