Skip to content

Latest commit

 

History

History
58 lines (43 loc) · 3.79 KB

README.md

File metadata and controls

58 lines (43 loc) · 3.79 KB

An Ensemble Model for Predicting the Onset of Diabetes using NHANES Data

By John Semerdjian & Spencer Frank

Code

Our models are contained in the NHANES.ipynb notebook. In order to run the notebook, create a virtual environment and install the required modules.

# create a virtual environment, "nhanes"
$ mkvirtualenv --python=/usr/local/bin/python3 nhanes
$ workon nhanes

# install required modules
$ pip install -r requirements.txt

# download/merge data
$ python ./bootstrap.py

# start ipython notebook
$ ipython notebook

Video & Report

You can find our report here.

Abstract

Prediction of disease onset from patient survey and lifestyle data is quickly becoming an important tool for diagnosing a disease before it progresses. In this study data from the National Health and Nutrition Examination Survey (NHANES) questionnaire is used to predict the onset of diabetes. An ensemble model using the output of several classification algorithms was developed to predict the onset on diabetes based on 16 features. The ensemble model had an AUC of 0.834 indicating high performance.

Features and Descriptions

Additional Variables