Skip to content

Set of utilities to manipulate datasets found in health care databases

Notifications You must be signed in to change notification settings

juanerolon/health-data-manip

Repository files navigation

Machine Learning with Healthcare Data

Collection of individual scripts and ipython notebooks illustrating my scratch development work towards the completion of the capstone project for Udacity's Machine Learning Nanodegree. As such, the scripts are entirely my own work and do not reflect the finished product nor constitute any type of reporting document. The completed project and report will be available on a separate repository.

Installation

Fork or clone GitHub repository and make sure you have a working Python 3 installation. To get the most of the scripts, make sure you have Numpy, Scipy, Pandas and Matplotlib libraries installed in your Python environment. The easiest way to add the required libraries is by using the tools provided by the Anaconda python distribution, https://www.anaconda.com.

$ git clone https://github.com/juanerolon/health-data-manip.git

Manual installation of core libraries within a conda environment (if libraries not already present):

$ conda install numpy

$ conda install scipy

$ conda install matplotlib

$ conda install pandas

Requisites

** Libraries **

  • Sklearn
  • Tensorflow with GPU support
  • Keras
  • Xgboost
  • Imbalanced-learn
  • Feather format

Usage

To run code inside Ipython notebooks you may open the file using Jupyter. From the command line:

$ jupyter notebook notebook.ipynb

To run each individual script you may use your favorite IDE or run the script from the command line:

$ python scriptname.py

Datasets

The datasets in health-data-manip/datasets/ are in SAS XPT format.

The raw datasets and their descriptions are open source and were obtained from The National Health and Nutrition Examination Surveys (NHANES) 2013. For more information visit:

https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/Overview.aspx?BeginYear=2013

Contributing

Please feel free to fork or clone or submit pull requests. By participating in this project, you agree to abide by open source standards and code of conduct.

License

The present project is a public domain work. However, it needs to be credited to @Juan E Rolon, and not used as an exact copy for project submission to Udacity's Machine Learning Nanodegree; doing so would constitute an act of plagiarism on your part and hurt your project evaluation. In any other case, please feel free to adapt the code, modify it or do whatever you see fit.

About

The scripts and notebooks in this project are part of the project submission materials developed to satisfy the capstone project completion for Udacity's Machine Learning Nanodegree.

About

Set of utilities to manipulate datasets found in health care databases

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published