Skip to content

Data analysis of heart risk key indicators using data from the 2020 annual CDC survey of 400k adults related to their health status.

License

Notifications You must be signed in to change notification settings

mriosrivas/heart-risk-prediction-machinelearning

Repository files navigation

Heart Risk Key Indicators EDA and Machine Learning Modeling

In this repository a data analysis of the heart risk key indicators using data from the 2020 annual CDC survey of 400k adults related to their health status is performed. Data was gathered using Kaggle from the Personal Key Indicators of Heart Disease | Kaggle dataset.

The main purpose of this project is to be able to detect the heart risk of a person given information about its physical and mental health. Therefore, the problem I will be solving is a binary classification one.

The process of developing a model consisted of many different parts some of which included exploratory data analysis, model selection, validation and interpretability. The following list includes a Jupyter Notebook for each step:

  1. Exploratory Data Analysis

  2. Machine Learning Model: Logistic Regression

  3. Machine Learning Model: Decision Trees

  4. Machine Learning Model: XGBoost

  5. Best Model Selection

  6. Model Usage Example

This project sets the foundations for deploying this model in a Web Server using Flask. The following Jupyter Notebook will describe how to do so.

At the end of this project, it is a good idea to have a script that can train each or all models at once. The script called train.py serves for this purpose.

You can do it by typing the following command to train all models at once.

python train.py --model all

If you want to train one model instead, you can set the model parameter as logistic, random_forrest or xgboost. When the training is done, you will have a series of bin files that includes the machine learning models and a dict_vectorizer which will be useful to transform our inputs into the desired format for our models.

About

Data analysis of heart risk key indicators using data from the 2020 annual CDC survey of 400k adults related to their health status.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published