In this repository a data analysis of the heart risk key indicators using data from the 2020 annual CDC survey of 400k adults related to their health status is performed. Data was gathered using Kaggle from the Personal Key Indicators of Heart Disease | Kaggle dataset.
The main purpose of this project is to be able to detect the heart risk of a person given information about its physical and mental health. Therefore, the problem I will be solving is a binary classification one.
The process of developing a model consisted of many different parts some of which included exploratory data analysis, model selection, validation and interpretability. The following list includes a Jupyter Notebook for each step:
This project sets the foundations for deploying this model in a Web Server using Flask. The following Jupyter Notebook will describe how to do so.
At the end of this project, it is a good idea to have a script that can train each or all models at once. The script called train.py
serves for this purpose.
You can do it by typing the following command to train all models at once.
python train.py --model all
If you want to train one model instead, you can set the model
parameter as logistic
, random_forrest
or xgboost
. When the training is done, you will have a series of bin
files that includes the machine learning models and a dict_vectorizer
which will be useful to transform our inputs into the desired format for our models.