Skip to content

richasethi3/CVD_Prediction

Repository files navigation

Development and Validation of Cardiovascular Disease Risk Prediction Tool: A Machine Learning Approach

The webapp for this work can be accessed at http://cvd-prediction-app.herokuapp.com/

The goal of this project was to develop a cardiovascular disease risk assessment screening tool to predict risk of heart attack and stroke among adults using self-reported information. This would in turn could be used to get a probability of an individual developing cardiovascular disease based on a short questionnaire and evaluate risk score.

The dataset was obtained from the National Health and Nutrition Examination Survey (NHANES) – a population-based program of studies designed to assess the health and nutritional status of adults and children in the United States. 5-year (2011-2016) demographics, examination, laboratory and questionnaire data was collected from NHANES for 60, 936 individuals.

After data cleaning and processing, the target feature CVD_risk showed unequal distribution of classes with 12,878 negative classes and 1,611 positive classes, corresponding to 1:8 ratio between positive and negative class. Therefore, specialized sampling techniques were employed to handle class imbalance.

Image

The performance of four classification algorithms – Logistic Regression, Support Vector Machine Random Forest, and k-Nearest Neighbors were analyzed using the default scikit-learn APIs. Figure below shows the Precision Recall curve and the ROC curve for the validation set with the most optimum model – Logistic Regression with SMOTEENN sampling. It showed 90% recall and 20% precision.

Image

The top 5 features for the most optimum model were age, diabetes, systolic blood pressure, smoking and family history, which were employed in the development of a cardiovascular disease prediction tool, which takes user input and predicts the disease risk based.

This approach can help transform the health insurance industry in two ways: they can provide new sources of insight to simplify and streamline current underwriting, and they can enhance the understanding of risk to enable more refined, granular categorizations of risk.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published