Skip to content

raka-raprast/heart-disease-prediction

Repository files navigation

Heart Disease Prediction

The prospect of using machine learning to predicting a heart disease

Dataset

Heart Disease UCI : https://www.kaggle.com/ronitf/heart-disease-uci

Background

Heart Disease is a sickness that are caused by a lot of factor and it is considered as one of the most deadly sickness in the world. Based on WHO research, it's ranked on the first as "The world’s biggest killer". It is because Ischaemic heart disease, responsible for 16% of the world’s total deaths. Since 2000, the largest increase in deaths has been for this disease, rising by more than 2 million to 8.9 million deaths in 2019. Therefore, prediction for heart disease might be useful for the further research to prevent late treatment for people with symptom of heart disease.

Goals

*Predicting a heart disease using machine learning

*Choosing the best models to predict heart disease

*Compare effectiveness of Gaussian Naive Bayes, Random Forest and Decision Tree

*Evaluate the model using ROC AUC

Data Dictionary

*age : age of the patient

*sex : male or female

*cp : chest pain type

*trestbps : resting blood pressure

*chol : serum cholestoral

*fbs : fasting blood sugar

*restecg : resting electrocardiographic

*thalach : maximum heart rate achieved

*exang : exercise induced angina

*oldpeak : ST depression induced by exercise

*slope : the slope of the peak exercise ST

*ca : number of major vessels

*thal : normal/fixed defect/reversable defect

*target : sick or not

Exploratory Data Analysis

Distribution of age vs sex with the target class

download (2)

Variation of age for each target class

download (1)

Correlation Heatmap

download

Modelling

*The Data splitted into train and test and the train data size used is 30%

*Models that are used for this analysis are Gaussian Naive Bayes, Decision Tree and Random Forest

*Evaluation using Receiver Operating Characteristic - Area Under Curve(ROC AUC)

*Validation using K-Fold Cross Validation

Evaluation

Evaluation metrics using Receiver Operating Characteristic - Area Under Curve

Untitled-4

Based on the evaluation we can conclude that naive bayes has better accuracy than other two models

Summary

*Machine Learning can be use for predicting heart disease

*Gaussian Naive Bayes are the most effective model in this analysis compared to Random Forrest and Decision Tree

*ROC AUC used for evaluation metrics

*Random Forest and Decision Tree models also give the accuracy above 70%

*Female data is more than male data based on the Variable Distribution graph

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published