heart_attack

Summary.

This project takes data consisting of attributes that are hypothesized to contribute to heart disease. The purpose of this project is to take the data and generate a predictive model for heart disease. The data was obtained through Kaggle. The following modules were used analyze/visualize and build a predictive model: pandas for data munging, matplotlib, seaborn for data visualization, and sklearn, eli5 for model building and its associated analyses.

The data have fourteen attributes. They include:

age in years
sex (1 = male; 0 = female)
cp chest pain type
trestbps resting blood pressure (in mm Hg on admission to the hospital)
chol serum cholestoral in mg/dl
fbs (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
restecg resting electrocardiographic results
thalach maximum heart rate achieved
exang exercise induced angina (1 = yes; 0 = no)
oldpeak ST depression induced by exercise relative to rest
slope the slope of the peak exercise ST segment
ca number of major vessels (0-3) colored by flourosopy
thal 3 = normal; 6 = fixed defect; 7 = reversable defect
target 1 or 0 (I think this means 1= heart disease; 0=no heart disease)

Data Analysis and Visualization.

Of the heart disease data, there are thirteen categories that cover commonly available statistics generated by the common blood tests performed on patients. To determine if the patient has heart disease, the individual's status is recorded as the target.

There were three main areas that were investigated. Whether there is a trend of heart disease with any of the following:

Age
Gender
Chest pain magnitude

Before any of those questions are addressed, a heatmap was generated to survey possible correlations.

Data Heatmap
All categories were set against each other. A resulting heatmap was produced that indicates positive/negative correlations. Considering the various columns in reference to target, there a few notable positive and negative relationships.

positive relationships include: cp, thalach, and slope.
negative relationships include: age, sex, exang, oldpeak, ca, and thal.

Age as an indicator for heart disease This plot considers age and its role as an indicator for heart disease. The legend indicates heart disease (1) v. no heart disease (0). In this case, the above bar graph indicates that there is little to correlation of age as an indicator for heart disease. This is further evidenced by the heat map having a negative correlation value of -0.23.

Gender as an indicator for heart disease This plot considers sex and its role as an indicator for heart disease. First, it appears that the data is skewed to males, meaning, that there are more males in this study compared to females. In fact, the ratio of males to females is 2:1. Second, the female population has a higher rate of heart disease; the male population has a lower rate of heart disease. Due to this discrepancy, the heatmap reads this as not having a positive correlation. In other words, heatmap is indicating that sex is not likely an indicator of having a heart disease (-0.28).

Chest pain as an indicator for heart disease This plot considers chest pain type (cp) as an indicator for heart disease. For data where cp is 1 or higher, the incidence of heart disease is high. For data were cp is 0, the value of 0 indicates that there is no chest pain and correlates strongly with the absence of heart disease. According to the heatmap, the value for cp is 0.43, a positive correlation. That means that cp is likely an indicator of having a heart disease.

Model Building.

Three models were trained and tested. The three include: linear regression, logistic regression, and support vector machine (SVM). Of the three, the linear regression model had a poor predictive outcome.

Logistic Regression

Accuracy score: Train = 0.864; Test = 0.885.
Classification report:

SVM

Accuracy score: Train = 0.855; Test = 0.869.
Classification report:

The better model of the two is the Logistic Regression. Below is the weight per feature and the ROC for the logistic model.

License.

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Images		Images
Resources		Resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
heart_attack_ML.ipynb		heart_attack_ML.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Images

Images

Resources

Resources

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

heart_attack_ML.ipynb

heart_attack_ML.ipynb

Repository files navigation

heart_attack

Summary.

Data Analysis and Visualization.

Model Building.

License.

About

Releases

Packages

Languages

License

knishina/heart_attack

Folders and files

Latest commit

History

Repository files navigation

heart_attack

Summary.

Data Analysis and Visualization.

Model Building.

License.

About

Resources

License

Stars

Watchers

Forks

Languages