# Heart Attacks are Funny

This is some analysis to save Rich Evans' life

In [None]:
#importing libraries
from sklearn.datasets import load_boston
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
%matplotlib inline
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE
from sklearn.linear_model import RidgeCV, LassoCV, Ridge, Lasso#Loading the dataset

## Read in the data

In [None]:
heart_df = pd.read_csv('heart.csv')

## Show me that sweet data

In [None]:
heart_df.head(5)

### Features

1. age - age in years
2. sex - sex (1 = male; 0 = female)
3. cp - chest pain type (1 = typical angina; 2 = atypical angina; 3 = non-anginal pain; 0 = asymptomatic)
4. trestbps - resting blood pressure (in mm Hg on admission to the hospital)
5. chol - serum cholestoral in mg/dl
6. fbs - fasting blood sugar > 120 mg/dl (1 = true; 0 = false)
7. restecg - resting electrocardiographic results (0 = normal; 1 = having ST-T; 2 = hypertrophy)
8. thalach - maximum heart rate achieved
9. exng - exercise induced angina (1 = yes; 0 = no)
10. oldpeak - ST depression induced by exercise relative to rest
11. slope - the slope of the peak exercise ST segment (1 = upsloping; 2 = flat; 3 = downsloping)
12. ca - number of major vessels (0-3) colored by flourosopy
13. thal - 2 = normal; 1 = fixed defect; 3 = reversable defect
14. num - the predicted attribute - diagnosis of heart disease (angiographic disease status) (Value 0 = < diameter narrowing; Value 1 = > 50% diameter narrowing)

Looks like the previous sample had a lot of people who are gonna die. Let's see some live Rich Evans

In [None]:
good_hearts = heart_df.where(heart_df['output'] == 0).dropna()
good_hearts.head(5)

## Analysis

Let's do some feature selection. Inspired by [this blog](https://towardsdatascience.com/feature-selection-with-pandas-e3690ad8504b)

In [None]:
X = heart_df.drop('output', axis=1)   #Feature Matrix
y = heart_df['output']          #Target Variable

In [None]:
#Using Pearson Correlation
plt.figure(figsize=(12,10))
cor = heart_df.corr()
sns.heatmap(cor, annot=True, cmap=plt.cm.Reds)
plt.show()

In [None]:
#Correlation with output variable
cor_target = abs(cor['output'])

#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.4]
relevant_features

## Initial thoughts

Chest pain, max heart rate, angia from exercise, and ST depression all mean Rich is gonna die. Makes sense. We better hope he don't got none of that.

One of the assumptions of linear regression is that the independent variables need to be uncorrelated with each other. If these variables are correlated with each other, then we need to keep only one of them and drop the rest. So let us check the correlation of selected features with each other. This can be done either by visually checking it from the above correlation matrix or from the code snippet below

In [None]:
print(heart_df[["cp","thalachh"]].corr())
print(heart_df[["cp","exng"]].corr())
print(heart_df[["cp","oldpeak"]].corr())
print(heart_df[["exng","thalachh"]].corr())
print(heart_df[["exng","oldpeak"]].corr())
print(heart_df[["thalachh","oldpeak"]].corr())

There are no correlations above 0.5 so we will keep all the features.