# Analysing the factors affecting heart disease

In [None]:
#Import python libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.model_selection import train_test_split # Import train_test_split function
from sklearn import svm #Import svm model
from sklearn import metrics #Import scikit-learn metrics module for accuracy calculation
import seaborn as sns
import matplotlib.pyplot as plt

**Importing the Heart data**

In [None]:
#Import the heart data
data = pd.read_csv("../input/heart-disease-data/heart.csv")
data.head()

**Display information(info) about the 'data'**

In [None]:
data.info()

**Display the size(shape) of data**

In [None]:
data.shape

**Display basic statistics (describe) about the 'data'**

In [None]:
data.describe()

The dataset contains the following features:<br>
**1. age(in years)**<br>
**2. sex:** (1 = male; 0 = female)<br>
**3. cp:** chest pain type (0,1,2,3)<br>
**4. trestbps:** resting blood pressure (in mm Hg on admission to the hospital)<br>
**5. chol:** serum cholestoral in mg/dl<br>
**6. fbs:** (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)<br>
**7. restecg:** resting electrocardiographic results<br>
**8. thalach:** maximum heart rate achieved<br>
**9. exang:** exercise induced angina (1 = yes; 0 = no)<br>
**10. oldpeak**: ST depression induced by exercise relative to rest<br>
**11. slope:** the slope of the peak exercise ST segment<br>
**12. ca:** number of major vessels (0-3) colored by flourosopy<br>
**13. thal:** 3 = normal; 6 = fixed defect; 7 = reversable defect<br>
**14. target:** 1 or 0 <br>

**Compare the features 'age','sex','cp','trestbps','target' against the 'target' using pairplot**

In [None]:
sns.pairplot(data=data[['age','sex','cp','trestbps','target']],hue='target')

<font color='blue'>**OBSERVATION :** Heart disease has correlation with "Chest pain type(cp)" and "Age"</font>

**Compare the features 'chol','fbs','restecg','thalach','target' against the 'target' using pairplot**

In [None]:
sns.pairplot(data=data[['chol','fbs','restecg','thalach','target']],hue='target')

<font color='blue'>**OBSERVATION :** Heart disease has correlation with "maximum heart rate achieved (thalach)"</font>

**Compare the features 'exang','oldpeak','slope','target' against the 'target' using pairplot**

In [None]:
sns.pairplot(data=data[['exang','oldpeak','slope','target']],hue='target')

<font color='blue'>**OBSERVATION :** Heart disease has NO correlation with with any of the features shown above</font>

**Find the correlation(corr) of the features in the 'data'**

In [None]:
data.corr()

**Find the heatmap of correlation(corr) of the features in the 'data'**

In [None]:
ax = plt.subplots(figsize=(12, 5));
ax = sns.heatmap(data.corr(), annot = True) ;

<font color='blue'>**OBSERVATION :** cp, thalach and slope are showing +ve correlation and exang, oldpeak and ca are showing -ve correlation</font>

**Find the correlation(corr) of the features against 'target' in the 'data'**

In [None]:
data.corr()['target'].sort_values(ascending=False)

<font color='blue'>**OBSERVATION :** cp, thalach and slope are showing +ve correlation and exang, oldpeak and ca are showing -ve correlation</font>

**Visualize the 'data' using boxenplot - 'target' vs 'age'**

In [None]:
sns.boxenplot(x='target', y='age', data=data)

<font color='blue'>**OBSERVATION :** Heart disease is more prevalent in lower age group</font>

**Visualize the 'data' using boxenplot - 'target' vs 'thalach'**

In [None]:
sns.boxenplot(x='target', y='thalach', data=data)

<font color='blue'>**OBSERVATION :** Heart disease is more prevalent in the people who have higher thalach (maximum heart rate achived)</font>

**Visualize the 'data' using boxenplot - 'target' vs 'oldpeak'**

In [None]:
sns.boxplot(x='target', y='oldpeak', data=data)

<font color='blue'>**OBSERVATION :** Heart disease is more prevalent in the people who have lower old peak</font>

**Compare 'target' with 'sex' using crosstab**

In [None]:
pd.crosstab(data.target, data.sex, margins=True)

<font color='blue'>**OBSERVATION :** Males are more prone to heart disease as compared to females</font>

**Compare 'target' with 'cp' using crosstab**

In [None]:
#pd.crosstab(data.target, data.cp, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')
pd.crosstab(data.target, data.cp, margins=True)

<font color='blue'>**OBSERVATION :** People who have Chest pain type 2 are more prone to Heart disease</font>

**Compare 'target' with 'exang' using crosstab**

In [None]:
pd.crosstab(data.target, data.exang, margins=True)

<font color='blue'>**OBSERVATION :** People who do not have exercise induced angina (exang) are more prone to heart disease</font>

**Compare 'target' with 'ca' using crosstab**

In [None]:
pd.crosstab(data.target, data.ca, margins=True)

<font color='blue'>**OBSERVATION :** Peope who have no major vessel coloured by flourscopy (ca) are more prone to heart disease</font>