# Predicting Heart Attack in a patient
## Nikhil Kumar Singh
### This is my first project on Kaggle. Any suggestion to improvise would be warmly welcomed.
### **1. Introduction:**
[**Coronary artery disease (CAD)**](https://en.wikipedia.org/wiki/Coronary_artery_disease), also known as coronary heart disease (CHD), or simply heart disease, usually involves a reduction in the blood flow due to blockage in the arteries of the heart. The heart disease are usually of the types stable angina, unstable angina, myocardial infarction, and sudden cardiac death. One of the most common symptom is shooting chest pain which may travel to the left shoulder or the arm. In many cases there are no symptoms of a heart disease but some usual symptoms mat include shortness of breath and emotional stress. Mostly the first sign of a heart disease in not clear until a heart attack.<br>
    Thus, having a machine learning model to predict the occurance of a heart disease based on various parameters which could detect early heart disease may prove to be quintessential in averting the heart attack.<br>

#### *1.1 Importing the Data and important libraries*

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline

In [None]:
Data_heart=pd.read_csv(r'../input/heart-disease-uci/heart.csv')
Data_heart_copy=Data_heart.copy()
Data_heart_copy.head()

### Attribute Description
**age**: age in years<br>
**sex**: (1 = male; 0 = female)<br>
**cp**: *chest pain type* (typical angina, atypical angina, non-angina, or asymptomatic angina)<br>
**trestbps**: *resting blood pressure* (in mm Hg on admission to the hospital)<br>
**chol**: *serum cholestoral* in mg/dl<br>
**fbs**: *Fasting blood sugar* (< 120 mg/dl or > 120 mg/dl) (1 = true; 0 = false)<br>
**restecg**: *resting electrocardiographic results* (normal, ST-T wave abnormality, or left ventricular hypertrophy)<br>
**thalach**: *Max. heart rate achieved during thalium stress test*<br>
**exang**: *Exercise induced angina* (1 = yes; 0 = no)<br>
**oldpeak**: *ST depression induced by exercise relative to rest*<br>
**slope**: *Slope of peak exercise ST segment* (0 = upsloping, 1 = flat, or 2 = downsloping)<br>
**ca**: *number of major vessels (0-3) colored by flourosopy* 4 = NA<br>
**thal**: *Thalium stress test result* 3 = normal; 6 = fixed defect; 7 = reversable defect 0 = NA<br>
**target**: *Heart disease status* 1 or 0 (0 = heart disease 1 = asymptomatic)<br>

#### *1.2. Investigating missing rows using missingno library*

In [None]:
msno.bar(Data_heart)
plt.show()

#### **Fortunately there are no missing numbers and we can proceed ahead with the original data sine there is no need to clean the dataset**

### 2. Exploratory Data Analysis 

#### *2.1. Recoding categorical variables with the labels*

In [None]:
visuals={"sex":{1:"Male",0:"Female"},
         "cp":{0:"typical angina",1: "atypical angina" ,2: "non-anginal pain" ,3: "asymptomatic"},
         "fbs":{0:"<=120",1:">120"},
         "exang":{0:"no",1:"yes"},
         "restecg" :{0:"normal" ,1:"ST-T wave abnormality",2:"probable or definite left ventricular hypertrophy"},
         "target" :{ 0:"No Heart Disease",1 : "heart-disease"},
         "slope" :{2 : "upsloping",1 :"flat",0 : "downsloping"},
         "thal" :{ 1 : "fixed defect",0 : "normal",2 : "reversable defect",3:"NA"}
         
}
Data_heart_copy.replace(visuals,inplace=True)

#### *2.2 Understanding the distribution of numerical variables*

In [None]:
plt.figure(figsize=(15,15))
for i, col in enumerate(['age', 'trestbps', 'chol','thalach','oldpeak', 'ca']):
    plt.subplot(3,2,i+1)
    sns.distplot(Data_heart_copy[col],hist=False)
plt.show()

#### *2.3 Understanding the count of categorical variables*

In [None]:
plt.figure(figsize=(25,35))
for i, col in enumerate(['sex', 'cp', 'fbs', 'restecg','exang','slope', 'thal', 'target']):
    plt.subplot(4,2,i+1)
    sns.countplot(x=col,data=Data_heart_copy)
plt.show()

#### *2.4 Age vs Heart Disease*

In [None]:
sns.catplot(x='target',y='age',hue='sex',data=Data_heart_copy,kind='violin')
plt.show()

##### As could be seen from the above graph, people in the range of [40 to 70](http://www.secondscount.org/treatments/treatments-detail-2/who-is-affected-by-cardiovascular-disease#.XxxuFJ4zbIU) suffers the most from the heart disease.

#### *2.5 Sex vs Heart Disease*

In [None]:
sns.catplot('target',col='sex',data=Data_heart_copy,kind='count')
plt.show()

##### **As can be seen from the graph, the propotion of males suffering from heart disease is more in males as compared to that in females.**

#### *2.6 Chest Pain vs Heart Disease*

In [None]:
sns.catplot('target',col='cp',data=Data_heart_copy,kind='count')
plt.show()

##### **Most proportion of people suffering from heart disease shows [atypical anginal pain](https://www.harringtonhospital.org/typical-and-atypical-angina-what-to-look-for/).<br> However, in most of the cases there is very subtle difference between atypical anginal pain and non-anginal pain.**

#### *2.7 Resting Ecg vs Heart Disease*

In [None]:
sns.catplot('target',col='restecg',data=Data_heart_copy,kind='count')
plt.show()

#### **People showing an abnormality in [ST-T wave](https://ecg.utah.edu/lesson/10) generally have a heart condition and this could play a major role in early detection of heart disease.**

#### *2.8 Thalium Test Result vs Heart Disease*

In [None]:
sns.catplot('target',col='thal',data=Data_heart_copy,kind='count')
plt.show()

##### **As can be seen from above people with defects in their arteries during [Thalium Test](https://www.healthline.com/health/thallium-stress-test) are more prone to heart disease and thus it could prove to be a good predictor for heart disease** 

#### *2.8 Max. heart rate achieved during thalium stress test vs Heart Disease*

In [None]:
sns.boxplot(x="target",y="thalach",data=Data_heart_copy)
plt.ylabel("Max. heart rate achieved during thalium stress test ")
plt.xlabel("Disease Condition")
plt.show()
plt.show()

##### **It is intutive since people's heart with a disease has to do more work as compared to people who do not have heart-disease.**

#### *2.9 Cholestrol vs Heart Disease*

In [None]:
sns.boxplot(x="target",y="chol",data=Data_heart_copy)
plt.ylabel("Cholestrol")
plt.xlabel("Disease Condition")
plt.show()

#### *2.10 Resting Blood Pressure vs Heart Disease*

In [None]:
sns.boxplot(x="target",y="trestbps",data=Data_heart_copy)
plt.ylabel("Resting Blood Pressure")
plt.xlabel("Disease Condition")
plt.show()