# **Heart Attack - EDA** 


# **INTRODUCTION**

# What is heart attack?
 
A heart attack happens when the flow of oxygen-rich blood in one or more of the coronary arteries, which supply the heart muscle, suddenly becomes blocked, and a section of heart muscle can’t get enough oxygen. The blockage is usually caused when a plaque ruptures. If blood flow isn’t restored quickly, either by a medicine that dissolves the blockage or a catheter placed within the artery that physically opens the blockage, the section of heart muscle begins to die

About this dataset

    Age : Age of the patient

    Sex : Sex of the patient

    exang: exercise induced angina (1 = yes; 0 = no)

    ca: number of major vessels (0-3)

    cp : Chest Pain type chest pain type
        Value 1: typical angina
        Value 2: atypical angina
        Value 3: non-anginal pain
        Value 4: asymptomatic

    trtbps : resting blood pressure (in mm Hg)

    chol : cholestoral in mg/dl fetched via BMI sensor

    fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)

    rest_ecg : resting electrocardiographic results
        Value 0: normal
        Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
        Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria

    thalach : maximum heart rate achieved

    target : 0= less chance of heart attack 1= more chance of heart attack


In [None]:
#Now, we have to importing libraries
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns  # visualization tool
import math

In [None]:
#We need to import dataset
data = pd.read_csv("../input/heart-attack-analysis-prediction-dataset/heart.csv")
print("Shape of the dataset")
# shape gives number of rows and columns in a tuble
data.shape

In [None]:
print(list(data.columns))

In [None]:
# head shows first 5 rows
data.head()

In [None]:
# tail shows last 5 rows
data.tail()

In [None]:
# info gives data type like dataframe, number of sample or row, number of feature or column, feature types and memory usage
print("Basic infomation about dataset")
data.info()


*     You can see that there are no missing rows in the entire dataset. So we do not need to fil/drop any value
*     All the columns except oldpeak (float) are of int data type.


In [None]:
data.isnull().sum()

* There are no missing values.

In [None]:
print("Description of data")

data.describe().T.style.bar(subset=['mean'],color='#205ff2').background_gradient(subset=['std','25%','50%','75%'],cmap="coolwarm")

In [None]:
data.corr()

In [None]:
#correlation map
f, ax = plt.subplots(figsize = (10,10))
sns.heatmap(data.corr(), annot=True, linewidths=.5, fmt= '.1f',ax=ax)
plt.show()

As we can see from the above graph and table, heart attack have positive correlations with chest pain,heart rate and slope whereas have negative correlation age, induced engina and major vessels.

In [None]:
# Histogram
# bins = number of bar in figure
data.age.plot(kind = 'hist',bins = 50,figsize = (5,5))
#range= (0,250),normed = True
plt.show()

In [None]:
# Find the heart attack percentage in the dataset
#val_counts: frequency counts
val_counts = data["output"].value_counts()
print(val_counts)
no_heart_attack = (val_counts[0] / data.shape[0]) * 100
heart_attack = (val_counts[1] / data.shape[0]) * 100

print(f"Heart Attack: {math.floor(heart_attack)}%")
print(f"No Heart Attack: {math.ceil(no_heart_attack)}%")

print()

sns.barplot(x = ["No Heart Attack", "Heart Attack"], y = [no_heart_attack, heart_attack])
plt.show()

* Black line at top is max
* Blue line at top is 75%
* Green line is median (50%)
* Blue line at bottom is 25%
* Black line at bottom is min

In [None]:
data.boxplot(column='chol',by = 'sex')

In [None]:
# Plotting all data 
data1 = data.loc[:,["chol","trtbps","age"]]
data1.plot()
# it is confusing

In [None]:
# subplots
data1.plot(subplots = True)
plt.show()

* Scatter plot is used to see the correlation between two values

In [None]:
# scatter plot  
data1.plot(kind = "scatter",x="age",y = "chol")
plt.show()

In [None]:
# hist plot  
#data1.plot(kind = "hist",y = "chol",bins = 10,range= (110,350),normed = True)