# Heart Attack - Data analysis
![](https://i.imgur.com/SbbmOq2.jpg?1)

This is my first ever EDA on kaggle :).I took inspiration by going through a lot of notebooks and put in everything I understood.

The objective of this project is to provide insight about how people are affected by heart attacks and what are the conditions they face.

The dataset was taken from kaggle- [Heart attack](https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset?select=heart.csv),the tools for data analysis used in this project are the packages **Numpy and Pandas**, and to visualize and explore the data **Matplotlib and Seaborn**.
I submitted this as my project for course [Data Analysis with Python: Zero to Pandas](zerotopandas.com) by https://jovian.ai/ and www.freecodecamp.org
Thanks to [Data Analysis with Python: Zero to Pandas](zerotopandas.com) course I was able to learn all the tools required properly.

## Features of data
1. age - age in years

2. sex - sex (1 = male; 0 = female)

3. cp - chest pain type (1 = typical angina; 2 = atypical angina; 3 = non-anginal pain; 0 = asymptomatic)

4. trestbps - resting blood pressure (in mm Hg on admission to the hospital)

5. chol - serum cholestoral in mg/dl

6. fbs - fasting blood sugar > 120 mg/dl (1 = true; 0 = false)

7. restecg - resting electrocardiographic results (1 = normal; 2 = having ST-T wave abnormality; 0 = hypertrophy)

8. thalach - maximum heart rate achieved

9. exang - exercise induced angina (1 = yes; 0 = no)

10. oldpeak - ST depression induced by exercise relative to rest

11. slope - the slope of the peak exercise ST segment (2 = upsloping; 1 = flat; 0 = downsloping)

12. ca - number of major vessels (0-3) colored by flourosopy

13. thal - 2 = normal; 1 = fixed defect; 3 = reversable defect

14. num - the predicted attribute - diagnosis of heart disease (angiographic disease status) (Value 0 = < diameter narrowing; Value 1 = > 50% diameter narrowing)

### How to run the code

This is an executable [*Jupyter notebook*](https://jupyter.org) hosted on [Jovian.ml](https://www.jovian.ml), a platform for sharing data science projects. You can run and experiment with the code in a couple of ways: *using free online resources* (recommended) or *on your own computer*.

#### Option 1: Running using free online resources (1-click, recommended)

The easiest way to start executing this notebook is to click the "Run" button at the top of this page, and select "Run on Binder". This will run the notebook on [mybinder.org](https://mybinder.org), a free online service for running Jupyter notebooks. You can also select "Run on Colab" or "Run on Kaggle".


#### Option 2: Running on your computer locally

1. Install Conda by [following these instructions](https://conda.io/projects/conda/en/latest/user-guide/install/index.html). Add Conda binaries to your system `PATH`, so you can use the `conda` command on your terminal.

2. Create a Conda environment and install the required libraries by running these commands on the terminal:

```
conda create -n zerotopandas -y python=3.8 
conda activate zerotopandas
pip install jovian jupyter numpy pandas matplotlib seaborn opendatasets --upgrade
```

3. Press the "Clone" button above to copy the command for downloading the notebook, and run it on the terminal. This will create a new directory and download the notebook. The command will look something like this:

```
jovian clone notebook-owner/notebook-id
```



4. Enter the newly created directory using `cd directory-name` and start the Jupyter notebook.

```
jupyter notebook
```

You can now access Jupyter's web interface by clicking the link that shows up on the terminal or by visiting http://localhost:8888 on your browser. Click on the notebook file (it has a `.ipynb` extension) to open it.


## Downloading the Dataset

You will need your kaggle username and key  to download the data 

In [None]:
!pip install jovian opendatasets --upgrade --quiet

Let's begin by downloading the data, and listing the files within the dataset.

In [None]:
# Change this
dataset_url = 'https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset?select=heart.csv'

In [None]:
import opendatasets as od
od.download(dataset_url)

The dataset has been downloaded and extracted.

In [None]:
# Change this
data_dir = './heart-attack-analysis-prediction-dataset'

In [None]:
import os
os.listdir(data_dir)

Let us save and upload our work to Jovian before continuing.

In [None]:
project_name = "heart-attack-data-analysis" # change this (use lowercase letters and hyphens only)

## Data Preparation and Cleaning

- The dataset has 303 rows and 14 columns

In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv(data_dir + '/heart.csv')

In [None]:
df.head()

In [None]:
df.shape

In [None]:
df.describe()

**checking for null values**

In [None]:
df.isna().sum()

**The data has no null values so we dont need to make any changes there.**

In [None]:
df.info()

In [None]:
df.duplicated().sum()

**one of the rows is duplicated so we will remove it.**

In [None]:
df.drop_duplicates(inplace=True)

In [None]:
df.duplicated().sum()

In [None]:
df.shape

**After removing the duplicated row we have 302 rows and 14 columns**

## Exploratory Analysis and Visualization

Let's begin by importing`matplotlib.pyplot` and `seaborn`.

In [None]:
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

## Male VS Female ratio 
plotting a pie chart for visualizing the percentage of males vs females

In [None]:
male_count = df.sex.sum()
female_count = df.shape[0]-male_count
fig = plt.figure(figsize =(10, 7))
plt.title('Male vs Female')
plt.pie([male_count,female_count],labels = ["Male","Female"],autopct="%0.2f%%");

**Around 68.21% are Male patients and 31.79% are Female patients.**

`Amount of male patients is more than Female patients`

## Age distribution

**How different age groups are affected can be shown using a histogram**

In [None]:
plt.title('Age Distribution')
plt.hist(df['age'], bins=np.arange(30, 90, 10));

`maximum number of patients belong to the 50-60 year age group`

## Types of chest pains people suffer from

1 = typical angina

2 = atypical angina

3 = non-anginal pain

0 = asymptomatic

In [None]:
sns.countplot(data=df, x="cp");

`most of the people have type 0(asymptomatic) chest pain`

## Which chest pain type has a higher chance of being a heart attack?

0= less chance of heart attack 1= more chance of heart attack

In [None]:
new_df = df.groupby(['cp','output'], sort=True).size().reset_index(name='Count')

In [None]:
sns.barplot(x='cp', y='Count', hue='output', data=new_df);

- Type 0 chest pain has maximum count of output 0 which means that asymptomatic chest pains cause least number of heart attacks
- whereas type 2 chest pains have maximum count of output 1 which indicates that atypical angina causes maximum number of heart attacks

## Correlation heatmap

In [None]:
plt.figure(figsize=(14,8))
p=sns.heatmap(df.corr(), annot = False,cmap = 'Blues')

## Age Distribution

In [None]:
plt.figure(figsize = (10,6))
ax = sns.distplot(df['age'])
ax.set_title('Age distribution', fontsize = 20);

In [None]:
plt.figure(figsize = (15, 5))
#plt.style.use("ggplot")
sns.countplot(x = df["age"]);  # using countplot
plt.title("Age Distribution", fontsize=20)
plt.xlabel("AGE", fontsize = 15)
plt.ylabel("#HEART FAILURE CASES ", fontsize=15)
plt.show()

Distribution of patients from age 29(min) to 77(max)- 5 years difference

Most of the patients have age (50-60).In which maximum number of Patients have age 58.

In [None]:
plt.figure(figsize = (12, 8))
plt.style.use("ggplot")
sns.histplot(data = df, x = 'age', hue = 'output')
plt.title("AGE EFFECT ON THE HEART-ATTACK")
plt.xlabel("Age")
plt.ylabel("Count")
plt.show()

1 = heart failure (blue), 0 = no heart failure (pink)

## Age Distribution conclusion
According to the data :
Increasing age doesn't seem to affect the amount of heart attacks

There is no strong Relationship with age and heart attack.So we can't say with Increasing the Age There is high Chance of Heart attack or Low Chance of Heart Attack.

# Effect of Blood Pressure

In [None]:
sns.displot(df["trtbps"])
plt.title("DISTRIBUTION OF BLOOD PRESSURE AROUND PATIENTS",fontsize=20)
plt.xlabel("BLOOD PRESSURE",fontsize=20)
plt.ylabel("COUNT",fontsize=20)
plt.show()

`Maximum number of patients have their blood pressure in the range 120-140`

In [None]:
plt.figure(figsize=(20,10))
sns.lineplot(x='age',y='trtbps',data=df)
plt.xlabel("Age",fontsize = 20)
plt.ylabel("Blood Pressure",fontsize = 20)
plt.show()

**There is high chance of Increase in Blood Pressure in the body With Increase in Age**

In [None]:
plt.figure(figsize = (12, 8))
plt.style.use("ggplot")
sns.histplot(data = df, x = 'trtbps', hue = 'output') #1 = heart failure (blue), 0 = no heart failure (pink)
plt.title("EFFECT OF BLOOD PRESSURE ON THE HEART-ATTACK")
plt.xlabel("Blood Pressure")
plt.ylabel("Count")
plt.show()

In [None]:
plt.figure(figsize=(12,8))
sns.lineplot(x="age",y="trtbps",hue="output",data=df)
plt.title("EFFECT OF HEART ATTACK WITH INCREASE IN AGE AND BLOOD PRESSURE")
plt.show()

**Increasing in the Blood Pressure have high Risk of Heart Attack**

## Effect of Cholestrol

In [None]:
sns.displot(df["chol"])
plt.title("DISTRIBUTION OF CHOLESTROL LEVEL AROUND PATIENTS",fontsize=20)
plt.xlabel("CHOLESTROL LEVEL",fontsize=20)
plt.ylabel("COUNT",fontsize=20)
plt.show()

**Most of the patients Chlostrol level lies between (200-300).**

In [None]:
plt.figure(figsize=(20,10))
sns.lineplot(y="chol",x="age",data=df)
plt.title("CHOLESTROL LEVEL WITH AGE",fontsize=20)
plt.xlabel("AGE",fontsize=20)
plt.ylabel("CHOLESTROL LEVEL",fontsize=20)
plt.show()

**There is high chance of Increase in Cholestrol Level in the body with increase in Age**

In [None]:
plt.figure(figsize = (12, 8))
plt.style.use("ggplot")
sns.histplot(data = df, x = 'chol', hue = 'output') #1 = heart failure (blue), 0 = no heart failure (pink)
plt.title("EFFECT OF CHOLESTROL ON THE HEART-ATTACK")
plt.xlabel("Cholestrol")
plt.ylabel("Count")
plt.show()

In [None]:
plt.figure(figsize=(10,6))
sns.lineplot(x="age",y="chol",hue="output",data=df)
plt.title("EFFECT OF HEART ATTACK WITH INCREASE IN AGE AND CHOLESTROL")
plt.show()

**increasing Cholestrol Level in the body doesn't show a good Relation. We cant say if cholestrol effects heart attack**

## Effect of Heart rate

In [None]:
sns.displot(df["thalachh"])
plt.title("DISTRIBUTION OF HEART RATE AROUND PATIENTS",fontsize=20)
plt.xlabel("HEART RATE",fontsize=20)
plt.ylabel("COUNT",fontsize=20)
plt.show()

**Most of the patients Heart Rate lies between (140-175).**

In [None]:
plt.figure(figsize=(20,10))
sns.lineplot(y="thalachh",x="age",data=df)
plt.title("AGE VS HEART RATE",fontsize=20)
plt.xlabel("AGE",fontsize=20)
plt.ylabel("Heart Rate",fontsize=20)
plt.show()

**There is high chance of Increase in Heart Rate in the body with increase in Age**

In [None]:
plt.figure(figsize = (12, 8))
plt.style.use("ggplot")
sns.histplot(data = df, x = 'thalachh', hue = 'output') #1 = heart failure (blue), 0 = no heart failure (pink)
plt.title("EFFECT OF MAX HEART RATE ON THE HEART-ATTACK")
plt.xlabel("Heart rate")
plt.ylabel("Count")
plt.show()

In [None]:
plt.figure(figsize=(10,6))
sns.lineplot(x="age",y="thalachh",hue="output",data=df)
plt.title("EFFECT OF HEART ATTACK WITH INCREASE IN AGE AND MAXIMUM HEART RATE")
plt.show()

**Person with High Heart Rate Have High Risk of Heart Attack**

## Effect of Fasting Blood Sugar

fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)

In [None]:
plt.title('Fasting Blood Sugar')
sns.countplot(data=df, x="fbs")
plt.show()

**most of the patients have their fasting blood sugar level below 120 mg/dl**

In [None]:
plt.figure(figsize = (12, 8))
plt.style.use("ggplot")
sns.countplot(data=df, x="fbs",hue = 'output')
plt.title("EFFECT OF Fasting Blood Sugar ON THE HEART-ATTACK")
#plt.xlabel("Blood Pressure")
#plt.ylabel("Count")
plt.show()

**patients with blood sugar below 120 mg/dl are affected more by heart attacks**

## Inferences and Conclusion

1. With increasing the age there is no strong relationship of Heart-Attack .
2. Increasing in Blood Pressure will have high Risk of heart attack
3. Increasing in Cholestrol Level will have high Risk of heart attack
4. Increasing in heart rate will have high Risk of heart attack