# Introduction

Every day, the average human heart beats around 100,000 times, pumping 2,000 gallons of blood through the body. Inside your body there are 60,000 miles of blood vessels.

Cardiovascular diseases (CVDs) or heart disease are the number one cause of death globally with millions death cases each year. CVDs are concertedly contributed by hypertension, diabetes, overweight and unhealthy lifestyles.

![](https://www.heartresearch.com.au/wp-content/uploads/2016/08/shutterstock_208215142-500x250.jpg)

Loading the initial libraries

In [None]:
import pandas as pd
import numpy as np 
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

Let us load the data set

In [None]:
heart_df= pd.read_csv('../input/heart-disease-uci/heart.csv')

In [None]:
heart_df.head()

In [None]:
heart_df.tail()

In [None]:
heart_df.info()

Data contains;

1. age - age in years
2. sex - (1 = male; 0 = female)
3. cp - chest pain type
4. trestbps - resting blood pressure (in mm Hg on admission to the hospital)
5. chol - serum cholestoral in mg/dl
6. fbs - (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg - resting electrocardiographic results
8. thalach - maximum heart rate achieved
9. exang - exercise induced angina (1 = yes; 0 = no)
10. oldpeak - ST depression induced by exercise relative to rest
11. slope - the slope of the peak exercise ST segment
12. ca - number of major vessels (0-3) colored by flourosopy
13. thal - 3 = normal; 6 = fixed defect; 7 = reversable defect
14. target - have disease or not (1=yes, 0=no)

In [None]:
heart_df.shape

From above data description, we can say thaat, dataset has 303 rows, 14 columns. Data set does not have any null values.

Count Plot for Target column

In [None]:
plt.figure(figsize=(7,5)) 
sns.countplot(x="target", data=heart_df, palette=('Orange','DarkBlue'))
plt.xlabel("Heart Disease (0 = No, 1= Yes)")
plt.show()

In [None]:
plt.figure(figsize=(7,5)) 
sns.countplot(x="sex", data=heart_df, hue="target", palette=('Orange','DarkBlue'))
plt.show()

In [None]:
plt.figure(figsize=(12,7)) 
sns.countplot(x="age", data=heart_df, hue="target", palette=('Orange','DarkBlue'))
plt.show()

In [None]:
plt.figure(figsize=(7,5)) 
sns.countplot(x="slope", data=heart_df, hue="target", palette=('Orange','DarkBlue'))
plt.xlabel('The Slope of The Peak Exercise ST Segment')
plt.ylabel('Frequency of Disease or Not')
plt.show()

In [None]:
plt.figure(figsize=(7,5)) 
sns.countplot(x="fbs", data=heart_df, hue="target", palette=('Orange','DarkBlue'))
plt.xlabel('FBS - (Fasting Blood Sugar > 120 mg/dl) (1 = true; 0 = false)')
plt.ylabel('Frequency of Disease or Not')
plt.show()

In [None]:
plt.figure(figsize=(7,5)) 
sns.countplot(x="cp", data=heart_df, hue="target", palette=('Orange','DarkBlue'))
plt.xlabel('Chest Pain Type')
plt.ylabel('Frequency of Disease or Not')
plt.show()

In [None]:
corr=heart_df.corr()
plt.figure(figsize=(15,15))
sns.heatmap(corr, cmap="RdYlGn", annot=True)

Now, let's start building the model

In [None]:
heart_df1=heart_df

Using get_dummies() to convert any categorical column into dummy or indicator variables.

In [None]:
heart_df1=pd.get_dummies(heart_df1)

In [None]:
heart_df1.head()

In [None]:
heart_df1.describe()

Let's assing x and y values and then scale the data set

In [None]:
x = heart_df1.drop(['target'], axis = 1)
y = heart_df1['target']

In [None]:
from sklearn.preprocessing import StandardScaler

sc=StandardScaler()
x=sc.fit_transform(x)

Let's split the data into train and test data set

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.3, random_state = 0)

Finding shape of split data

In [None]:
x_train.shape, x_test.shape

In [None]:
y_train.shape, y_test.shape

Now, let's build logistic model on our data set

In [None]:
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
logreg.fit(x_train, y_train)

Predict on the top of test data

In [None]:
y_pred = logreg.predict(x_test)

Creating confusion matrix

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
confmat = confusion_matrix(y_pred, y_test)
confmat

And here is the accuracy

In [None]:
accuracy_score(y_pred, y_test)

The prediction accuracy for the test data set using the above Logistic Regression Model is **81.31%**