Heart disease describes a range of conditions that affect your heart. Heart diseases include:

* Blood vessel disease, such as coronary artery disease
* Heart rhythm problems (arrhythmias)
* Heart defects you're born with (congenital heart defects)
* Heart valve disease
* Disease of the heart muscle
* Heart infection


Many forms of heart disease can be prevented or treated with healthy lifestyle choices. lets start doing EDA and Build the model.

![](https://miro.medium.com/max/6300/1*CEwJ9Ai7LEytGq-tPmMafg.jpeg)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


In [None]:
heart=pd.read_csv('../input/heart-disease-uci/heart.csv')

In [None]:
heart.info()

**Data contains;**

age - age in years

sex - (1 = male; 0 = female)

cp - chest pain type

trestbps - resting blood pressure (in mm Hg on admission to the hospital)

chol - serum cholestoral in mg/dl
fbs - (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)

restecg - resting electrocardiographic results

thalach - maximum heart rate achieved

exang - exercise induced angina (1 = yes; 0 = no)

oldpeak - ST depression induced by exercise relative to rest

slope - the slope of the peak exercise ST segment

ca - number of major vessels (0-3) colored by flourosopy

thal - 3 = normal; 6 = fixed defect; 7 = reversable defect

target - have disease or not (1=yes, 0=no)

In [None]:
heart.describe

In [None]:
heart.head()

In [None]:
heart.isnull().sum()

In [None]:
heart.shape

In [None]:
heart['target'].value_counts()

In [None]:
heart.dtypes

In [None]:
heart['target'].unique

In [None]:
plt.figure(figsize=(7,5)) 
sns.countplot(x="target", data=heart, palette=('Orange','DarkBlue'))
plt.xlabel("Heart Disease (0 = No, 1= Yes)")
plt.show()

In [None]:
plt.figure(figsize=(10,9)) 
sns.countplot(x="age", hue="target",data=heart, palette=('Orange','DarkBlue'))
plt.show()

In [None]:
plt.figure(figsize=(7,5)) 
sns.countplot(x="slope", data=heart, hue="target", palette=('Orange','DarkBlue'))
plt.xlabel('The Slope of The Peak Exercise ST Segment')
plt.ylabel('Frequency of Disease or Not')
plt.show()

In [None]:
plt.figure(figsize=(7,5)) 
sns.countplot(x="cp", data=heart, hue="target", palette=('Orange','DarkBlue'))
plt.xlabel('Chest Pain Type')
plt.ylabel('Frequency of Disease or Not')
plt.show()

In [None]:
plt.figure(figsize=(7,5)) 
sns.countplot(x="fbs", data=heart, hue="target", palette=('Orange','DarkBlue'))
plt.xlabel('FBS - (Fasting Blood Sugar > 120 mg/dl) (1 = true; 0 = false)')
plt.ylabel('Frequency of Disease or Not')
plt.show()

In [None]:
corr=heart.corr()
plt.figure(figsize=(15,15))
sns.heatmap(corr, cmap="RdYlGn", annot=True)

Now, let's start building the model

In [None]:
heart_df1=heart

Let Do Feature Encoding

In [None]:
heart= pd.get_dummies(heart, columns=['cp', 'slope', 'thal' ], drop_first= True)



In [None]:
heart.head()

In [None]:
heart.describe

In [None]:
x = heart.drop(['target'], axis = 1)
y = heart['target']

In [None]:
from sklearn.preprocessing import StandardScaler

sc=StandardScaler()
x=sc.fit_transform(x)

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.3, random_state = 0)

In [None]:
x_train.shape, x_test.shape

In [None]:
y_train.shape, y_test.shape

Build Logistric Regression 

In [None]:
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
logreg.fit(x_train, y_train)

In [None]:
y_pred = logreg.predict(x_test)

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
confmat = confusion_matrix(y_pred, y_test)
confmat

In [None]:
accuracy_score(y_pred, y_test)

Build Random Forest

In [None]:
from sklearn.ensemble import RandomForestClassifier

random_forest = RandomForestClassifier(criterion = "gini", 
                                       min_samples_leaf = 1, 
                                       min_samples_split = 10,   
                                       n_estimators=100, 
                                       max_features='auto', 
                                       oob_score=True, 
                                       random_state=1, 
                                       n_jobs=-1)

random_forest.fit(x_train, y_train)
y_pred = random_forest.predict(x_test)
random_forest.score(x_train, y_train)
print("Score: ", round(random_forest.oob_score_, 4)*100, "%")