# Heart Disease Prediction

This Notebook has a model which tells a person have heart disease or not. And dataset includes all heart disease information. 
Dataset Columns here are : 
* age
* sex
* chest pain type (4 values)
* resting blood pressure
* serum cholestoral in mg/dl
* fasting blood sugar > 120 mg/dl
* resting electrocardiographic results (values 0,1,2)
* maximum heart rate achieved
* exercise induced angina
* oldpeak = ST depression induced by exercise relative to rest
* the slope of the peak exercise ST segment
* number of major vessels (0-3) colored by flourosopy
* thal: 3 = normal; 6 = fixed defect; 7 = reversable defect


#### Import Relevent Libraries 

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
data = pd.read_csv('../input/heart-disease-uci/heart.csv')

In [None]:
# Look at our dataset
data.head()

Just look at the dataset it contains only integer values so **No need** to do any preprocessing in this.

Then look at our Target values it is either 0 or 1 so **What is this**
Guys : it is a classification problem as simple as that.

In [None]:
data.target.unique()

In [None]:
# info method tells about the datatypes of all columns in the dataset.
data.info()

In [None]:
# It is a small dataset as you can see just 303 values
data.shape

In [None]:
data.describe()

##### Split our Dataset into X and y

In [None]:
X = data.iloc[:, 0:13].values
y = data.iloc[:, 13].values

In [None]:
X

In [None]:
y

As we can see that no need of preprocessing here. (Very Good).

### Feature Engineering

**Extra Tree Classifier** is a classifier which gives us imoprtances of all features. Actually the story is it compare all the feature in finding of Target value individually. 

In [None]:
from sklearn.ensemble import ExtraTreesClassifier

In [None]:
# n_estimator is parameter which try through all the dataset n times
clf = ExtraTreesClassifier(n_estimators=100, random_state=0)

In [None]:
clf.fit(X, y)

#### Importances of Features & Graph Plotting

In [None]:
important_features = clf.feature_importances_

In [None]:
important_features.max()

In [None]:
important_features

In [None]:
columns = data.columns

In [None]:
cols = columns[0:13]

In [None]:
cols

In [None]:
plt.figure(figsize = (15, 5))
plt.bar(cols, important_features, color = "green")

plt.xlabel("Name of Columns")
plt.ylabel("Columns Importances")

plt.title("Important Features Graph")
plt.show()

In [None]:
print(important_features)
print(cols)

This code is for finding the column name from data which has greater feature importances than a particular threshold value
**Note : In this case it is 0.08**

In [None]:
important_columns = []
for i in range(len(important_features)):
    if(important_features[i] >= 0.08):
        important_columns.append(cols[i])

In [None]:
important_columns

Now we have our new X and y here. Yeah y is same as previous but X has some changes in it.

In [None]:
X_new = data.iloc[:, [2, 7, 8, 9, 11, 12]].values

In [None]:
X_new

In [None]:
y

### Splittion of Dataset into Train and Test

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X_new, y, test_size = 0.1)

In [None]:
X_train

In [None]:
y_train

In [None]:
X_test

In [None]:
y_test

Till now we have done our preprocesings or Feature extractions and all.

### Model Making

In [None]:
from sklearn.tree import DecisionTreeClassifier as dtc

In [None]:
classifier = dtc(criterion='gini', splitter='best', max_depth=None, min_samples_split=2, 
                 min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, 
                 random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, 
                 min_impurity_split=None, class_weight=None)

In [None]:
classifier.fit(X_train, y_train)

In [None]:
y_pred = classifier.predict(X_test)

In [None]:
y_pred

In [None]:
y_test

#### Cross Val Scores

In [None]:
from sklearn.model_selection import cross_val_score
scores = cross_val_score(classifier, X_train, y_train, cv=5)

In [None]:
scores

In [None]:
classifier.score(X_test, y_test)

#### Confusion Matrix

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score
from sklearn.metrics import confusion_matrix

In [None]:
conf_matrix = confusion_matrix(y_true=y_test, y_pred=y_pred)

In [None]:
conf_matrix

#### Precision Score

In [None]:
print('Precision: %.3f' % precision_score(y_test, y_pred))

#### Recall Score

In [None]:
print('Recall: %.3f' % recall_score(y_test, y_pred))

#### Accuracy Score

In [None]:
print('Accuracy: %.3f' % accuracy_score(y_test, y_pred))

#### F1 score

In [None]:
print('F1 Score: %.3f' % f1_score(y_test, y_pred))

**Hope you guys enjoyed this notebook. Don't forget to Upvote this notebook.**

**Thanks for reading.**