## Set up

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

import os
print(os.listdir())

import warnings
warnings.filterwarnings('ignore')

['.git', '.gitignore', '.ipynb_checkpoints', 'classification.ipynb', 'Heart.csv', 'initial_exploration.ipynb', 'linear_regression.ipynb', 'README.md']


## Features: All the features except target
1.Target: Predictions are made if a person has a heart disease or not.<br>
2.Based on the features(sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, thal)that are being considered, a model will be developed which will predict if a person is suffering from heart disease or not

In [3]:
X = heart_data.iloc[:, :-1].values
y = heart_data.iloc[:, -1].values

In [5]:
from sklearn import preprocessing
  
min_max_scaler = preprocessing.MinMaxScaler(feature_range =(0, 1))
  
X_preprocessed = min_max_scaler.fit_transform(X)

In [6]:
from sklearn.model_selection import train_test_split

X_train,X_test,Y_train,Y_test = train_test_split(X_preprocessed,y,test_size=0.20,random_state=0)

## Decision Tree

In [7]:
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0, splitter = 'best', max_depth = 6)
classifier.fit(X_train, Y_train)

## Evaluation

In [8]:
y_pred = classifier.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred),1), Y_test.reshape(len(Y_test),1)),1))

[[1 1]
 [1 1]
 [1 0]
 [1 0]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [0 1]
 [0 0]
 [0 0]
 [0 0]
 [1 1]
 [0 0]
 [1 1]
 [1 1]
 [1 0]
 [1 1]
 [1 1]
 [1 1]
 [0 0]
 [0 0]
 [1 1]
 [1 0]
 [0 0]
 [0 1]
 [0 1]
 [0 0]
 [0 0]
 [1 0]
 [0 0]
 [0 1]
 [0 1]
 [1 1]
 [1 1]
 [0 0]
 [0 1]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [0 0]
 [1 1]
 [1 1]
 [0 0]
 [1 1]
 [1 0]
 [0 1]
 [1 1]
 [0 0]
 [1 0]
 [1 1]
 [0 0]
 [1 1]
 [0 0]]


## Computing metrics

In [10]:
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
cm = confusion_matrix(Y_test, y_pred)
print(cm)
print("Accuracy is ",accuracy_score(Y_test, y_pred))
print("Precision is ",precision_score(Y_test, y_pred))
print("Sensitivity is ", recall_score(Y_test, y_pred))
print("F1 is", f1_score(Y_test, y_pred))

[[22  7]
 [ 7 25]]
Accuracy is  0.7704918032786885
Precision is  0.78125
Sensitivity is  0.78125
F1 is 0.78125


## Training the Support Vector Classifier on training the model

In [11]:
from sklearn.svm import SVC
s_classifier = SVC(kernel = 'linear', random_state = 0)
s_classifier.fit(X_train, Y_train)

### Predicting the values of the test set using support vector model

In [12]:
y_pred = s_classifier.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred),1), Y_test.reshape(len(Y_test),1)),1))

[[1 1]
 [1 1]
 [0 0]
 [1 0]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [0 0]
 [0 0]
 [0 0]
 [1 1]
 [0 0]
 [1 1]
 [1 1]
 [1 0]
 [1 1]
 [1 1]
 [1 1]
 [0 0]
 [0 0]
 [1 1]
 [1 0]
 [0 0]
 [0 1]
 [0 1]
 [1 0]
 [0 0]
 [1 0]
 [0 0]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [0 0]
 [1 1]
 [0 0]
 [1 0]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [0 0]
 [1 1]
 [1 1]
 [0 0]
 [1 1]
 [1 0]
 [1 1]
 [1 1]
 [0 0]
 [0 0]
 [1 1]
 [1 0]
 [1 1]
 [0 0]]


### Evaluating

In [15]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(Y_test, y_pred)
print(cm)
accuracy_score(Y_test, y_pred)

[[21  8]
 [ 2 30]]


0.8360655737704918

## Results
### 1.Accuracy for SVC is 83% which is higher than the accuracy of Decision Tree Classifier that is 78%.
### 2. SVM performed better than Decision Tree Classifier.


## Final  Evaluation

In [17]:
y_prediction=classifier.predict(X_test)
print(y_prediction)

[1 1 1 1 1 1 1 1 0 0 0 0 1 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 1 0 0
 0 0 0 0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 0 1 1 0 1 0]


In [18]:
y_pred_svc=s_classifier.predict(X_test)
print(y_pred_svc)

[1 1 0 1 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 0 0 1 1 0 0 0 1 0 1 0 1 1 1 1 0 1
 0 1 0 0 0 0 1 1 1 1 0 1 1 0 1 1 1 1 0 0 1 1 1 0]


## Parameters found
random_state, splitter, max_depth are the parameters used in decision tree classifier. I used best for splitter and max depth as 5, If I increase max depth then it leads to overfitting. If max depth is less, it will lead to underfitting. And In Support Vector classifiers I have tried with different kernels and finally considered the best one.
