# PREDICT HEART ATTACK BY USING SVM 

Support Vector Machine (SVM) is a supervised machine learning algorithm capable of performing classification, regression and even outlier detection. The linear SVM classifier works by drawing a straight line between two classes. All the data points that fall on one side of the line will be labeled as one class and all the points that fall on the other side will be labeled as the second. The classifier separates data points using a hyperplane with the largest amount of margin. 

In [4]:
import pandas as pd
import numpy as np

**1. Load the required dataset**

In [5]:
df = pd.read_csv('/Users/prashastisaraf/Downloads/heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall,output
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


**2. Exploring the dataset**

In [6]:
df.shape

(303, 14)

There are 14 variables in the dataset. The first 13 are our features. The last variable 'output'is the target variable.

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       303 non-null    int64  
 1   sex       303 non-null    int64  
 2   cp        303 non-null    int64  
 3   trtbps    303 non-null    int64  
 4   chol      303 non-null    int64  
 5   fbs       303 non-null    int64  
 6   restecg   303 non-null    int64  
 7   thalachh  303 non-null    int64  
 8   exng      303 non-null    int64  
 9   oldpeak   303 non-null    float64
 10  slp       303 non-null    int64  
 11  caa       303 non-null    int64  
 12  thall     303 non-null    int64  
 13  output    303 non-null    int64  
dtypes: float64(1), int64(13)
memory usage: 33.3 KB


In [11]:
df.duplicated()

0      False
1      False
2      False
3      False
4      False
       ...  
298    False
299    False
300    False
301    False
302    False
Length: 303, dtype: bool

In [12]:
df.describe()

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall,output
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.366337,0.683168,0.966997,131.623762,246.264026,0.148515,0.528053,149.646865,0.326733,1.039604,1.39934,0.729373,2.313531,0.544554
std,9.082101,0.466011,1.032052,17.538143,51.830751,0.356198,0.52586,22.905161,0.469794,1.161075,0.616226,1.022606,0.612277,0.498835
min,29.0,0.0,0.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,47.5,0.0,0.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0,2.0,0.0
50%,55.0,1.0,1.0,130.0,240.0,0.0,1.0,153.0,0.0,0.8,1.0,0.0,2.0,1.0
75%,61.0,1.0,2.0,140.0,274.5,0.0,1.0,166.0,1.0,1.6,2.0,1.0,3.0,1.0
max,77.0,1.0,3.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,2.0,4.0,3.0,1.0


**3. Separate Features and target variables**

In [13]:
feature = df.drop(columns=['output'],axis=1)
feature

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3


In [14]:
label = df["output"]
label

0      1
1      1
2      1
3      1
4      1
      ..
298    0
299    0
300    0
301    0
302    0
Name: output, Length: 303, dtype: int64

**4. Spliting the data into training and testing sets**

In [15]:
# Import train_test_split function

from sklearn.model_selection import train_test_split

In [16]:
# Split dataset into training set and test set

x_train, x_test, y_train, y_test = train_test_split(feature,\
                                label, test_size = 0.3)

# 70% training and 30% test

**5. Scale the features using standardisation**

In [17]:
# Import StandardScaler function

from sklearn.preprocessing import StandardScaler

In [18]:
# Scaling the data

sc_x = StandardScaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.fit_transform(x_test)

**6. Generating Model - SUPPORT VECTOR MACHINE - SVM**

In [19]:
# import SVC classifier
from sklearn.svm import SVC

#Create a svm Classifier
support_vector_classifier = SVC(kernel='rbf') 

# fit classifier to training set
support_vector_classifier.fit(x_train,y_train)

#Predict the response for test dataset
y_pred = support_vector_classifier.predict(x_test)

**7. Evaluating the Model**

In [20]:
#Import confusion_matrix module to create the confusion matrix
from sklearn.metrics import confusion_matrix

cm_support_vector_classifier = confusion_matrix(y_test,y_pred)

print(cm_support_vector_classifier,end='\n\n')

[[33 17]
 [ 0 41]]



In [21]:
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics

# Model Accuracy: how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.8131868131868132


**The accuracy of the model is 81.3%**

In [22]:
#Import scikit-learn model_selection module for cross validation score
from sklearn.model_selection import cross_val_score

cross_val_svc = cross_val_score(estimator = SVC(kernel = 'rbf'),\
                X = x_train, y = y_train, cv = 10, n_jobs = -1)

print("Cross Validation Accuracy : ",cross_val_svc.mean())

Cross Validation Accuracy :  0.8164502164502163


**The cross validation accuracy of the model is 81.6%**