<b><i>The objective of the Support Vector Machine is to find the best splitting boundary between data. In two dimensional space, you can think of this like the best fit line that divides your dataset. With a Support Vector Machine, we're dealing in vector space, thus the separating line is actually a separating hyperplane. The best separating hyperplane is defined as the hyperplane that contains the "widest" margin between support vectors. The hyperplane may also be referred to as a decision boundary.Support vectors are the data points, which are closest to the hyperplane. </i></b>
    
    #A margin is a gap between the two lines on the closest class points. This is calculated as the perpendicular distance from the line to support vectors or closest points. If the margin is larger in between the classes, then it is considered a good margin, a smaller margin is a bad margin.

In [19]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

In [2]:
heart_data = pd.read_csv(r'C:\Users\ankit.bo.kumar\dataset\heartPatientCleveland\heart.csv')
heart_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 15 columns):
id          303 non-null int64
age         303 non-null int64
sex         303 non-null int64
cp          303 non-null int64
trestbps    303 non-null int64
chol        303 non-null int64
fbs         303 non-null int64
restecg     303 non-null int64
thalach     303 non-null int64
exang       303 non-null int64
oldpeak     303 non-null float64
slope       303 non-null int64
ca          303 non-null int64
thal        303 non-null int64
target      303 non-null int64
dtypes: float64(1), int64(14)
memory usage: 35.6 KB


In [3]:
heart_data.head()

Unnamed: 0,id,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,1,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,2,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,3,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,4,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,5,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [5]:
X = heart_data[['id','age','cp','trestbps','chol','fbs','restecg','thalach','exang','oldpeak','slope','ca','thal']]
X.head()

Unnamed: 0,id,age,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal
0,1,63,3,145,233,1,0,150,0,2.3,0,0,1
1,2,37,2,130,250,0,1,187,0,3.5,0,0,2
2,3,41,1,130,204,0,0,172,0,1.4,2,0,2
3,4,56,1,120,236,0,1,178,0,0.8,2,0,2
4,5,57,0,120,354,0,1,163,1,0.6,2,0,2


In [6]:
y = heart_data[['target']]
y.head()

Unnamed: 0,target
0,1
1,1
2,1
3,1
4,1


In [33]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.25, random_state = 0)

<h5>In a SVM you are searching for two things: a hyperplane with the largest minimum margin, and a hyperplane that correctly separates as many instances as possible. The problem is that you will not always be able to get both things. The C parameter determines how great your desire is for the latter.</h5>

In [34]:
svm = SVC(kernel='linear',C=10,random_state=0)

In [35]:
svm.fit(X_train,y_train)

  y = column_or_1d(y, warn=True)


SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='linear', max_iter=-1, probability=False, random_state=0,
    shrinking=True, tol=0.001, verbose=False)

In [36]:
y_pred = svm.predict(X_test)

In [37]:
accuracy = accuracy_score(y_pred,y_test)

In [38]:
print("The accuracy for heart patient prediction using Support Vector is", accuracy)

The accuracy for heart patient prediction using Support Vector is 1.0


In [41]:
y_pred

array([0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0,
       0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0,
       1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 0, 0, 1], dtype=int64)

In [42]:
y_test

Unnamed: 0,target
225,0
152,1
228,0
201,0
52,1
245,0
175,0
168,0
223,0
217,0
