# **Support Vector Machine**
- Supervised Learning Model
- Both Regression and **Classification**
- Hyperplane
- Support Vector
- Best for small dataset with many features

### Hyperplane:
Hyperplane is a line (in 2d space) or a plane that separates the data points into two classes.

It's equation is:
$$
y = wx -b
$$

### Support Vectors:
Support vectors are the data points nearest to the hyperplane. If these data points changes, the position of the hyperplane changes. There can be more than one support vectors in a class.\
Margin is the perpendicular distance between the planes containing support vectors of two different classes. Hyperplane lies at the center of margin.

### SVM Kernel:
Kernel Function generally transforms the training set of data so that a non-linear decision surface can be transformed to a linear equation in a higher number of dimension spaces.\
It returns the inner product between two points in a standard feature dimension. It's types are:
- Linear:\
    $ K(x_1,x_2) = x_1^T x_2 $
- Polynomial:\
    $ K(x_1,x_2) = (x_1^T x_2 + r)^d $
- Radial Basis Function (rbf):\
    $ K(x_1,x_2) = e^{(- \gamma . ||x_1 - x_2||^2)} $
- Sigmoid:\
    $ K(x_1,x_2) = tanh(\gamma . x_1^T x_2 + r) $

### Loss Function
Loss function measures how far an estimated value is from its true value.
$$
Loss = \frac{1}{n} \displaystyle\sum_{i=0}^n (Y_i - \hat{Y_i})^2
$$

But we can't use this loss function in classification as it gives local mininmas for classifiaction.\
So we use:

**Hinge Loss**:

Hinge Loss is one of the types of Loss Function, mainly used for maximum margin classification models.\
Hinge Loss incorporates a margin or a distance from the calssification boundary into the loss calculation. Even if a new observations are calssified correctly, they can incur a penalty if the margin from the decision boundary is not large enough.
$$
L = max(0 - y_i(w^tx_i + b)), for\ correct\ classification
$$
And,
$$
L = max(1 - y_i(w^tx_i + b)), for\ wrong\ classification
$$

### Gradient Descent for SVM Classifier
Gradient Descent is an optimization algorithm used for minimizing the loss function in various machine learning algorithms. It is used for updating the parameters of the learning model.\
w = w - l\*${\delta w}$\
b = b - l\*${\delta b}$

where,\
w --> weight\
b --> bias\
l --> learning rate

And,\
$\delta w$ abd $\delta b$ are the partial derivative of cost function with respect to w and b respectively.

if ($ y_i.(w.x+b) >= 1 $):

$
\frac{dJ}{dw} = 2 \lambda w
$

$
\frac{dJ}{db} = 0
$

else:

$
\frac{dJ}{dw} = 2 \lambda w - y_i.x_i
$

$
\frac{dJ}{db} = y_i
$

In [1]:
# importing numpy
import numpy as np

## **Support Vector Machine Classifier Class**

In [2]:
class SVM_Classifier():

    # initialing the hyperparameters
    def __init__(self, learning_rate, no_of_iterations, lambda_parameter):
        self.learning_rate = learning_rate
        self.no_of_iterations = no_of_iterations
        self.lambda_parameter = lambda_parameter # it is regularizaton parameter for gradient descent

    # fitting the dataset to SVM classifier
    def fit(self, X, Y):
        self.m, self.n = X.shape

        # initializing weights and bias
        self.w = np.zeros(self.n)
        self.b = 0

        self.X = X
        self.Y = Y

        # implementing gradient descent for model optimization
        for i in range(self.no_of_iterations):
            self.update_weights()

    # function to update the weights and values
    def update_weights(self):
        # label encoding (convert 0 to -1)
        y_label = np.where(self.Y <= 0, -1, 1)

        # gradients (dw, db)
        for index, x_i in enumerate(self.X):
            condition = y_label[index] * (np.dot(x_i, self.w) - self.b) >= 1

            if condition:
                dw = 2 * self.lambda_parameter * self.w
                db = 0
            else:
                dw = 2 * self.lambda_parameter * self.w - np.dot(x_i,y_label[index])
                db = y_label[index]
        
            self.w = self.w - self.learning_rate * dw
            self.b = self.b - self.learning_rate * db

    # prediact the label for a given input value
    def predict(self, X):
        output = np.dot(X, self.w) - self.b
        predicted_labels = np.sign(output)

        y_hat = np.where(predicted_labels <= -1, 0 , 1)
        return y_hat

In [3]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

In [4]:
# loading the dataset
diabetes_dataset = pd.read_csv('diabetes.csv')

In [5]:
diabetes_dataset.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [6]:
diabetes_dataset.shape

(768, 9)

In [7]:
diabetes_dataset.describe()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


In [8]:
diabetes_dataset.Outcome.value_counts()

Outcome
0    500
1    268
Name: count, dtype: int64

0 --> Non-diabetic

In [9]:
diabetes_dataset.groupby('Outcome').mean()

Unnamed: 0_level_0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
Outcome,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,3.298,109.98,68.184,19.664,68.792,30.3042,0.429734,31.19
1,4.865672,141.257463,70.824627,22.164179,100.335821,35.142537,0.5505,37.067164


In [10]:
features = diabetes_dataset.drop('Outcome',axis=1)
target = diabetes_dataset['Outcome']

In [11]:
features

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32
3,1,89,66,23,94,28.1,0.167,21
4,0,137,40,35,168,43.1,2.288,33
...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63
764,2,122,70,27,0,36.8,0.340,27
765,5,121,72,23,112,26.2,0.245,30
766,1,126,60,0,0,30.1,0.349,47


In [12]:
target

0      1
1      0
2      1
3      0
4      1
      ..
763    0
764    0
765    0
766    1
767    0
Name: Outcome, Length: 768, dtype: int64

Data Standardization

In [13]:
scaler = StandardScaler()

In [14]:
std_features = scaler.fit_transform(features)

In [15]:
std_features

array([[ 0.63994726,  0.84832379,  0.14964075, ...,  0.20401277,
         0.46849198,  1.4259954 ],
       [-0.84488505, -1.12339636, -0.16054575, ..., -0.68442195,
        -0.36506078, -0.19067191],
       [ 1.23388019,  1.94372388, -0.26394125, ..., -1.10325546,
         0.60439732, -0.10558415],
       ...,
       [ 0.3429808 ,  0.00330087,  0.14964075, ..., -0.73518964,
        -0.68519336, -0.27575966],
       [-0.84488505,  0.1597866 , -0.47073225, ..., -0.24020459,
        -0.37110101,  1.17073215],
       [-0.84488505, -0.8730192 ,  0.04624525, ..., -0.20212881,
        -0.47378505, -0.87137393]], shape=(768, 8))

Train Test Split

In [17]:
X_train, X_test, Y_train, Y_test = train_test_split(std_features,target,test_size=0.2,random_state=2,stratify=target)

In [18]:
print(std_features.shape, X_train.shape, X_test.shape)

(768, 8) (614, 8) (154, 8)


Training the model

In [19]:
classifier = SVM_Classifier(learning_rate=0.01, no_of_iterations=1000, lambda_parameter=0.01)

In [20]:
classifier.fit(X_train,Y_train)

Model Evaluation

In [21]:
X_train_prediction = classifier.predict(X_train)
training_data_accuracy = accuracy_score(Y_train, X_train_prediction)

print('Training Data Accuracy: ',training_data_accuracy)

Training Data Accuracy:  0.7768729641693811


In [22]:
X_test_prediction = classifier.predict(X_test)
testing_data_accuracy = accuracy_score(Y_test, X_test_prediction)

print('Testing Data Accuracy: ',testing_data_accuracy)

Testing Data Accuracy:  0.7662337662337663


Making a Predictive System

In [23]:
input_data = (2,197,70,45,543,30.5,0.158,53)

np_input = np.asarray(input_data)

reshaped = np_input.reshape(1,-1)

std_input = scaler.transform(reshaped)

classifier.predict(std_input)



array([1])