# Support Vector Machine (SVM)

For this hands on, we will learn SVM unsing Pytorch. I will now assume that the readers are already familar with 1) Pytorch, 2) Regression models and 3) MNIST data.

OBJECTIVE:
    * SVM using a standard pytorch ML framework
    * Aplying the model to predict binary digit (0 or 1).

## SVM: definition

Just as in the case of Logistic Regression, SVM also aims to model a fuction $f_{w,b}$ that predicts a binary label $ y\in[-1,+1]$ given a continuous data $x \in \mathbb{R}^{K}$ where K is the number of features in $x$. However unlike the Regression mdoels we learnt, SVM tries to answer the problem from a geometic perspective. 

As shown in the figure below, lets imagine we have n data points that are scattered around in a 2D space and we have labels denoted with different colors: blue for positives (+1) and red for negatives (-1). One way to build a good classifer is to find the line (or hyperplane) : $ wx + b = 0 $ that has the largest distance to the nearest postive and negative samples (a.k.a support vectors). where $w$ is the normal vector and $b$ is a scalar offset.


<p align="center">
    <img width="50%" src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/72/SVM_margin.png/600px-SVM_margin.png"> 
</p>

So how do we translate the geometric concept to a set of equations? Let's first define a decison rule for SVM. Given a hyperplane $ wx + b = 0 $, Our decision rule is as follows, for each $i$ :

\begin{equation}
wx_{i}+b \geq 1, \;\;\; \text{if} \;\;\; y_{i} = +1
\\
wx_{i}+b \leq 1, \;\;\; \text{if} \;\;\; y_{i} = -1
\end{equation}

Where the constant 1 is a value to prevent datapoints from falling into the margin. The above equation can be written more compactly as:

\begin{equation}
y_i(wx_{i}+b) \geq 1, \forall i
\end{equation}

As mentioned earier, our goal is to find the hyperplane $ wx + b = 0 $ that has the largest margin to support vectors: $ (x_{+ve} - x_{-ve}) \cdot \frac{w}{||w||}$. Where $x_{+ve}$ and $x_{-ve}$ are the positive and negative datapoints that lie on the either side of support vectors lines. Using the the decision rule above, we can re-write the margin simply as $\frac{2}{||W||}$. We want to maximize the margin $\frac{2}{||W||}$ or simply minimize its inverse $ ||w||$. Finally putting a constraint above that the decision rule has to be greater or equal to 1: $1 - (wx_{i}+b) \leq 0 $, The final objective function is:

\begin{equation}
J(w,b) = ||w|| + C\sum_{i} max[0, 1 - y_{i} ( wx_{i} - b ) ]
\end{equation}

where $C$ is an hyperparameter.

### Pytorch Code

In [2]:
# lets first import packages we will use

import numpy as np

import torch
from torch import nn
from torch import functional as F
from torch.optim import SGD

from torch.utils.data import TensorDataset, DataLoader

from torchvision.datasets import MNIST
from torchvision import transforms

#### Create BatchLoader
Just as the Logistic Regression exercise, we will use MNIST dataset. Lets preprocess the dataset first.

In [10]:
# download 
trainD = MNIST(".", download = True, train = True)
testD  = MNIST(".", download = True, train = False)

# to tensor
toTensor = lambda pair: (transforms.ToTensor()(pair[0]), pair[1])
trainD = map(toTensor, trainD)
testD  = map(toTensor, testD)

# subsetting 0 and 1
only01 = lambda pair : pair[1] in [0,1]
trainD = filter(only01, trainD)
testD  = filter(only01, testD)

# 0 to -1 
otom1 = lambda pair : (pair[0], -1 if pair[1] == 0 else 1)
trainD = map(otom1, trainD)
testD  = map(otom1, testD)


# normalisation
normalize = lambda pair: (transforms.Normalize(mean=[0], std=[1])(pair[0]), pair[1] )
trainD = map(normalize, trainD)
testD  = map(normalize, testD)

# flatten
flatten = lambda pair: ( torch.flatten(pair[0]), pair[1] )
trainD = map(flatten, trainD)
testD  = map(flatten, testD)

processedTrainD = list(trainD)
processedTestD  = list(testD)

# batch loader
trainX = torch.stack(list(map(lambda pair : pair[0], processedTrainD)))
trainY = torch.tensor(list(map(lambda pair : pair[1], processedTrainD)))

trainLoader = DataLoader( TensorDataset(trainX, trainY), batch_size = 100)

testX = torch.stack(list(map(lambda pair : pair[0], processedTestD)))
testY = torch.tensor(list(map(lambda pair : pair[1], processedTestD)))

testLoader = DataLoader( TensorDataset(testX, testY), batch_size = 100)

#### Define and train a model

You will notice that the code implementation of SVM and Logistic Regression and nearly identical with only 2 differences: 1) SVM does not have a non-linear function after a linear mapping and 2) We use the loss introduced above (also known as Hinge Loss) instead of Binary Cross Entropy Loss.

In [28]:
model = nn.Linear(28*28, 1)

optimizer = SGD(model.parameters(), lr = 0.0001, momentum = 0.0)

for e in range(20):
    losses = []
    for X,y in trainLoader:    

        X = X.type(torch.float32)
        y = y.type(torch.float32)
        
        model.train()
        optimizer.zero_grad()
        
        yHat = model(X)

        loss = torch.sum(torch.clamp(1 - yHat.t()*y, min = 0))
      
        losses.append(loss.item())
        loss.backward()                        
        optimizer.step()
        
    print(f"Loss at epoch:{e} = {np.mean(losses)}")    

Loss at epoch:0 = 7.115142087767444
Loss at epoch:1 = 1.264043013880572
Loss at epoch:2 = 0.984408650341935
Loss at epoch:3 = 0.8430878304121062
Loss at epoch:4 = 0.7480847816767655
Loss at epoch:5 = 0.6820956585914131
Loss at epoch:6 = 0.634114039695169
Loss at epoch:7 = 0.599269598018466
Loss at epoch:8 = 0.5742188199298588
Loss at epoch:9 = 0.5517017996217323
Loss at epoch:10 = 0.533276734858986
Loss at epoch:11 = 0.5177706278215243
Loss at epoch:12 = 0.5033151887533233
Loss at epoch:13 = 0.4909841854741254
Loss at epoch:14 = 0.47998717262988955
Loss at epoch:15 = 0.4694254304480365
Loss at epoch:16 = 0.460208200563596
Loss at epoch:17 = 0.45090110893324603
Loss at epoch:18 = 0.44240734098464485
Loss at epoch:19 = 0.4340640620922479


#### Test

Again it is almost identical to Logistic Regression excpet that we now use a sign function instead of rounding to 0 or 1.

In [31]:
# test
yHats = []
ys    = []
for X,y in testLoader:

    X = X.type(torch.float32)
    y = y.type(torch.float32)
    
    model.eval()
    yHat = model(X)
    
    ys.append(y.detach().numpy())
    yHats.append(yHat.detach().numpy())
    
yHats = np.concatenate(yHats)
ys    = np.concatenate(ys)

acc = sum((np.sign(yHats)).squeeze() == ys) / len(ys) * 100 

print(f"test acc : {str(acc)[:4]} %" )

test acc : 99.9 %
