### Binary Support Vector Machine

&nbsp;

SVM is where we really step into the door of machine learning. I personally think Andrew Ng's lectures did not explain very well on this topic. I strongly advise readers to use alternative materials or the website SVM tutorial. This site was created by a Pole developer called Alexandre Kowalczyk. He dedicated to teach SVM from the very basic notation of vectors and LaGrangian to the hardcore Wolfe dual problem and SMO algorithm. It is very friendly for beginners who forget everything about high school math, haha. The free e-book he wrote is a must-read for more advanced techniques such as L1/L2 regularized soft margin, kernels, SMO and multiclass classification.

Link to this awesome website

https://www.svm-tutorial.com/

For multiclass classification

https://github.com/je-suis-tm/machine-learning/blob/master/multiclass%20support%20vector%20machine.ipynb

In [1]:
import cvxopt.solvers
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.svm import SVC
import os
os.chdir('d:/python/data')

In [2]:
#using official sklearn package with the same parameters
def skl_binary_svm(x_train,x_test,y_train,y_test,**kwargs):
    
    m=SVC(**kwargs).fit(np.array(x_train).reshape(-1, 1), \
                        np.array(y_train).ravel())
    
    train=m.score(np.array(x_train).reshape(-1, 1), \
                  np.array(y_train).ravel())*100
    test=m.score(np.array(x_test).reshape(-1, 1), \
                 np.array(y_test).ravel())*100
    
    print('\ntrain accuracy: %s'%(train)+'%')
    print('\ntest accuracy: %s'%(test)+'%')

In [3]:
#svm for binary classification
def binary_svm(x_train,x_test,y_train,y_test,
               kernel='linear',poly_constant=0.0,poly_power=1,gamma=5):

    #this is outer product matrix
    #which is the combination of all inner products
    #alternatively,we can write outer product in
    #np.mat([np.dot(y_train[i],y_train[j]) 
    #for j in y_train.index for i in y_train.index]).reshape(
    #len(y_train),len(y_train))
    #or just np.mat(y_train).T*np.mat(y_train)
    y_product=np.outer(y_train,y_train)
    
    #using different kernels to map inner product to a higher dimension space
    #there are only three kernels here, which are linear, polynomial, gaussian
    if kernel=='linear':
        x_product=np.outer(x_train,x_train)
    elif kernel=='polynomial':
        arr=np.outer(x_train,x_train)
        x_product=np.apply_along_axis(
            lambda x:(x+poly_constant)**poly_power,
            0,arr.ravel()).reshape(arr.shape)
    else:
        #gaussian/rbf kernel
        #map to infinite dimension space
        #be careful with the value of gamma
        #when gamma is too large, it could be overfitted
        #when gamma is too small, it could be underfitted
        #better to use gridsearch to find an optimal gamma
        arr=np.mat(
            [i-j for j in x_train for i in x_train]).reshape(
            len(x_train),len(x_train))
        x_product=np.apply_along_axis(
            lambda x:np.exp(-1*gamma*(np.linalg.norm(x))**2),
            0,arr.ravel()).reshape(arr.shape)
    
    #plz refer to the following link
    #for how to solve wolfe dual problem in cvxopt
    # http://cvxopt.org/userguide/coneprog.html#quadratic-programming
    P=cvxopt.matrix(x_product*y_product)
    q=cvxopt.matrix(-1*np.ones(len(x_train)))
    G=cvxopt.matrix(np.diag(-1 * np.ones(len(x_train))))
    h=cvxopt.matrix(np.zeros(len(x_train)))
    A=cvxopt.matrix(y_train,(1,len(x_train)))
    b=cvxopt.matrix(0.0)

    solution=cvxopt.solvers.qp(P, q, G, h, A, b)
    alpha=pd.Series(solution['x'])
    w=np.sum(alpha*y_train*x_train)

    #here i am using prof andrew ng's method of calculating b
    #alternatively, we can do a normal average of all value b
    #b=np.mean(y_train-w*x_train)
    b=-(min(x_train[y_train==1.0]*w)+max(x_train[y_train==-1.0]*w))/2

    print('\ntrain accuracy: %s'%(len(
        y_train[np.sign(
            np.multiply(w,x_train)+b)==y_train])/len(y_train)*100)+'%')
    print('\ntest accuracy: %s'%(len(
        y_test[np.sign(np.multiply(w,x_test)+b)==y_test])/len(y_test)*100)+'%')
    print('\nparameters w: %s'%(w))
    print('\nparameters b: %s'%(b))

### Run

In [4]:
df=pd.read_csv('iris.csv')

In [5]:
#the classification has to be float instead of int
#this is requested by cvxopt
#for a binary classification
#the value should be either -1.0 or 1.0
df['y']=np.select([df['type']=='Iris-setosa', \
                   df['type']=='Iris-versicolor', \
                   df['type']=='Iris-virginica'],[-1.0,0.0,1.0])

In [6]:
#for simplicity, let us make it a binary classification
df=df[df['y']!=0.0]

In [7]:
#for simplicity, let us reduce the dimension of x to 1
#reference to pca
# https://github.com/je-suis-tm/machine-learning/blob/master/principal%20component%20analysis.ipynb
high_dims=pd.concat([df[i] for i in df.columns if 'length' in i or 'width' in i],axis=1)
x=PCA(n_components=1).fit_transform(high_dims)

In [8]:
x=pd.Series([x[i].item() for i in range(len(x))])

In [9]:
y=df['y']

In [10]:
#train test split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3)

In [11]:
#crucial!!!!
#or we would get errors in the next step
x_test.reset_index(inplace=True,drop=True)
y_test.reset_index(inplace=True,drop=True)
x_train.reset_index(inplace=True,drop=True)
y_train.reset_index(inplace=True,drop=True)

In [12]:
binary_svm(x_train,x_test,y_train,y_test)

     pcost       dcost       gap    pres   dres
 0: -3.4761e+00 -6.0406e+00  2e+02  1e+01  2e+00
 1: -1.3648e+00 -8.9590e-01  2e+01  1e+00  2e-01
 2:  1.5920e-02 -6.2513e-01  6e-01  1e-15  2e-15
 3: -1.7897e-01 -2.7387e-01  9e-02  2e-16  7e-16
 4: -2.0894e-01 -2.8781e-01  8e-02  2e-16  5e-16
 5: -2.6661e-01 -2.6956e-01  3e-03  2e-16  5e-16
 6: -2.6887e-01 -2.6900e-01  1e-04  3e-16  4e-16
 7: -2.6899e-01 -2.6899e-01  1e-06  1e-16  4e-16
 8: -2.6899e-01 -2.6899e-01  1e-08  3e-16  3e-16
Optimal solution found.

train accuracy: 100.0%

test accuracy: 100.0%

parameters w: 0.7334706947134487

parameters b: 0.41906322704309396


In [13]:
skl_binary_svm(x_train,x_test,y_train,y_test,kernel='linear')


train accuracy: 100.0%

test accuracy: 100.0%
