# Support Vector Machines

Here we will give an example of applying SVM to 4-dimensional data.  
We will first make up some data that is linearly separable, then use sklearn to find a separator (maybe not the same one, but one that still works).

We begin with the standard imports:

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# use seaborn plotting defaults
import seaborn as sns; sns.set()

## Making up data to play with

In [2]:
from sklearn.datasets.samples_generator import make_blobs
X1, y1 = make_blobs(n_samples=20, centers=2,
                  random_state=0, cluster_std=2)
X2, y2 = make_blobs(n_samples=20, centers=2,
                  random_state=0, cluster_std=2)

In [3]:
# 4 - dimensional data
X = np.concatenate((X1, X2), axis=1)
X

array([[ 2.70514248,  2.81945729,  2.70514248,  2.81945729],
       [-0.04183841, -1.94237221, -0.04183841, -1.94237221],
       [ 1.60240548,  2.59559585,  1.60240548,  2.59559585],
       [ 2.87644691,  4.00107291,  2.87644691,  4.00107291],
       [ 4.71138606,  2.34923157,  4.71138606,  2.34923157],
       [ 1.86399654,  4.97113598,  1.86399654,  4.97113598],
       [ 0.76983237,  5.12498433,  0.76983237,  5.12498433],
       [ 2.36516237,  1.6539887 ,  2.36516237,  1.6539887 ],
       [ 6.59477677, -2.01106769,  6.59477677, -2.01106769],
       [-4.12970955,  5.61102452, -4.12970955,  5.61102452],
       [ 2.14678456,  0.52329596,  2.14678456,  0.52329596],
       [ 0.27969603, -3.06392928,  0.27969603, -3.06392928],
       [ 5.12082595,  3.8363812 ,  5.12082595,  3.8363812 ],
       [ 1.26435722,  7.21233434,  1.26435722,  7.21233434],
       [ 1.28061389,  0.29305816,  1.28061389,  0.29305816],
       [ 4.51584888,  3.30242336,  4.51584888,  3.30242336],
       [ 1.35944322,  1.

In [4]:
## Cheating here!  Choosing labels according to a linear classification
## This will just make the later SVM nice.
w = [1, 2, 3, 4]
b = -20

def f(x,w,b):
    return sum([w[i]*x[i] for i in range(len(w))]) + b


In [5]:
y = np.array([ 2*(f(X[i],w,b)>0  )-1 for i in range(len(y1)) ])
y

array([ 1, -1,  1,  1,  1,  1,  1, -1, -1, -1, -1, -1,  1,  1, -1,  1, -1,
        1,  1,  1])

# SVM using sklearn

### Fitting a support vector machine

Plug the data in a find a fit

In [6]:
from sklearn.svm import SVC # "Support vector classifier"
model = SVC(kernel='linear', C=100)
model.fit(X, y)

SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='linear', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

### Extract coefficients from model

In [7]:
b_learn = model.intercept_[0]
b_learn

-13.974430698363326

In [8]:
w_learn = (model.coef_)[0]
w_learn

array([1.27712201, 2.09608988, 1.27712201, 2.09608988])

In [9]:
## Example check:
print("y[0] = ", y[0])
print("learned function of X[0] = ", f(X[0], w_learn, b_learn) )

y[0] =  1
learned function of X[0] =  4.754835066369134


#### Since the predicted value was positive, it predicted correctly

### Check on data for correctness on all data


In [10]:
print("T/F     y[i]    f(X[i]))")
I = range(len(y))
for i in I:
    sign = f(X[i],w_learn,b_learn)*y[i] > 0
    print(sign,"   ", y[i],"    ", f(X[i],w_learn,b_learn))
print("Number true:", sum([f(X[i],w_learn,b_learn)*y[i] > 0 for i in I]))
print("Number false:", sum([f(X[i],w_learn,b_learn)*y[i] < 0 for i in I]))

T/F     y[i]    f(X[i]))
True     1      4.754835066369134
True     -1      -22.224069697911467
True     1      0.9997083092712256
True     1      10.145933510603333
True     1      7.907999984120389
True     1      11.626566993462314
True     1      9.47676465016272
True     -1      -0.9994109077394828
True     -1      -5.560518883884486
True     -1      -1.000293155030862
True     -1      -6.297268361745775
True     -1      -26.10456111391583
True     1      15.188207971552524
True     1      19.490448253583388
True     -1      -9.474877860480586
True     1      11.40450165684714
True     -1      -5.42802758304137
True     1      2.677932933481978
True     1      12.473815870324332
True     1      11.469370648051163
Number true: 20
Number false: 0


Alternatively, you could use sklearn to return the predictions

In [16]:
y_predict = model.predict(X)
y_predict

array([ 1, -1,  1,  1,  1,  1,  1, -1, -1, -1, -1, -1,  1,  1, -1,  1, -1,
        1,  1,  1])

In [18]:
## Check if it matches original y
y == y_predict

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True])

## Predict New Data Labels
Given new data values, we now want to use our predictor to determine its label

#### Predict by hand

In [11]:
## Suppose we want to pedict the label on x_new
x_new = np.array([3,7,3,1])
f(x_new,w_learn, b_learn)

10.457020406821696

Since f(x_new,w_learn, b_learn) > 0 , classify it as a 1.  This can be done as 

In [12]:
y_new = 2*(f(x_new,w_learn, b_learn) > 0)-1
y_new

1

#### Using sklearn

In [20]:
y_new = model.predict([x_new])
y_new[0]

1

### Predict large set of data


In [46]:
X_new = np.random.rand(15,4)+1.6
X_new

array([[2.46350072, 2.00623543, 2.16086734, 2.36909536],
       [2.58375044, 1.9128083 , 1.98400395, 2.06437   ],
       [2.52616443, 1.91509917, 2.23977682, 2.44520358],
       [2.25935698, 2.46328596, 1.8597641 , 1.85793108],
       [2.2263227 , 2.33065546, 2.31530074, 1.9423563 ],
       [2.58491204, 2.41687113, 1.90612101, 2.01562784],
       [2.26223867, 2.06679729, 1.9224054 , 2.44385233],
       [2.11573174, 1.70244131, 1.94709056, 2.0442258 ],
       [1.98604432, 2.29161726, 2.41170362, 2.59976687],
       [2.53689256, 2.21333169, 1.8803709 , 1.75964277],
       [2.02405636, 2.5269408 , 1.76362802, 2.37038545],
       [1.82278597, 1.83035092, 2.02691278, 1.69268726],
       [1.64842048, 1.65327344, 2.01809494, 1.84089292],
       [1.636963  , 2.06136033, 1.67249043, 2.30562335],
       [1.64742066, 1.73874116, 2.50029792, 2.59136078]])

In [47]:
y_new = model.predict(X_new)
y_new

array([ 1,  1,  1,  1,  1,  1,  1, -1,  1, -1,  1, -1, -1, -1,  1])