# SVM

Topics:
1.  Linear SVM
    -  Model fitting
    -  Separating plane visualization
2.  SVM with kernel
    -  Model fitting
    -  Decision function visualization

In [None]:
import plotly.graph_objects as go
import plotly.express as px
import numpy as np
import pandas as pd

## Linear SVM

The linear SVM uses a plane to separate two classes of data points. This model works well when the two classes are linearly separable.

First let us load and split the data.

In [None]:
from sklearn.model_selection import train_test_split

# Load data
# If you are using Colab,
# don't forget to upload the .csv files to
# sample_data directory.
data = pd.read_csv('sample_data/linearly_separable.csv',header=0, names=['x0','x1','label'])
data['label'] = data['label'].astype('str')

x_all = data[['x0','x1']].to_numpy()
label_all = data[['label']].to_numpy().flatten()

# Split data and dataframe
id = np.arange(len(label_all))
x, x_t, label, label_t, id, id_t = train_test_split(x_all, label_all, id, random_state=5)
data_train, data_test = data.iloc[id], data.iloc[id_t]

In [None]:
# Data visualization
fig = px.scatter(data_train, x='x0', y='x1', color='label', title='Training Data')
fig.update_yaxes(
    scaleanchor='x',
    scaleratio=1
)
fig.update_layout(
    width=800,
    height=600
)
fig

We use the sklearn library to build our linear SVM model.

In [None]:
from sklearn import svm

linear_SVM = svm.SVC(kernel='linear')
linear_SVM.fit(x,label)

In [None]:
# Test fitted model on the testing dataset
pred = linear_SVM.predict(x_t)
acc = (pred==label_t).sum()/len(label_t)
print(acc)

We can see that the fitted linear SVM has an accuracy of $100\%$ on the testing dataset. This is because the two classes in this dataset are linearly separable.

We can easily get the separating line from our fitted model. Recall that the line has the form
$$
w_0 x_0 + w_1 x_1 + b = 0,
$$
where $w=(w_0,w_1)$ is the coefficient, and $b$ is the intercept of our model.

The **decision function** of our linear SVM is given by
$$
f((x_0,x_1)) = w_0 x_0 + w_1 x_1 + b.
$$
A data $(x_0,x_1)$ belongs to the positive class if $f((x_0,x_1))>0$, and it belongs to the negative class if $f((x_0,x_1))<0$.

In [None]:
w = linear_SVM.coef_.flatten()
b = linear_SVM.intercept_.item()

Now let us visualize the separating line.

In [None]:
x0_min, x0_max = x_all[:,0].min(), x_all[:,0].max()
x1_min, x1_max = -1/w[0]*(w[1]*x0_min+b), -1/w[0]*(w[1]*x0_max+b)

fig = px.scatter(data_train, x='x0', y='x1', color='label')
fig.update_yaxes(
    scaleanchor='x',
    scaleratio=1
)
fig.add_traces(go.Scatter(x=[x0_min, x0_max],y=[x1_min, x1_max], mode='lines', name='separating line'))
fig.update_layout(
    width=800,
    height=600
)
fig

## SVM with Kernel

It is rare to have linearly separable classes in everyday line.

A linear SVM only uses **one** separating line to separate the two classes, it will fail for data that is highly non-linear.

We will demonstrate this by using a simulated data.

In [None]:
# Load data
data = pd.read_csv('sample_data/circular.csv',header=0, names=['x0','x1','label'])
data['label'] = data['label'].astype('str')

x_all = data[['x0','x1']].to_numpy()
label_all = data[['label']].to_numpy().flatten()

# Split data and dataframe
id = np.arange(len(label_all))
x, x_t, label, label_t, id, id_t = train_test_split(x_all, label_all, id, random_state=5)
data_train, data_test = data.iloc[id], data.iloc[id_t]

In [None]:
fig = px.scatter(
    data_train,
    x='x0',
    y='x1',
    color='label',
    title='Training Data'
)
fig.update_yaxes(
    scaleanchor='x',
    scaleratio=1
)
fig.update_layout(
    width=800,
    height=600
)
fig

It is clear that the two classes in this dataset is NOT linearly separable.

### The Problem of the Linear SVM

What happens if we fit a linear SVM to this dataset?

In [None]:
linear_SVM = svm.SVC(kernel='linear')
linear_SVM.fit(x,label)

# Test fitted model on the testing dataset
pred = linear_SVM.predict(x_t)
acc = (pred==label_t).sum()/len(label_t)
print('Accuracy of linear SVM on circular test datapoints is : {:.4f}'.format(acc))

In [None]:
w = linear_SVM.coef_.flatten()
b = linear_SVM.intercept_.item()

x0_min, x0_max = x_all[:,0].min(), x_all[:,0].max()
x1_min, x1_max = -1/w[0]*(w[1]*x0_min+b), -1/w[0]*(w[1]*x0_max+b)

fig = px.scatter(data_train, x='x0', y='x1', color='label')
fig.update_yaxes(
    scaleanchor='x',
    scaleratio=1
)
fig.add_traces(go.Scatter(x=[x0_min, x0_max],y=[x1_min, x1_max], mode='lines', name='separating line'))
fig.update_layout(
    width=800,
    height=600
)
fig

### SVM with Non-linear Kernel

We can use a non-linear kernel to separate the two classes in the circular dataset. Recall that in the lecture, you learned to use Gaussian similarity kernel
$$
K(x^{(i)},x^{(j)}) = \exp\left(-\gamma||x^{(i)}-x^{(j)}||^2\right).
$$
This kernel is also widely known as the **radial basis function** (rbf) kernel.

In [None]:
rbf_SVM = svm.SVC(kernel='rbf')
rbf_SVM.fit(x,label)

# Test fitted model on the testing dataset
pred = rbf_SVM.predict(x_t)
acc = (pred==label_t).sum()/len(label_t)
print('Accuracy of rbf SVM on circular test datapoints is : {:.4f}'.format(acc))

With the help of the rbf kernel, our SVM model can make perfect prediction on the test dataset!

The rbf SVM no longer uses lines to separate classes, and the decision function $f$ is very complicated.

Our fitted rbf_SVM model can help us calculate the decision function. The classification rule is the same as before:
1.  $(x_0,x_1)$ is the negative class if $f((x_0,x_1))<0$.
1.  $(x_0,x_1)$ is the positive class if $f((x_0,x_1))<0$.

We visualize the decision function in the following way.
1.  We uniformly sample points on the plane around our data points.
2.  We evaluate the decision function on each sampled points.
3.  We make a 3D plot to visualize the decision function values.

In [None]:
x_0_min, x_0_max = data['x0'].min(), data['x0'].max()
x_1_min, x_1_max = data['x1'].min(), data['x1'].max()

# We sample 100 data points on each axis.
# Then we use the meshgrid method to generate
# coordinates of the points.
x0_sample, x1_sample = np.meshgrid(
    np.linspace(x_0_min, x_0_max, 100),
    np.linspace(x_1_min, x_1_max, 100)
)

x0_sample_flat, x1_sample_flat = x0_sample.flatten(), x1_sample.flatten()
plane_samples = np.vstack([x0_sample_flat, x1_sample_flat]).T


In [None]:
plane_samples[:5,:]

In [None]:
sampled_decision_values = rbf_SVM.decision_function(plane_samples).reshape(x0_sample.shape)
data_decision_values = rbf_SVM.decision_function(x)

In [None]:
fig = go.Figure(go.Scatter3d(x=x[:,0],y=x[:,1],z=data_decision_values, mode='markers', marker=dict(size=5,color=label.astype(np.int32))))
fig.add_traces(
    go.Surface(x=x0_sample,y=x1_sample,z=sampled_decision_values,opacity=0.5,showscale=False)
)
fig.update_layout(width=800, height=600)
fig.update_layout(scene_aspectmode='manual',
                  scene_aspectratio=dict(x=1, y=1, z=1),
                  title='Decision Function Visualization')
fig.show()

We can see that the decision function (represented by the surface) is non-linear. Points of the $+1$ class have positive decision values, while points of the $-1$ class have negative decision values.