# Application: Support Vector Machines Part 1

#### References

1. **Support-vector machine:** https://en.wikipedia.org/wiki/Support-vector_machine
2. **Support Vector Machines**: https://scikit-learn.org/stable/modules/svm.html

## Support Vector Machines

Support vector machines or SVMs are a set of supervised learning methods used for classification, regression amd outliers detection. At first approximation, what SVMs do is to find a separation line or more generally a hyperplane between the data of two classes. 

The separation line tries to maximize the distance between the line and the nearest point. This makes the algorithm more robust. Nonetheless, SVMs try first to have a correct classification and then try to maximize the margin.

Some of the advantages of SVMs are [2]:

- Effective in high dimensional spaces
- Effective in cases where the number of dimensions is greater than the number of samples
- Memory efficient as they use a subset of training points in the decision function
- Different kernel functions can be specified for the decision function.

However, SVMs have disadvantages also which include [2]:

- If the number of features is much greater than the number of samples, then the choice of the kernel function becomes crucial when dealing with overfitting.
- SVMs do not directly provide probability estimates. These can be calculated using five-fold cross validation

## Using SVMs in ```sklearn```

In [1]:
from sklearn import svm
from sklearn.metrics import accuracy_score

In [2]:
X = [[0,0], [1,1]]
y = [0,1]

In [3]:
# Create the classifier. Note that we can also set the kernel type:
# clf = svm.SVC(kernel='linear')
# The C option controls the tradeoff between smooth decision boundary and classifying trainig points correctly
# Hence  a large value of C we get more points classified correctly. Thus the decision boundary will be more wigly
# the \gamma parameter defines how far the influence of a single training example reaches
clf = svm.SVC(kernel='linear')

---

**Remark**

The $\gamma$ parameter has no effect when we use a linear kernel

---

The input to the ```SVC``` class is two arrays; one of size ```[n_samples, n_features]```, that holds the training examples, and an array that holds the class labels which can be either strings or integers that has size ```[n_samples]```.

In [4]:
clf.fit(X,y)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='linear', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

In [5]:
test_data = [[2., 2.]]
test_labels = [1]
pred = clf.predict(test_data)

---

**Remark**

SVMs decision function depends on some subset of the training data, called the support vectors. Some properties of these support vectors can be found in 

- ```support_vectors_``` which gives  the support vectors
- ```support_``` which gives the indices of the support vectors
- ```n_support_``` which is the numbers of support vectors for each class


---

In [8]:
clf.support_vectors_

array([[0., 0.],
       [1., 1.]])

In [9]:
clf.support_

array([0, 1])

In [10]:
clf.n_support_

array([1, 1])

In [7]:
accuracy = accuracy_score(pred, test_labels)
print("Accuracy is ", accuracy)

Accuracy is  1.0


## SVM: Maximum margin separating hyperplane

In [11]:
print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.datasets import make_blobs


# we create 40 separable points
X, y = make_blobs(n_samples=40, centers=2, random_state=6)

# fit the model, don't regularize for illustration purposes
clf = svm.SVC(kernel='linear', C=1000)
clf.fit(X, y)

plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)

# plot the decision function
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()

# create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)

# plot decision boundary and margins
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5,
           linestyles=['--', '-', '--'])
# plot support vectors
ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100,
           linewidth=1, facecolors='none', edgecolors='k')
plt.show()

Automatically created module for IPython interactive environment


<Figure size 640x480 with 1 Axes>