#Support Vector Machines
**Support Vector Machines** or **SVM** defines the decision boundary between two classes.  SVM defines this decision boundary by maximizing the distances between the points in the two classes. This distance is known as the **margin**.  Although SVMs are designed to give the largest margin between the two classes, it can tolerate *outliers* which may not allow the decision boundary to uniformly split the two classes.

##SVM in sklearn

Now let's see how SVMs can be coded in Python  Below is the example given in the documentation:

In [3]:
from sklearn import svm

features_train = [[0,0],[1,1]]
labels_train = [0,1]
features_test = [[2.,2.]]
clf = svm.SVC()
clf.fit(features_train,labels_train)
pred  = clf.predict(features_test)
pred

array([1])

##Nonlinear SVMs

Traditionally SVMs generate linear decision boundaries, but it is possible to give intricate nonlinear decision boundaries as well.  This is done by adding additional features in which the plots can be transformed into a different plane such that a linearly decision boundary can be made.  Once the boundary has been defined in the transformed plane, then when we map the results back to original plane, it gives us a nonlinear decision boundary.  These transformations can be performed by the *kernel* in python.  In esscense, this is what happens:

>$x,y -> x_{1},x_{2},x_{3},x_{4}$ (Not Seperable -> Seperable)

>Non Linear Seperation <- Linear Seperable Solution

The scikit-learn module provides various kernels for you use by passing in the parameter to the *kernel* parameter of the SVC().
Here is an example:

In [4]:
clf = svm.SVC(kernel="linear")

The other two parameters that can have an affect on how the decision boundary is generated are the **gamma** and **C** parameters.  The *gamma* parameter defines the influence of a single training example(low value means far reach, and high value means close reach).  The **C** parameter determines whether to tradeoff misclassification of training examples versus the smoothness of the decision surface (low value means smooth decision surface and high value means we try to classify all training examples correctly).  It is worth noting that based on your kernel, gamma, and C parameters, you run the risk of **overfitting** which is something you want to avoid in machine learning algorithms as they make it difficult to predict unseen data points.

##Pros and Cons

SVMs work really when there is a clear margin of seperation between the two domains.  They don't work as well when there is a very large dataset as algorithm can run at $O(n^3)$ or if there is a lot of noise (overlapping).