Documentation  
=============
http://scikit-learn.org/stable/modules/svm.html

Support vector machines (SVMs) are a set of supervised learning methods used for <font color = 'green'>classification, regression and outliers detection </font>.

<font color = 'red' size = '4pt'> Advantage: </font> 

1>  Uses a subset of training points in the decision process (called support vectors), so it is memory efficient

2> Versatile: Different kernel function can be specified for the decision function


<font color = 'red' size = '4pt'> Disadvantage: </font>

1> If the number of feature is more than number of samples, avoid overfitting by choosing kernel function and regularizing term is crucial

2> SVM do not directly provide probability estimates. 


How It works
===========

<font color = 'red'> Scenario1: </font> Select the hyper-plane which segregates the two classes better. In this scenario, hyper-plane “B” has excellently performed this job
![title](resources/SVM_1.webp)



<font color = 'red'> Scenario2: </font> Maximizing the distances between nearest data point (either class) and hyper-plane will help us to decide the right hyper-plane. This distance is called as Margin. Let’s look at the below snapshot:
If you see the below picture.
![title](resources/SVM_2.webp)

Above, you can see that the margin for hyper-plane C is high as compared to both A and B. Hence, we name the right hyper-plane as C. Another lightning reason for selecting the hyper-plane with higher margin is robustness. If we select a hyper-plane having low margin then there is high chance of miss-classification.


<font color = 'red'> Scenario3: </font> Classification has higher priority over margin. SVM selects the hyper-plane which classifies the classes accurately prior to maximizing margin. In the below picture, even though B has higher margin compared to A, hyper-plane B has a classification error and A has classified all correctly. Therefore, the right hyper-plane is A.
![title](resources/SVM_3.webp)



<font color = 'red'> Scenario4: </font> Ignore outliers. SVM has a feature to ignore outliers and find the hyper-plane that has maximum margin. Hence, we can say, SVM is robust to outliers.




<font color = 'red'> Scenario5: </font> Non linear plane. SVM introduce additional feature.

![title](resources/SVM_5.webp)


SVM can solve this problem. Easily! It solves this problem by introducing additional feature. Here, we will add a new feature z=x^2+y^2. Now, let’s plot the data points on axis x and z:


![title](resources/SVM_6.webp)

NB vs SVM
=========
Naive Bayes is great for text--it’s faster and generally gives better performance than an SVM for this particular problem. Of course, there are plenty of other problems where an SVM might work better. 
In addition to picking your algorithm, depending on which one you try, there are parameter tunes to worry about as well, and the possibility of overfitting (especially if you don’t have lots of training data).

Our general suggestion is to try a few different algorithms for each problem. GridCV, a great sklearn tool can find an optimal parameter tune almost automatically.

In [11]:
from sklearn import svm
X = [[0, 0], [1, 1]] # Train data
y = [0, 1] # Label data
clf = svm.SVC() # We are using support vector classifier (SVC)
clf.fit(X, y)  

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [4]:
clf.predict([[2., 2.]]) # This point belongs to label 1


array([1])

In [10]:
clf.predict([[0., 0.5]]) # This point belongs to label 0

array([0])

Udacity SVM
===========

In [None]:
import sys
from class_vis import prettyPicture
from prep_terrain_data import makeTerrainData

import matplotlib.pyplot as plt
import copy
import numpy as np
import pylab as pl


features_train, labels_train, features_test, labels_test = makeTerrainData()


########################## SVM #################################
### we handle the import statement and SVC creation for you here
from sklearn.svm import SVC
clf = SVC(kernel="linear") # you can either import svm and then do svm.svc directly import SVC


#### now your job is to fit the classifier
#### using the training features/labels, and to
#### make a set of predictions on the test data

clf.fit(features_train, labels_train)

#### store your predictions in a list named pred
pred = clf.predict(features_test)




from sklearn.metrics import accuracy_score
acc = accuracy_score(pred, labels_test)

def submitAccuracy():
    return acc

Versatile Kernel: 
=============
Different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.
http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

As we can see that call can accept certain number of parameter


<font color='green'>class sklearn.svm.SVC</font> (<font color='red'>C=1.0</font>, <font color='red'>kernel=’rbf’</font>, degree=3, <font color='red'> gamma=’auto’</font>, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=’ovr’, random_state=None)

Out of all these parameters, kernel, C, and gamma is the most important parameter.

We can specify "kernel" as below

kernel : string, optional (<font color='red'>default=’rbf’</font>)

Specifies the kernel type to be used in the algorithm. It must be one of <font color='green'>‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable</font> . If none is given, ‘rbf’ will be used.



RBF SVM parameters(see example svm_rbf_parameter)
============================================
http://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html

This example illustrates the effect of the parameters <font color='green'>gamma and C </font> of the Radial Basis Function (RBF) kernel SVM.

Intuitively, the gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. The gamma parameters can be seen as the inverse of the radius of influence of samples selected by the model as support vectors.

The C parameter trades off misclassification of training examples against simplicity of the decision surface. A low C makes the decision surface smooth, while a high C aims at classifying all training examples correctly by giving the model freedom to select more samples as support vectors.

Low value of 'gamma' = training points that are far from decision boundary has more influence

High Value of 'gamma' = training points that are close from decision boundary has more influence

High value of 'c' = incorporation of more traing set 

Low value of 'c' = more smooth boundary