## Support Vector Machines

Support Vector Machines compute the line (or hyper-plane) that best separates the data points of different classes which are closest to the **decision boundary**. These data points are called _support vectors_. The dimensionality of the hyperplane is equal to the number of input features minus one. [Here's some more details](https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47)

### Key SVM Terms:
1. Kernel: The function used to map a lower dimensional data into a higher dimensional data.
2. Hyper Plane: In SVM this is basically the separation line between the data classes. Although in SVR we are going to define it as the line that will will help us predict the continuous value or target value
3. Boundary line: In SVM there are two lines other than Hyper Plane which creates a margin . The support vectors can be on the Boundary lines or outside it. This boundary line separates the two classes. In SVR the concept is same.
4. Support vectors: This are the data points which are closest to the boundary. The distance of the points is minimum or least.

<img src="images/svm2.png" />

The decision boundary is the maximum distance from the closest points of each class (support vectors) to the hyperplane.

In [33]:
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
sns.set()

from sklearn import datasets
from sklearn import svm 
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

# SVM and Multi-Category Models

As we often are given situtations where the required analysis is underpinned by a multiclass response, there are many different families of models that allow us to perform this type of analysis. In this notebook, we will review how to implement many varieties of Support Vector Machines to expand our toolbox of skills. Let's start by looking at a simple example of a Support Vector Machine for classification.

In [65]:
# get iris dataset from online
iris_data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/bezdekIris.data",
                        names=["sepal_length","sepal_width","petal_length","petal_width","iris_type"])

In [66]:
# convert target variable, iris type into categorical
iris_data['iris_type'] = pd.Categorical(iris_data.iris_type)

In [67]:
iris_data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,iris_type
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [68]:
Xs = iris_data.iloc[:,:-1]
y = iris_data.iloc[:,-1]

In [72]:
X_train, X_test, y_train, y_test = train_test_split(Xs, y, test_size=0.3)

In [73]:
#Simple Train-Test
svc = svm.SVC()
svc.fit(X_train,y_train)
print(svc.score(X_test,y_test))

1.0




In [74]:
#Cross-Validated
svc = svm.SVC(gamma='auto')
print(cross_val_score(svc,Xs,y,cv=10).mean().round(4))

0.98


In [75]:
#Hyperparameter Tuned Cross-Validated
svc = svm.SVC()
grid = {'C':[100,.001,.01,1000,10000],
        'kernel':['rbf','linear','poly','sigmoid'],
        'gamma':['auto','scale']}
gscv = GridSearchCV(svc,grid,cv=4,verbose=True,return_train_score=False)
gscv.fit(Xs,y)

Fitting 4 folds for each of 40 candidates, totalling 160 fits


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 160 out of 160 | elapsed:    1.8s finished


GridSearchCV(cv=4, error_score='raise-deprecating',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False),
       fit_params=None, iid='warn', n_jobs=None,
       param_grid={'C': [100, 0.001, 0.01, 1000, 10000], 'kernel': ['rbf', 'linear', 'poly', 'sigmoid'], 'gamma': ['auto', 'scale']},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
       scoring=None, verbose=True)

In [76]:
scores = pd.DataFrame(gscv.cv_results_).filter(regex='param_+|mean_test_score'
                                            ).sort_values('mean_test_score',
                                                          ascending=False).reset_index().drop(['index'],axis=1)
scores.head(15)

Unnamed: 0,mean_test_score,param_C,param_gamma,param_kernel
0,0.98,100.0,scale,poly
1,0.973333,10000.0,auto,linear
2,0.973333,0.01,auto,poly
3,0.973333,1000.0,scale,linear
4,0.973333,0.001,auto,poly
5,0.973333,10000.0,scale,linear
6,0.973333,1000.0,auto,linear
7,0.966667,100.0,scale,linear
8,0.966667,1000.0,scale,poly
9,0.966667,100.0,scale,rbf


And that's it for the simplest model! After all the lessons in the last few weeks, we can finally quickly and thoroughly implement an SVM model using all the steps we've covered.

Which do you think is the best?

## Regression
As this runs, since have included more tuning parameters, take note of the amount of time it takes to do all of the computations. Since this is not optimized for large amounts of data, you will see a greater than multinomial scaling of computation time. In general, SVM is checking the distance of every point to each other, which will take a long time already **(num_samples^2 * num_features)**. 

We can also model continuous data, aka a regression problem. Here our objective for SVR is to basically consider the points that are within the boundary line. Our best fit line is the line hyperplane that has maximum number of points. [More details here](https://medium.com/coinmonks/support-vector-regression-or-svr-8eb3acf6d0ff)

In [31]:
Xs,y = datasets.make_regression(n_features=5,n_samples=1000,noise=0.3)

In [32]:
svr = svm.SVR()
grid = {'C':[0.1,0.2,0.3],
        'kernel':['rbf','linear','poly','sigmoid'],
        'gamma':['auto','scale'],
        'tol':[1e-5,1e-4,1e-3,1e-2],
        'epsilon':[0,0.5,1,2]}
gscv = GridSearchCV(svr,grid,cv=4,verbose=True,return_train_score=False)
gscv.fit(Xs,y)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 4 folds for each of 384 candidates, totalling 1536 fits


[Parallel(n_jobs=1)]: Done 1536 out of 1536 | elapsed:   31.0s finished


GridSearchCV(cv=4, error_score='raise-deprecating',
       estimator=SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
  gamma='auto_deprecated', kernel='rbf', max_iter=-1, shrinking=True,
  tol=0.001, verbose=False),
       fit_params=None, iid='warn', n_jobs=None,
       param_grid={'C': [0.1, 0.2, 0.3], 'kernel': ['rbf', 'linear', 'poly', 'sigmoid'], 'gamma': ['auto', 'scale'], 'tol': [1e-05, 0.0001, 0.001, 0.01], 'epsilon': [0, 0.5, 1, 2]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
       scoring=None, verbose=True)

In [27]:
scores = pd.DataFrame(gscv.cv_results_).filter(regex='param_+|mean_test_score'
                                            ).sort_values('mean_test_score',
                                                          ascending=True).reset_index().drop(['index'],axis=1)
scores.head(15)

Unnamed: 0,mean_test_score,param_C,param_epsilon,param_gamma,param_kernel,param_tol
0,0.06995,0.1,2.0,scale,rbf,0.01
1,0.06995,0.1,2.0,scale,rbf,0.001
2,0.06995,0.1,2.0,scale,rbf,1e-05
3,0.06995,0.1,2.0,scale,rbf,0.0001
4,0.070066,0.1,1.0,scale,rbf,1e-05
5,0.070066,0.1,1.0,scale,rbf,0.0001
6,0.070066,0.1,1.0,scale,rbf,0.001
7,0.070066,0.1,1.0,scale,rbf,0.01
8,0.07026,0.1,0.5,scale,rbf,0.0001
9,0.07026,0.1,0.5,scale,rbf,1e-05
