# SVM 

Support vector machines are popular approaches for classification and regression. They are particularly popular for the ease with which they can incorporate non-linearity through the use of kernels, in particular the RBF kernel. Python's sci-kit learn provides an interface to the popular LibSVM and LibLinear C++ implementations. e1071 in R provides an interface to LibSVM as well, making it very easy to replicate results between the two. Note that you should always standardize data used in SVM. One downside of SVMs is that they do not scale well, generally limited to samples in the thousands rather than tens of thousands, if a proper grid search of parameters is to be performed. 

## Pulling Data and Imports 

In [7]:
import pandas as pd
from sklearn import svm
from sklearn import model_selection
from sklearn import preprocessing
from sklearn import metrics
%pylab inline

Populating the interactive namespace from numpy and matplotlib


In [8]:
x_data = pd.read_csv('C:\\Users\\smcdo\\OneDrive\\Documents\\Model_Framework\\Benchmarks\\x_data.csv', index_col=0)
y_data = pd.read_csv('C:\\Users\\smcdo\\OneDrive\\Documents\\Model_Framework\\Benchmarks\\y_data.csv', index_col=0, squeeze=True)

### SVM Regression

### Linear Kernel (LibSVM)

In [13]:
svm_mod = svm.SVR(kernel='linear', tol=0.0001, C=1.0, epsilon=0.1, shrinking=True, cache_size=200)
%timeit  svm_mod.fit(x_data, y_data)
fitted_vals = svm_mod.predict(x_data)
print('Fitted Values: %s' % fitted_vals)

1 loop, best of 3: 6.68 s per loop
Fitted Values: [-2.36468398  3.85682403  1.81944099 ..., -1.52230135 -0.37722462
  1.39538166]


### Linear Kernel (LibLinear)

In [14]:
#x_stand = preprocessing.StandardScaler(x_data)
svm_mod = svm.LinearSVR(tol=0.0001, C=1.0, epsilon=0.1)
%timeit  svm_mod.fit(x_data, y_data)
fitted_vals = svm_mod.predict(x_data)
print('Fitted Values: %s' % fitted_vals)

1 loop, best of 3: 326 ms per loop
Fitted Values: [-2.36294961  3.85065666  1.81657867 ..., -1.52076929 -0.37782716
  1.39027483]


### RBF Kernel (LibSVM)

In [15]:
svm_mod = svm.SVR(kernel='rbf', tol=0.0001, C=1.0, epsilon=0.1, gamma=0.1, shrinking=False, cache_size=200)
%timeit svm_mod.fit(x_data, y_data)
fitted_vals = svm_mod.predict(x_data)
print('Fitted Values: %s' % fitted_vals)

1 loop, best of 3: 7.16 s per loop
Fitted Values: [-2.34312897  3.87952142  1.79411853 ..., -1.51426592 -0.34624601
  1.38995285]


### RBF Kernel (CV)

In [16]:
param_grid = [{'C': [4, 8, 16], 'gamma': [.5, 1, 2], 'kernel': ['rbf'], 'epsilon': [.1]}]
base_model = svm.SVR()
model = model_selection.GridSearchCV(base_model, param_grid, cv=5)
model.fit(x_data,y_data)
pred = model.predict(x_data)
score = model.score(x_data,y_data)
print('Fitted Values: %s' % pred)
print('MSE: %s' % score)
print('CV Params: %s' % model.best_params_)
print('CV Grid Score: %s' % model.grid_scores_)

Fitted Values: [-2.34387759  3.93464834  1.84249673 ..., -1.55036    -0.38265523
  1.38974087]
MSE: 0.998273125082
CV Params: {'epsilon': 0.1, 'C': 16, 'gamma': 0.5, 'kernel': 'rbf'}
CV Grid Score: [mean: 0.98364, std: 0.00433, params: {'epsilon': 0.1, 'C': 4, 'gamma': 0.5, 'kernel': 'rbf'}, mean: 0.95481, std: 0.00979, params: {'epsilon': 0.1, 'C': 4, 'gamma': 1, 'kernel': 'rbf'}, mean: 0.88201, std: 0.01413, params: {'epsilon': 0.1, 'C': 4, 'gamma': 2, 'kernel': 'rbf'}, mean: 0.98477, std: 0.00368, params: {'epsilon': 0.1, 'C': 8, 'gamma': 0.5, 'kernel': 'rbf'}, mean: 0.95604, std: 0.00926, params: {'epsilon': 0.1, 'C': 8, 'gamma': 1, 'kernel': 'rbf'}, mean: 0.88417, std: 0.01408, params: {'epsilon': 0.1, 'C': 8, 'gamma': 2, 'kernel': 'rbf'}, mean: 0.98490, std: 0.00362, params: {'epsilon': 0.1, 'C': 16, 'gamma': 0.5, 'kernel': 'rbf'}, mean: 0.95594, std: 0.00925, params: {'epsilon': 0.1, 'C': 16, 'gamma': 1, 'kernel': 'rbf'}, mean: 0.88413, std: 0.01407, params: {'epsilon': 0.1, 'C'



### Machine Learning Interface

In [None]:
import Machine_Learning_Interface.svm_regression as sm
model = sm.SVR(scale=True, kernel='rbf', parameters=[{'C' : np.logspace(-3, 3, 7), 'epsilon' : np.logspace(-3, 3, 7)}], cv_folds=3)
model.fit(x_data,y_data)
pred = model.predict(x_data)
score = metrics.mean_squared_error(pred, y_data)
model.diagnostics()
print('Fitted Values: %s' % pred)
print('MSE: %s' % score)