## Support Vector Machines

Previously, we attempted to predict an automobile's mpg as a function of several variables by using linear models. But this type of model does not have the flexibility to fit nonlinear relationships, and our data may not be linearly related. Therefore, in this notebook, we attempt to fit more complex models to the data, namely support vector machines.

In [1]:
import numpy as np
from sklearn.svm import LinearSVR, SVR
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score, GridSearchCV

from mtcars_practice.config import data_dir

In [2]:
X_train = np.load(data_dir + '/train_test/X_train.npy')
X_test = np.load(data_dir + '/train_test/X_test.npy')

y_train = np.load(data_dir + '/train_test/y_train.npy')
y_test = np.load(data_dir + '/train_test/y_test.npy')

The LinearSVR and SVR classes implement support vector machines (svm) in scikit-learn. Linear support vector machines use a linear kernel function, as opposed to more exotic kernels like the radial basis function. Both LinearSVR and SVR implement these, although LinearSVR is faster, so we will use linearSVR for our analysis.

In [3]:
lin_svm = LinearSVR(epsilon=1.5)
lin_svm.fit(X_train, y_train)

y_pred = lin_svm.predict(X_train)
svm_rmse = np.sqrt(mean_squared_error(y_train, y_pred))

print(svm_rmse)

4.379942346177799


In [4]:
print(X_train[0, :])

[0.72868217 0.74860335 0.58151403 0.17857143 0.         0.
 0.         0.         1.         1.         0.         0.        ]
