# Support Vector Regressor Pipeline

The complete explanation of this notebook is available at The explanation for this notebook is available at https://youranalystbuddy.com/support-vector-machine-pipeline/

For regression, we use the auto-mpg data. The target is `mpg`, miles-per-gallon of cars

### Load and split data

In [1]:
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np

In [2]:
data = pd.read_csv('auto-mpg.csv')
data.head(2)

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin
0,18.0,8,307.0,130.0,3504,12.0,70,1
1,15.0,8,350.0,165.0,3693,11.5,70,1


In [3]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(data, test_size=0.25)

### Processing pipeline

In [4]:
num_cols = ['cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'year']
cat_cols = ['origin']
target = 'mpg'

from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import FunctionTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

#pipeline for numeric features
#we need to impute horsepower
num_pipeline = Pipeline([
    ('impute', SimpleImputer(strategy='median')),
    ('standardize', StandardScaler())
])

#pipeline for class features
cat_pipeline = Pipeline([
    ('encoder', OneHotEncoder())
])

#full pipeline - combine numeric and class pipelines
process_pipeline = ColumnTransformer([
    ('numeric', num_pipeline, num_cols),
    ('class', cat_pipeline, cat_cols)
])

### Modeling pipeline

In [5]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVR

svr = Pipeline([
    ('processing', process_pipeline),
    ('svr',SVR())
])

param_grid = [
    {'svr__kernel':['linear'], 
     'svr__C' : [0.1, 1, 10]},
    {'svr__kernel':['poly'], 
     'svr__degree' : [2, 3, 4], 
     'svr__coef0' : [0, 1, 10], 
     'svr__C' : [0.1, 1, 10]},
    {'svr__kernel':['rbf'], 
     'svr__gamma' : [0.001, 0.01, 0.1, 1, 10, 100], 
     'svr__C' : [0.1, 1, 10]}
]

grid_search = GridSearchCV(svr, param_grid, cv=5, scoring='r2', return_train_score=True)

grid_search.fit(train, train[target])

The finetuned model (note that score is now R2 since we are doing regression):

### Train and test

In [6]:
print(grid_search.best_params_)
print(grid_search.best_score_)

{'svr__C': 1, 'svr__coef0': 1, 'svr__degree': 4, 'svr__kernel': 'poly'}
0.8665966760874261


And the testing performance 

In [7]:
grid_search.score(test, test[target])

0.909048880828731