# Support Vector Regression
- The nature of statistical learning theory by Vladimir Vapnik
- While in simple linear regression we would use the OLS method to determine the regression line
- In SVR, instead of a line, we have a tube with epsilon(si) width either side of the central line
- E-Insensitive Tube
- The points inside this tube are disregarded for error
- The vectors outside the tube are called support vectors as they help determine the size/shape of the tube
- This is like a margin of error that we are allowing our model to have
- Error is distance between the point to the tube
- '*' is used to denote errors below the tube
- Potential Support Vectors - Vectors right on the margin of the tube
- Can be linear or non-linear in nature

In [20]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [21]:
dataset = pd.read_csv('Position_Salaries.csv')

In [22]:
dataset.sample(n=3)

Unnamed: 0,Position,Level,Salary
5,Region Manager,6,150000
9,CEO,10,1000000
6,Partner,7,200000


In [23]:
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, -1:].values

In [24]:
X

array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10]])

In [25]:
y

array([[  45000],
       [  50000],
       [  60000],
       [  80000],
       [ 110000],
       [ 150000],
       [ 200000],
       [ 300000],
       [ 500000],
       [1000000]])

- In SVR, feature scaling is required as the equation is implicit and does not explicitly calculate coefficients to compensate for the change in magnitude due to difference in units  
- In Linear Regression we could do without the same but it is suggested we should since it makes it easy for us to compare features with different units

In [26]:
from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()
sc_X.fit(X)
X = sc_X.transform(X)

sc_y = StandardScaler()
sc_y.fit(y)
y = sc_y.transform(y)

In [27]:
X

array([[-1.5666989 ],
       [-1.21854359],
       [-0.87038828],
       [-0.52223297],
       [-0.17407766],
       [ 0.17407766],
       [ 0.52223297],
       [ 0.87038828],
       [ 1.21854359],
       [ 1.5666989 ]])

In [28]:
y

array([[-0.72004253],
       [-0.70243757],
       [-0.66722767],
       [-0.59680786],
       [-0.49117815],
       [-0.35033854],
       [-0.17428902],
       [ 0.17781001],
       [ 0.88200808],
       [ 2.64250325]])