# SUPPORT VECTOR REGRESSOR

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df = sns.load_dataset('tips')

In [26]:
df['sex'].value_counts()

sex
Male      157
Female     87
Name: count, dtype: int64

In [28]:
df['time'].value_counts()

time
Dinner    176
Lunch      68
Name: count, dtype: int64

In [29]:
df['day'].value_counts()

day
Sat     87
Sun     76
Thur    62
Fri     19
Name: count, dtype: int64

feature encoding (Label Encoding and onehot encoding)


where there are two values in a categorial feature like smoker and sex, we will apply label encodng means replace them with 0 and 1

where there are more than two values in a feature we will apply onehot encoding

In [3]:
X = df[['tip', 'sex', 'smoker', 'day', 'time', 'size']]

In [4]:
Y = df['total_bill']

do train test before feature encoding to prevent data leakage that is model shouldnt have any information about test data in any way

In [5]:
# train test split
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.25,random_state=10)

In [6]:
# first pickup features which are binary and apply label encoding
from sklearn.preprocessing import LabelEncoder

In [7]:
le1= LabelEncoder()
le2 = LabelEncoder()
le3 = LabelEncoder()

In [8]:
X_train['sex'] = le1.fit_transform(X_train['sex'])
X_train['smoker'] = le2.fit_transform(X_train['smoker'])
X_train['time'] = le3.fit_transform(X_train['time'])

In [9]:
X_test['sex'] = le1.transform(X_test['sex'])
X_test['smoker'] = le2.transform(X_test['smoker'])
X_test['time'] = le3.transform(X_test['time'])

In [10]:
# now one hot encoding--ColumnTransformer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder



we use trasformers to do onehot encoding and give this transformers in form of tupils, onehot is transforming name it can be anything ig,OnehotEncoder is class name, drop = first means after doind onehotencoding it will remove the first feature,then we tell at what index we will apply onehotencoding here our 'day' column is 3rd index. Remainder can have two values either drop or passthrough, drop means all other columns will be dropped other than the column on which we are applying onehotencoding, passthrough means all other columns will stay. here we want other columns so we apply passthrough

In [11]:
ct=ColumnTransformer(transformers=[('onehot',OneHotEncoder(drop='first'),[3])],remainder='passthrough')

# make sure you give your index in form of list like here we gave [3] and not 3, else it will show error when u will perform fit transform

In [12]:
# we selected drop first so first category has been dropped and remaining three categories have been onehotencoded in a way
# that they contain first category data also, this is done to remove redundncy and increase efficency


X_train=ct.fit_transform(X_train)

In [13]:
X_test=ct.transform(X_test)

In [14]:
# SVR -- SUPPORT VECTOR REGRESSOR
from sklearn.svm import SVR

In [15]:
svr = SVR()

In [16]:
svr.fit(X_train,Y_train)

In [17]:
y_pred = svr.predict(X_test)

In [18]:
from sklearn.metrics import r2_score,mean_absolute_error

In [19]:
score = r2_score(Y_test,y_pred)
mae = mean_absolute_error(Y_test,y_pred)
print("score: ",score)
print('mae: ',mae)

score:  0.46028114561159283
mae:  4.1486423210190235


In regression tasks like SVR, kernels are often used to handle non-linear relationships between the features and the target variable. This is because SVR aims to fit a regression line (hyperplane) to the data, and kernels can help create a more flexible and accurate model by mapping the data into a higher-dimensional space

In [20]:
#hyperparamter tuning
from sklearn.model_selection import GridSearchCV
 # defining the parameter range
param_grid = {'C': [0.1,1,10,100,1000],
               'gamma':[1,0.1,0.01,0.001,0.0001],
               'kernel':['rbf']}

In [21]:
grid = GridSearchCV(SVR(),param_grid,refit=True,verbose=3)

In [22]:
grid.fit(X_train,Y_train)

Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV 1/5] END .......C=0.1, gamma=1, kernel=rbf;, score=-0.067 total time=   0.0s
[CV 2/5] END .......C=0.1, gamma=1, kernel=rbf;, score=-0.058 total time=   0.0s
[CV 3/5] END .......C=0.1, gamma=1, kernel=rbf;, score=-0.145 total time=   0.0s
[CV 4/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.025 total time=   0.1s
[CV 5/5] END .......C=0.1, gamma=1, kernel=rbf;, score=-0.089 total time=   0.1s
[CV 1/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.013 total time=   0.0s
[CV 2/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.021 total time=   0.0s
[CV 3/5] END .....C=0.1, gamma=0.1, kernel=rbf;, score=-0.010 total time=   0.0s
[CV 4/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.124 total time=   0.0s
[CV 5/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.050 total time=   0.0s
[CV 1/5] END ....C=0.1, gamma=0.01, kernel=rbf;, score=-0.053 total time=   0.0s
[CV 2/5] END ....C=0.1, gamma=0.01, kernel=rbf;

In [23]:
y_pred1 = grid.predict(X_test)

In [24]:
score = r2_score(Y_test,y_pred1)
mae = mean_absolute_error(Y_test,y_pred1)
print("score: ",score)
print('mae: ',mae)

score:  0.5081618245078652
mae:  3.8685147526100887


In [25]:
#checking our params
grid.best_params_

{'C': 1000, 'gamma': 0.0001, 'kernel': 'rbf'}