## 7 Apr Support Vector Machines - 2

Q1. What is the relationship between polynomial functions and kernel functions in machine learning 
algorithms?

Ans:
    
    polynomial functions and kernel functions are both used in machine learning algorithms, but they have different roles. Polynomial functions are used to model non-linear relationships between features and the output variable, while kernel functions are used to transform the data into a higher-dimensional space for linear classification. Polynomial functions can also be used as kernel functions in SVMs to perform non-linear classification.

Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

Ans: 

In [1]:
from sklearn.svm import SVC
classifier = SVC(kernel='Poly')

Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

Ans: 
    
    In Support Vector Regression (SVR), epsilon is a hyperparameter that controls the width of the epsilon-insensitive zone around the predicted value. The epsilon-insensitive zone is the range within which errors are considered acceptable and do not contribute to the loss function.

    Increasing the value of epsilon in SVR can have an impact on the number of support vectors. Support vectors are the data points that lie on the margin or violate the margin in the SVM algorithm.

    When the value of epsilon is increased, the width of the epsilon-insensitive zone is also increased, which means that more data points can be predicted with errors within the epsilon-insensitive zone. This leads to a larger margin and a smaller number of support vectors.

    However, if the increase in epsilon is too large, then more data points may fall within the epsilon-insensitive zone, leading to a higher number of support vectors.

    Therefore, the relationship between the value of epsilon and the number of support vectors in SVR is not straightforward and depends on the specific dataset and problem being addressed. Generally speaking, small values of epsilon lead to larger margins and a larger number of support vectors, while larger values of epsilon lead to smaller margins and a smaller number of support vectors.

Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter 
affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works 
and provide examples of when you might want to increase or decrease its value?

Ans: 

    1. Kernel function: The kernel function is used to transform the input data into a higher-dimensional space where the SVM can find a linear decision boundary that separates the data into different classes. The choice of kernel function can greatly impact the performance of the model. For example, a radial basis function (RBF) kernel is often used for non-linear problems, while a linear kernel is appropriate for linear problems. Increasing the complexity of the kernel function can lead to overfitting, so it is important to choose the appropriate kernel function for the problem at hand.

    2. C parameter: The C parameter controls the trade-off between achieving a low training error and a low testing error. A small value of C allows for a larger number of support vectors, which can lead to a wider margin and better generalization. However, a large value of C results in a smaller number of support vectors, which can lead to a tighter margin and better training error. Increasing the value of C can help to reduce the bias of the model.

    3. Epsilon parameter: The epsilon parameter controls the width of the epsilon-insensitive zone around the predicted value. Increasing the value of epsilon leads to a larger margin and fewer support vectors, which can improve the generalization of the model. However, if the value of epsilon is too large, then more data points may fall within the epsilon-insensitive zone, leading to a higher number of support vectors.

    4. Gamma parameter: The gamma parameter determines the shape of the decision boundary and is used in the RBF kernel function. A small value of gamma results in a wide curve, while a large value of gamma results in a narrow curve. A larger value of gamma may lead to overfitting, while a smaller value may lead to underfitting. Increasing the value of gamma can help to increase the complexity of the model.

Q5. Assignment:

Example : on tips dataset, predict the time

#### 1. Import the necessary libraries and load the dataset


In [2]:

import seaborn as sns
df = sns.load_dataset("tips")

In [3]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [4]:
# Check Missing values 

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   total_bill  244 non-null    float64 
 1   tip         244 non-null    float64 
 2   sex         244 non-null    category
 3   smoker      244 non-null    category
 4   day         244 non-null    category
 5   time        244 non-null    category
 6   size        244 non-null    int64   
dtypes: category(4), float64(2), int64(1)
memory usage: 7.4 KB


In [5]:
df.columns

Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'], dtype='object')

#### 2.Preprocess the data using any technique of your choice (e.g. scaling, normalization)

In [6]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(df[['total_bill', 'tip']])
data = scaler.transform(df[['total_bill', 'tip']])
df1 = pd.DataFrame(data, columns=['total_bill', 'tip'])

In [7]:
df['total_bill'] = df1['total_bill']
df['tip'] = df1['tip']

In [8]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,-0.314711,-1.439947,Female,No,Sun,Dinner,2
1,-1.063235,-0.969205,Male,No,Sun,Dinner,3
2,0.13778,0.363356,Male,No,Sun,Dinner,3
3,0.438315,0.225754,Male,No,Sun,Dinner,2
4,0.540745,0.44302,Female,No,Sun,Dinner,4


#### Encoding

In [9]:
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
df['sex'] = encoder.fit_transform(df['sex'])
df['smoker'] = encoder.fit_transform(df['smoker'])
df['day'] = encoder.fit_transform(df['day'])
df['time'] = encoder.fit_transform(df['time'])

In [10]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,-0.314711,-1.439947,0,0,2,0,2
1,-1.063235,-0.969205,1,0,2,0,3
2,0.13778,0.363356,1,0,2,0,3
3,0.438315,0.225754,1,0,2,0,2
4,0.540745,0.44302,0,0,2,0,4


In [11]:
#### 2. Split the dataset into training and testing set

In [12]:
# Dependent(y) and Independent(X) variable

y = df['time']
X = df[['total_bill','tip','sex','smoker','day','size']]

In [13]:
X.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,size
0,-0.314711,-1.439947,0,0,2,2
1,-1.063235,-0.969205,1,0,2,3
2,0.13778,0.363356,1,0,2,3
3,0.438315,0.225754,1,0,2,2
4,0.540745,0.44302,0,0,2,4


In [14]:
y.head()

0    0
1    0
2    0
3    0
4    0
Name: time, dtype: int64

#### 3. Split the dataset into training and testing set

In [15]:
#train test split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=40)

In [16]:
X_train.count()

total_bill    204
tip           204
sex           204
smoker        204
day           204
size          204
dtype: int64

In [17]:
X_test.count()

total_bill    40
tip           40
sex           40
smoker        40
day           40
size          40
dtype: int64

In [18]:
y_train.count()

204

In [19]:
y_test.count()

40

#### 4. Create an instance of the SVC classifier and train it on the training data

In [20]:
from sklearn.svm import SVC
svc = SVC()
svc.fit(X_train, y_train)

#### 5.  use the trained classifier to predict the labels of the testing data


In [22]:
y_pred = svc.predict(X_test)

#### 6. Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy, 
precision, recall, F1-scoreK

In [23]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [24]:
accuracy_score(y_pred, y_test)

0.975

In [25]:
precision_score(y_pred, y_test)

1.0

In [26]:
recall_score(y_pred, y_test)

0.9230769230769231

In [27]:
f1_score(y_pred, y_test)

0.9600000000000001

#### 7.  Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to 
improve its performanc_

In [32]:
from sklearn.model_selection import GridSearchCV
parameter = {'C' : [0.1, 10,100, 1000],
              'gamma' : [1, 0.1, 0.001, 0.0001],
              'kernel' : ['poly', 'linear', 'rbf'],
             }
              

In [33]:
grid = GridSearchCV(SVC(),
                    param_grid=parameter,
                    refit=True,
                    cv = 5,
                    verbose=3)

#### 8. Train the tuned classifier on the entire dataseg

In [34]:
grid.fit(X_train, y_train)

Fitting 5 folds for each of 48 candidates, totalling 240 fits
[CV 1/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.976 total time=   0.0s
[CV 2/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.927 total time=   0.0s
[CV 3/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.976 total time=   0.0s
[CV 4/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.878 total time=   0.0s
[CV 5/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.950 total time=   0.0s
[CV 1/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.976 total time=   0.0s
[CV 2/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.976 total time=   0.0s
[CV 3/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.976 total time=   0.0s
[CV 4/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.927 total time=   0.0s
[CV 5/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.950 total time=   0.0s
[CV 1/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.780 total time=   0.0s
[CV 2/5] END ........C=0.1, gamma=1, kernel=rbf

In [35]:
grid.best_params_

{'C': 100, 'gamma': 0.1, 'kernel': 'rbf'}

In [36]:
y_pred2 = grid.predict(X_test)

In [37]:
#Evaluate the score 

accuracy_score(y_pred2, y_test)

0.95

In [38]:
precision_score(y_pred2, y_test)

1.0

In [39]:
recall_score(y_pred2, y_test)

0.8571428571428571

In [40]:
f1_score(y_pred2, y_test)

0.923076923076923

#### 9. Save the trained classifier to a file for future use

In [None]:
import pickle
