Q1. What is the relationship between polynomial functions and kernel functions in machine learning
algorithms?

Ans:-In machine learning, kernel functions are commonly used to transform input data into a higher-dimensional feature space, where linear models can be used to make better predictions. Polynomial functions can be used as a type of kernel function, specifically as a polynomial kernel function.

A polynomial kernel function is defined as:

K(x, y) = (x^T y + c)^d

where x and y are input vectors, c is a constant, and d is the degree of the polynomial. This function computes the dot product of the input vectors in a higher-dimensional space.


Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?



In [1]:
from sklearn import svm
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Generate a sample dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=0, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an instance of the SVM classifier with a polynomial kernel
poly_svm = svm.SVC(kernel='poly', degree=3, C=1.0, gamma='auto')

# Train the SVM classifier on the training set
poly_svm.fit(X_train, y_train)

# Predict the labels of the testing set
y_pred = poly_svm.predict(X_test)

# Evaluate the performance of the SVM classifier using classification metrics
print('Accuracy:', accuracy_score(y_test, y_pred))
print('Precision:', precision_score(y_test, y_pred))
print('Recall:', recall_score(y_test, y_pred))
print('F1 score:', f1_score(y_test, y_pred))


Accuracy: 0.875
Precision: 0.8446601941747572
Recall: 0.90625
F1 score: 0.8743718592964823


Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

Ans:- In Support Vector Regression (SVR), epsilon is a hyperparameter that controls the width of the epsilon-insensitive tube around the regression line. The epsilon-insensitive tube is the region within which no penalty is given for errors made by the SVR model. The larger the value of epsilon, the wider the epsilon-insensitive tube.

Increasing the value of epsilon can lead to an increase in the number of support vectors in SVR. This is because a wider epsilon-insensitive tube allows more data points to fall within the tube without incurring a penalty, and hence, more support vectors are required to define the regression line. Conversely, decreasing the value of epsilon leads to a narrower epsilon-insensitive tube, which may cause some data points to be penalized, resulting in fewer support vectors.

Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter
affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works
and provide examples of when you might want to increase or decrease its value?

ANS:- 

Kernal function is used for featurisation, it increases the dimension of the dataset so that it can be classified, they are of various type like polynomial kernel, rvf kernal etc.

C parameter is used to regulate regularisation term, if C increase effect of regularization term decreases and vice versa.

epsilon paramenter:   Epsilon in the epsilon-SVR model. It specifies the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value. Must be non-negative.

Gamma parameter is Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’



Q5. Assignment:
L Import the necessary libraries and load the dataseg
L Split the dataset into training and testing setZ
L Preprocess the data using any technique of your choice (e.g. scaling, normaliMationK
L Create an instance of the SVC classifier and train it on the training datW
L hse the trained classifier to predict the labels of the testing datW
L Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy,
precision, recall, F1-scoreK
L Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to
improve its performanc_
L Train the tuned classifier on the entire dataseg
L Save the trained classifier to a file for future use.

In [2]:
import pandas as pd
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
# Load the data from the file into a DataFrame
df = pd.read_csv('spambase.data', header=None)
X = df.drop(57, axis=1)
y=df[57]
X

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,47,48,49,50,51,52,53,54,55,56
0,0.00,0.64,0.64,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.0,0.000,0.000,0.0,0.778,0.000,0.000,3.756,61,278
1,0.21,0.28,0.50,0.0,0.14,0.28,0.21,0.07,0.00,0.94,...,0.0,0.000,0.132,0.0,0.372,0.180,0.048,5.114,101,1028
2,0.06,0.00,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.0,0.010,0.143,0.0,0.276,0.184,0.010,9.821,485,2259
3,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.0,0.000,0.137,0.0,0.137,0.000,0.000,3.537,40,191
4,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.0,0.000,0.135,0.0,0.135,0.000,0.000,3.537,40,191
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4596,0.31,0.00,0.62,0.0,0.00,0.31,0.00,0.00,0.00,0.00,...,0.0,0.000,0.232,0.0,0.000,0.000,0.000,1.142,3,88
4597,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.0,0.000,0.000,0.0,0.353,0.000,0.000,1.555,4,14
4598,0.30,0.00,0.30,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.0,0.102,0.718,0.0,0.000,0.000,0.000,1.404,6,118
4599,0.96,0.00,0.00,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.0,0.000,0.057,0.0,0.000,0.000,0.000,1.147,5,78


In [3]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_scaled
X= pd.DataFrame(X_scaled, columns=X.columns)
X

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,47,48,49,50,51,52,53,54,55,56
0,-0.342434,0.330885,0.712859,-0.0469,0.011565,-0.350266,-0.291794,-0.262562,-0.323302,-0.371364,...,-0.111546,-0.158453,-0.514307,-0.155198,0.624007,-0.308355,-0.103048,-0.045247,0.045298,-0.008724
1,0.345359,0.051909,0.435130,-0.0469,-0.256117,0.672399,0.244743,-0.088010,-0.323302,1.086711,...,-0.111546,-0.158453,-0.026007,-0.155198,0.126203,0.423783,0.008763,-0.002443,0.250563,1.228324
2,-0.145921,-0.165072,0.851723,-0.0469,1.364846,0.343685,0.193644,0.036670,1.974017,0.016422,...,-0.111546,-0.117376,0.014684,-0.155198,0.008496,0.440053,-0.079754,0.145921,2.221106,3.258733
3,-0.342434,-0.165072,-0.556761,-0.0469,0.472573,-0.350266,0.500237,1.308402,0.789462,0.605857,...,-0.111546,-0.158453,-0.007511,-0.155198,-0.161934,-0.308355,-0.103048,-0.052150,-0.062466,-0.152222
4,-0.342434,-0.165072,-0.556761,-0.0469,0.472573,-0.350266,0.500237,1.308402,0.789462,0.605857,...,-0.111546,-0.158453,-0.014910,-0.155198,-0.164387,-0.308355,-0.103048,-0.052150,-0.062466,-0.152222
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4596,0.672880,-0.165072,0.673183,-0.0469,-0.464314,0.781971,-0.291794,-0.262562,-0.323302,-0.371364,...,-0.111546,-0.158453,0.343917,-0.155198,-0.329912,-0.308355,-0.103048,-0.127640,-0.252336,-0.322110
4597,-0.342434,-0.165072,-0.556761,-0.0469,-0.464314,-0.350266,-0.291794,-0.262562,-0.323302,-0.371364,...,-0.111546,-0.158453,-0.514307,-0.155198,0.102907,-0.308355,-0.103048,-0.114623,-0.247205,-0.444165
4598,0.640128,-0.165072,0.038373,-0.0469,-0.464314,-0.350266,-0.291794,-0.262562,-0.323302,-0.371364,...,-0.111546,0.260533,2.141746,-0.155198,-0.329912,-0.308355,-0.103048,-0.119382,-0.236941,-0.272628
4599,2.801763,-0.165072,-0.556761,-0.0469,0.011565,-0.350266,-0.291794,-0.262562,-0.323302,-0.371364,...,-0.111546,-0.158453,-0.303450,-0.155198,-0.329912,-0.308355,-0.103048,-0.127483,-0.242073,-0.338604


In [4]:
from sklearn.model_selection import train_test_split, GridSearchCV

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = SVC()
model.fit(X_train, y_train)
y_pred= model.predict(X_test)
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.92      0.97      0.94       531
           1       0.95      0.89      0.92       390

    accuracy                           0.93       921
   macro avg       0.94      0.93      0.93       921
weighted avg       0.94      0.93      0.93       921



In [5]:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
from scipy.stats import uniform, randint

# define the parameter space to search over
param_distributions = {
    'C': uniform(loc=0, scale=10),
    'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
    'degree': randint(low=1, high=10),
    'gamma': ['scale', 'auto'] + list(np.logspace(-3, 3, num=10)),
}

# create an SVC model
svc = SVC()

# create a RandomizedSearchCV object with 10-fold cross-validation
rs = RandomizedSearchCV(estimator=svc, 
                        param_distributions=param_distributions, 
                        n_iter=5, 
                        cv=10, 
                        random_state=42)

# fit the model to the data
rs.fit(X_train, y_train)

# print the best parameters and accuracy score
print("Best Parameters: ", rs.best_params_)
print("Best Accuracy Score: ", rs.best_score_)


<IPython.core.display.Javascript object>

Best Parameters:  {'C': 3.745401188473625, 'degree': 8, 'gamma': 0.021544346900318832, 'kernel': 'rbf'}
Best Accuracy Score:  0.9298913043478262


In [6]:
model = SVC(C= 3.745401188473625, degree = 8, gamma = 0.021544346900318832, kernel = 'rbf')

In [7]:
model.fit(X_train, y_train)
y_pred= model.predict(X_test)
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.93      0.97      0.95       531
           1       0.95      0.90      0.93       390

    accuracy                           0.94       921
   macro avg       0.94      0.94      0.94       921
weighted avg       0.94      0.94      0.94       921



In [8]:
model = SVC(C= 3.745401188473625, degree = 8, gamma = 0.021544346900318832, kernel = 'rbf')
model.fit(X, y)

In [9]:
# Save the trained model for further use
from joblib import dump, load
dump(model, 'svr_model.joblib')

['svr_model.joblib']