# Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?
Answer:

In machine learning, especially in Support Vector Machines (SVMs), kernel functions are used to transform the input data into a higher-dimensional space where it becomes easier to separate the data using a hyperplane. A polynomial kernel is one such kernel function. The polynomial kernel function allows the SVM to create decision boundaries that are polynomial functions of the input features.

K(x,y)=(x.y+c)**d

# Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # using only the first two features for visualization
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train an SVM with a polynomial kernel
clf = SVC(kernel='poly', degree=3, C=1.0)
clf.fit(X_train, y_train)

# Predict the labels for the testing set
y_pred = clf.predict(X_test)

# Compute the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')


Accuracy: 0.73


# Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?
Answer:

In Support Vector Regression (SVR), the epsilon parameter (ϵ) defines a margin of tolerance where no penalty is given to errors. Increasing the value of ϵ allows for a wider margin where more data points are considered within the tolerance and thus not penalized. This typically results in fewer support vectors because more data points fall within the ϵ-tube and do not influence the model.

# Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?

1. Kernel Function
Effect: Determines the shape and flexibility of the decision boundary.
Usage: Choose based on data complexity:
Linear: For linearly separable data.
Polynomial: For polynomial relationships.
RBF: For complex, non-linear relationships.
2. C Parameter
Effect: Controls the trade-off between achieving a low error on training data and minimizing the model complexity.
Usage:
Increase C: To reduce bias (allow fewer misclassifications but risk overfitting).
Decrease C: To increase margin width (allow more misclassifications but improve generalization).
3. Epsilon Parameter (in SVR)
Effect: Defines a margin of tolerance where no penalty is given to errors.
Usage:
Increase epsilon: To allow more deviation from the true values within the margin (reduce the number of support vectors).
Decrease epsilon: To make the model more sensitive to deviations (increase the number of support vectors).
4. Gamma Parameter
Effect: Defines how far the influence of a single training example reaches.
Usage:
Increase gamma: To make the model more sensitive to specific training examples (capture fine details, risk overfitting).
Decrease gamma: To make the model less sensitive to specific training examples (smoother decision boundary, improve generalization).


# Q.5Assignment:
 Import the necessary libraries and load the dataseg
 Split the dataset into training and testing setZ
 Preprocess the data using any technique of your choice (e.g. scaling, normaliMationK
 Create an instance of the SVC classifier and train it on the training datW
 hse the trained classifier to predict the labels of the testing datW
 Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy,
precision, recall, F1-scoreK
 Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to
improve its performanc_
 Train the tuned classifier on the entire dataseg
 Save the trained classifier to a file for future use.

In [2]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import GridSearchCV
import joblib

# Load the dataset
data = load_iris()
X = data.data
y = data.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Preprocess the data (scaling)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


In [3]:
# Create an instance of the SVC classifier
svc = SVC(kernel='linear', C=1.0, random_state=42)

# Train the classifier on the training data
svc.fit(X_train, y_train)


In [4]:
# Predict the labels of the testing data
y_pred = svc.predict(X_test)


In [5]:
# Evaluate the performance of the classifier
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print(f'Accuracy: {accuracy:.2f}')
print(f'Precision: {precision:.2f}')
print(f'Recall: {recall:.2f}')
print(f'F1 Score: {f1:.2f}')


Accuracy: 0.98
Precision: 0.98
Recall: 0.98
F1 Score: 0.98


# Tune the hyperparameters of the SVC classifier using GridSearchCV to improve its performance

In [6]:
# Define the parameter grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
    'degree': [2, 3, 4],
    'gamma': ['scale', 'auto']
}

# Create a GridSearchCV instance
grid_search = GridSearchCV(SVC(), param_grid, cv=5, scoring='accuracy', n_jobs=-1)

# Fit the GridSearchCV instance on the training data
grid_search.fit(X_train, y_train)

# Get the best parameters
best_params = grid_search.best_params_
print(f'Best Parameters: {best_params}')


Best Parameters: {'C': 10, 'degree': 2, 'gamma': 'scale', 'kernel': 'linear'}


In [7]:
# Create an instance of the SVC classifier with the best parameters
svc_tuned = SVC(**best_params, random_state=42)

# Train the classifier on the entire dataset
svc_tuned.fit(X, y)


In [8]:
# Save the trained classifier to a file
joblib.dump(svc_tuned, 'svc_tuned_model.joblib')

# Load the model from the file (for future use)
# loaded_model = joblib.load('svc_tuned_model.joblib')


['svc_tuned_model.joblib']