
Q1. Polynomial functions and kernel functions are closely related in machine learning algorithms, particularly in support vector machines (SVMs). A polynomial kernel function is a type of kernel function used in SVMs to implicitly map input data into a higher-dimensional feature space, where linear separation may be possible. Polynomial functions, on the other hand, are mathematical functions that involve terms raised to powers, such as 
�
2
x 
2
 , 
�
3
x 
3
 , etc. In SVMs, the polynomial kernel function computes the dot product between two data points in the original feature space, raising it to a specified power 
�
d, and adding an optional coefficient term and constant term. This allows the SVM to learn non-linear decision boundaries in the original feature space without explicitly transforming the data into higher dimensions.

Q2. To implement an SVM with a polynomial kernel in Python using Scikit-learn

In [1]:
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Preprocess data (e.g., scaling)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create an instance of SVC with polynomial kernel
svc = SVC(kernel='poly')

# Train the classifier on the training data
svc.fit(X_train_scaled, y_train)

# Use the trained classifier to predict labels of the testing data
y_pred = svc.predict(X_test_scaled)

# Evaluate the performance of the classifier (e.g., accuracy)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.9666666666666667


Q3. Increasing the value of epsilon in Support Vector Regression (SVR) typically leads to fewer support vectors
. Epsilon determines the width of the margin in SVR, which affects the trade-off between fitting the training data closely and allowing more flexibility in the margin. When epsilon is increased, the margin becomes wider, allowing more training points to fall within the margin without affecting the loss function. Consequently, fewer support vectors are needed to define the margin, leading to a sparser model.

Q4. In Support Vector Regression (SVR), the choice of kernel function, C parameter, epsilon parameter, and gamma parameter can significantly affect the performance of the model:

Kernel Function: The choice of kernel function determines the form of the decision boundary and the model's ability to capture non-linear relationships in the data. For example, a polynomial kernel may be suitable for data with polynomial relationships, while an RBF (Radial Basis Function) kernel can capture more complex, non-linear patterns.

C Parameter: The C parameter controls the trade-off between the model's complexity and the degree to which deviations larger than epsilon are tolerated. A higher value of C allows for smaller margins and fewer violations of the margin constraints, potentially leading to overfitting, while a lower value of C encourages a wider margin and a more robust model.

Epsilon Parameter: The epsilon parameter determines the margin of tolerance around the regression line. Increasing epsilon allows for larger deviations from the regression line without being penalized, resulting in a wider margin and potentially fewer support vectors.

Gamma Parameter: The gamma parameter affects the influence of individual training samples on the decision boundary. A low value of gamma implies a large influence, leading to a smooth decision boundary, while a high value of gamma implies a smaller influence, resulting in a more complex decision boundary that may be prone to overfitting

 Below is the assignment completed using Python's Scikit-learn library:

In [2]:
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import joblib

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Preprocess data (e.g., scaling)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create an instance of SVC with polynomial kernel
svc = SVC(kernel='poly')

# Define hyperparameters for tuning
param_grid = {'C': [0.1, 1, 10], 'gamma': [0.1, 1, 10], 'degree': [2, 3, 4]}

# Perform grid search with cross-validation
grid_search = GridSearchCV(svc, param_grid, cv=5)
grid_search.fit(X_train_scaled, y_train)

# Get the best parameters
best_params = grid_search.best_params_

# Train the tuned classifier on the entire dataset
tuned_svc = SVC(kernel='poly', **best_params)
tuned_svc.fit(X_train_scaled, y_train)

# Save the trained classifier to a file
joblib.dump(tuned_svc, 'tuned_svc_model.pkl')

# Use the tuned classifier to predict labels of the testing data
y_pred = tuned_svc.predict(X_test_scaled)

# Evaluate the performance of the classifier (e.g., accuracy)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.9666666666666667
