In [None]:
#q1

In [None]:
Polynomial functions and kernel functions are both types of mathematical functions that can be used in machine learning algorithms, particularly in support vector machines (SVMs) and kernel-based methods.

Polynomial functions are a type of function that can be used to transform the input features of a dataset into a higher-dimensional space. 
This can be useful for creating decision boundaries that are not linear, as the polynomial features can help capture nonlinear relationships between the input features and the target variable. 

Kernel functions, on the other hand, are used to measure the similarity between pairs of data points.
Kernel functions operate on the original feature space, but can also be used to implicitly transform the data into a higher-dimensional feature space.
This can be useful for capturing complex relationships between the input features and the target variable without explicitly computing the high-dimensional feature space.

Some types of kernel functions, such as the polynomial kernel, are based on polynomial functions and can be used to implicitly transform the input features into a higher-dimensional space. 
In this way, kernel functions and polynomial functions are related in that they both involve transforming data into a higher-dimensional space, but kernel functions do so implicitly and are often more computationally efficient.

In [None]:
#q2

In [2]:
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Generate some random data to work with
X, y = make_classification(n_samples=2000, n_features=10, n_informative=5, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an instance of the SVM classifier with a polynomial kernel
svm_poly = SVC(kernel='poly', degree=3, gamma='scale', C=1.0)

# Train the SVM classifier on the training data
svm_poly.fit(X_train, y_train)

# Test the SVM classifier on the testing data and evaluate its performance
score = svm_poly.score(X_test, y_test)
print("Accuracy:", score)


Accuracy: 0.855


In [None]:
#q3

In [None]:
In Support Vector Regression (SVR), the epsilon parameter controls the width of the epsilon-insensitive zone around the regression line.
An epsilon value of 0 indicates that there is no margin and any data point that falls on or outside the regression line will be considered an error. 
As the value of epsilon increases, the margin around the regression line also increases, allowing more data points to be within the margin and considered correctly predicted.

When the value of epsilon is increased, the number of support vectors generally increases. 
This is because more data points are now considered within the margin, which requires more support vectors to define the regression line.
the exact relationship between epsilon and the number of support vectors depends on the specific dataset and the other hyperparameters of the SVR algorithm. 
In some cases, increasing epsilon may even decrease the number of support vectors if the new margin includes a large number of correctly predicted data points.

In [None]:
#q4

In [None]:
The performance of Support Vector Regression (SVR) is highly dependent on the choice of kernel function, C parameter, epsilon parameter, and gamma parameter. Here is how each parameter works and how changing its value can affect the performance of SVR:

1. Kernel Function: The kernel function maps the original data to a higher-dimensional feature space where the problem becomes more easily separable. The choice of kernel function can have a significant impact on the performance of SVR. Popular kernel functions include linear, polynomial, and radial basis function (RBF). Linear kernel is suitable for linearly separable datasets while polynomial kernel and RBF kernel are used for non-linearly separable datasets.

2. C Parameter: The C parameter controls the trade-off between achieving a low training error and a low testing error by controlling the degree of misclassification that is tolerated in the training data. A low value of C creates a wider margin around the decision boundary, allowing more misclassifications in the training data but resulting in a simpler and more generalized model. Conversely, a high value of C creates a narrower margin and tolerates fewer misclassifications in the training data, resulting in a more complex model that may overfit the training data. Therefore, the choice of C parameter is a balance between underfitting and overfitting.

3. Epsilon Parameter: The epsilon parameter controls the width of the epsilon-insensitive zone around the regression line in SVR. The epsilon-insensitive zone allows some errors in the prediction, as long as they are within the zone. A larger value of epsilon creates a wider zone, allowing more errors in the prediction but resulting in a simpler and more generalized model. Conversely, a smaller value of epsilon creates a narrower zone, allowing fewer errors in the prediction, resulting in a more complex model that may overfit the training data.

4. Gamma Parameter: The gamma parameter is specific to the RBF kernel function and controls the smoothness of the decision boundary. A low value of gamma creates a smoother decision boundary, resulting in a more generalized model. Conversely, a high value of gamma creates a more complex decision boundary that may overfit the training data. Therefore, the choice of gamma parameter is a balance between underfitting and overfitting.

Here are some examples of when you might want to increase or decrease each parameter:

1. Kernel Function: If the dataset is linearly separable, a linear kernel should be used. For non-linearly separable datasets, a polynomial or RBF kernel can be used depending on the complexity of the dataset.

2. C Parameter: If the model is underfitting, the value of C should be decreased to create a wider margin and allow more misclassifications in the training data. If the model is overfitting, the value of C should be increased to create a narrower margin and tolerate fewer misclassifications in the training data.

3. Epsilon Parameter: If the model is underfitting, the value of epsilon should be decreased to create a narrower epsilon-insensitive zone and tolerate fewer errors in the prediction. If the model is overfitting, the value of epsilon should be increased to create a wider epsilon-insensitive zone and allow more errors in the prediction.

4. Gamma Parameter: If the model is underfitting, the value of gamma should be decreased to create a smoother decision boundary. If the model is overfitting, the value of gamma should be increased to create a more complex decision boundary that better fits the training data. However, too high gamma can lead to overfitting, so the choice of gamma should be carefully balanced.

In [3]:
#q5

In [7]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
import pickle
import seaborn as sns
# Load the dataset
df = sns.load_dataset('iris')

# Split dataset into features and labels
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=45)

# Preprocess the data using StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create an instance of the SVC classifier and train it on the training data
clf = SVC()
clf.fit(X_train, y_train)

# Predict the labels of the testing data
y_pred = clf.predict(X_test)

# Evaluate the performance of the classifier using accuracy score
acc_score = accuracy_score(y_test, y_pred)
print("Accuracy Score:", acc_score)

# Tune the hyperparameters of the SVC classifier using GridSearchCV
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'poly', 'rbf'], 'degree': [2, 3, 4]}
grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=3)
grid.fit(X_train, y_train)

# Print the best parameters and the best score
print("Best Parameters:", grid.best_params_)
print("Best Score:", grid.best_score_)

# Train the tuned classifier on the entire dataset
clf_tuned = grid.best_estimator_
clf_tuned.fit(X, y)

# Save the trained classifier to a file for future use
filename = 'svm_classifier.pkl'
pickle.dump(clf_tuned, open(filename, 'wb'))


Accuracy Score: 0.9733333333333334
Fitting 5 folds for each of 27 candidates, totalling 135 fits
[CV 1/5] END ....C=0.1, degree=2, kernel=linear;, score=1.000 total time=   0.0s
[CV 2/5] END ....C=0.1, degree=2, kernel=linear;, score=0.933 total time=   0.0s
[CV 3/5] END ....C=0.1, degree=2, kernel=linear;, score=0.933 total time=   0.0s
[CV 4/5] END ....C=0.1, degree=2, kernel=linear;, score=0.867 total time=   0.0s
[CV 5/5] END ....C=0.1, degree=2, kernel=linear;, score=1.000 total time=   0.0s
[CV 1/5] END ......C=0.1, degree=2, kernel=poly;, score=0.800 total time=   0.0s
[CV 2/5] END ......C=0.1, degree=2, kernel=poly;, score=0.800 total time=   0.0s
[CV 3/5] END ......C=0.1, degree=2, kernel=poly;, score=0.467 total time=   0.0s
[CV 4/5] END ......C=0.1, degree=2, kernel=poly;, score=0.600 total time=   0.0s
[CV 5/5] END ......C=0.1, degree=2, kernel=poly;, score=0.733 total time=   0.0s
[CV 1/5] END .......C=0.1, degree=2, kernel=rbf;, score=0.867 total time=   0.0s
[CV 2/5] END