### 1.

Polynomial functions and kernel functions are both mathematical tools commonly used in machine learning algorithms, but they serve different purposes.

Polynomial functions are a type of mathematical function that involves powers of a variable raised to non-negative integers. In machine learning, polynomial functions are often used as basis functions to model nonlinear relationships between input features and the output variable. They can be employed in various algorithms, such as polynomial regression or polynomial expansion in support vector machines (SVMs).

Kernel functions, on the other hand, are an integral part of many machine learning algorithms, particularly those involving kernel methods. Kernel methods aim to find a nonlinear decision boundary in a high-dimensional feature space by implicitly mapping the input data into that space using a kernel function. The kernel function measures the similarity or inner product between pairs of data points without explicitly computing the transformation. This avoids the computational burden associated with high-dimensional feature spaces.

While polynomial functions can be used as basis functions in kernel methods, the term "kernel function" typically refers to a specific type of function used in kernel methods, such as the radial basis function (RBF) kernel or the polynomial kernel. These kernel functions define the similarity measure between data points and enable the mapping into a higher-dimensional feature space.

### 2.

To implement an SVM (Support Vector Machine) with a polynomial kernel in Python using Scikit-learn, you can follow these steps:

Step 1: Import the necessary libraries

Step 2: Prepare your data

Step 3: Split the data into training and testing sets

Step 4: Create an SVM model with polynomial kernel

Step 5: Train the SVM model

Step 6: Make predictions

Step 7: Evaluate the model

### 3.

In Support Vector Regression (SVR), the value of epsilon determines the width of the margin around the predicted regression function. It represents the threshold within which errors are considered acceptable and do not contribute to the model's loss function.

When you increase the value of epsilon, you allow for a wider margin around the regression function, meaning that more data points can fall within the margin without penalty. This wider margin allows for more flexibility in fitting the data and can result in a larger number of support vectors.

Support vectors are the data points that lie on the margin or violate the margin (i.e., fall within the margin or have a prediction error larger than epsilon). These points play a crucial role in defining the regression function and determining the model's complexity.

With a larger epsilon, the margin expands, accommodating more data points. Consequently, more points can become support vectors as they either lie within the wider margin or cross the threshold for acceptable error. Therefore, increasing the value of epsilon generally leads to an increase in the number of support vectors in SVR.

### 4.

Support Vector Regression (SVR) is a machine learning algorithm that utilizes support vector machines (SVMs) for regression tasks. SVR involves several important parameters: kernel function, C parameter, epsilon parameter, and gamma parameter. Each parameter has a specific role in determining the performance and behavior of the SVR model.

Kernel Function:
The kernel function determines the type of decision boundary used by the SVR model. It maps the input data into a higher-dimensional feature space, where it becomes easier to find a linear separation between the data points. The choice of kernel function affects the complexity and flexibility of the model. Commonly used kernel functions are:
a. Linear Kernel: This kernel results in a linear decision boundary. It is computationally efficient and suitable when the relationship between features and target is expected to be linear.
b. Polynomial Kernel: This kernel allows for non-linear decision boundaries using polynomial functions. The degree of the polynomial can be specified.
c. Radial Basis Function (RBF) Kernel: This kernel is commonly used and can capture complex non-linear relationships. It is defined by a gamma parameter (explained below).

C Parameter:
The C parameter controls the trade-off between the model's complexity (flexibility) and the extent to which errors are tolerated. It determines the penalty for misclassifying or misfitting training examples. A higher C value enforces a stricter margin and reduces the number of training errors, potentially leading to overfitting. On the other hand, a lower C value allows more errors and provides a larger margin, potentially leading to underfitting. The C parameter should be selected based on the desired balance between model complexity and generalization. If overfitting is observed, reducing C can be beneficial, and vice versa.

Epsilon Parameter:
The epsilon parameter (ε) determines the width of the ε-insensitive tube around the regression line. It represents the allowable deviation from the actual target value for a data point to be considered accurately predicted. Data points within this tube do not contribute to the error term. Larger epsilon values result in a wider tube, allowing more training examples to be considered accurate. Smaller epsilon values lead to a narrower tube, making the model more sensitive to deviations. The choice of epsilon depends on the acceptable level of error tolerance in the regression predictions. If the task requires more precise predictions, a smaller epsilon value should be used.

Gamma Parameter:
The gamma parameter affects the influence of each training example on the decision boundary. It defines the inverse of the standard deviation of the RBF kernel. A small gamma value implies a wider bell-shaped curve for each training example, resulting in a smoother decision boundary. In contrast, a larger gamma value makes each example's influence more localized, leading to a more complex and possibly overfitted decision boundary. Generally, a smaller gamma value is preferred when the dataset has a larger spread or when there are many training examples, while a larger gamma value is suitable for datasets with smaller spread or fewer training examples.

Examples of when you might want to increase or decrease the parameter values:

Kernel Function:

Increase flexibility: Choose a non-linear kernel (e.g., polynomial or RBF) when the relationship between features and target is expected to be non-linear.
Decrease complexity: Use a linear kernel when a linear relationship is anticipated.
C Parameter:

Increase complexity: Use a higher C value when you want to minimize training errors and capture complex patterns in the data.
Decrease complexity: Reduce C if the model is overfitting or if you have limited training data.
Epsilon Parameter:

Increase tolerance: Use a larger epsilon value if the task allows for more deviation in the predicted outputs.
Decrease tolerance: Select a smaller epsilon value for more precise predictions or when strict adherence to the actual target values is required.
Gamma Parameter:

Increase localization: Use a larger gamma value when the decision boundary should be influenced by nearby data points only.
Decrease localization: Choose a smaller gamma value to consider a wider range of training examples for determining the decision boundary.

### 5.

In [2]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [3]:
iris = load_iris()

In [4]:
X = iris.data
y = iris.target

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42)

In [6]:
from sklearn.svm import SVC

In [10]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [13]:
from sklearn.preprocessing import StandardScaler

In [14]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [15]:
svc= SVC(kernel='linear')

In [29]:
svc.fit(X_train_scaled,y_train)

SVC(kernel='linear')

In [30]:
y_pred = svc.predict(X_test)

In [31]:
accuracy = accuracy_score(y_test,y_pred)
precision = precision_score(y_test,y_pred, average='micro')
recall = recall_score(y_test,y_pred, average='micro')
f1 = f1_score(y_test,y_pred, average='micro')

In [32]:
print('Accuracy', accuracy)
print('Precision Score', precision)
print('Recall Score', recall)
print('F1 Score', f1)

Accuracy 0.36666666666666664
Precision Score 0.36666666666666664
Recall Score 0.36666666666666664
F1 Score 0.36666666666666664


In [33]:
from sklearn.model_selection import GridSearchCV

In [34]:
# defining parameter range

param_grid = {'C': [0.1, 1, 10, 100, 1000],
              'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
              'kernel':['linear']
              }

In [35]:
grid=GridSearchCV(SVC(),param_grid=param_grid,refit=True,cv=5,verbose=3)

In [36]:
grid.fit(X_train,y_train)

Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV 1/5] END .....C=0.1, gamma=1, kernel=linear;, score=1.000 total time=   0.0s
[CV 2/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.917 total time=   0.0s
[CV 3/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.833 total time=   0.0s
[CV 4/5] END .....C=0.1, gamma=1, kernel=linear;, score=1.000 total time=   0.0s
[CV 5/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.958 total time=   0.0s
[CV 1/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=1.000 total time=   0.0s
[CV 2/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.917 total time=   0.0s
[CV 3/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.833 total time=   0.0s
[CV 4/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=1.000 total time=   0.0s
[CV 5/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.958 total time=   0.0s
[CV 1/5] END ..C=0.1, gamma=0.01, kernel=linear;, score=1.000 total time=   0.0s
[CV 2/5] END ..C=0.1, gamma=0.01, kernel=linear

GridSearchCV(cv=5, estimator=SVC(),
             param_grid={'C': [0.1, 1, 10, 100, 1000],
                         'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
                         'kernel': ['linear']},
             verbose=3)

In [37]:
grid.best_params_

{'C': 1, 'gamma': 1, 'kernel': 'linear'}

In [38]:
svc_tune = SVC(C= 1, gamma= 1, kernel= 'linear')

In [40]:
svc_tune.fit(X_train_scaled,y_train)

SVC(C=1, gamma=1, kernel='linear')

In [41]:
svc_tune.score(X_test,y_test)

0.36666666666666664

In [42]:
import joblib

joblib.dump(svc_tune, 'trained_classifier.pkl')

['trained_classifier.pkl']