## Question 1

Polynomial functions and kernel functions are both used in machine learning algorithms for classification and regression tasks, but they serve different purposes and are not directly related.

A polynomial function is a function that consists of terms of the form ax^n, where a is a constant, x is the variable, and n is a non-negative integer. Polynomial functions can be used to model complex data relationships, but they are often limited in their ability to capture nonlinear patterns in data.

On the other hand, kernel functions are used in machine learning algorithms to transform data into a higher-dimensional feature space, where it may be easier to find linear or nonlinear patterns in the data. Kernel functions can be used with a variety of machine learning algorithms, including support vector machines (SVMs) and kernelized regression models.

One specific type of kernel function is the polynomial kernel, which is used to transform data into a higher-dimensional feature space using polynomial functions. The polynomial kernel function calculates the dot product between two vectors in the transformed feature space, which is equivalent to computing the polynomial function of the original data. In this way, the polynomial kernel function allows SVMs and other machine learning algorithms to capture nonlinear patterns in the data using polynomial functions.

## Question 2

In [1]:
from sklearn import datasets
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)
poly_kernel_svm = make_pipeline(StandardScaler(),
                                SVC(kernel='poly', degree=3, coef0=1, C=5))
poly_kernel_svm.fit(X_train, y_train)
print("Accuracy on training set: {:.3f}".format(poly_kernel_svm.score(X_train, y_train)))
print("Accuracy on testing set: {:.3f}".format(poly_kernel_svm.score(X_test, y_test)))

Accuracy on training set: 0.886
Accuracy on testing set: 0.689


## Question 3

In Support Vector Regression (SVR), epsilon is a hyperparameter that controls the width of the margin around the regression line. Increasing the value of epsilon leads to a wider margin, which allows more data points to be within the margin and potentially become support vectors.

In general, increasing the value of epsilon can increase the number of support vectors in SVR. This is because a wider margin allows for more data points to be within the margin, and thus more data points can potentially become support vectors.

However, the effect of increasing epsilon on the number of support vectors also depends on the complexity of the data and the distribution of the data points. In some cases, increasing epsilon may have a negligible effect on the number of support vectors, while in other cases, it may lead to a significant increase in the number of support vectors.

It is important to note that having a large number of support vectors can increase the computational complexity of the SVR model, as the model has to evaluate the kernel function for each support vector when making predictions. Therefore, increasing epsilon to include more support vectors can lead to a more complex model and slower training and prediction times.

## Question 4

The performance of Support Vector Regression (SVR) is heavily dependent on the choice of hyperparameters such as the kernel function, C parameter, epsilon parameter, and gamma parameter. In this answer, we will explain each of these parameters and how they affect the performance of SVR, as well as provide examples of when you might want to increase or decrease their value.

1. Kernel function: The kernel function is a mathematical function that maps the input data into a higher-dimensional feature space, where it is easier to separate the data with a linear boundary. The choice of kernel function affects the shape of the decision boundary, which in turn affects the accuracy and complexity of the SVR model. Some commonly used kernel functions in SVR include:

  - Linear kernel: This kernel function is simply a dot product of the input vectors, which corresponds to a linear decision boundary. It is computationally efficient but may not work well if the data is not linearly separable.

  - Polynomial kernel: This kernel function computes the dot product of the input vectors raised to a certain power, which allows for a more complex decision boundary that can handle curved relationships between the features. The degree of the polynomial can be specified as a hyperparameter.

  - Radial Basis Function (RBF) kernel: This kernel function uses a Gaussian function to measure the similarity between two input vectors in a higher-dimensional space. It allows for a very flexible decision boundary that can handle non-linear relationships between the features. The gamma parameter controls the width of the Gaussian function, which affects the smoothness of the decision boundary.

    The choice of kernel function depends on the complexity and non-linearity of the data. If the data is linearly separable, a linear kernel may be sufficient, while if the data has non-linear relationships between the features, a polynomial or RBF kernel may be more appropriate.


2. C parameter: The C parameter controls the trade-off between achieving a low training error and a low testing error. A small value of C will result in a wider margin, which allows more training errors but may generalize better to the testing data. On the other hand, a large value of C will result in a narrower margin, which may overfit to the training data and perform poorly on the testing data.

    The choice of C parameter depends on the complexity and noise level of the data. If the data has a high degree of noise, a small value of C may be appropriate to allow for more training errors and reduce overfitting. However, if the data has a clear signal and a low degree of noise, a larger value of C may be appropriate to achieve a better fit to the training data.


3. Epsilon parameter: The epsilon parameter defines the margin of tolerance around the predicted output for each data point. It determines the width of the tube around the regression line, and data points outside the tube will be considered errors. A larger value of epsilon allows more errors in the training set, which can result in a wider margin and a simpler model. On the other hand, a smaller value of epsilon requires a tighter fit to the training data, which can result in a narrower margin and a more complex model.

    The choice of epsilon parameter depends on the trade-off between simplicity and accuracy. If the data is noisy or contains outliers, a larger value of epsilon may be appropriate to allow for more errors in the training set and reduce the impact of the outliers. However, if the data is clean and has a clear signal, a smaller value of epsilon may be appropriate to achieve a better fit to the training data.


4. Gamma parameter: The gamma parameter controls the shape of the decision boundary in the RBF kernel function. It determines the width of the Gaussian function used to measure the similarity between input vectors, and a


## Question 5

In [6]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report

In [7]:
df = pd.read_csv("candy-data.csv")
df.head()

Unnamed: 0,competitorname,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent
0,100 Grand,1,0,1,0,0,1,0,1,0,0.732,0.86,66.971725
1,3 Musketeers,1,0,0,0,1,0,0,1,0,0.604,0.511,67.602936
2,One dime,0,0,0,0,0,0,0,0,0,0.011,0.116,32.261086
3,One quarter,0,0,0,0,0,0,0,0,0,0.011,0.511,46.116505
4,Air Heads,0,1,0,0,0,0,0,0,0,0.906,0.511,52.341465


In [11]:
df.isnull().sum()

competitorname      0
chocolate           0
fruity              0
caramel             0
peanutyalmondy      0
nougat              0
crispedricewafer    0
hard                0
bar                 0
pluribus            0
sugarpercent        0
pricepercent        0
winpercent          0
dtype: int64

In [8]:
x=df.iloc[:,2:]
y=df["chocolate"]

In [10]:
xtr,xte,ytr,yte=train_test_split(x,y,test_size=0.2,random_state=42)

In [12]:
from sklearn.preprocessing import StandardScaler

In [14]:
scaler=StandardScaler()

In [15]:
xtr=scaler.fit_transform(xtr)
xte=scaler.transform(xte)

In [16]:
clf=SVC()

In [17]:
clf.fit(xtr,ytr)

In [18]:
ypred=clf.predict(xte)

In [19]:
print(confusion_matrix(yte,ypred))
print("-------------------------------------")
print(accuracy_score(yte,ypred))
print("-------------------------------------")
print(classification_report(yte,ypred))

[[9 0]
 [2 6]]
-------------------------------------
0.8823529411764706
-------------------------------------
              precision    recall  f1-score   support

           0       0.82      1.00      0.90         9
           1       1.00      0.75      0.86         8

    accuracy                           0.88        17
   macro avg       0.91      0.88      0.88        17
weighted avg       0.90      0.88      0.88        17



In [20]:
from sklearn.model_selection import GridSearchCV

In [25]:
parameters={"C":[0.001,0.01,0.1,1,10,100],
            "kernel":['linear', 'poly', 'rbf', 'sigmoid'],
            "gamma":['scale', 'auto'],
            "degree":[1,2,3,4,5,6,7,8,9]}

In [32]:
clf1=GridSearchCV(SVC(),param_grid=parameters,refit=True)

In [33]:
clf1.fit(xtr,ytr)

In [34]:
y_pred=clf1.predict(xte)

In [35]:
clf1.best_params_

{'C': 0.01, 'degree': 1, 'gamma': 'scale', 'kernel': 'linear'}

In [36]:
print(confusion_matrix(yte,ypred))
print("-------------------------------------")
print(accuracy_score(yte,ypred))
print("-------------------------------------")
print(classification_report(yte,ypred))

[[9 0]
 [2 6]]
-------------------------------------
0.8823529411764706
-------------------------------------
              precision    recall  f1-score   support

           0       0.82      1.00      0.90         9
           1       1.00      0.75      0.86         8

    accuracy                           0.88        17
   macro avg       0.91      0.88      0.88        17
weighted avg       0.90      0.88      0.88        17

