# 1] What is the relationship between polynomial functions and kernel functions in machine learning algorithms?


### Relationship between Polynomial Functions and Kernel Functions:
### => The polynomial kernel function is a specific type of kernel function that computes the similarity between data points using a polynomial function. The polynomial kernel implicitly maps the data into a higher-dimensional feature space, where the decision boundary can be nonlinear. It allows SVMs to capture polynomial relationships between the data points without explicitly computing the polynomial features.

### => In other words, while polynomial functions are a specific type of mathematical function used to introduce non-linearity in the feature space, the polynomial kernel is a specific type of kernel function used in SVMs to implicitly capture polynomial relationships between the data points without explicitly expanding the feature space. Polynomial kernels leverage the advantages of polynomial functions by enabling SVMs to model nonlinear decision boundaries based on the polynomial similarity between data points.

# 2] How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?


In [1]:
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=5, n_redundant=2,n_repeated=1, random_state=100)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=100)

# Create an SVM classifier with a polynomial kernel
classifier = SVC(kernel='poly', degree=3, random_state=100)
# Here, 'poly' specifies the polynomial kernel, and 'degree' sets the degree of the polynomial

# Train the SVM classifier
classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Accuracy: 0.895


# 3] How does increasing the value of epsilon affect the number of support vectors in SVR?


### Here's how the increase in epsilon affects the number of support vectors in SVR:

## 1) Larger Epsilon:
### => When you set a larger value of epsilon, the epsilon-insensitive tube becomes wider. This wider tube allows more data points to fall within the acceptable error range, leading to a larger margin. As a result, more data points can become support vectors as they are closer to or within the margin boundaries.

## 2) More Flexibility: 
### => Increasing epsilon provides more flexibility to the SVR model, allowing it to capture a wider range of data points within the tube. This increased flexibility often leads to a larger number of support vectors as the model adjusts to include more points in the regression estimation.
###  
### => It's important to note that the number of support vectors can vary depending on the specific dataset and the complexity of the underlying relationship. In general, a larger epsilon tends to increase the number of support vectors, but the exact impact may also depend on other factors such as the complexity of the data distribution, regularization parameters, and the chosen kernel function.

# 4] How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?


## 1) Kernel Function:
### => Different kernel functions capture different types of relationships in the data. For example, a linear kernel assumes a linear relationship, while polynomial and radial basis function (RBF) kernels capture non-linear relationships.
### => The choice of the kernel function should align with the underlying data distribution and the complexity of the relationship between the features and the target variable.
### => Selecting an appropriate kernel function is essential to ensure the model can accurately represent the patterns and dynamics in the data.
### Examples: Linear, Polynomial, RBF (Radial Basis Function), Sigmoid, etc.
### Increase: When the data exhibits complex, non-linear relationships, you may want to choose a kernel function that can capture those non-linear patterns. In such cases, increasing the complexity of the kernel function, such as using a Polynomial or RBF kernel, can be beneficial.
### Decrease: If the data exhibits a relatively simple, linear relationship, using a linear kernel function may be sufficient. In this case, increasing the complexity of the kernel function may lead to overfitting or unnecessary computational overhead.

## 2) C Parameter:
### => The C parameter in SVR controls the trade-off between model complexity and error optimization.
### => A smaller C value allows more errors (violations of the epsilon-insensitive tube) but results in a larger margin and potentially a simpler model. This can be useful in the presence of outliers or noisy data.
### => A larger C value enforces stricter error optimization, leading to a narrower margin and potentially more complex models. This can be beneficial when the data is well-behaved and the goal is to minimize errors.
### Examples: Small values (e.g., 0.01, 0.1) for a more flexible model, large values (e.g., 10, 100) for a more focused model.
### Increase: When the data is well-behaved and there is no noise or outliers, increasing the C parameter can lead to stricter error optimization. This results in a narrower margin and potentially more complex models that fit the data closely.
### Decrease: In the presence of outliers or noisy data, reducing the C parameter allows more errors (violations of the epsilon-insensitive tube) and results in a larger margin. This can help to create a more robust and generalizable model by avoiding overfitting.

## 3) Epsilon Parameter:
### => The epsilon parameter determines the width of the epsilon-insensitive tube around the regression line.
### => A larger epsilon allows more points to fall within the acceptable error range and can result in wider tube boundaries. This can make the model more flexible and capture a wider range of data points.
### => Smaller epsilon values result in narrower tube boundaries and stricter error tolerance. This can lead to a more focused model but may be less flexible in capturing diverse patterns.
### Examples: Larger values (e.g., 0.1, 0.5) for more flexibility, smaller values (e.g., 0.01, 0.05) for stricter error tolerance.
### Increase: When you want the SVR model to be more flexible and capture a wider range of data points, increasing the epsilon parameter widens the epsilon-insensitive tube boundaries. This allows more data points to fall within the acceptable error range, resulting in a more flexible model.
### Decrease: If you want the model to be more focused and have stricter error tolerance, reducing the epsilon parameter narrows the tube boundaries. This can lead to a more focused model that pays closer attention to data points within a smaller margin.

## 4) Gamma Parameter:
### => The gamma parameter is specific to kernel functions like RBF and determines the influence of each training example.
### => A larger gamma value makes each training example have a more localized influence, resulting in a more complex decision boundary. This can lead to overfitting if not carefully tuned.
### => Smaller gamma values make each training example have a broader influence, resulting in smoother decision boundaries and more generalization.
### Examples: Larger values (e.g., 0.1, 1.0) for more localized influence, smaller values (e.g., 0.01, 0.001) for broader influence.
### Increase: A higher gamma value gives more localized influence to each training example, resulting in a more complex decision boundary. This can be useful when the data has intricate patterns or sharp transitions between classes.
### Decrease: If you want a smoother decision boundary with broader influence from each training example, reducing the gamma parameter can achieve that. Smaller gamma values can help prevent overfitting and promote better generalization.

# 5] Assignment:
## Import the necessary libraries and load the dataset
## Split the dataset into training and testing sets
## Preprocess the data using any technique of your choice (e.g. scaling, normalization)
## Create an instance of the SVC classifier and train it on the training data
## use the trained classifier to predict the labels of the testing data
## Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy, precision, recall, F1-score)
## Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomizedSearchCV to improve its performance
## Train the tuned classifier on the entire dataset
## Save the trained classifier to a file for future use.

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [16]:
from sklearn.datasets import load_digits

In [17]:
df=load_digits()

In [18]:
X=df.data
y=df.target

In [19]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.19,random_state=29)

In [20]:
from sklearn.svm import SVC

In [21]:
svc=SVC()

In [22]:
svc.fit(X_train,y_train)

In [23]:
y_pred=svc.predict(X_test)

In [24]:
from sklearn.metrics import classification_report,confusion_matrix

In [25]:
print(confusion_matrix(y_pred,y_test))

[[41  0  0  0  0  0  0  0  0  0]
 [ 0 41  0  0  0  0  0  0  1  0]
 [ 0  0 35  0  0  0  0  0  0  0]
 [ 0  0  0 37  0  0  0  0  0  0]
 [ 1  0  0  0 32  0  0  0  0  0]
 [ 0  0  0  1  0 37  0  0  0  0]
 [ 0  0  0  0  0  0 26  0  0  0]
 [ 0  0  0  0  0  0  0 28  0  0]
 [ 0  0  0  0  1  0  0  0 24  1]
 [ 0  0  0  0  0  0  0  0  0 36]]


In [26]:
print(classification_report(y_pred,y_test))

              precision    recall  f1-score   support

           0       0.98      1.00      0.99        41
           1       1.00      0.98      0.99        42
           2       1.00      1.00      1.00        35
           3       0.97      1.00      0.99        37
           4       0.97      0.97      0.97        33
           5       1.00      0.97      0.99        38
           6       1.00      1.00      1.00        26
           7       1.00      1.00      1.00        28
           8       0.96      0.92      0.94        26
           9       0.97      1.00      0.99        36

    accuracy                           0.99       342
   macro avg       0.99      0.98      0.98       342
weighted avg       0.99      0.99      0.99       342



In [36]:
parameter={
    "C":[0.001,0.01,0.1,1,10],
    "kernel":["poly","rbf","sigmoid","linear"],
}

In [37]:
from sklearn.model_selection import GridSearchCV
classifier=SVC()


In [38]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.21,random_state=29)


In [39]:
clf=GridSearchCV(classifier,param_grid=parameter,cv=5,scoring="accuracy")

In [40]:
clf.fit(X_train,y_train)

In [41]:
clf.best_params_

{'C': 10, 'kernel': 'rbf'}

In [42]:
y_pred=clf.predict(X_test)

In [46]:
import pickle as pkl

In [48]:
with open('svc_model.pkl', 'wb') as f:
    pickle.dump(clf, f)