Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?

Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?

Q5. Assignment:

- Import the necessary libraries and load the dataset
- Split the dataset into training and testing sets. 
- Preprocess the data using any technique of your choice (e.g. scaling, normalization)
- Create an instance of the SVC classifier and train it on the training data
- hse the trained classifier to predict the labels of the testing data
- Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy, precision, recall, F1-score)
- Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to improve its performance
- Train the tuned classifier on the entire dataset
- Save the trained classifier to a file for future use.

### Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?


Polynomial functions and kernel functions are closely related in machine learning algorithms, particularly in kernel methods such as Support Vector Machines (SVMs). The relationship lies in the concept of the kernel trick.

In SVMs, the kernel trick is a mathematical technique that allows us to implicitly map the input data into a higher-dimensional feature space without explicitly calculating the transformed features. It avoids the need for explicit feature mapping, which can be computationally expensive for high-dimensional data.

Polynomial functions are one type of kernel function commonly used in SVMs. A polynomial kernel computes the similarity between two data points in the original input space by evaluating the polynomial function of their dot product. The polynomial kernel function has the form:

K(x, y) = (gamma * <x, y> + coef0)^degree

where gamma, coef0, and degree are hyperparameters that control the shape and flexibility of the decision boundary. The dot product <x, y> is calculated in the original input space.

By using the polynomial kernel, the SVM can implicitly perform computations in a higher-dimensional feature space without explicitly mapping the data to that space. This allows SVMs to learn complex nonlinear decision boundaries.

Overall, polynomial functions are a type of kernel function used in SVMs and other kernel-based machine learning algorithms to enable efficient and flexible learning in higher-dimensional feature spaces without explicitly computing the transformed features.

### Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?


In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()

# Split the dataset into features (X) and target variable (y)
X = iris.data
y = iris.target

# Split the data into a training set and a testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an SVM classifier with a polynomial kernel
svm = SVC(kernel='poly', degree=3)  # degree is the degree of the polynomial kernel

# Train the SVM classifier
svm.fit(X_train, y_train)

# Predict labels for the testing set
y_pred = svm.predict(X_test)

# Compute accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Accuracy: 1.0


### Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?


In Support Vector Regression (SVR), the epsilon parameter, often denoted as ε, controls the width of the epsilon-insensitive zone around the predicted values. It determines the tolerance for errors that are considered acceptable.

Increasing the value of epsilon in SVR will typically lead to an increase in the number of support vectors. This is because a larger epsilon allows more training instances to fall within the epsilon-insensitive zone without violating the margin.

When epsilon is small, the SVR model aims to fit the training data more precisely, resulting in a narrower epsilon-insensitive zone. This can lead to a smaller number of support vectors, as the model is more focused on minimizing errors directly on or very close to the data points.

On the other hand, when epsilon is larger, the SVR model allows more training instances to be within the epsilon-insensitive zone, even if they have larger deviations from the predicted values. This broader tolerance for errors can result in a larger number of support vectors, as the model is more lenient and accepts a wider range of instances within the epsilon band.



### Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?


1. Kernel Function:

The kernel function determines the type of mapping used to transform the input data into a higher-dimensional feature space. Different kernel functions capture different types of relationships in the data. Some commonly used kernel functions in SVR are linear, polynomial, Gaussian (RBF), and sigmoid.

- Linear Kernel: Suitable for linear relationships in the data.
- Polynomial Kernel: Captures polynomial relationships between features.
- Gaussian (RBF) Kernel: Useful for capturing complex, non-linear relationships.
- Sigmoid Kernel: Can model non-linear relationships with a sigmoidal shape.

The choice of the kernel function depends on the specific characteristics of the data and the underlying relationship you want to capture. It may require experimentation and cross-validation to determine the most suitable kernel function.

2. C Parameter:

The C parameter controls the trade-off between minimizing the training error and the complexity of the decision function. It represents the regularization parameter in SVR. A larger C value allows for a more complex model with a narrower margin, potentially leading to overfitting. Conversely, a smaller C value enforces a larger margin and a simpler model, which may result in underfitting.

- Increase C: May lead to overfitting but can provide a better fit to the training data.
- Decrease C: Can prevent overfitting, but the model may not capture all the complexities in the data.

3. Epsilon Parameter:

The epsilon parameter, often denoted as ε, defines the width of the epsilon-insensitive zone around the predicted values. It determines the tolerance for errors that are considered acceptable. Instances falling within this zone are not considered errors and do not contribute to the loss function.

- Increase Epsilon: Expands the epsilon-insensitive zone, allowing more training instances to be within the zone, potentially increasing the number of support vectors and increasing model complexity.
- Decrease Epsilon: Narrows the epsilon-insensitive zone, making the model more focused on fitting the training data precisely, potentially reducing the number of support vectors and simplifying the model.

4. Gamma Parameter:

The gamma parameter determines the influence of a single training example and controls the shape of the decision boundary. It defines the reach of each training example in the feature space.

- Increase Gamma: Leads to a more localized and complex decision boundary. It may result in overfitting if the value is too high.
- Decrease Gamma: Expands the influence of each training example, potentially leading to a smoother decision boundary. It can help prevent overfitting, especially when dealing with large datasets.


### Q5. Assignment:

- Import the necessary libraries and load the dataset
- Split the dataset into training and testing sets. 
- Preprocess the data using any technique of your choice (e.g. scaling, normalization)
- Create an instance of the SVC classifier and train it on the training data
- hse the trained classifier to predict the labels of the testing data
- Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy, precision, recall, F1-score)
- Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to improve its performance
- Train the tuned classifier on the entire dataset
- Save the trained classifier to a file for future use.

In [2]:
import pandas as pd
import seaborn as sns 
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [3]:
df = pd.read_csv('/Users/aakanksha/My_Codes/data-science-master-course/data/winequality-red.csv')
df

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.700,0.00,1.9,0.076,11.0,34.0,0.99780,3.51,0.56,9.4,5
1,7.8,0.880,0.00,2.6,0.098,25.0,67.0,0.99680,3.20,0.68,9.8,5
2,7.8,0.760,0.04,2.3,0.092,15.0,54.0,0.99700,3.26,0.65,9.8,5
3,11.2,0.280,0.56,1.9,0.075,17.0,60.0,0.99800,3.16,0.58,9.8,6
4,7.4,0.700,0.00,1.9,0.076,11.0,34.0,0.99780,3.51,0.56,9.4,5
...,...,...,...,...,...,...,...,...,...,...,...,...
1594,6.2,0.600,0.08,2.0,0.090,32.0,44.0,0.99490,3.45,0.58,10.5,5
1595,5.9,0.550,0.10,2.2,0.062,39.0,51.0,0.99512,3.52,0.76,11.2,6
1596,6.3,0.510,0.13,2.3,0.076,29.0,40.0,0.99574,3.42,0.75,11.0,6
1597,5.9,0.645,0.12,2.0,0.075,32.0,44.0,0.99547,3.57,0.71,10.2,5


In [4]:
df.isnull().sum()

fixed acidity           0
volatile acidity        0
citric acid             0
residual sugar          0
chlorides               0
free sulfur dioxide     0
total sulfur dioxide    0
density                 0
pH                      0
sulphates               0
alcohol                 0
quality                 0
dtype: int64

In [5]:
X = df.drop(columns='quality')
y = df['quality']

In [6]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

In [7]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((1199, 11), (400, 11), (1199,), (400,))

In [8]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

# Fit the scaler to the training data and transform it
X_train_scaled = scaler.fit_transform(X_train)

# Transform the testing data
X_test_scaled = scaler.transform(X_test)

X_train_scaled = scaler.fit_transform(X_train)

In [9]:
from sklearn.svm import SVC

svc = SVC(kernel='linear')
svc.fit(X_train_scaled, y_train)

In [10]:
# predication 
y_pred = svc.predict(X_test_scaled)


In [11]:
from sklearn.metrics import accuracy_score, average_precision_score, f1_score, recall_score, precision_score, confusion_matrix, classification_report

acc_score=accuracy_score(y_test,y_pred)
pre_score=precision_score(y_test,y_pred, average='micro')
rec_score=recall_score(y_test,y_pred, average='micro')
F1_score=f1_score(y_test,y_pred, average='micro')
report=classification_report(y_test,y_pred)
print('accuracy score: ',acc_score)
print('precision score: ',pre_score)
print('recall score: ',rec_score)
print('f1 score: ',F1_score)
print('classification report: ',report)


accuracy score:  0.565
precision score:  0.565
recall score:  0.565
f1 score:  0.565
classification report:                precision    recall  f1-score   support

           3       0.00      0.00      0.00         1
           4       0.00      0.00      0.00        13
           5       0.61      0.76      0.68       164
           6       0.52      0.60      0.56       169
           7       0.00      0.00      0.00        48
           8       0.00      0.00      0.00         5

    accuracy                           0.56       400
   macro avg       0.19      0.23      0.21       400
weighted avg       0.47      0.56      0.51       400



In [12]:
from sklearn.model_selection import GridSearchCV

# Tune the hyperparameters using GridSearchCV
#defining parameter range

param_grid = { 'C' :[0.1, 0.5, 1, 5, 10, 50, 100],
              'gamma' : [1,0.1,0.01,0.001,0.0001],
              'kernel' : ['linear']

}

grid_search = GridSearchCV(svc, param_grid, cv=5, verbose=3, refit=True)
grid_search.fit(X_train_scaled, y_train)

# Get the best hyperparameters and train the classifier on the entire dataset
best_classifier = grid_search.best_estimator_
print("best parameters: ", best_classifier)
best_classifier.fit(X_train_scaled, y_train)



Fitting 5 folds for each of 35 candidates, totalling 175 fits
[CV 1/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.542 total time=   0.0s
[CV 2/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.537 total time=   0.0s
[CV 3/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.588 total time=   0.0s
[CV 4/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.592 total time=   0.0s
[CV 5/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.657 total time=   0.0s
[CV 1/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.542 total time=   0.0s
[CV 2/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.537 total time=   0.0s
[CV 3/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.588 total time=   0.0s
[CV 4/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.592 total time=   0.0s
[CV 5/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.657 total time=   0.0s
[CV 1/5] END ..C=0.1, gamma=0.01, kernel=linear;, score=0.542 total time=   0.0s
[CV 2/5] END ..C=0.1, gamma=0.01, kernel=linear

In [13]:
#trainig the entire dataset with best parameters
svc = SVC(kernel='linear', C=100, gamma=1)
svc.fit(X_train_scaled, y_train)

# predication 
y_pred4=grid_search.predict(X_test_scaled)
print(accuracy_score(y_test, y_pred4))

0.5625


In [14]:
import joblib
# Save the trained classifier to a file
joblib.dump(best_classifier, 'svm_classifier.pkl')


['svm_classifier.pkl']