# Support Vector Machines Assignment - 2

Q1. What is the relationship between polynomial functions and kernel functions in machine learning
algorithms?

Polynomial functions and kernel functions in machine learning, particularly in Support Vector Machines (SVMs), are related through the use of kernel trick. The kernel trick allows SVMs to operate in a higher-dimensional space without explicitly computing the coordinates of points in that space, which can be computationally expensive. Polynomial kernel functions are a type of kernel function used in SVMs that compute the inner product of two vectors in a higher-dimensional space, often resulting in non-linear decision boundaries. Therefore, polynomial functions are a specific type of kernel function used to transform data in SVMs, enabling them to effectively classify non-linearly separable datasets.

Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

In [1]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report

X, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0, random_state=42)
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=42)
svm_classifier = SVC(kernel='poly', degree=3, gamma='auto')
svm_classifier.fit(X_train,y_train)
y_pred=svm_classifier.predict(X_test)
print(accuracy_score(y_test,y_pred))
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

0.864
[[ 96  25]
 [  9 120]]
              precision    recall  f1-score   support

           0       0.91      0.79      0.85       121
           1       0.83      0.93      0.88       129

    accuracy                           0.86       250
   macro avg       0.87      0.86      0.86       250
weighted avg       0.87      0.86      0.86       250



Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

In Support Vector Regression (SVR), epsilon (ε) is a hyperparameter that controls the margin of tolerance within which no penalty is given to errors. This margin is called the epsilon-insensitive tube. When you increase the value of epsilon:

1. **Wider Epsilon-Insensitive Tube**: A larger epsilon leads to a wider epsilon-insensitive tube around the regression line. This means that SVR allows more training points to be within this margin without any penalty.

2. **More Support Vectors**: As the epsilon-insensitive tube widens, more training points may fall within this tube. These points are called support vectors because they influence the position and orientation of the regression line. Therefore, increasing epsilon can lead to an increase in the number of support vectors.

3. **Smoother Regression Function**: With a wider epsilon, SVR becomes more tolerant of errors, which can result in a smoother regression function. However, this also increases the risk of overfitting if epsilon is set too high, as the model may capture noise or outliers in the data.

In summary, increasing the value of epsilon in SVR generally results in a wider margin of tolerance for errors, more support vectors, and potentially a smoother regression function. However, it's crucial to strike a balance with epsilon to prevent overfitting and ensure that the model generalizes well to unseen data.

Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter
affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works
and provide examples of when you might want to increase or decrease its value?

The performance of Support Vector Regression (SVR) is heavily influenced by several key parameters: the choice of kernel function, the C parameter, the epsilon parameter (ε), and the gamma parameter (γ). Let's discuss each parameter and how it affects SVR performance, along with examples of when you might want to adjust its value:

1. **Choice of Kernel Function**:
   - **Linear Kernel (kernel='linear')**: Suitable for linearly separable data and generally faster to compute. It's a good choice when the relationship between features and target is approximately linear.
   - **RBF Kernel (kernel='rbf')**: Radial Basis Function (RBF) kernel is versatile and can capture non-linear relationships effectively. It's suitable when the relationship between features and target is complex and non-linear.
   - **Polynomial Kernel (kernel='poly')**: Useful for capturing polynomial relationships between features and target. The 'degree' parameter controls the degree of the polynomial.

   Example:
   - If your data has a non-linear relationship, choosing an RBF or polynomial kernel might improve performance compared to a linear kernel.

2. **C Parameter**:
   - The C parameter controls the trade-off between the model's simplicity (smoother decision boundary) and its ability to fit the training data (avoiding errors). A smaller C value gives more tolerance to errors, leading to a smoother decision boundary, while a larger C value emphasizes fitting the training data more closely.
   
   Example:
   - Use a smaller C value if you suspect your data contains noise or outliers that you want the model to ignore.
   - Use a larger C value when you have confidence in the training data and want the model to fit it closely.

3. **Epsilon Parameter (ε)**:
   - Epsilon (ε) is the margin of tolerance within which no penalty is given to errors. It defines the size of the epsilon-insensitive tube around the regression line. A larger epsilon allows more training points to be within this margin without penalty.
   
   Example:
   - Increase epsilon if you want the model to be more tolerant of errors, leading to a smoother regression function.

4. **Gamma Parameter (γ)**:
   - In the RBF, polynomial, and sigmoid kernels, gamma (γ) defines how far the influence of a single training example reaches, with low values meaning 'far' and high values meaning 'close'. A small gamma value results in a smoother decision boundary, while a large gamma value leads to a more complex decision boundary and can cause overfitting.
   
   Example:
   - Use a smaller gamma value if you have a lot of training data or want a smoother decision boundary.
   - Use a larger gamma value if you have a smaller dataset or suspect the relationship between features and target is complex and localized.

In summary, the choice of kernel function, along with tuning the C parameter, epsilon parameter, and gamma parameter, plays a crucial role in determining the performance of Support Vector Regression. Understanding how each parameter works and adjusting its value based on the characteristics of your data and the desired model complexity can lead to better regression results. Cross-validation and hyperparameter tuning techniques can help find the optimal values for these parameters.

#### Q5. Assignment:
 Import the necessary libraries and load the dataset
 
 Split the dataset into training and testing sets
 
 Preprocess the data using any technique of your choice (e.g. scaling, normalization)
 
 Create an instance of the SVC classifier and train it on the training data
 
 Use the trained classifier to predict the labels of the testing data
 
 Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy,precision, recall, F1-score)
 
 Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to improve its performance
 
 Train the tuned classifier on the entire dataset
 
 Save the trained classifier to a file for future use.

In [2]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [3]:
from sklearn.datasets import load_iris

In [4]:
data=load_iris()

In [7]:
X=pd.DataFrame(data.data,columns=data.feature_names)

In [8]:
X.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [9]:
y=data.target

In [10]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [11]:
from sklearn.model_selection import train_test_split

In [12]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.20,random_state=42)

In [13]:
from sklearn.preprocessing import StandardScaler

In [14]:
scaler=StandardScaler()

In [15]:
X_train_scaled=scaler.fit_transform(X_train)
X_test_scaled=scaler.transform(X_test)

In [16]:
from sklearn.svm import SVC

In [17]:
svc=SVC(kernel='linear')

In [18]:
svc.fit(X_train,y_train)

In [19]:
svc.coef_

array([[-0.04631136,  0.52105578, -1.0030165 , -0.46411816],
       [-0.00641373,  0.17867392, -0.5389119 , -0.29158729],
       [ 0.56766907,  1.21519237, -2.03626115, -1.70330734]])

In [20]:
y_pred=svc.predict(X_test)

In [21]:
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report

In [22]:
print(accuracy_score(y_test,y_pred))
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

1.0
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



In [23]:
from sklearn.model_selection import GridSearchCV

In [24]:
param_grid={
    'C': [0.1, 1, 10, 100],
    'gamma': [0.1, 0.01, 0.001], 
    'kernel': ['linear', 'rbf', 'poly']
}

In [25]:
svc_clf=GridSearchCV(svc,param_grid=param_grid,cv=5)

In [26]:
svc_clf.fit(X_train,y_train)

In [27]:
svc_clf.best_params_

{'C': 0.1, 'gamma': 0.1, 'kernel': 'poly'}

In [28]:
y_pred=svc_clf.predict(X_test)

In [29]:
print(accuracy_score(y_test,y_pred))
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

1.0
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



In [30]:
import pickle

In [31]:
with open('SVC_classifier.pkl','wb') as file:
    pickle.dump(svc_clf,file)