Q1. What is the mathematical formula for a linear SVM?

The mathematical formula for a linear Support Vector Machine (SVM) can be represented as:

    f(x)=w.x+b


where:
- f(x) represents the decision function
- w is the weight vector
- x is the input vector
- b is the bias term

Q2. What is the objective function of a linear SVM?

The objective function of a linear Support Vector Machine (SVM) is to minimize the hinge loss function, which is expressed as:

min w,b (||w||/2)+Ci ∑i=1 to n ζi
w is the weight vector
b is the bias term
C is the regularization parameter
yi is the true class label for the ith training example
ζi is the ith training example vector
n is the number of training examples

Q3. What is the kernel trick in SVM?

The kernel trick in Support Vector Machines (SVM) is a technique that allows SVMs to handle non-linearly separable data by implicitly mapping the input vectors into a higher-dimensional feature space using a kernel function. This higher-dimensional space enables the SVM to find a linear decision boundary that effectively separates the data points. The kernel trick avoids the need to explicitly compute and store the transformed feature vectors, making it computationally efficient for handling complex datasets.

Q4. What is the role of support vectors in SVM Explain with example

Support vectors play a crucial role in Support Vector Machines (SVM) as they are the data points closest to the decision boundary (hyperplane) that determine the placement and orientation of the hyperplane. These points have the potential to influence the position and orientation of the decision boundary.

Let's explain this with an example:

Suppose you have a dataset with two classes, labeled as positive (+1) and negative (-1), and the data points are represented in a two-dimensional space. The goal is to find a linear decision boundary (hyperplane) that effectively separates these two classes.

Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in
SVM?

Certainly! Here's a brief illustration of Hyperplane, Marginal plane, Soft margin, and Hard margin in Support Vector Machines (SVM) using examples and graphs:

1. **Hyperplane**:
   - Example: Consider a binary classification problem with two classes, labeled as positive (+1) and negative (-1). The hyperplane is the decision boundary that separates these classes.
   - Graph: In a two-dimensional feature space, the hyperplane is a straight line. In higher dimensions, it's a hyperplane.

2. **Marginal plane**:
   - Example: The marginal plane is formed by the support vectors, which are the data points closest to the hyperplane.
   - Graph: It's the region that includes the support vectors and has a margin (distance) from the hyperplane.

3. **Soft Margin**:
   - Example: Soft margin SVM allows for some misclassification to handle noisy or overlapping data.
   - Graph: The soft margin allows for data points to fall within the margin or even on the wrong side of the hyperplane if necessary, represented by circles (misclassified points).

4. **Hard Margin**:
   - Example: Hard margin SVM enforces strict classification without allowing any misclassification.
   - Graph: The hard margin does not tolerate any misclassification and requires all data points to be correctly classified, represented by crosses (correctly classified points).

In these illustrations, the green line represents the hyperplane, the dashed lines represent the margins, and the circles (soft margin) or crosses (hard margin) represent the data points. The goal of SVM is to find the optimal hyperplane that maximizes the margin while minimizing classification errors, taking into account whether it's a soft margin or hard margin scenario.

Q6. SVM Implementation through Iris dataset.

In [3]:
from sklearn.datasets import load_iris

In [4]:
data=load_iris()

In [6]:
data

{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
  

In [7]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [15]:
X=pd.DataFrame(data.data,columns=data.feature_names)

In [16]:
y=data.target

In [17]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [18]:
from sklearn.model_selection import train_test_split

In [21]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=42)

In [22]:
X_train.shape,X_test.shape

((112, 4), (38, 4))

In [23]:
from sklearn.svm import SVC

In [25]:
svc=SVC(kernel='linear')

In [26]:
svc.fit(X_train,y_train)

In [27]:
svc.coef_

array([[-0.04631136,  0.52105578, -1.0030165 , -0.46411816],
       [-0.00641373,  0.17867392, -0.5389119 , -0.29158729],
       [ 0.57613513,  1.19215085, -2.03465638, -1.67923323]])

In [28]:
y_pred=svc.predict(X_test)

In [30]:
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report

In [31]:
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_test,y_pred))
print(accuracy_score(y_test,y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      1.00      1.00        11
           2       1.00      1.00      1.00        12

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38

[[15  0  0]
 [ 0 11  0]
 [ 0  0 12]]
1.0


In [48]:
param_grid={
    'C':[0.1,1,10,100,1000],
    'gamma':[1,0.1,0.01,0.001,0.0001]
}

In [47]:
from sklearn.model_selection import GridSearchCV

In [49]:
svclf=GridSearchCV(svc,param_grid=param_grid,refit=True,cv=5,verbose=3)

In [51]:
svclf.fit(X_train,y_train)

Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV 1/5] END ....................C=0.1, gamma=1;, score=1.000 total time=   0.0s
[CV 2/5] END ....................C=0.1, gamma=1;, score=0.957 total time=   0.0s
[CV 3/5] END ....................C=0.1, gamma=1;, score=0.818 total time=   0.0s
[CV 4/5] END ....................C=0.1, gamma=1;, score=1.000 total time=   0.0s
[CV 5/5] END ....................C=0.1, gamma=1;, score=0.955 total time=   0.0s
[CV 1/5] END ..................C=0.1, gamma=0.1;, score=1.000 total time=   0.0s
[CV 2/5] END ..................C=0.1, gamma=0.1;, score=0.957 total time=   0.0s
[CV 3/5] END ..................C=0.1, gamma=0.1;, score=0.818 total time=   0.0s
[CV 4/5] END ..................C=0.1, gamma=0.1;, score=1.000 total time=   0.0s
[CV 5/5] END ..................C=0.1, gamma=0.1;, score=0.955 total time=   0.0s
[CV 1/5] END .................C=0.1, gamma=0.01;, score=1.000 total time=   0.0s
[CV 2/5] END .................C=0.1, gamma=0.01

In [52]:
svclf.best_params_

{'C': 1, 'gamma': 1}

In [53]:
y_pred=svclf.predict(X_test)

In [54]:
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_test,y_pred))
print(accuracy_score(y_test,y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      1.00      1.00        11
           2       1.00      1.00      1.00        12

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38

[[15  0  0]
 [ 0 11  0]
 [ 0  0 12]]
1.0


In [55]:
import numpy as np

class LinearSVM:
    def __init__(self, learning_rate=0.01, epochs=1000, C=1):
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.C = C
        self.weights = None
        self.bias = None
    
    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        # Gradient descent training
        for _ in range(self.epochs):
            for idx, x_i in enumerate(X):
                condition = y[idx] * (np.dot(x_i, self.weights) - self.bias) >= 1
                if condition:
                    self.weights -= self.learning_rate * (2 * self.C * self.weights)
                else:
                    self.weights -= self.learning_rate * (2 * self.C * self.weights - np.dot(x_i, y[idx]))
                    self.bias -= self.learning_rate * y[idx]
    
    def predict(self, X):
        return np.sign(np.dot(X, self.weights) - self.bias)

# Example usage
X_train = np.array([[1, 2], [2, 3], [3, 3], [2, 1], [3, 2]])
y_train = np.array([-1, -1, 1, 1, 1])

# Create and train the linear SVM classifier from scratch
svm_scratch = LinearSVM()
svm_scratch.fit(X_train, y_train)

# Example predictions
X_test = np.array([[4, 5], [1, 1]])
y_pred = svm_scratch.predict(X_test)
print("Predictions:", y_pred)


Predictions: [1. 1.]
