In [None]:
Q1. What is the mathematical formula for a linear SVM?


The mathematical formula for a linear Support Vector Machine (SVM) can be expressed as follows:

Given a dataset with feature vectors X_i in a D-dimensional space and corresponding binary class labels y_i (+1 or -1 for two-class classification):

The decision function for a linear SVM is defined as:

f(x) = w · x + b

Here,

f(x) is the decision function that predicts the class label for a given input vector x.
w is the weight vector, which determines the direction of the decision boundary.
x is the input feature vector.
b is the bias term, which shifts the decision boundary away from the origin.
The goal of training a linear SVM is to find the optimal values for the weight vector w and the bias term b that maximize the margin between the two classes while minimizing classification errors.

The optimization problem can be expressed as:

Minimize: 1/2 ||w||^2

Subject to: y_i(w · x_i + b) ≥ 1 for all data points (i)

Here, ||w|| represents the Euclidean norm of the weight vector w, and the constraint ensures that all data points are correctly classified and are at least a margin of 1 away from the decision boundary.

This optimization problem can be solved using various techniques, such as the quadratic programming solver, to find the optimal values for w and b.






Q2. What is the objective function of a linear SVM?

The objective function of a linear Support Vector Machine (SVM) is a mathematical expression that defines the optimization problem that the SVM aims to solve. The primary goal of this objective function is to find the optimal values for the weight vector (w) and the bias term (b) in the linear SVM model in a way that maximizes the margin between the two classes while minimizing classification errors. Here is the formal representation of the objective function for a linear SVM:

Minimize: 1/2 ||w||^2

Subject to: y_i(w · x_i + b) ≥ 1 for all data points (i)

In this objective function:

||w|| represents the Euclidean norm (L2 norm) of the weight vector w. It is a measure of the magnitude or length of the weight vector.

The goal is to minimize 1/2 ||w||^2. This term is often referred to as the "regularization term" or the "L2 regularization term." It encourages finding a solution where the weight vector w is as small as possible. This, in turn, helps in achieving a wider margin between the classes, as larger values of w will result in smaller margins.

The subject to constraint specifies that for each data point (x_i), the product of the class label (y_i) and the decision function (w · x_i + b) must be greater than or equal to 1. This constraint ensures that all data points are correctly classified and are at least a margin of 1 away from the decision boundary.




Q3. What is the kernel trick in SVM?

The kernel trick is a fundamental concept in Support Vector Machines (SVMs) that allows SVMs to handle non-linearly separable data by implicitly mapping the input data into a higher-dimensional feature space. It enables linear classifiers, such as linear SVMs, to work effectively in scenarios where a linear decision boundary cannot separate the classes in the original input space.

Here's how the kernel trick works:

Original Input Space: In the original input space, the data may not be linearly separable, meaning a simple straight line or hyperplane cannot effectively separate the different classes.

Mapping to a Higher-Dimensional Space: The kernel trick involves mapping the data points from the original input space to a higher-dimensional feature space using a mathematical function called a "kernel." This mapping is often non-linear and can transform the data into a space where it becomes linearly separable.

The kernel function K(x, x') calculates the inner product (dot product) of two data points x and x' in the higher-dimensional space. Different types of kernel functions can be used, including:

Linear Kernel: K(x, x') = x · x' (no mapping, remains in the original space).
Polynomial Kernel: K(x, x') = (γ * (x · x') + r)^d, where γ, r, and d are hyperparameters.
Radial Basis Function (RBF) Kernel (Gaussian Kernel): K(x, x') = exp(-γ * ||x - x'||^2), where γ is a hyperparameter.
Linear Separation in the Feature Space: In the higher-dimensional feature space, the transformed data may now be linearly separable. This means that a hyperplane can effectively separate the classes, even though it corresponds to a non-linear decision boundary in the original input space.

Solving the Linear SVM: Once the data is transformed, you can use a standard linear SVM to find the optimal hyperplane that separates the classes in the feature space.

Predictions in the Original Space: When making predictions for new data points in the original input space, the kernel trick allows you to apply the learned linear decision boundary from the feature space without explicitly mapping the new data into the feature space. This is achieved through the use of kernel functions, which compute the dot product between the new data point and the support vectors in the feature space.





Q4. What is the role of support vectors in SVM Explain with example


Support vectors play a crucial role in Support Vector Machines (SVMs). They are the data points from the training dataset that are closest to the decision boundary (the hyperplane) and have a non-zero contribution to defining the decision boundary. Support vectors are instrumental in defining the margin, which is the distance between the decision boundary and the nearest support vectors. Their role can be explained using an example:

Example: Binary Classification with Linear SVM

Suppose you have a binary classification problem with two classes, A and B, and your goal is to find a linear decision boundary that separates these classes.

Here's a simplified dataset in a 2D feature space:

                      
    Class A (positive):   O    O      O
Class B (negative):      O    O    O

                      
 In this example, the circles represent data points. Class A is represented by circles with an 'O,' and Class B is represented by circles with an 'X.' The goal is to find a straight line (the decision boundary) that separates the two classes.

Linear SVM Training: When you train a linear SVM on this dataset, it identifies the optimal hyperplane (decision boundary) that maximizes the margin between the classes. The margin is the distance between the decision boundary and the nearest data points.

Support Vectors: The support vectors are the data points that are closest to the decision boundary and directly influence its position. In this case, they are the points on or near the decision boundary.

The support vectors from Class A are the three circles on the left side.
The support vectors from Class B are the three circles on the right side.
These points are crucial because if you move the decision boundary even slightly, it will affect these points. They essentially "support" the position of the decision boundary.

Margin: The margin of the SVM is defined as the distance between the decision boundary and the nearest support vector. In this case, it's the distance between the decision boundary and one of the circles (a support vector) from Class A.

Margin = Distance between decision boundary and support vector
Classification: During the testing or prediction phase, when you encounter a new data point, the SVM determines which side of the decision boundary it falls on. If the data point is closer to one class of support vectors, it's classified as belonging to that class.

The key role of support vectors can be summarized as follows:

Support vectors are the critical data points that define the position and orientation of the decision boundary.
They are the closest data points to the decision boundary and have a non-zero margin.
The margin is maximized by selecting the support vectors as reference points.
The SVM uses support vectors to make predictions for new data points by determining their position relative to the decision boundary.   
                      
                      

Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in
SVM?

                      
To illustrate the concepts of Hyperplane, Marginal Plane, Soft Margin, and Hard Margin in Support Vector Machines (SVM), let's consider a simple two-dimensional binary classification problem with two classes: Class A (represented by circles) and Class B (represented by crosses). We'll use example graphs to visualize these concepts.

1. Hyperplane:

The hyperplane is the decision boundary that separates the two classes in the feature space. In a two-dimensional space, it's a line. The hyperplane is defined by the SVM to maximize the margin between the classes.

Example Graph:

         |        +
     |      /|
     |    /  |
     |  /    |
     |/      |   
     +-------+ (Hyperplane)
     |       |
     |       |
     |       |
     |       |
     |       |
Class A    Class B

                      
In this graph, the hyperplane is the straight line that separates Class A from Class B.

2. Marginal Plane:

The marginal plane refers to the planes parallel to the hyperplane that are at a certain distance from it. These planes help define the margin. The data points closest to these marginal planes are called support vectors.

Example Graph:

        |        +
     |        |
     |        |
     |        |
     |        |
     |        |
     |        |
     |        |
     |       /|   
     |     /  |
     |   /    |
     | /      |   
     +-------+ (Hyperplane)
     |       |
     |       |
Class A    Class B

                      
In this graph, the dashed lines represent the marginal planes parallel to the hyperplane. The data points near these planes (the '+' symbols) are support vectors.

3. Soft Margin:

Soft margin allows for some misclassification in the training data to find a more flexible decision boundary. It introduces a margin of error, allowing some data points to cross the margin boundary or even the hyperplane. This is especially useful when dealing with noisy or overlapping data.

Example Graph:

      |        +
     |      / |
     |    /   |
     |  /     |
     |/       |   
     +-------+ (Hyperplane with Soft Margin)
     |       |
     |       |
     |       |
     |       |
     |       |
     |       |
Class A    Class B

  In this graph, you can see that a few data points from Class A cross the margin boundary but are still correctly classified.

4. Hard Margin:

Hard margin SVM does not allow any misclassification in the training data. It requires a clear separation between the classes, and the margin is maximized with no data points inside or crossing it. Hard margin SVMs are sensitive to outliers and noise.

Example Graph: 
                      
         |        +
     |        |
     |        |
     |        |
     |        |
     |        |
     |        |
     |        |
     |        |   
     |        |
     |        |
     |        |
     +-------+ (Hyperplane with Hard Margin)
     |       |
Class A    Class B

  In this graph, there are no data points inside or crossing the margin boundary, representing a hard margin SVM.   

                      
Q6. SVM Implementation through Iris dataset.

Bonus task: Implement a linear SVM classifier from scratch using Python and compare its
performance with the scikit-learn implementation.
~ Load the iris dataset from the scikit-learn library and split it into a training set and a testing setl
~ Train a linear SVM classifier on the training set and predict the labels for the testing setl
~ Compute the accuracy of the model on the testing setl
~ Plot the decision boundaries of the trained model using two of the featuresl
~ Try different values of the regularisation parameter C and see how it affects the performance of
the model.




import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # Using only the first two features for simplicity
y = iris.target

# Split the dataset into a training set and a testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a linear SVM classifier
svm_classifier = SVC(kernel='linear', C=1.0)

# Train the classifier on the training data
svm_classifier.fit(X_train, y_train)

# Predict the labels for the testing data
y_pred = svm_classifier.predict(X_test)

# Compute the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of scikit-learn SVM: {accuracy * 100:.2f}%")

# Plot the decision boundaries
def plot_decision_boundary(X, y, classifier, ax):
    h = 0.02  # step size in the mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    ax.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
    ax.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, edgecolor='k')
    ax.set_xlim(xx.min(), xx.max())
    ax.set_ylim(yy.min(), yy.max())
    ax.set_xticks(())
    ax.set_yticks(())

# Create subplots
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Plot decision boundaries for the SVM classifier
for i, C in enumerate([1, 100]):
    svc = SVC(kernel='linear', C=C)
    svc.fit(X_train, y_train)
    plot_decision_boundary(X_train, y_train, svc, axes[i])
    axes[i].set_title(f"SVM (C = {C})")

plt.show()


Now, let's implement a simple linear SVM from scratch and compare its performance:

                      
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # Using only the first two features for simplicity
y = iris.target

# Split the dataset into a training set and a testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Implement a simple linear SVM from scratch
class LinearSVM:
    def __init__(self, learning_rate=0.01, num_epochs=1000, C=1.0):
        self.learning_rate = learning_rate
        self.num_epochs = num_epochs
        self.C = C

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.w = np.zeros(n_features)
        self.b = 0

        for epoch in range(self.num_epochs):
            for i in range(n_samples):
                if y[i] * (np.dot(X[i], self.w) - self.b) >= 1:
                    self.w -= self.learning_rate * (2 * self.C * self.w)
                else:
                    self.w -= self.learning_rate * (2 * self.C * self.w - np.dot(X[i], y[i]))
                    self.b -= self.learning_rate * y[i]

    def predict(self, X):
        return np.sign(np.dot(X, self.w) - self.b)

# Create and train the LinearSVM classifier from scratch
svm_scratch = LinearSVM(learning_rate=0.01, num_epochs=1000, C=1.0)
svm_scratch.fit(X_train, y_train)

# Predict the labels for the testing data
y_pred_scratch = svm_scratch.predict(X_test)

# Compute the accuracy of the model
accuracy_scratch = accuracy_score(y_test, y_pred_scratch)
print(f"Accuracy of SVM from scratch: {accuracy_scratch * 100:.2f}%")

                      

