# Q1. What is the mathematical formula for a linear SVM?

## Ans. :

A linear Support Vector Machine (SVM) is a type of binary classification algorithm that tries to find a linear decision boundary that separates two classes in a dataset.

The mathematical formula for a linear SVM can be expressed as:

__y(x) = w^T x + b__

where:

* y(x) is the predicted class label for a new input vector x
* w is a weight vector that determines the orientation of the decision boundary
* b is a bias term that determines the location of the decision boundary
* T is the transpose operator

The goal of the linear SVM is to find the optimal values of w and b that maximize the margin, which is the distance between the decision boundary and the closest data points from each class. This is typically done by solving a constrained optimization problem that involves minimizing the norm of w subject to the constraint that all training data points lie on the correct side of the decision boundary. This optimization problem can be solved using a variety of techniques, such as quadratic programming or gradient descent.

# Q2. What is the objective function of a linear SVM?

## Ans. :

The objective function of a linear Support Vector Machine (SVM) is to find the hyperplane that maximizes the margin between the two classes in a binary classification problem. The margin is defined as the distance between the hyperplane and the closest points from each class, also known as support vectors.

Mathematically, the objective function of a linear SVM is expressed as:

minimize: __1/2 * ||w||^2__

subject to:

__yi(w^T xi + b) >= 1, for i = 1,2,...,n__

where:

* w is the weight vector that determines the orientation of the hyperplane
* b is the bias term that determines the location of the hyperplane
* xi is the i-th training sample
* yi is its corresponding label (+1 or -1)
* n is the number of training samples
* ||w|| is the L2-norm of the weight vector

The first term in the objective function (1/2 * ||w||^2) is the regularization term, which penalizes large values of the weight vector to prevent overfitting. The second term represents the margin constraint, which ensures that all training samples are classified correctly and lie on the correct side of the hyperplane.

The objective function is typically solved using optimization techniques such as quadratic programming or gradient descent, which find the values of w and b that minimize the objective function while satisfying the margin constraint.

# Q3. What is the kernel trick in SVM?

## Ans. :

The kernel trick is a technique used in Support Vector Machines (SVMs) to transform non-linearly separable input data into a higher-dimensional feature space, where it may become linearly separable.

In SVMs, the decision boundary is defined as a hyperplane in the feature space. The kernel trick allows us to implicitly map the input data into a higher-dimensional space, without actually computing the transformation explicitly. Instead of computing the dot product between the input vectors directly, the kernel function calculates the dot product between the vectors in the transformed feature space.

The kernel function is defined as:

K(xi, xj) = φ(xi) . φ(xj)

where K is the kernel function, xi and xj are input vectors, and φ is a non-linear mapping function that maps the input vectors into the higher-dimensional feature space.

There are several types of kernel functions that can be used in SVMs, including linear, polynomial, Gaussian (also known as radial basis function or RBF), and sigmoid kernels. These kernels have different properties and are suitable for different types of data.

The kernel trick is computationally efficient because it avoids the need to explicitly compute the transformation into the higher-dimensional feature space. Instead, it calculates the dot product between the vectors in the feature space, which can be done efficiently using kernel functions. This makes SVMs with the kernel trick computationally efficient for non-linear classification problems, where explicit computation of the transformation would be too expensive.

# Q4. What is the role of support vectors in SVM Explain with example.

## Ans. :

Support vectors play a crucial role in the support vector machine (SVM) algorithm, which is a popular supervised learning algorithm used for classification and regression tasks.

In SVM, the objective is to find the hyperplane that best separates the data points into different classes. The hyperplane is defined as the decision boundary that maximizes the margin between the two classes. The margin is the distance between the hyperplane and the closest data points of each class, and the points that lie on the margin are called support vectors.

Support vectors are important because they determine the position and orientation of the hyperplane. Only the support vectors are used to define the hyperplane, while the other data points that are farther away from the hyperplane are not used. This is because the SVM algorithm is designed to find the optimal hyperplane that generalizes well to new, unseen data. By focusing only on the support vectors, SVM can avoid overfitting and achieve better generalization performance.

Let's take a simple example to illustrate the role of support vectors in SVM. Suppose we have a two-dimensional dataset with two classes, red and blue, Our goal is to find the decision boundary that separates the two classes. The SVM algorithm searches for the hyperplane that maximizes the margin between the two classes, as shown in the figure below:

![SVM_data.png](attachment:SVM_data.png)

The red and blue circles represent the support vectors, which lie on the margin. These points are crucial for defining the hyperplane because they determine its position and orientation. The other data points that are farther away from the hyperplane are not used.

Once the hyperplane is found, we can use it to classify new, unseen data points based on which side of the hyperplane they lie on. For example, a new data point that lies on the right side of the

# Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in SVM?

## Ans. :

### 1. Hyperplane:
In SVM, the hyperplane is the decision boundary that separates the data points into different classes. In a binary classification problem, the hyperplane is a line that divides the feature space into two regions, one for each class. The goal of SVM is to find the hyperplane that maximizes the margin between the two classes.

### 2. Marginal plane:
In SVM, the marginal plane is a plane that is parallel to the hyperplane and touches the support vectors on both sides. The distance between the hyperplane and the marginal plane is called the margin, and the goal of SVM is to maximize this margin. The marginal plane defines the boundary beyond which the SVM does not penalize errors.

### 3. Soft margin:
In some cases, the data points may not be linearly separable, and it may not be possible to find a hyperplane that perfectly separates the two classes. In such cases, we can use a soft margin SVM, which allows for some misclassification errors by allowing some data points to be on the wrong side of the hyperplane. The degree of misclassification is controlled by a parameter called the C parameter, which penalizes misclassification errors.

### 4. Hard margin:
In contrast to soft margin SVM, a hard margin SVM requires that all data points be correctly classified and that there is a clear separation between the two classes. A hard margin SVM is only appropriate when the data is linearly separable, and there are no misclassification errors.

In summary, the hyperplane is the decision boundary that separates the data points into different classes, the marginal plane is a plane that is parallel to the hyperplane and touches the support vectors on both sides, the soft margin SVM allows for some misclassification errors, and the hard margin SVM requires that all data points be correctly classified and that there is a clear separation between the two classes.

# Q6. SVM Implementation through Iris dataset.

* __Load the iris dataset from the scikit-learn library and split it into a training set and a testing set__
* __Train a linear SVM classifier on the training set and predict the labels for the testing set__
* __Compute the accuracy of the model on the testing set__
* __Plot the decision boundaries of the trained model using two of the features__
* __Try different values of the regularisation parameter C and see how it affects the performance of the model.__


__Bonus task:__ Implement a linear SVM classifier from scratch using Python and compare its
performance with the scikit-learn implementation.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from warnings import filterwarnings
filterwarnings('ignore')

In [2]:
# Load the iris dataset from the scikit-learn library and split it into a training set and a testing set

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# load iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the dataset into a training set and a testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [3]:
# Train a linear SVM classifier on the training set and predict the labels for the testing set

from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score

# Train a linear SVM classifier on the training set
svm = LinearSVC(random_state=42)
svm.fit(X_train, y_train)

# Predict the labels for the testing set
y_pred = svm.predict(X_test)

print(y_pred)

[1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
 0 0 0 2 1 1 0 0]


In [4]:
# Compute the accuracy of the model on the testing set
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 1.0


In [None]:
# Plot the decision boundaries of the trained model using two of the features.

# Create a meshgrid of feature values
x_min, x_max = X[:, 2].min() - 1, X[:, 2].max() + 1
y_min, y_max = X[:, 3].min() - 1, X[:, 3].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01))

# Make predictions on the meshgrid
Z = svm.predict(np.c_[xx.ravel(), yy.ravel()])

# Reshape the predictions and plot the decision boundaries
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
plt.scatter(X[:, 2], X[:, 3], c=y, cmap=plt.cm.Paired)
plt.xlabel('Petal length')
plt.ylabel('Petal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.show()

### To try different values of the regularisation parameter C and see how it affects the performance of the model, you can create a loop that trains and tests the SVM with different values of C:

In [7]:
from sklearn.svm import SVC

for c in [0.1, 1, 10, 100]:
    svm = SVC(kernel='linear', C=c)
    svm.fit(X_train, y_train)
    y_pred = svm.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"C={c}, accuracy={accuracy}")

C=0.1, accuracy=1.0
C=1, accuracy=1.0
C=10, accuracy=0.9777777777777777
C=100, accuracy=1.0
