<a href="https://colab.research.google.com/github/sameermdanwer/python-assignment-/blob/main/Support_Vector_Machines_Assignment_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Q1. What is the mathematical formula for a linear SVM?


A linear Support Vector Machine (SVM) is used for binary classification tasks and aims to find the optimal hyperplane that separates two classes in the feature space. The mathematical formulation for a linear SVM involves the following components:

1. **Linear Decision Function**: The decision function for a linear SVM can be expressed as:
[
f(x) = w^T x + b
]
where:

 * ( x ) is the input feature vector.
 * ( w ) is the weight vector (the coefficients that define the orientation of the hyperplane).
 * ( b ) is the bias term (which allows the hyperplane to be shifted).
2. **Classification Rule** : The classification of a sample ( x ) is determined based on the sign of the decision function:
[
y = \begin{cases}
+1 & \text{if } f(x) \geq 0 \
-1 & \text{if } f(x) < 0
\end{cases}
]
where ( y ) is the predicted class label.

3. **Optimization Objective** : The goal of the SVM is to find the optimal ( w ) and ( b ) that maximize the margin between the two classes while also ensuring that the samples from each class are correctly classified. The optimization problem can be formulated as:
[
\min_{w, b} \quad \frac{1}{2} | w |^2 \quad \text{subject to} \quad y_i (w^T x_i + b) \geq 1, \quad \forall i
]
for each sample ( (x_i, y_i) ) in the training set, where ( y_i \in {+1, -1} ).

This formulation maximizes the margin between the two classes while keeping the classification constraints satisfied. The optimization can be solved using methods such as Lagrange multipliers or quadratic programming.

# Q2. What is the objective function of a linear SVM?

The objective function of a linear Support Vector Machine (SVM) is defined to find a hyperplane that best separates the data points of two classes while maximizing the margin between them. The SVM aims to minimize a particular cost function while adhering to certain constraints.

# Objective Function
For a linear SVM, the objective function can be expressed as:

[
\min_{w, b} \quad \frac{1}{2} | w |^2
]

**Components**:

* ( | w |^2 ): This term represents the squared norm of the weight vector ( w ). Minimizing ( | w |^2 ) corresponds to maximizing the margin between the two classes. A smaller ( w ) leads to a larger margin.
* Constraints:
In conjunction with the objective function, the SVM must satisfy the constraints for all training samples ( (x_i, y_i) ):

[
y_i (w^T x_i + b) \geq 1, \quad \forall i
]

where:

  * ( x_i ) is the feature vector of the  
  * ( i )-th training sample.
( y_i \in {+1, -1} ) is the corresponding class label.

**Soft Margin**

In the case of non-linearly separable data, a soft margin formulation is used. The soft-margin SVM introduces slack variables ( \xi_i ) to allow for some misclassification, leading to the following objective function:

[
\min_{w, b, \xi} \quad \frac{1}{2} | w |^2 + C \sum_{i=1}^N \xi_i
]

# Explanation of Components:
* ( C ): A positive regularization parameter that controls the trade-off between maximizing the margin and minimizing the classification error. A larger ( C ) means that more emphasis is put on correctly classifying all training examples (less margin) while a smaller ( C ) allows for a wider margin even at the risk of some misclassifications.
* ( \xi_i ): Slack variable that measures the degree of misclassification for each training sample.

# Q3. What is the kernel trick in SVM?

The kernel trick is a powerful technique used in Support Vector Machines (SVM) and other machine learning algorithms to enable the learning of nonlinear decision boundaries without explicitly transforming the input data into a higher-dimensional space. Here's a detailed explanation of what the kernel trick is and how it works:

**Concept**

1. **Higher-Dimensional Space**: Many datasets are not linearly separable in their original feature space. By projecting the data into a higher-dimensional space, it becomes possible to find a linear separator (hyperplane) that can effectively distinguish between classes.

2. **Feature Transformation**: Traditionally, transforming the data from the original feature space ( \mathcal{X} ) to a higher-dimensional feature space ( \mathcal{Z} ) is done using a mapping function ( \phi: \mathcal{X} \rightarrow \mathcal{Z} ). However, explicitly calculating this transformation can be computationally expensive or infeasible, especially in very high-dimensional spaces.

# The Kernel Trick
Instead of explicitly calculating the coordinates of the data in the higher-dimensional space, the kernel trick uses a kernel function ( K(x_i, x_j) ) to compute the inner product in that transformed space efficiently. This allows SVM to operate in a high-dimensional feature space without the need to explicitly define or calculate the transformation ( \phi ).

Mathematically, if ( K(x_i, x_j) = \phi(x_i) \cdot \phi(x_j) ) represents the inner product in the higher-dimensional space, then the kernel trick allows the SVM optimization problem to be expressed entirely in terms of ( K ).

**Common Kernel Functions**

1. **Linear Kernel**:
[
K(x_i, x_j) = x_i^T x_j
]
(Equivalent to no transformation.)

2. **Polynomial Kernel**:
[
K(x_i, x_j) = (x_i^T x_j + c)^d
]
where ( c ) is a constant and ( d ) is the degree of the polynomial.

3. **Radial Basis Function (RBF) or Gaussian Kernel**:
[
K(x_i, x_j) = \exp\left(-\frac{|x_i - x_j|^2}{2\sigma^2}\right)
]
where ( \sigma ) is a parameter that controls the width of the kernel.

4. **Sigmoid Kerne**:
[
K(x_i, x_j) = \tanh(\alpha x_i^T x_j + c)
]
where ( \alpha ) and ( c ) are kernel parameters.

# Q4. What is the role of support vectors in SVM Explain with example

Support vectors play a crucial role in the functionality of Support Vector Machines (SVM). They are the data points that are closest to the decision boundary (hyperplane) and are critical in defining the position and orientation of that hyperplane. Here's an explanation of their role, along with an illustrative example.

# Role of Support Vectors
1. **Defining the Decision Boundary**: Support vectors are the elements of the training dataset that are used to define the optimal hyperplane. Removing any other data points that are not support vectors does not affect the position of the hyperplane, but removing support vectors would change it.

2. **Margin Calculation**: The SVM algorithm aims to maximize the margin between the support vectors of opposite classes. The distance between the support vectors of both classes and the hyperplane defines the margin. A larger margin typically indicates better performance on unseen data.

3. **Resilience to Noise**: SVMs are designed to be less sensitive to noise. Because the decision boundary is determined by support vectors, the overall model can be more robust against outliers since only a subset of data points influences the boundary.

4. **Impact on Prediction**: Only support vectors are used in the prediction or decision-making process. Non-support vectors do not influence the outcome of the SVM classifier.

**Example**

Let's consider a simple example with a two-dimensional dataset, where we want to classify points into two classes: Class A and Class B.

Dataset:
Imagine we have the following points:

* Class A: (1, 2), (2, 3)
* Class B: (3, 4), (4, 5), (5, 6)
The points can be plotted on a 2D plane. Upon analyzing the plot, we observe that there is a clear linear boundary that separates Class A from Class B.

Visual Representation
The decision boundary (hyperplane) in a 2D space can look like this:

In [None]:
         |
    B    |   B
         |
   ------------------
         |
         |   A
     A   |

# Identification of Support Vectors
1. **Support Vectors**: In this example, let's say the closest points to the decision boundary are:
* Class A point: (2, 3)
* Class B point: (3, 4)
These two points are the support vectors because they are the nearest to the margin. They actually "support" the construction of the hyperplane.

2. **Non-Support Vectors**: The other points, (1, 2) from Class A and (4, 5), (5, 6) from Class B, are not as close to the decision boundary. They do not determine the margin boundaries and thus do not affect the position of the hyperplane.


# Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in
SVM?

Support Vector Machines (SVMs) rely on the concepts of hyperplanes, margins, soft margins, and hard margins to classify data. Here, we'll illustrate these concepts through examples and associated graphs.

# 1. Hyperplane
A hyperplane in SVM is a flat affine subspace that separates different classes in the feature space. In two dimensions, a hyperplane is simply a line.

**Example**:

In [None]:
          |         B
   Class  |           B
          |           B
     A    |___________________
          |         A
          |

In this graph, the line (hyperplane) separates Class A from Class B. The goal of SVM is to position this hyperplane such that it maximizes the margin between the two classes.

# 2. Margin
The margin is the distance between the hyperplane and the nearest data points of each class (support vectors). SVM aims to maximize this margin.

**Graph with Margins:**



In [None]:
          |         B
   Class  |           B
          |           B
     A    |-----|-----|-----|----- (Hyperplane)
          |    - |     |  -
          |         A

In this graph, the dashed lines represent the margins. The distance between these lines and the hyperplane is the margin. The points where the margins touch the dashed lines are the support vectors.

# 3. Hard Margin SVM
Hard margin SVM is a type of SVM that seeks to find a hyperplane that perfectly separates the classes with no misclassifications. This is only possible when the data is linearly separable.

**Examples**:

In [None]:
          |         B
   Class  |           B
          |           B
     A    |--------------| (Hyperplane)
          |    A        |
          |   A         |

In this case, there are no points from Class A that are on the side of Class B and vice versa. The hyperplane perfectly separates Class A and Class B with maximal margin between the two classes.

# 4. Soft Margin SVM
Soft margin SVM allows for some misclassifications, which is useful for datasets that are not perfectly linearly separable. This approach introduces a margin of tolerance for misclassified points, thus allowing some data points to be on the wrong side of the margin.

**Example**:

In [None]:
          |         B
   Class  |           B
          |         B   B
     A    |----|-------|----- (Hyperplane)
          |  A A      |   A
          |  A        |

# Q6. SVM Implementation through Iris dataset.

Bonus task: Implement a linear SVM classifier from scratch using Python and compare its
performance with the scikit-learn implementation.
~ Load the iris dataset from the scikit-learn library and split it into a training set and a testing setl
~ Train a linear SVM classifier on the training set and predict the labels for the testing setl
~ Compute the accuracy of the model on the testing setl
~ Plot the decision boundaries of the trained model using two of the featuresl
~ Try different values of the regularisation parameter C and see how it affects the performance of
the model.


# Implementing Linear SVM Classifier from Scratch
For this task, we will implement a linear SVM classifier from scratch using Python. Then we will compare its performance with the implementation from the scikit-learn library on the Iris dataset.

**Steps to Complete the Task**

Load the Iris Dataset: Use the scikit-learn library.
Split the Dataset: Divide it into training and testing sets.
Implement Linear SVM: Create a linear SVM classifier from scratch.
Train the Classifier: Fit the SVM on the training dataset.
Predict and Evaluate: Make predictions on the test set and calculate accuracy.
Plot Decision Boundaries: Visualize the decision boundary.
Experiment with Regularization Parameter (C): Vary (C) and assess its effect on performance.
Implementation

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # We'll use only the first two features for visualization
y = iris.target

# Convert target variable to binary for binary SVM implementation (Setosa vs. Not-Setosa)
y_binary = (y == 0).astype(int)  # Class 0 vs Class 1 & 2

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y_binary, test_size=0.3, random_state=42)

# Scale the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Implementing Linear SVM from scratch
class LinearSVM:
    def __init__(self, learning_rate=1e-3, reg_strength=1e-3, num_iter=1000):
        self.lr = learning_rate
        self.reg_strength = reg_strength
        self.num_iter = num_iter
        self.W = None

    def fit(self, X, y):
        num_samples, num_features = X.shape
        # Initialize weights
        self.W = np.random.randn(num_features)

        # Training process
        for _ in range(self.num_iter):
            hinge_losses = 1 - y * (X @ self.W)
            hinge_losses[hinge_losses < 0] = 0  # Only positive losses
            loss = (np.sum(hinge_losses) / num_samples) + (self.reg_strength * np.sum(self.W ** 2))

            # Gradient calculation
            indicator = hinge_losses > 0
            dW = -((X.T @ (indicator.astype(int) * y)) / num_samples) + (2 * self.reg_strength * self.W))
            self.W -= self.lr * dW  # Update weights

    def predict(self, X):
        return np.where(X @ self.W >= 0, 1, -1)  # Binary classification

# Train the Linear SVM Classifier
svm = LinearSVM(learning_rate=1e-3, reg_strength=0.1, num_iter=1000)
svm.fit(X_train, y_train)

# Make predictions
y_pred = svm.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of custom Linear SVM: {accuracy:.4f}")

# Scikit-learn SVM model for comparison
from sklearn.svm import SVC

# Train scikit-learn SVM
sklearn_svm = SVC(kernel='linear', C=0.1)
sklearn_svm.fit(X_train, y_train)
y_pred_sklearn = sklearn_svm.predict(X_test)

# Compare accuracy
accuracy_sklearn = accuracy_score(y_test, y_pred_sklearn)
print(f"Accuracy of scikit-learn Linear SVM: {accuracy_sklearn:.4f}")

# Decision boundary plotting function
def plot_decision_boundary(X, y, model):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02), np.arange(y_min, y_max, 0.02))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.coolwarm)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o')
    plt.xlabel("Sepal Length")
    plt.ylabel("Sepal Width")
    plt.title("SVM Decision Boundary (Custom Implementation)")
    plt.show()

# Plot decision boundary for the custom SVM
plot_decision_boundary(X_test, y_test, svm)

# Plot decision boundary for scikit-learn SVM
plot_decision_boundary(X_test, y_test, sklearn_svm)

SyntaxError: unmatched ')' (<ipython-input-1-24c4b9b31300>, line 45)

# Experimenting with Regularization Parameter (C)
To experiment with different values of the regularization parameter (C) in the scikit-learn SVM, you can modify the C parameter in the instantiation of SVC and rerun the training process and evaluation. Then observe how the accuracy and decision boundary change.

You can adjust the C value in the code, like so:

In [None]:
sklearn_svm = SVC(kernel='linear', C=your_value_here)