# Q1

In [None]:
"""
What is the mathematical formula for a linear SVM?
"""

In [None]:
"""
f(x) = sign(w^T x + b)

where:
- f(x) is the predicted class label for input x,
- w is the weight vector,
- b is the bias term,
- x is the input vector.

The sign function returns +1 for positive values and -1 for negative values, effectively classifying the input into one of the two classes.

"""

# Q2

In [None]:
"""
What is the objective function of a linear SVM?
"""

In [None]:
"""
minimize: (1/2) ||w||^2
subject to: y_i(w^T x_i + b) >= 1, for all training samples (x_i, y_i)

where:
- w is the weight vector,
- b is the bias term,
- x_i is the i-th training sample,
- y_i is the corresponding class label (+1 or -1).

The objective is to find the values of w and b that minimize the norm of the weight vector while ensuring that all training samples lie outside a margin of width 1.

"""

# Q3

In [None]:
"""
What is the kernel trick in SVM?
"""

In [None]:
"""
The kernel trick in SVM is a technique that allows SVM to efficiently handle non-linearly separable data without explicitly mapping the data into a higher-dimensional feature space. Instead of computing the transformation explicitly, the kernel trick defines a kernel function that operates directly on the original data.
"""

# Q4

In [None]:
"""
What is the role of support vectors in SVM Explain with example
"""

In [None]:
"""
The support vectors in SVM are the data points from the training set that lie closest to the decision boundary. They are the critical samples that determine the position and orientation of the decision boundary. The support vectors play a crucial role in SVM for two main reasons:

1. Support vectors define the decision boundary: The decision boundary of SVM is determined by the support vectors. The remaining training samples that are not support vectors do not affect the decision boundary.

2. Support vectors are important for generalization: The support vectors are the most influential samples for the SVM model. Their position and orientation relative to the decision boundary are crucial for the model's ability to generalize to unseen data. Removing or changing support vectors can significantly impact the model's performance.

Example: Consider a binary classification problem with two classes, where the classes are not linearly separable. In SVM, only a subset of the training samples, which are the support vectors, will be located near the decision boundary. These support vectors define the margin and contribute to the calculation of the decision boundary. The remaining samples, which are not support vectors, are not as important for the model's decision-making process.
"""

# Q5

In [None]:
"""
Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in SVM?
"""

In [None]:
"""
- Hyperplane: In SVM, a hyperplane is a decision boundary that separates the classes. For a binary classification problem, the hyperplane is a (d-1)-dimensional subspace in the d-dimensional feature space. In a linear SVM, the hyperplane is a linear function defined by the weight vector w and bias term b.

- Marginal plane: The marginal plane refers to the planes that are parallel to the hyperplane and pass through the support vectors. These planes define the margins of the SVM model. The distance between the hyperplane and the marginal planes is known as the margin. The optimal hyperplane is the one that maximizes the margin.

- Soft margin: In SVM, a soft margin allows for some misclassification of the training samples. It introduces a slack variable to allow for a certain degree of error or overlap between the classes. Soft margin SVM is used when the data is not linearly separable, and the objective is to find a decision boundary that achieves a balance between maximizing the margin and allowing misclassification.

- Hard margin: In SVM, a hard margin does not allow any misclassification of the training samples. It assumes that the data is linearly separable, and the objective is to find a decision boundary that perfectly separates the classes. Hard margin SVM is used when the data is linearly separable.
"""

# Q6

In [None]:
"""
SVM Implementation through Iris dataset.
"""

In [1]:

# Import necessary libraries
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an SVM classifier
svm = SVC(kernel='linear')

# Fit the classifier to the training data
svm.fit(X_train, y_train)

# Make predictions on the test data
y_pred = svm.predict(X_test)

# Evaluate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)



Accuracy: 1.0


# Bonus Task

In [None]:
"""
Implement a linear SVM classifier from scratch using Python and compare its performance with the scikit-learn implementation.
"""

In [3]:
import numpy as np

class LinearSVM:
    def __init__(self, learning_rate=0.001, num_epochs=1000, C=1.0):
        self.learning_rate = learning_rate
        self.num_epochs = num_epochs
        self.C = C
        self.weights = None
        self.bias = None
    
    def fit(self, X, y):
        num_samples, num_features = X.shape
        
        # Initialize weights and bias
        self.weights = np.zeros(num_features)
        self.bias = 0
        
        # Training loop
        for _ in range(self.num_epochs):
            # Compute scores and predicted labels
            scores = np.dot(X, self.weights) + self.bias
            y_pred = np.where(scores >= 0, 1, -1)
            
            # Compute hinge loss
            hinge_loss = np.maximum(0, 1 - y * scores)
            
            # Compute gradient
            dW = np.zeros(num_features)
            for idx, loss in enumerate(hinge_loss):
                if loss > 0:
                    dW += self.C * y[idx] * X[idx]
            
            # Update weights and bias
            self.weights -= self.learning_rate * (self.weights - dW)
            self.bias -= self.learning_rate * np.sum(-self.C * y * (hinge_loss > 0))
    
    def predict(self, X):
        scores = np.dot(X, self.weights) + self.bias
        y_pred = np.where(scores >= 0, 1, -1)
        return y_pred


In [4]:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.svm import LinearSVC

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Instantiate and train the scratch implementation
svm_scratch = LinearSVM()
svm_scratch.fit(X_train, y_train)
y_pred_scratch = svm_scratch.predict(X_test)

# Instantiate and train the scikit-learn implementation
svm_sklearn = LinearSVC()
svm_sklearn.fit(X_train, y_train)
y_pred_sklearn = svm_sklearn.predict(X_test)

# Compare the accuracy of both implementations
accuracy_scratch = accuracy_score(y_test, y_pred_scratch)
accuracy_sklearn = accuracy_score(y_test, y_pred_sklearn)

print("Accuracy (Scratch Implementation):", accuracy_scratch)
print("Accuracy (Scikit-learn Implementation):", accuracy_sklearn)



Accuracy (Scratch Implementation): 0.3
Accuracy (Scikit-learn Implementation): 1.0


