Q1. What is the mathematical formula for a linear SVM?
Q2. What is the objective function of a linear SVM?
Q3. What is the kernel trick in SVM?
Q4. What is the role of support vectors in SVM Explain with example
Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in
SVM?
Q6. SVM Implementation through Iris dataset.
~ Load the iris dataset from the scikit-learn library and split it into a training set and a testing setl
~ Train a linear SVM classifier on the training set and predict the labels for the testing setl
~ Compute the accuracy of the model on the testing setl
~ Plot the decision boundaries of the trained model using two of the featuresl
~ Try different values of the regularisation parameter C and see how it affects the performance of
the model.

Bonus task: Implement a linear SVM classifier from scratch using Python and compare its
performance with the scikit-learn implementation.

### Q1. What is the mathematical formula for a linear SVM?
The mathematical formula for a linear Support Vector Machine (SVM) classifier is given by the decision function:

\[ f(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b \]

where:
- \(\mathbf{w}\) is the weight vector,
- \(\mathbf{x}\) is the input feature vector,
- \(b\) is the bias term.

### Q2. What is the objective function of a linear SVM?
The objective function of a linear SVM is to find the optimal hyperplane that maximizes the margin between the two classes. This can be formulated as an optimization problem:

\[ \min_{\mathbf{w}, b} \left( \frac{1}{2} \|\mathbf{w}\|^2 \right) + C \sum_{i=1}^{n} \xi_i \]

subject to the constraints:

\[ y_i (\mathbf{w} \cdot \mathbf{x}_i + b) \geq 1 - \xi_i \]
\[ \xi_i \geq 0 \]

where:
- \(\|\mathbf{w}\|^2\) is the regularization term,
- \(C\) is the regularization parameter,
- \(\xi_i\) are the slack variables that allow for some misclassification in the case of non-linearly separable data,
- \(y_i\) are the class labels (either +1 or -1).

### Q3. What is the kernel trick in SVM?
The kernel trick allows SVMs to efficiently perform classification in a higher-dimensional space without explicitly mapping the data to that space. This is done using a kernel function \( K(\mathbf{x}_i, \mathbf{x}_j) \), which computes the dot product in the higher-dimensional feature space:

\[ K(\mathbf{x}_i, \mathbf{x}_j) = \phi(\mathbf{x}_i) \cdot \phi(\mathbf{x}_j) \]

Common kernel functions include:
- Linear kernel: \( K(\mathbf{x}_i, \mathbf{x}_j) = \mathbf{x}_i \cdot \mathbf{x}_j \)
- Polynomial kernel: \( K(\mathbf{x}_i, \mathbf{x}_j) = (\mathbf{x}_i \cdot \mathbf{x}_j + c)^d \)
- Radial basis function (RBF) kernel: \( K(\mathbf{x}_i, \mathbf{x}_j) = \exp(-\gamma \|\mathbf{x}_i - \mathbf{x}_j\|^2) \)

### Q4. What is the role of support vectors in SVM? Explain with an example.
Support vectors are the data points that lie closest to the decision boundary (or hyperplane) and are most informative for determining the optimal position and orientation of the hyperplane. These points directly affect the margin and are crucial in the training of the SVM.

For example, consider a binary classification problem with two classes of points (red and blue). The support vectors are the points from each class that are closest to the separating hyperplane. The optimal hyperplane is positioned such that it maximizes the margin between the support vectors of the two classes.

### Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin, and Hard margin in SVM.
- **Hyperplane**: The decision boundary that separates different classes. In a 2D space, it is a line; in a 3D space, it is a plane; and in higher dimensions, it is a hyperplane.

- **Marginal Plane**: The planes parallel to the hyperplane that pass through the support vectors. The distance between these planes is the margin.

- **Hard Margin**: When the data is linearly separable, SVM tries to find the hyperplane that perfectly separates the two classes without any misclassification.

- **Soft Margin**: When the data is not perfectly separable, SVM allows for some misclassification by introducing slack variables. The objective is to find a balance between maximizing the margin and minimizing classification errors.

### Q6. SVM Implementation through Iris dataset.
Let's implement a linear SVM classifier using the Iris dataset from the scikit-learn library.

```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from mlxtend.plotting import plot_decision_regions

# Load the iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # Use only the first two features for easy visualization
y = iris.target

# Only consider the first two classes for binary classification
X = X[y != 2]
y = y[y != 2]

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train a linear SVM classifier
svc = SVC(kernel='linear', C=1.0)
svc.fit(X_train, y_train)

# Predict the labels for the testing set
y_pred = svc.predict(X_test)

# Compute the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Plot the decision boundaries
plot_decision_regions(X_test, y_test, clf=svc, legend=2)
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.title('SVM Decision Boundary')
plt.show()
```

### Effect of the Regularization Parameter \( C \)
The regularization parameter \( C \) controls the trade-off between maximizing the margin and minimizing the classification error. A smaller \( C \) encourages a larger margin, possibly at the expense of some misclassifications, while a larger \( C \) aims for a smaller margin but tries to classify all training examples correctly.

```python
for C in [0.01, 0.1, 1, 10, 100]:
    svc = SVC(kernel='linear', C=C)
    svc.fit(X_train, y_train)
    y_pred = svc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy with C={C}: {accuracy:.2f}")
    
    plot_decision_regions(X_test, y_test, clf=svc, legend=2)
    plt.xlabel(iris.feature_names[0])
    plt.ylabel(iris.feature_names[1])
    plt.title(f'SVM Decision Boundary with C={C}')
    plt.show()
```

### Bonus Task: Implement a Linear SVM Classifier from Scratch
Implementing a linear SVM from scratch involves solving the quadratic programming optimization problem. For simplicity, we can use a library like `cvxopt` to handle the optimization.

```python
import numpy as np
from cvxopt import matrix, solvers

# Define a function to implement SVM from scratch
def linear_svm(X, y, C=1.0):
    m, n = X.shape
    K = np.dot(X, X.T)
    P = matrix(np.outer(y, y) * K)
    q = matrix(-np.ones((m, 1)))
    G = matrix(np.vstack((-np.eye(m), np.eye(m))))
    h = matrix(np.hstack((np.zeros(m), np.ones(m) * C)))
    A = matrix(y, (1, m), 'd')
    b = matrix(0.0)

    sol = solvers.qp(P, q, G, h, A, b)
    alphas = np.array(sol['x']).flatten()
    
    # Support vectors have non zero lagrange multipliers
    sv = alphas > 1e-5
    ind = np.arange(len(alphas))[sv]
    alphas = alphas[sv]
    sv_X = X[sv]
    sv_y = y[sv]

    # Calculate weight vector and bias term
    w = np.sum(alphas * sv_y[:, None] * sv_X, axis=0)
    b = np.mean(sv_y - np.dot(sv_X, w))

    return w, b

# Train the SVM classifier from scratch
w, b = linear_svm(X_train, y_train, C=1.0)

# Define a predict function
def predict(X, w, b):
    return np.sign(np.dot(X, w) + b)

# Predict the labels for the testing set
y_pred_scratch = predict(X_test, w, b)

# Compute the accuracy of the scratch implementation
accuracy_scratch = np.mean(y_pred_scratch == y_test)
print(f"Accuracy (from scratch): {accuracy_scratch:.2f}")

# Compare with scikit-learn implementation
accuracy_sklearn = accuracy_score(y_test, svc.predict(X_test))
print(f"Accuracy (scikit-learn): {accuracy_sklearn:.2f}")
```