# SVM Kernel Comparison Project

This project evaluates and compares the performance of Support Vector Machine (SVM) classifiers using different kernel functions on a synthetic dataset. The goal is to understand how various kernels influence classification boundaries and accuracy.

## Dataset Overview

A synthetic dataset with two input features (`X1` and `X2`) and a binary class label was utilized. This allows clear visualization of how SVM decision boundaries change across kernel types.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification

## Data Generation and Preprocessing

The dataset was generated using `make_classification`, followed by splitting into training and test sets.

In [None]:
X, y = make_classification(n_samples=300, n_features=2, n_informative=2, n_redundant=0,
                           n_clusters_per_class=1, flip_y=0.1, class_sep=1.0, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [None]:
plt.figure(figsize=(8, 6))
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=ListedColormap(['red', 'blue']), edgecolor='k')
plt.xlabel('X1')
plt.ylabel('X2')
plt.title('Training Data')
plt.grid(True)
plt.show()

## SVM Model Training and Comparison

Four different SVM kernels were used: linear, polynomial, RBF, and sigmoid. Each model was trained and evaluated on the test set.

In [None]:
kernels = ['linear', 'poly', 'rbf', 'sigmoid']
models = {}

for kernel in kernels:
    model = SVC(kernel=kernel, gamma='auto')
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    models[kernel] = (model, acc)
    print(f"{kernel.capitalize()} Kernel Accuracy: {acc:.2f}")

## Visualization of Decision Boundaries

The following plots illustrate the decision boundaries for each kernel.

In [None]:
def plot_decision_boundary(model, X, y, title):
    h = .02
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.figure(figsize=(8, 6))
    plt.contourf(xx, yy, Z, cmap=ListedColormap(['#FFAAAA', '#AAAAFF']), alpha=0.8)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=ListedColormap(['red', 'blue']), edgecolor='k')
    plt.title(title)
    plt.xlabel('X1')
    plt.ylabel('X2')
    plt.grid(True)
    plt.show()

In [None]:
for kernel in kernels:
    model, _ = models[kernel]
    plot_decision_boundary(model, X_train, y_train, f"SVM with {kernel.capitalize()} Kernel")

## Conclusion

This project highlights the importance of kernel selection in Support Vector Machines. While linear kernels produce simple decision boundaries, nonlinear kernels like RBF and polynomial allow for more flexibility in separating complex data. The visualizations and accuracy scores provide insights into each kernel's strengths in classification tasks.