# Support Vector Machines (SVM)

Support Vector Machines are among the best "out-of-the-box" classifiers available. They rely on elegant geometric concepts to find the optimal boundary between classes.

## 1. The Hyperplane Concept
*   In **2 Dimensions**, a hyperplane is a flat **Line**.
*   In **3 Dimensions**, a hyperplane is a flat **Plane**.
*   In **p Dimensions**, it is a $p-1$ dimensional subspace.

We want to find a hyperplane that separates our two classes of data (e.g., Blue dots vs Red dots) perfectly.

## 2. Maximal Margin Classifier
Usually, there are infinite lines that can separate two perfectly separated clusters. Which one is best?

**Intuition**: We want the "widest street" possible.
*   We find a separator such that the distance to the nearest training data points is maximized. 
*   This distance is called the **Margin**.
*   The classifier is the "center line" of this street.

### Support Vectors
Interestingly, the position of the decision boundary depends **only** on the few observations that are closest to the line.
*   These closest points are called **Support Vectors**.
*   Points far away from the boundary do not affect the model at all. This makes SVM distinct from Logistic Regression (where all points contribute).

## 3. The Kernel Trick (Non-Linearity)
What if data isn't separable by a straight line? (e.g., data looks like a circle inside a ring).

**Solution**: Project the data into a higher dimension.
*   Imagine 2D data on a sheet of paper. You can't draw a straight line to separate inner and outer rings.
*   "Lift" the inner ring up into 3D space.
*   Now you can slide a flat sheet (hyperplane) between them!
*   This mathematical projection is handled efficiently using **Kernels**.

Common Kernels:
*   **Linear**: Standard straight line.
*   **Polynomial**: Curved lines.
*   **Radial Basis Function (RBF)**: Can create complex, island-like shapes.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 1. Generate Non-Linear Data (Moons)
X, y = make_moons(n_samples=400, noise=0.2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Visualize Data
plt.figure(figsize=(8,6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')
plt.title('Data that is NOT linearly separable')
plt.show()

# Helper function to visualize boundaries
def plot_decision_boundary(model, X, y, title="Boundary"):
    h = .02
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.figure(figsize=(8,6))
    plt.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')
    plt.title(title)
    plt.show()

### Comparing Kernels
We will try a Linear Kernel (which should fail) and an RBF Kernel (which should succeed).

In [None]:
# 2. Linear Kernel
svm_linear = SVC(kernel='linear', C=1.0)
svm_linear.fit(X_train, y_train)
acc_linear = accuracy_score(y_test, svm_linear.predict(X_test))
plot_decision_boundary(svm_linear, X, y, f"Linear Kernel (Acc: {acc_linear:.2f})")

# 3. RBF Kernel (Radial Basis Function)
svm_rbf = SVC(kernel='rbf', gamma=2, C=1.0)
svm_rbf.fit(X_train, y_train)
acc_rbf = accuracy_score(y_test, svm_rbf.predict(X_test))
plot_decision_boundary(svm_rbf, X, y, f"RBF Kernel (Acc: {acc_rbf:.2f})")

## 4. Hyperparameters
Two key parameters control SVM behavior:

1.  **C (Cost)**: Controls how strict we are about the margin.
    *   **Low C**: "Soft margin". Allows more errors/violations used to find a wider street. Generally generalizes better (Low Variance).
    *   **High C**: "Hard margin". Strict. Tries to classify everything correctly. Can overfit (High Variance).

2.  **Gamma** (for RBF): Defines how far the influence of a single training example reaches.
    *   **Low Gamma**: Far reach. Smoother decision boundary.
    *   **High Gamma**: Close reach. Boundary hugs data points tightly (can lead to "islands" around points).

## 5. Quiz

**Q1. Which Kernel is best suited for concentric circle data (one circle inside another)?**
A) Linear Kernel
B) RBF or Polynomial Kernel
C) No SVM can handle this.

**Q2. What are "Support Vectors"?**
A) All data points in the training set.
B) The data points closest to the decision boundary (margin).
C) The misclassified points only.

**Q3. If your SVM is overfitting (memorizing the noise), what should you try?**
A) Increase C (make it stricter).
B) Decrease C (allow wider margin/smoother boundary).
C) Use a more complex Kernel.

---
### Sample Answers
**Q1:** B). RBF/Polynomial project data to higher dims where circles become separable planes.
**Q2:** B). They literally "support" or define the boundary.
**Q3:** B). Decreasing C increases regularization (wider margin), reducing overfitting.