# ü§ñ Support Vector Machines (SVM) ‚Äî Interactive Notebook
### Python for Machine Learning | Middle School Edition

---

**In this notebook we will:**
- üîµ Classify data into **2 groups** (Binary Classification)
- üåà Classify data into **3+ groups** (Multi-Class Classification)
- üéõÔ∏è See how changing the **C parameter** affects accuracy
- üî≠ See how different **Kernels** change the decision boundary
- üìä Build heatmaps to find the **best settings** for SVM

> **Tip:** Run each cell from top to bottom using **Shift + Enter**. Watch the charts appear!

---


## ‚öôÔ∏è Section 0 ‚Äî Setup: Import Libraries

First, let's import everything we need. Think of this as getting all your art supplies ready before you start a painting!

In [None]:
# ‚îÄ‚îÄ Core libraries ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import warnings
warnings.filterwarnings('ignore')

# ‚îÄ‚îÄ Scikit-learn tools ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
from sklearn.svm import SVC
from sklearn.multiclass import OneVsRestClassifier
from sklearn.datasets import (make_moons, make_circles, make_blobs,
                               make_classification, load_iris, load_wine)
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import (accuracy_score, classification_report,
                              confusion_matrix, ConfusionMatrixDisplay)
from sklearn.pipeline import Pipeline

# ‚îÄ‚îÄ Plot style ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
plt.rcParams.update({
    'figure.facecolor': '#f8f9fa',
    'axes.facecolor':   '#ffffff',
    'axes.grid':        True,
    'grid.alpha':       0.3,
    'font.size':        11,
})
COLORS  = ['#e74c3c', '#3498db', '#27ae60', '#f39c12', '#9b59b6']
CMAP_BG = plt.cm.RdYlGn

print("‚úÖ Libraries loaded ‚Äî ready to go!")
print(f"   NumPy {np.__version__} | Matplotlib {plt.matplotlib.__version__}")


### üõ†Ô∏è Helper Function ‚Äî Plot Decision Boundary

This function draws the colored regions showing what SVM predicts in each zone. We will reuse it many times!

In [None]:
def plot_decision_boundary(ax, clf, X, y, title='', show_sv=True,
                           cmap=plt.cm.RdYlGn, alpha=0.25):
    """Plot the decision boundary and data points for a trained SVM."""
    h = 0.04
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
    ax.contourf(xx, yy, Z, alpha=alpha, cmap=cmap)
    ax.contour(xx, yy, Z, colors='#333333', linewidths=1.5, levels=np.unique(Z)[:-1] + 0.5)

    n_classes = len(np.unique(y))
    for i, cls in enumerate(np.unique(y)):
        mask = y == cls
        ax.scatter(X[mask, 0], X[mask, 1],
                   s=70, color=COLORS[i % len(COLORS)],
                   edgecolors='white', linewidths=0.5,
                   zorder=5, label=f'Class {cls}')

    # Highlight support vectors
    if show_sv and hasattr(clf, 'support_vectors_'):
        sv = clf.support_vectors_
        ax.scatter(sv[:, 0], sv[:, 1], s=200, facecolors='none',
                   edgecolors='gold', linewidths=2, zorder=6, label='Support Vectors')

    acc = accuracy_score(y, clf.predict(X))
    ax.set_title(f'{title}\nAccuracy: {acc*100:.1f}%', fontsize=11, fontweight='bold')
    ax.legend(fontsize=8, loc='upper left')
    return acc

print("‚úÖ Helper function defined!")


---
## üîµ Section 1 ‚Äî Binary Classification (2 Classes)

Binary means **two categories** ‚Äî like yes/no, pass/fail, spam/not-spam.

We will test SVM on **three toy datasets** that get progressively trickier:
1. üç¶ **Linearly Separable** ‚Äî a straight line works perfectly
2. üåô **Moons** ‚Äî two crescent-moon shapes interleaved
3. ‚≠ï **Circles** ‚Äî one group surrounds the other in a ring


### 1a. Generate Three Toy Datasets

In [None]:
np.random.seed(42)

# Dataset 1: Linearly separable blobs
X_lin, y_lin = make_blobs(n_samples=200, centers=2,
                           cluster_std=0.8, random_state=42)

# Dataset 2: Two moons (curved, not linearly separable)
X_moon, y_moon = make_moons(n_samples=200, noise=0.18, random_state=42)

# Dataset 3: Concentric circles (ring inside ring)
X_circ, y_circ = make_circles(n_samples=200, noise=0.12,
                                factor=0.45, random_state=42)

datasets = [
    (X_lin,  y_lin,  'Toy 1: Blobs (easy)\nUse: Linear Kernel'),
    (X_moon, y_moon, 'Toy 2: Moons (medium)\nUse: RBF Kernel'),
    (X_circ, y_circ, 'Toy 3: Circles (hard)\nUse: RBF Kernel'),
]

# Quick peek at sizes
for name, (X, y, title) in enumerate(datasets):
    print(f"Dataset {name+1}: {X.shape[0]} samples, "
          f"Class 0: {sum(y==0)}, Class 1: {sum(y==1)}")


### 1b. Visualise the Raw Data

Let's look at the data **before** training any model. Can you guess which dataset is easiest to separate?

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
fig.suptitle('Binary Classification ‚Äî Three Toy Datasets (Raw Data)',
             fontsize=14, fontweight='bold', y=1.02)

for ax, (X, y, title) in zip(axes, datasets):
    for cls in [0, 1]:
        ax.scatter(X[y==cls, 0], X[y==cls, 1],
                   s=60, color=COLORS[cls], alpha=0.8,
                   edgecolors='white', linewidths=0.4,
                   label=f'Class {cls}')
    ax.set_title(title, fontsize=11, fontweight='bold')
    ax.legend(fontsize=9)

plt.tight_layout()
plt.show()
print("Notice: Dataset 1 can be split by a straight line.")
print("Datasets 2 & 3 need curved boundaries ‚Äî a straight line would fail!")


### 1c. Scale, Train, and Plot Decision Boundaries

We pick the best kernel for each dataset and see the results. Remember ‚Äî always **scale** your data before SVM!

In [None]:
kernels = ['linear', 'rbf', 'rbf']
fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
fig.suptitle('Binary SVM ‚Äî Decision Boundaries on Each Dataset',
             fontsize=14, fontweight='bold', y=1.02)

for ax, (X, y, title), kernel in zip(axes, datasets, kernels):
    # Split & scale
    X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.25, random_state=42)
    scaler = StandardScaler()
    X_tr_sc = scaler.fit_transform(X_tr)
    X_te_sc = scaler.transform(X_te)
    X_sc    = scaler.transform(X)

    # Train
    clf = SVC(kernel=kernel, C=1.0, random_state=42)
    clf.fit(X_tr_sc, y_tr)

    # Plot
    test_acc = accuracy_score(y_te, clf.predict(X_te_sc))
    sv_count = len(clf.support_vectors_)
    plot_decision_boundary(ax, clf, X_sc, y,
                           title=f'{title}\nKernel={kernel} | Test Acc={test_acc*100:.1f}%')
    ax.set_xlabel(f'Support Vectors: {sv_count}', fontsize=9)

plt.tight_layout()
plt.show()

print("Gold circles = Support Vectors (the 'VIP' points closest to the boundary)")
print("The shaded regions show which class SVM predicts in each zone.")


### 1d. Real-ish Example ‚Äî Will the Student Pass? üéì

Let's use a dataset that feels more realistic: predicting if a student **passes or needs extra practice** based on their homework and quiz scores.


In [None]:
np.random.seed(7)
n = 60

# Students who passed (high scores)
pass_hw   = np.random.normal(76, 9, n)
pass_quiz = np.random.normal(73, 9, n)

# Students who need practice (lower scores)
fail_hw   = np.random.normal(46, 9, n)
fail_quiz = np.random.normal(43, 9, n)

X_stu = np.vstack([np.column_stack([pass_hw, pass_quiz]),
                   np.column_stack([fail_hw,  fail_quiz])])
y_stu = np.array([1]*n + [0]*n)   # 1=Pass, 0=Needs Practice

# Split & scale
X_tr, X_te, y_tr, y_te = train_test_split(X_stu, y_stu,
                                            test_size=0.2, random_state=42)
scaler_stu = StandardScaler()
X_tr_sc = scaler_stu.fit_transform(X_tr)
X_te_sc = scaler_stu.transform(X_te)
X_sc    = scaler_stu.transform(X_stu)

# Train
clf_stu = SVC(kernel='rbf', C=1.0, random_state=42)
clf_stu.fit(X_tr_sc, y_tr)

# Plot
fig, ax = plt.subplots(figsize=(8, 6))
plot_decision_boundary(ax, clf_stu, X_sc, y_stu,
                       title='Will the Student Pass? (RBF Kernel, C=1.0)')
ax.set_xlabel('Homework Score (scaled)', fontsize=11)
ax.set_ylabel('Quiz Score (scaled)', fontsize=11)
plt.tight_layout()
plt.show()

# Predict new students
new_students = np.array([[80, 75],   # strong student
                          [50, 55],   # struggling student
                          [63, 60]])  # borderline student
new_sc = scaler_stu.transform(new_students)
preds  = clf_stu.predict(new_sc)
labels = ['Pass', 'Needs Practice']

print("\n--- Predicting New Students ---")
for (hw, qz), p in zip(new_students, preds):
    print(f"  HW={hw}, Quiz={qz}  ‚Üí  {labels[p]}")
print(f"\nTest accuracy: {accuracy_score(y_te, clf_stu.predict(X_te_sc))*100:.1f}%")


---
## üåà Section 2 ‚Äî Multi-Class Classification (3+ Classes)

SVM was designed for **two** classes, but we can extend it to handle many classes using two strategies:

| Strategy | How it works | sklearn |
|----------|-------------|---------|
| **One-vs-Rest (OvR)** | Train 1 classifier per class vs all others | `OneVsRestClassifier(SVC())` |
| **One-vs-One (OvO)** | Train 1 classifier per *pair* of classes | `SVC()` ‚Üê **default!** |

We'll test both on three datasets.


### 2a. Three Multi-Class Toy Datasets

In [None]:
np.random.seed(42)

# Dataset A: 3 clearly separated blobs
X_3blob, y_3blob = make_blobs(n_samples=300, centers=3,
                               cluster_std=0.9, random_state=42)

# Dataset B: 4 blobs ‚Äî slightly overlapping
X_4blob, y_4blob = make_blobs(n_samples=400, centers=4,
                               cluster_std=1.3, random_state=10)

# Dataset C: Iris flowers (real botanical dataset!)
iris   = load_iris()
X_iris = iris.data[:, [2, 3]]     # petal length & petal width (best 2 features)
y_iris = iris.target
iris_names = iris.target_names

multi_datasets = [
    (X_3blob, y_3blob, '3 Blobs (3 classes)',    None),
    (X_4blob, y_4blob, '4 Blobs (4 classes)',    None),
    (X_iris,  y_iris,  'Iris Flowers (3 species)', iris_names),
]

fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
fig.suptitle('Multi-Class Datasets ‚Äî Raw Data', fontsize=14, fontweight='bold', y=1.02)

for ax, (X, y, title, names) in zip(axes, multi_datasets):
    for i, cls in enumerate(np.unique(y)):
        lbl = names[cls] if names is not None else f'Class {cls}'
        ax.scatter(X[y==cls, 0], X[y==cls, 1],
                   s=60, color=COLORS[i], alpha=0.8,
                   edgecolors='white', linewidths=0.4, label=lbl)
    ax.set_title(title, fontsize=11, fontweight='bold')
    ax.legend(fontsize=9)

plt.tight_layout()
plt.show()


### 2b. OvR vs OvO ‚Äî Side-by-Side on Each Dataset

We train **both strategies** on each dataset and compare. Do they give the same result?

In [None]:
fig, axes = plt.subplots(2, 3, figsize=(15, 9))
fig.suptitle('Multi-Class SVM: OvR (top row)  vs  OvO (bottom row)',
             fontsize=14, fontweight='bold', y=1.01)

strategy_labels = ['One-vs-Rest (OvR)', 'One-vs-One (OvO, SVC default)']

for col, (X, y, ds_title, names) in enumerate(multi_datasets):
    scaler_mc = StandardScaler()
    X_sc = scaler_mc.fit_transform(X)
    X_tr, X_te, y_tr, y_te = train_test_split(X_sc, y,
                                               test_size=0.25, random_state=42)

    clfs = [
        OneVsRestClassifier(SVC(kernel='rbf', C=2.0, random_state=42)),
        SVC(kernel='rbf', C=2.0, decision_function_shape='ovo', random_state=42),
    ]

    for row, (clf, strat) in enumerate(zip(clfs, strategy_labels)):
        clf.fit(X_tr, y_tr)
        ax = axes[row][col]
        cmap = plt.cm.Pastel1 if len(np.unique(y)) > 2 else plt.cm.RdYlGn

        # Plot regions
        h = 0.04
        x0min, x0max = X_sc[:,0].min()-0.5, X_sc[:,0].max()+0.5
        x1min, x1max = X_sc[:,1].min()-0.5, X_sc[:,1].max()+0.5
        xx, yy = np.meshgrid(np.arange(x0min, x0max, h),
                              np.arange(x1min, x1max, h))
        Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
        ax.contourf(xx, yy, Z, alpha=0.22, cmap=cmap)
        ax.contour(xx, yy, Z, colors='#555', linewidths=1.2)

        # Plot points
        for i, cls in enumerate(np.unique(y)):
            lbl = names[cls] if names is not None else f'Class {cls}'
            ax.scatter(X_sc[y==cls, 0], X_sc[y==cls, 1],
                       s=55, color=COLORS[i], alpha=0.85,
                       edgecolors='white', linewidths=0.4, label=lbl)

        test_acc = accuracy_score(y_te, clf.predict(X_te))
        ax.set_title(f'{ds_title}\n{strat} | Acc={test_acc*100:.1f}%',
                     fontsize=10, fontweight='bold')
        ax.legend(fontsize=7, loc='upper left')

plt.tight_layout()
plt.show()
print("Notice how OvR and OvO often produce very similar boundaries!")
print("The key difference shows up when classes are close together.")


### 2c. Peek Inside OvR ‚Äî One Binary Classifier per Class

Let's visualise **each individual binary classifier** that OvR trains. For 3 classes, there are 3 classifiers!


In [None]:
scaler3 = StandardScaler()
X3_sc   = scaler3.fit_transform(X_3blob)

fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
fig.suptitle('Inside OvR ‚Äî Each Classifier Asks "Is it THIS class, or the rest?"',
             fontsize=13, fontweight='bold', y=1.02)

class_labels = ['Class 0 (Red)', 'Class 1 (Blue)', 'Class 2 (Green)']

for i, ax in enumerate(axes):
    # Binary labels: class i = 1, everything else = 0
    y_bin = (y_3blob == i).astype(int)

    clf_bin = SVC(kernel='rbf', C=1.0, random_state=42)
    clf_bin.fit(X3_sc, y_bin)

    # Regions
    h = 0.05
    xx, yy = np.meshgrid(np.arange(X3_sc[:,0].min()-0.5, X3_sc[:,0].max()+0.5, h),
                         np.arange(X3_sc[:,1].min()-0.5, X3_sc[:,1].max()+0.5, h))
    Z = clf_bin.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
    ax.contourf(xx, yy, Z, alpha=0.20,
                colors=[COLORS[i] if v==1 else '#eeeeee' for v in [0,1]])
    ax.contour(xx, yy, Z, colors=[COLORS[i]], linewidths=2.5)

    # Points: highlight the target class, grey out the rest
    for j in range(3):
        c     = COLORS[j] if j==i else '#cccccc'
        alpha = 0.9 if j==i else 0.4
        ax.scatter(X3_sc[y_3blob==j, 0], X3_sc[y_3blob==j, 1],
                   s=60, color=c, alpha=alpha, edgecolors='white', linewidths=0.4)

    acc = accuracy_score(y_bin, clf_bin.predict(X3_sc))
    ax.set_title(f'Classifier {i+1}: {class_labels[i]} vs Rest\nAcc={acc*100:.1f}%',
                 fontsize=11, fontweight='bold', color=COLORS[i])

plt.tight_layout()
plt.show()


### 2d. Confusion Matrix ‚Äî Iris Multi-Class

A confusion matrix tells us **exactly** which flowers the model got right, and which ones it mixed up.

In [None]:
# Full Iris dataset (all 4 features for best accuracy)
X_iris_full = iris.data
y_iris_full = iris.target

scaler_iris = StandardScaler()
X_sc_iris   = scaler_iris.fit_transform(X_iris_full)
X_tr, X_te, y_tr, y_te = train_test_split(X_sc_iris, y_iris_full,
                                            test_size=0.25, random_state=42)

# Train OvO (SVC default)
clf_iris = SVC(kernel='rbf', C=2.0, random_state=42)
clf_iris.fit(X_tr, y_tr)
y_pred = clf_iris.predict(X_te)

# Plot confusion matrix
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
fig.suptitle('Iris Flower Classification ‚Äî Results', fontsize=13, fontweight='bold')

cm   = confusion_matrix(y_te, y_pred)
disp = ConfusionMatrixDisplay(cm, display_labels=iris.target_names)
disp.plot(ax=axes[0], colorbar=False, cmap='Blues')
axes[0].set_title('Confusion Matrix\n(diagonal = correct predictions)', fontweight='bold')

# Per-class bar chart
report = classification_report(y_te, y_pred, target_names=iris.target_names, output_dict=True)
species  = iris.target_names
f1_scores = [report[s]['f1-score'] for s in species]
bars = axes[1].bar(species, f1_scores, color=COLORS[:3], edgecolor='white', width=0.5)
axes[1].set_ylim(0, 1.15)
axes[1].set_ylabel('F1 Score', fontsize=11)
axes[1].set_title('F1 Score per Flower Species\n(1.0 = perfect)', fontweight='bold')
for bar, score in zip(bars, f1_scores):
    axes[1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,
                 f'{score:.2f}', ha='center', fontweight='bold', fontsize=12)

plt.tight_layout()
plt.show()

print(f"\nOverall Test Accuracy: {accuracy_score(y_te, y_pred)*100:.1f}%")
print("\nFull Report:")
print(classification_report(y_te, y_pred, target_names=iris.target_names))


### 2e. Bonus Dataset ‚Äî Wine Classification üç∑

The Wine dataset has **13 features** and **3 classes** (three wine varieties). More features = harder to visualise but often more accurate!


In [None]:
wine   = load_wine()
X_wine = wine.data
y_wine = wine.target

print("Wine Dataset Info:")
print(f"  Samples: {X_wine.shape[0]}, Features: {X_wine.shape[1]}")
print(f"  Classes: {list(wine.target_names)}")
print(f"  Per class: {[sum(y_wine==i) for i in range(3)]}")

# Scale & split
scaler_wine = StandardScaler()
X_sc_wine   = scaler_wine.fit_transform(X_wine)
X_tr, X_te, y_tr, y_te = train_test_split(X_sc_wine, y_wine,
                                            test_size=0.25, random_state=42)

# Compare strategies
for strat_name, clf in [
    ('OvO (SVC default)',  SVC(kernel='rbf', C=5.0, random_state=42)),
    ('OvR',                OneVsRestClassifier(SVC(kernel='rbf', C=5.0, random_state=42))),
]:
    clf.fit(X_tr, y_tr)
    acc = accuracy_score(y_te, clf.predict(X_te))
    cv  = cross_val_score(clf, X_sc_wine, y_wine, cv=5).mean()
    print(f"  {strat_name:<22} Test Acc={acc*100:.1f}%   5-fold CV={cv*100:.1f}%")


---
## üéõÔ∏è Section 3 ‚Äî The C Parameter: Strict vs Flexible

The **C parameter** controls how much SVM cares about misclassifying training points:

| C value | Behaviour | Risk |
|---------|-----------|------|
| **Very small** (0.001) | Very flexible, big margin, allows mistakes | Might underfit |
| **Medium** (1.0) | Balanced ‚Äî usually the best starting point | ‚Äî |
| **Very large** (1000) | Very strict, no mistakes allowed, small margin | Might overfit |

> **Analogy:** C is like a teacher's strictness on a test. Too strict = memorises answers, fails on new questions. Too flexible = doesn't learn enough.


### 3a. Visualise C on the Moons Dataset

Watch how the boundary changes as C goes from tiny to huge:

In [None]:
X_mc, y_mc = make_moons(n_samples=250, noise=0.22, random_state=42)
X_tr_c, X_te_c, y_tr_c, y_te_c = train_test_split(X_mc, y_mc, test_size=0.25, random_state=42)

scaler_c = StandardScaler()
X_tr_cs  = scaler_c.fit_transform(X_tr_c)
X_te_cs  = scaler_c.transform(X_te_c)
X_cs     = scaler_c.transform(X_mc)

C_values = [0.001, 0.1, 1.0, 10.0, 100.0, 1000.0]

fig, axes = plt.subplots(2, 3, figsize=(15, 9))
fig.suptitle('Effect of C Parameter on Decision Boundary (RBF Kernel, Moons Dataset)',
             fontsize=14, fontweight='bold', y=1.01)

train_accs, test_accs = [], []

for ax, C in zip(axes.flatten(), C_values):
    clf_c = SVC(kernel='rbf', C=C, gamma='scale', random_state=42)
    clf_c.fit(X_tr_cs, y_tr_c)

    tr_acc = accuracy_score(y_tr_c, clf_c.predict(X_tr_cs))
    te_acc = accuracy_score(y_te_c, clf_c.predict(X_te_cs))
    train_accs.append(tr_acc)
    test_accs.append(te_acc)

    h = 0.04
    xx, yy = np.meshgrid(np.arange(X_cs[:,0].min()-0.4, X_cs[:,0].max()+0.4, h),
                         np.arange(X_cs[:,1].min()-0.4, X_cs[:,1].max()+0.4, h))
    Z = clf_c.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
    ax.contourf(xx, yy, Z, alpha=0.25, cmap=CMAP_BG)
    ax.contour(xx, yy, Z, colors='#333', linewidths=1.5)

    for cls in [0, 1]:
        ax.scatter(X_cs[y_mc==cls, 0], X_cs[y_mc==cls, 1],
                   s=55, color=COLORS[cls], alpha=0.8,
                   edgecolors='white', linewidths=0.4)

    sv_n = len(clf_c.support_vectors_)
    color = '#c0392b' if (tr_acc - te_acc > 0.06) else '#27ae60'
    ax.set_title(f'C = {C}\nTrain: {tr_acc*100:.1f}%  Test: {te_acc*100:.1f}%  SVs: {sv_n}',
                 fontsize=10.5, fontweight='bold', color=color)

plt.tight_layout()
plt.show()
print("Red title = overfitting warning (train >> test)")
print("Green title = healthy gap between train and test accuracy")


### 3b. Train vs Test Accuracy Curve

This plot shows the classic **overfitting curve** ‚Äî as C grows, training accuracy rises but test accuracy eventually falls.

In [None]:
C_range   = np.logspace(-3, 4, 40)
tr_scores = []
te_scores = []

for C in C_range:
    clf_tmp = SVC(kernel='rbf', C=C, gamma='scale', random_state=42)
    clf_tmp.fit(X_tr_cs, y_tr_c)
    tr_scores.append(accuracy_score(y_tr_c, clf_tmp.predict(X_tr_cs)))
    te_scores.append(accuracy_score(y_te_c, clf_tmp.predict(X_te_cs)))

fig, ax = plt.subplots(figsize=(10, 5))
ax.semilogx(C_range, tr_scores, 'o-', color='#e74c3c', lw=2, ms=4, label='Training Accuracy')
ax.semilogx(C_range, te_scores, 's-', color='#3498db', lw=2, ms=4, label='Test Accuracy')

# Shade the overfit region
best_c_idx = np.argmax(te_scores)
best_c     = C_range[best_c_idx]
ax.axvline(best_c, color='#27ae60', linestyle='--', lw=2, label=f'Best C ‚âà {best_c:.2f}')
ax.fill_between(C_range, tr_scores, te_scores, alpha=0.10, color='red',
                label='Overfitting gap')

ax.set_xlabel('C value (log scale)', fontsize=12)
ax.set_ylabel('Accuracy', fontsize=12)
ax.set_title('Train vs Test Accuracy as C Changes\n'
             '(left = too flexible / right = too strict)', fontsize=13, fontweight='bold')
ax.legend(fontsize=11)
ax.set_ylim(0.5, 1.05)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda v, _: f'{v*100:.0f}%'))

plt.tight_layout()
plt.show()
print(f"Best C for this dataset: {best_c:.3f}")
print(f"Best test accuracy:      {max(te_scores)*100:.1f}%")


### 3c. Number of Support Vectors vs C

As C increases, SVM gets stricter and uses **fewer** support vectors. Let's see this relationship!

In [None]:
C_test = [0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500, 1000]
sv_counts = []

for C in C_test:
    clf_sv = SVC(kernel='rbf', C=C, gamma='scale', random_state=42)
    clf_sv.fit(X_tr_cs, y_tr_c)
    sv_counts.append(len(clf_sv.support_vectors_))

fig, axes = plt.subplots(1, 2, figsize=(13, 5))

axes[0].semilogx(C_test, sv_counts, 'D-', color='#9b59b6', lw=2, ms=8)
axes[0].fill_between(C_test, sv_counts, alpha=0.15, color='#9b59b6')
axes[0].set_xlabel('C value (log scale)', fontsize=12)
axes[0].set_ylabel('Number of Support Vectors', fontsize=12)
axes[0].set_title('As C increases ‚Üí Fewer Support Vectors\n(SVM becomes more selective)', fontsize=12, fontweight='bold')

# Bar chart version
bars = axes[1].bar([str(c) for c in C_test], sv_counts,
                    color='#9b59b6', alpha=0.75, edgecolor='white')
axes[1].set_xlabel('C value', fontsize=12)
axes[1].set_ylabel('Support Vectors', fontsize=12)
axes[1].set_title('Support Vector Count per C Value', fontsize=12, fontweight='bold')
axes[1].tick_params(axis='x', rotation=45)
for bar, cnt in zip(bars, sv_counts):
    axes[1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5,
                 str(cnt), ha='center', fontsize=9, fontweight='bold')

plt.tight_layout()
plt.show()
print("Small C  ‚Üí many support vectors ‚Üí loose boundary")
print("Large C  ‚Üí few support vectors  ‚Üí tight boundary (risk of overfitting)")


---
## üî≠ Section 4 ‚Äî Kernel Comparison

The **kernel** is the "lens" SVM uses to look at the data. Different kernels draw differently shaped boundaries:

| Kernel | Shape of boundary | Best for |
|--------|------------------|----------|
| `linear` | Straight line | Linearly separable data |
| `poly` | Curved, polynomial shape | Moderate complexity |
| `rbf` | Circular / blob shapes | Most real-world data |
| `sigmoid` | S-shaped curve | Rarely used |


### 4a. All Four Kernels on Three Datasets

In [None]:
kernels_all = ['linear', 'poly', 'rbf', 'sigmoid']
test_dsets  = [
    (X_lin,  y_lin,  'Blobs (easy)'),
    (X_moon, y_moon, 'Moons (medium)'),
    (X_circ, y_circ, 'Circles (hard)'),
]

fig, axes = plt.subplots(len(test_dsets), len(kernels_all),
                         figsize=(17, 12))
fig.suptitle('Kernel Comparison ‚Äî All Four Kernels on Three Datasets',
             fontsize=15, fontweight='bold', y=1.01)

for row, (X, y, ds_name) in enumerate(test_dsets):
    scaler_k = StandardScaler()
    X_sc_k   = scaler_k.fit_transform(X)
    X_tr_k, X_te_k, y_tr_k, y_te_k = train_test_split(
        X_sc_k, y, test_size=0.25, random_state=42)

    for col, kernel in enumerate(kernels_all):
        ax = axes[row][col]
        clf_k = SVC(kernel=kernel, C=1.0, degree=3, random_state=42)
        clf_k.fit(X_tr_k, y_tr_k)
        te_acc = accuracy_score(y_te_k, clf_k.predict(X_te_k))

        h = 0.05
        xx, yy = np.meshgrid(
            np.arange(X_sc_k[:,0].min()-0.4, X_sc_k[:,0].max()+0.4, h),
            np.arange(X_sc_k[:,1].min()-0.4, X_sc_k[:,1].max()+0.4, h))
        Z = clf_k.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
        ax.contourf(xx, yy, Z, alpha=0.25, cmap=CMAP_BG)
        ax.contour(xx, yy, Z, colors='#444', linewidths=1.2)
        for cls in np.unique(y):
            ax.scatter(X_sc_k[y==cls, 0], X_sc_k[y==cls, 1],
                       s=40, color=COLORS[cls], alpha=0.75,
                       edgecolors='white', linewidths=0.3)

        emoji = '‚úÖ' if te_acc > 0.88 else ('‚ö†Ô∏è' if te_acc > 0.75 else '‚ùå')
        ax.set_title(f'{emoji} {kernel} | {te_acc*100:.1f}%',
                     fontsize=10.5, fontweight='bold')

        if col == 0:
            ax.set_ylabel(ds_name, fontsize=11, fontweight='bold')

plt.tight_layout()
plt.show()
print("‚úÖ = Great (>88%)   ‚ö†Ô∏è = OK (75-88%)   ‚ùå = Poor (<75%)")


### 4b. Kernel Accuracy Bar Chart ‚Äî Head-to-Head

In [None]:
results = {}  # {dataset_name: {kernel: test_acc}}

for X, y, ds_name in test_dsets:
    scaler_bar = StandardScaler()
    X_sc_bar   = scaler_bar.fit_transform(X)
    X_tr_b, X_te_b, y_tr_b, y_te_b = train_test_split(
        X_sc_bar, y, test_size=0.25, random_state=42)
    results[ds_name] = {}
    for kernel in kernels_all:
        clf_b = SVC(kernel=kernel, C=1.0, degree=3, random_state=42)
        clf_b.fit(X_tr_b, y_tr_b)
        results[ds_name][kernel] = accuracy_score(y_te_b, clf_b.predict(X_te_b))

# Plot grouped bar chart
x     = np.arange(len(kernels_all))
width = 0.25
ds_colors = ['#3498db', '#e74c3c', '#27ae60']

fig, ax = plt.subplots(figsize=(11, 6))
for i, (ds_name, kern_acc) in enumerate(results.items()):
    vals = [kern_acc[k] for k in kernels_all]
    bars = ax.bar(x + i*width, [v*100 for v in vals],
                  width, label=ds_name, color=ds_colors[i],
                  alpha=0.82, edgecolor='white')
    for bar, v in zip(bars, vals):
        ax.text(bar.get_x() + bar.get_width()/2,
                bar.get_height() + 0.5,
                f'{v*100:.0f}%', ha='center', fontsize=8.5, fontweight='bold')

ax.set_xticks(x + width)
ax.set_xticklabels([f'{k}' for k in kernels_all], fontsize=12)
ax.set_ylabel('Test Accuracy (%)', fontsize=12)
ax.set_ylim(40, 110)
ax.set_title('Kernel Accuracy Comparison ‚Äî Three Binary Datasets (C=1.0)',
             fontsize=13, fontweight='bold')
ax.legend(fontsize=10)
ax.axhline(90, color='grey', linestyle=':', alpha=0.5, label='90% reference')

plt.tight_layout()
plt.show()


---
## üìä Section 5 ‚Äî C √ó Kernel Heatmaps: Finding the Sweet Spot

Now let's combine everything: we'll try **every combination of C and kernel** and record the accuracy.
This is called a **grid search** ‚Äî we build a table (heatmap) of results to find the best settings!


### 5a. Heatmap ‚Äî Moons Dataset

In [None]:
from sklearn.model_selection import cross_val_score

C_grid       = [0.001, 0.01, 0.1, 0.5, 1, 5, 10, 50, 100, 500]
kernel_grid  = ['linear', 'poly', 'rbf', 'sigmoid']

scaler_hm = StandardScaler()
X_hm = scaler_hm.fit_transform(X_moon)

# Build accuracy matrix using 5-fold cross-validation
hm_scores = np.zeros((len(kernel_grid), len(C_grid)))

for ki, kernel in enumerate(kernel_grid):
    for ci, C in enumerate(C_grid):
        clf_hm = SVC(kernel=kernel, C=C, gamma='scale', degree=3, random_state=42)
        scores = cross_val_score(clf_hm, X_hm, y_moon, cv=5, scoring='accuracy')
        hm_scores[ki, ci] = scores.mean()
    print(f"  Done kernel='{kernel}'")

# Plot heatmap
fig, ax = plt.subplots(figsize=(13, 5))
im = ax.imshow(hm_scores, cmap='RdYlGn', aspect='auto',
               vmin=hm_scores.min(), vmax=1.0)
plt.colorbar(im, ax=ax, label='5-Fold CV Accuracy')

ax.set_xticks(range(len(C_grid)))
ax.set_xticklabels([str(c) for c in C_grid], fontsize=10)
ax.set_yticks(range(len(kernel_grid)))
ax.set_yticklabels(kernel_grid, fontsize=11, fontweight='bold')
ax.set_xlabel('C value', fontsize=12)
ax.set_title('Accuracy Heatmap: C √ó Kernel (Moons Dataset)\n'
             'Green = high accuracy  |  Red = low accuracy',
             fontsize=13, fontweight='bold')

# Annotate cells
for ki in range(len(kernel_grid)):
    for ci in range(len(C_grid)):
        v = hm_scores[ki, ci]
        txt_color = 'black' if v > 0.75 else 'white'
        ax.text(ci, ki, f'{v*100:.0f}%', ha='center', va='center',
                fontsize=9, fontweight='bold', color=txt_color)

# Star the best cell
best_ki, best_ci = np.unravel_index(hm_scores.argmax(), hm_scores.shape)
ax.add_patch(plt.Rectangle((best_ci-0.5, best_ki-0.5), 1, 1,
                             fill=False, edgecolor='gold', lw=3))
ax.text(best_ci, best_ki - 0.62, '‚òÖ BEST', ha='center', fontsize=9,
        color='gold', fontweight='bold')

plt.tight_layout()
plt.show()

print(f"\nBest combination: kernel='{kernel_grid[best_ki]}', C={C_grid[best_ci]}")
print(f"Best CV accuracy: {hm_scores[best_ki, best_ci]*100:.1f}%")


### 5b. Heatmap ‚Äî Iris Multi-Class Dataset

In [None]:
scaler_iris_hm = StandardScaler()
X_iris_hm = scaler_iris_hm.fit_transform(iris.data)

hm_iris = np.zeros((len(kernel_grid), len(C_grid)))

for ki, kernel in enumerate(kernel_grid):
    for ci, C in enumerate(C_grid):
        clf_tmp = SVC(kernel=kernel, C=C, gamma='scale', degree=3, random_state=42)
        scores  = cross_val_score(clf_tmp, X_iris_hm, iris.target, cv=5)
        hm_iris[ki, ci] = scores.mean()
    print(f"  Done kernel='{kernel}'")

fig, ax = plt.subplots(figsize=(13, 5))
im = ax.imshow(hm_iris, cmap='RdYlGn', aspect='auto',
               vmin=hm_iris.min(), vmax=1.0)
plt.colorbar(im, ax=ax, label='5-Fold CV Accuracy')

ax.set_xticks(range(len(C_grid)))
ax.set_xticklabels([str(c) for c in C_grid], fontsize=10)
ax.set_yticks(range(len(kernel_grid)))
ax.set_yticklabels(kernel_grid, fontsize=11, fontweight='bold')
ax.set_xlabel('C value', fontsize=12)
ax.set_title('Accuracy Heatmap: C √ó Kernel (Iris Multi-Class Dataset)\n'
             'Green = high accuracy  |  Red = low accuracy',
             fontsize=13, fontweight='bold')

for ki in range(len(kernel_grid)):
    for ci in range(len(C_grid)):
        v = hm_iris[ki, ci]
        txt_color = 'black' if v > 0.80 else 'white'
        ax.text(ci, ki, f'{v*100:.0f}%', ha='center', va='center',
                fontsize=9, fontweight='bold', color=txt_color)

best_ki_i, best_ci_i = np.unravel_index(hm_iris.argmax(), hm_iris.shape)
ax.add_patch(plt.Rectangle((best_ci_i-0.5, best_ki_i-0.5), 1, 1,
                             fill=False, edgecolor='gold', lw=3))
ax.text(best_ci_i, best_ki_i - 0.62, '‚òÖ BEST', ha='center', fontsize=9,
        color='gold', fontweight='bold')

plt.tight_layout()
plt.show()

print(f"\nBest combination: kernel='{kernel_grid[best_ki_i]}', C={C_grid[best_ci_i]}")
print(f"Best CV accuracy: {hm_iris[best_ki_i, best_ci_i]*100:.1f}%")


### 5c. Side-by-Side Heatmap Comparison ‚Äî All Three Binary Datasets

In [None]:
C_short = [0.01, 0.1, 1, 10, 100]
kern_short = ['linear', 'poly', 'rbf', 'sigmoid']

fig, axes = plt.subplots(1, 3, figsize=(18, 4))
fig.suptitle('C √ó Kernel Accuracy Heatmap ‚Äî Three Datasets Compared',
             fontsize=14, fontweight='bold', y=1.02)

for ax, (X, y, ds_name) in zip(axes, test_dsets):
    sc = StandardScaler()
    Xs = sc.fit_transform(X)
    mat = np.zeros((len(kern_short), len(C_short)))
    for ki, k in enumerate(kern_short):
        for ci, C in enumerate(C_short):
            clf_s = SVC(kernel=k, C=C, gamma='scale', degree=3, random_state=42)
            mat[ki, ci] = cross_val_score(clf_s, Xs, y, cv=5).mean()

    im = ax.imshow(mat, cmap='RdYlGn', aspect='auto',
                   vmin=mat.min(), vmax=1.0)
    ax.set_xticks(range(len(C_short)))
    ax.set_xticklabels(C_short, fontsize=10)
    ax.set_yticks(range(len(kern_short)))
    ax.set_yticklabels(kern_short, fontsize=10)
    ax.set_xlabel('C')
    ax.set_title(ds_name, fontsize=12, fontweight='bold')
    for ki in range(len(kern_short)):
        for ci in range(len(C_short)):
            v = mat[ki, ci]
            ax.text(ci, ki, f'{v*100:.0f}', ha='center', va='center',
                    fontsize=10, fontweight='bold',
                    color='black' if v > 0.75 else 'white')

plt.tight_layout()
plt.show()
print("Each number = cross-validated accuracy % for that C + kernel combo.")
print("Compare columns to see which C is best per dataset.")
print("Compare rows to see which kernel is best per dataset.")


---
## ü§ñ Section 6 ‚Äî Let sklearn Find the Best Settings Automatically

Instead of manually trying every combination, sklearn has `GridSearchCV` that does it for you ‚Äî automatically!


In [None]:
from sklearn.model_selection import GridSearchCV

# Define the search space
param_grid = {
    'svc__kernel': ['linear', 'rbf', 'poly'],
    'svc__C':      [0.1, 1, 10, 100],
    'svc__gamma':  ['scale', 'auto'],
}

# Build a pipeline (scale + SVM)
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('svc',    SVC(random_state=42))
])

# Grid search with 5-fold cross-validation
grid_search = GridSearchCV(pipeline, param_grid, cv=5,
                            scoring='accuracy', n_jobs=-1, verbose=0)

print("Searching best params on Moons dataset...")
grid_search.fit(X_moon, y_moon)

print(f"\nBest parameters found: {grid_search.best_params_}")
print(f"Best cross-validated accuracy: {grid_search.best_score_*100:.1f}%")

# Show top 5 results
import pandas as pd
results_df = pd.DataFrame(grid_search.cv_results_)
top5 = results_df.sort_values('mean_test_score', ascending=False)[
    ['param_svc__kernel', 'param_svc__C', 'param_svc__gamma', 'mean_test_score']
].head(5)
top5.columns = ['Kernel', 'C', 'Gamma', 'CV Accuracy']
top5['CV Accuracy'] = (top5['CV Accuracy'] * 100).round(1).astype(str) + '%'
print("\nTop 5 Combinations:")
print(top5.to_string(index=False))


In [None]:
# Same grid search on Iris
print("Searching best params on Iris dataset (all 4 features)...")
grid_iris = GridSearchCV(pipeline, param_grid, cv=5,
                          scoring='accuracy', n_jobs=-1)
grid_iris.fit(iris.data, iris.target)

print(f"\nBest parameters for Iris: {grid_iris.best_params_}")
print(f"Best cross-validated accuracy: {grid_iris.best_score_*100:.1f}%")

# Plot parameter importance
res_iris = pd.DataFrame(grid_iris.cv_results_)

fig, axes = plt.subplots(1, 2, figsize=(13, 5))
fig.suptitle('GridSearchCV Results ‚Äî Iris Dataset', fontsize=13, fontweight='bold')

# By kernel
for kernel in ['linear', 'rbf', 'poly']:
    subset = res_iris[res_iris['param_svc__kernel'] == kernel]
    axes[0].plot(subset['param_svc__C'].astype(float),
                 subset['mean_test_score'] * 100,
                 'o-', label=kernel, linewidth=2, markersize=6)
axes[0].set_xscale('log')
axes[0].set_xlabel('C value', fontsize=11)
axes[0].set_ylabel('CV Accuracy (%)', fontsize=11)
axes[0].set_title('Accuracy by Kernel & C Value', fontsize=11, fontweight='bold')
axes[0].legend(fontsize=10)

# Bar: best accuracy per kernel
best_by_kernel = res_iris.groupby('param_svc__kernel')['mean_test_score'].max() * 100
axes[1].bar(best_by_kernel.index, best_by_kernel.values,
            color=['#e74c3c','#3498db','#27ae60'], alpha=0.8, edgecolor='white')
axes[1].set_ylabel('Best CV Accuracy (%)', fontsize=11)
axes[1].set_title('Best Accuracy per Kernel (Iris)', fontsize=11, fontweight='bold')
axes[1].set_ylim(90, 102)
for i, (kern, val) in enumerate(best_by_kernel.items()):
    axes[1].text(i, val + 0.2, f'{val:.1f}%', ha='center', fontweight='bold', fontsize=12)

plt.tight_layout()
plt.show()


---
## üß† Section 7 ‚Äî Summary & Key Takeaways

Congratulations ‚Äî you've completed the SVM notebook! Let's review what we learned:


In [None]:
summary = {
    'Topic': [
        'Binary SVM',
        'Multi-Class (OvR)',
        'Multi-Class (OvO)',
        'Kernel: linear',
        'Kernel: rbf',
        'Kernel: poly',
        'Small C',
        'Large C',
        'Best C finder',
        'Scaling',
    ],
    'Key Point': [
        'SVM draws the widest-margin boundary between 2 classes',
        'N classifiers trained (1 per class vs rest); pick highest confidence',
        'N*(N-1)/2 classifiers; majority vote decides; SVC() default',
        'Straight line ‚Äî fast, good for linearly separable data',
        'Circular/blob boundary ‚Äî best for most real-world data',
        'Polynomial curve ‚Äî good for medium-complexity patterns',
        'Flexible, wide margin, allows some errors, less overfitting',
        'Strict, narrow margin, fits training data tightly, risk of overfitting',
        'Use GridSearchCV to automatically search the best C & kernel',
        'ALWAYS use StandardScaler before fitting SVM!',
    ],
    'Code Snippet': [
        "SVC(kernel='rbf', C=1.0)",
        "OneVsRestClassifier(SVC())",
        "SVC()  # OvO is the default",
        "SVC(kernel='linear')",
        "SVC(kernel='rbf')  # recommended default",
        "SVC(kernel='poly', degree=3)",
        "SVC(C=0.1)",
        "SVC(C=100)",
        "GridSearchCV(Pipeline([...]), param_grid, cv=5)",
        "StandardScaler().fit_transform(X)",
    ]
}

import pandas as pd
df_summary = pd.DataFrame(summary)
print(df_summary.to_string(index=False))


### üéØ Practice Challenges

Try these challenges to test your skills:

1. **Beginner:** Create a new dataset using `make_blobs(centers=5)` and train an OvO SVM on it. What accuracy do you get?

2. **Intermediate:** On the Circles dataset, find the single best (kernel, C) pair manually by trying different values. Beat 97% accuracy!

3. **Advanced:** Load the Wine dataset (`load_wine()`), run GridSearchCV with all three kernels and C values from 0.01 to 1000, and visualise the results in a heatmap.

4. **Explorer:** What happens when you change the `gamma` parameter in the RBF kernel? Try `gamma` values of 0.001, 0.1, 1, 10 on the Circles dataset and plot the boundaries.

---
*Great work! Every line of code you write brings you closer to being a machine learning pro!* üöÄ
