<a href="https://colab.research.google.com/github/jesse-venson/Machine-learning/blob/main/ML_SVM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Q1 - Part A: Load Dataset and Train-Test Split

In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Train-test split (80:20)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print("Dataset loaded successfully")
print(f"Training samples: {X_train.shape[0]}")
print(f"Test samples: {X_test.shape[0]}")
print(f"Features: {X_train.shape[1]}")
print(f"Classes: {iris.target_names}")

Dataset loaded successfully
Training samples: 120
Test samples: 30
Features: 4
Classes: ['setosa' 'versicolor' 'virginica']



Q1 - Part B, C, D: Train SVM with Different Kernels and Evaluate

In [2]:
# Define kernels
kernels = ['linear', 'poly', 'rbf']
results = {}

for kernel in kernels:
    print(f"\n{'='*70}")
    print(f"Training SVM with {kernel.upper()} kernel")
    print('='*70)

    # Train SVM
    if kernel == 'poly':
        svm = SVC(kernel=kernel, degree=3, random_state=42)
    else:
        svm = SVC(kernel=kernel, random_state=42)

    svm.fit(X_train, y_train)

    # Predictions
    y_pred = svm.predict(X_test)

    # Evaluate
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred, average='weighted')
    recall = recall_score(y_test, y_pred, average='weighted')
    f1 = f1_score(y_test, y_pred, average='weighted')
    cm = confusion_matrix(y_test, y_pred)

    # Store results
    results[kernel] = {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1_score': f1,
        'confusion_matrix': cm
    }

    # Print results
    print(f"\nAccuracy: {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall: {recall:.4f}")
    print(f"F1-Score: {f1:.4f}")
    print(f"\nConfusion Matrix:")
    print(cm)

# Find best kernel
best_kernel = max(results, key=lambda k: results[k]['accuracy'])
print(f"\n{'='*70}")
print(f"BEST KERNEL: {best_kernel.upper()}")
print(f"Accuracy: {results[best_kernel]['accuracy']:.4f}")
print('='*70)


Training SVM with LINEAR kernel

Accuracy: 1.0000
Precision: 1.0000
Recall: 1.0000
F1-Score: 1.0000

Confusion Matrix:
[[10  0  0]
 [ 0 10  0]
 [ 0  0 10]]

Training SVM with POLY kernel

Accuracy: 0.9667
Precision: 0.9697
Recall: 0.9667
F1-Score: 0.9666

Confusion Matrix:
[[10  0  0]
 [ 0  9  1]
 [ 0  0 10]]

Training SVM with RBF kernel

Accuracy: 0.9667
Precision: 0.9697
Recall: 0.9667
F1-Score: 0.9666

Confusion Matrix:
[[10  0  0]
 [ 0  9  1]
 [ 0  0 10]]

BEST KERNEL: LINEAR
Accuracy: 1.0000


Q1 - Part E: Identify Best Kernel and Explain

In [3]:
# Summary comparison
print("\n" + "="*70)
print("SUMMARY COMPARISON")
print("="*70)
print(f"{'Kernel':<15} {'Accuracy':<12} {'Precision':<12} {'Recall':<12} {'F1-Score':<12}")
print("-"*70)
for kernel in kernels:
    print(f"{kernel.upper():<15} {results[kernel]['accuracy']:<12.4f} {results[kernel]['precision']:<12.4f} {results[kernel]['recall']:<12.4f} {results[kernel]['f1_score']:<12.4f}")

print(f"\n{'='*70}")
print("EXPLANATION:")
print("="*70)
print(f"Best kernel: {best_kernel.upper()}")
print("\nWhy it performs best:")
if best_kernel == 'linear':
    print("- Linear kernel works well because Iris data is linearly separable")
    print("- Simple decision boundaries are sufficient")
    print("- Less prone to overfitting with small datasets")
elif best_kernel == 'rbf':
    print("- RBF kernel can capture non-linear relationships")
    print("- Flexible decision boundaries adapt to data structure")
    print("- Good generalization on complex patterns")
elif best_kernel == 'poly':
    print("- Polynomial kernel captures polynomial relationships")
    print("- Degree 3 provides moderate complexity")
    print("- Can model curved decision boundaries")


SUMMARY COMPARISON
Kernel          Accuracy     Precision    Recall       F1-Score    
----------------------------------------------------------------------
LINEAR          1.0000       1.0000       1.0000       1.0000      
POLY            0.9667       0.9697       0.9667       0.9666      
RBF             0.9667       0.9697       0.9667       0.9666      

EXPLANATION:
Best kernel: LINEAR

Why it performs best:
- Linear kernel works well because Iris data is linearly separable
- Simple decision boundaries are sufficient
- Less prone to overfitting with small datasets


Q2 - Part A, B: Breast Cancer with and without Scaling

In [4]:
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler

# Load Breast Cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print("="*70)
print("SVM WITHOUT FEATURE SCALING")
print("="*70)

# Train SVM WITHOUT scaling
svm_no_scale = SVC(kernel='rbf', random_state=42)
svm_no_scale.fit(X_train, y_train)

# Evaluate
train_acc_no_scale = accuracy_score(y_train, svm_no_scale.predict(X_train))
test_acc_no_scale = accuracy_score(y_test, svm_no_scale.predict(X_test))

print(f"Training Accuracy: {train_acc_no_scale:.4f}")
print(f"Testing Accuracy: {test_acc_no_scale:.4f}")

print("\n" + "="*70)
print("SVM WITH FEATURE SCALING")
print("="*70)

# Apply StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train SVM WITH scaling
svm_with_scale = SVC(kernel='rbf', random_state=42)
svm_with_scale.fit(X_train_scaled, y_train)

# Evaluate
train_acc_with_scale = accuracy_score(y_train, svm_with_scale.predict(X_train_scaled))
test_acc_with_scale = accuracy_score(y_test, svm_with_scale.predict(X_test_scaled))

print(f"Training Accuracy: {train_acc_with_scale:.4f}")
print(f"Testing Accuracy: {test_acc_with_scale:.4f}")

# Comparison
print("\n" + "="*70)
print("COMPARISON")
print("="*70)
print(f"{'Metric':<25} {'Without Scaling':<20} {'With Scaling':<20}")
print("-"*70)
print(f"{'Training Accuracy':<25} {train_acc_no_scale:<20.4f} {train_acc_with_scale:<20.4f}")
print(f"{'Testing Accuracy':<25} {test_acc_no_scale:<20.4f} {test_acc_with_scale:<20.4f}")
print(f"{'Improvement':<25} {'-':<20} {test_acc_with_scale - test_acc_no_scale:+.4f}")

SVM WITHOUT FEATURE SCALING
Training Accuracy: 0.9187
Testing Accuracy: 0.9298

SVM WITH FEATURE SCALING
Training Accuracy: 0.9824
Testing Accuracy: 0.9825

COMPARISON
Metric                    Without Scaling      With Scaling        
----------------------------------------------------------------------
Training Accuracy         0.9187               0.9824              
Testing Accuracy          0.9298               0.9825              
Improvement               -                    +0.0526


Q2 - Part C: Discussion on Feature Scaling Effect

In [5]:
print("\n" + "="*70)
print("DISCUSSION: EFFECT OF FEATURE SCALING ON SVM PERFORMANCE")
print("="*70)

print("\n1. Why Feature Scaling Matters for SVM:")
print("   - SVM uses distance-based calculations (kernel functions)")
print("   - Features with larger ranges dominate the distance metric")
print("   - Without scaling, the hyperplane is biased toward high-magnitude features")

print("\n2. Impact on RBF Kernel:")
print("   - RBF kernel: exp(-gamma * ||x - x'||^2)")
print("   - Unscaled features cause numerical instability")
print("   - Small gamma values may not capture patterns effectively")

print("\n3. Observed Results:")
print(f"   - Without scaling: {test_acc_no_scale:.4f} test accuracy")
print(f"   - With scaling: {test_acc_with_scale:.4f} test accuracy")
print(f"   - Improvement: {(test_acc_with_scale - test_acc_no_scale)*100:.2f}%")

print("\n4. Conclusion:")
print("   - Feature scaling is ESSENTIAL for SVM, especially with RBF/polynomial kernels")
print("   - StandardScaler ensures all features contribute equally")
print("   - Always scale features before training SVM models")


DISCUSSION: EFFECT OF FEATURE SCALING ON SVM PERFORMANCE

1. Why Feature Scaling Matters for SVM:
   - SVM uses distance-based calculations (kernel functions)
   - Features with larger ranges dominate the distance metric
   - Without scaling, the hyperplane is biased toward high-magnitude features

2. Impact on RBF Kernel:
   - RBF kernel: exp(-gamma * ||x - x'||^2)
   - Unscaled features cause numerical instability
   - Small gamma values may not capture patterns effectively

3. Observed Results:
   - Without scaling: 0.9298 test accuracy
   - With scaling: 0.9825 test accuracy
   - Improvement: 5.26%

4. Conclusion:
   - Feature scaling is ESSENTIAL for SVM, especially with RBF/polynomial kernels
   - StandardScaler ensures all features contribute equally
   - Always scale features before training SVM models
