<a href="https://colab.research.google.com/github/kankkw/229352-StatisticalLearning/blob/main/Lab05.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Statistical Learning for Data Science 2 (229352)
#### Instructor: Donlapark Ponnoprat

#### [Course website](https://donlapark.pages.dev/229352/)

## Lab #6

## Support Vector Machines (SVM)

[SVM module documentation](https://scikit-learn.org/stable/modules/svm.html#svm)

[LinearSVC documentation](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC)

[SVC documentation](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC  # fast but only linear
from sklearn.svm import SVC  # slower but can do kernels

In [None]:
# Load the iris data
iris = datasets.load_iris()
X = iris.data[:, 2:]
y = iris.target

In [None]:
# Plot the data
plt.figure(figsize=(7,6))
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.xlabel(iris.feature_names[2])
plt.ylabel(iris.feature_names[3])
plt.show()

#### In this problem, you'll use support vector machines to classify the Iris data

#### The following function helps you plot the decision boundary.

In [None]:
# Plot the decision boundaries
def plot_decision_boundary(clf, X, y):
    h = 0.005  # Boundary lines' resolution
    x_min, x_max = X[:,0].min() - 10*h, X[:,0].max() + 10*h
    y_min, y_max = X[:,1].min() - 10*h, X[:,1].max() + 10*h
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.figure(figsize=(7,6))
    plt.contourf(xx, yy, Z, alpha=0.25)  # Background
    plt.contour(xx, yy, Z, colors='k', linewidths=0.2)  # Boundary lines
    plt.scatter(X[:,0], X[:,1], c=y);  # Data points
    plt.xlabel(iris.feature_names[2])
    plt.ylabel(iris.feature_names[3])

#### Exercise 1. Split the data into training set and test set.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

#### Exercise 2. Learn a linear SVM classifier using sklearn.svm.LinearSVC. You will need to set `loss='hinge'`.

#### Try different values of the tradeoff parameter: `C = 0.01, 0.1, 1.0, 10.0, 100.0` and use `plot_decision_boundary` to plot the decision boundary.

#### If you encounter `RuntimeError`, consider setting `max_iter=100000`

#### What is the effect of `C` on the decision boundary?

In [None]:
C_values = [0.01, 0.1, 1.0, 10.0, 100.0]

for C in C_values:
    clf = LinearSVC(C=C, loss='hinge', max_iter=100000)
    clf.fit(X_train, y_train)
    plot_decision_boundary(clf, X_train, y_train)
    plt.title(f"Linear SVM (C = {C})")
    plt.show()

Effect of C:
Smaller C gives a wider margin and smoother decision boundary (more regularization).
Larger C gives a narrower margin and fits the training data more strictly.

#### Exercise 3. Pick a value of `C` that you like. Then report the test error.

In [None]:
C = 1.0
clf = LinearSVC(C=C, loss='hinge', max_iter=100000)
clf.fit(X_train, y_train)

test_accuracy = clf.score(X_test, y_test)
test_error = 1 - test_accuracy

test_error

#### Exercise 4. Now try kernel SVM with a quadratic kernel. You can do this with sklearn.svm.SVC, setting `kernel='rbf'` and `C = 1.0`.

#### Try different values of the tradeoff parameter: `gamma = 0.01, 0.1, 1.0, 10.0, 100.0` and use `plot_decision_boundary` to plot the decision boundary.

#### If you encounter `RuntimeError`, consider setting `max_iter=100000`

#### What is the effect of `gamma` on the decision boundary?

In [None]:
gamma_values = [0.01, 0.1, 1.0, 10.0, 100.0]

for gamma in gamma_values:
    clf = SVC(kernel='rbf', C=1.0, gamma=gamma, max_iter=100000)
    clf.fit(X_train, y_train)
    plot_decision_boundary(clf, X_train, y_train)
    plt.title(f"RBF SVM (gamma = {gamma})")
    plt.show()

Effect of gamma:
Small gamma gives a smoother and more global decision boundary.
Large gamma gives a more complex and localized decision boundary, risking overfitting.

#### Exercise 5. Pick a value of `gamma` that you like. Then report the test error and the number of support vectors.

In [None]:
gamma = 0.1
clf = SVC(kernel='rbf', C=1.0, gamma=gamma)
clf.fit(X_train, y_train)

test_accuracy = clf.score(X_test, y_test)
test_error = 1 - test_accuracy
num_support_vectors = clf.n_support_.sum()

test_error, num_support_vectors

#### Exercise 6. Between Linear SVM and Kernel SVM, which model would you prefer to use for classification of Iris data?
1. Explain using test accuracy
2. Explaing using decision boundary plot

Preferred model: Kernel SVM (RBF)

1. Explanation using test accuracy:
Kernel SVM achieves higher test accuracy than Linear SVM.

2. Explanation using decision boundary:
Kernel SVM provides a nonlinear decision boundary that better separates the Iris classes,
while Linear SVM is limited to linear separation.