# Support Vector Machines: Finding the Best Boundary

Welcome to the eighth notebook in our **Machine Learning Basics for Beginners** series! After exploring K-Nearest Neighbors, let's dive into **Support Vector Machines (SVMs)**, a powerful supervised learning algorithm primarily used for classification, though it can also handle regression. SVMs are great at finding the best boundary to separate different classes.

**What You'll Learn in This Notebook:**
- What Support Vector Machines are and when to use them.
- How SVMs work in simple terms.
- A hands-on example of classifying data points into two classes using SVM.
- An interactive exercise to adjust data and see how the decision boundary changes.
- Visualizations to understand the concept of margins and support vectors.

Let's get started!

## 1. What are Support Vector Machines?

**Support Vector Machines (SVMs)** are a supervised learning algorithm designed to classify data by finding the best possible boundary (or hyperplane) that separates different classes with the widest possible margin. While SVMs are mainly used for classification, they can also be adapted for regression tasks.

- **Goal**: Find a decision boundary that maximizes the margin (distance) between the closest points of different classes, ensuring the best separation.
- **When to Use It**: Use SVMs for binary classification tasks (e.g., spam vs. not spam) when you want a model that can handle both linearly separable data and, with some tricks (like kernels), non-linearly separable data. They work well with small to medium-sized datasets.
- **Examples**:
  - Classifying emails as spam or not spam based on features like word frequency.
  - Identifying whether a tumor is benign or malignant based on medical measurements.
  - Separating images of cats and dogs based on pixel features.

**Analogy**: Imagine you’re trying to separate two groups of toys on a table with a straight line drawn by a ruler. You want the line to be positioned so that the toys from each group are as far away from it as possible, ensuring no toy is too close to the wrong side. SVMs do this by finding the optimal dividing line (or boundary) with the largest gap between groups.

## 2. How Do Support Vector Machines Work?

SVMs might sound complex, but the core idea is simple: find the best boundary to separate classes. Let’s break it down step by step:

1. **Find the Hyperplane**: In a 2D space, a hyperplane is just a line that separates two classes. In higher dimensions, it becomes a plane or more complex surface. SVM looks for the hyperplane that best divides the data points of different classes.
2. **Maximize the Margin**: Among all possible hyperplanes that separate the classes, SVM chooses the one with the largest margin—the distance between the hyperplane and the closest data points (called support vectors) from each class. A larger margin means better separation and often better generalization to new data.
3. **Support Vectors**: These are the data points closest to the hyperplane. They are critical because they define the position and orientation of the hyperplane. If these points move, the hyperplane might change.
4. **Handle Non-Linear Data (with Kernels)**: If the data isn’t linearly separable (can’t be split by a straight line), SVMs use a "kernel trick" to transform the data into a higher-dimensional space where a linear boundary can work. Common kernels include polynomial and radial basis function (RBF).
5. **Prediction**: For a new data point, SVM determines which side of the hyperplane it falls on to classify it into one of the classes.

**Analogy**: Think of SVM as a referee drawing a line between two teams on a field. The referee wants the line to be as far as possible from the nearest players on both sides (maximizing the margin) so there’s no confusion about which side a player is on. The nearest players (support vectors) are the ones the referee watches closely to set the line.

**Key Advantage**: SVMs are effective at finding clear boundaries even in complex data (with kernels) and are less prone to overfitting when the margin is maximized.

## 3. Example: Classifying Points with SVM

Let’s see SVM in action with a small synthetic dataset of points in 2D space, representing two classes (e.g., two types of fruits based on size and color intensity).

**Dataset** (simplified):
- Feature 1 (e.g., Size): 1, 2, 5, 6, 1.5
- Feature 2 (e.g., Color Intensity): 1, 1.5, 4, 5, 2
- Class (Label): 0, 0, 1, 1, 0

We’ll use Python’s `scikit-learn` library to create an SVM model, train it on this data, and predict the class of a new point. Focus on the steps and output, not the code details.

**Instructions**: Run the code below to see how SVM classifies points and visualizes the decision boundary and support vectors.

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC

# Our small dataset
X = np.array([[1, 1], [2, 1.5], [5, 4], [6, 5], [1.5, 2]])  # Features: size, color intensity
y = np.array([0, 0, 1, 1, 0])  # Labels: class 0 or 1

# Create and train the SVM model with a linear kernel
model = SVC(kernel='linear', C=1.0)
model.fit(X, y)

# Predict for a new point with size=3, color intensity=2.5
new_point = np.array([[3, 2.5]])
prediction = model.predict(new_point)[0]
print(f"New Point (size=3, color intensity=2.5): Predicted as Class {prediction}")

# Get support vectors
support_vectors = model.support_vectors_
print(f"Support Vectors (points defining the boundary):\n{support_vectors}")

# Visualize the data, decision boundary, and support vectors
# Create a mesh grid for decision boundary
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot decision boundary and margins
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.RdYlBu)
plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='blue', label='Class 0', alpha=0.8)
plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='red', label='Class 1', alpha=0.8)
plt.scatter(new_point[0][0], new_point[0][1], color='green', marker='x', s=200, label='New Point')
# Highlight support vectors
plt.scatter(support_vectors[:, 0], support_vectors[:, 1], color='yellow', edgecolor='black', s=150, alpha=0.5, label='Support Vectors')
# Plot the decision boundary line
w = model.coef_[0]
a = -w[0] / w[1]
xx_line = np.linspace(x_min, x_max)
yy_line = a * xx_line - (model.intercept_[0]) / w[1]
plt.plot(xx_line, yy_line, 'k-', label='Decision Boundary')
# Plot margins (approximate)
margin = 1 / np.sqrt(np.sum(model.coef_ ** 2))
yy_down = yy_line - np.sqrt(1 + a ** 2) * margin
yy_up = yy_line + np.sqrt(1 + a ** 2) * margin
plt.plot(xx_line, yy_down, 'k--')
plt.plot(xx_line, yy_up, 'k--')
plt.xlabel('Feature 1 (e.g., Size)')
plt.ylabel('Feature 2 (e.g., Color Intensity)')
plt.title('Support Vector Machine: Decision Boundary and Margins')
plt.legend()
plt.grid(True)
plt.show()

print("Look at the plot above:")
print("- Blue dots are Class 0 points.")
print("- Red dots are Class 1 points.")
print("- The colored background shows the decision regions.")
print("- The black solid line is the decision boundary (hyperplane).")
print("- The black dashed lines show the margins (maximum separation).")
print("- Yellow highlighted points are the support vectors defining the boundary.")
print("- The green 'X' is the new point being classified.")

## 4. Interactive Exercise: Adjust Data and See the Boundary

Now it’s your turn to experiment with SVM! In this exercise, you can add a new data point to the dataset by specifying its features and class, then see how the decision boundary and margins change. You’ll also choose a new point to classify.

**Instructions**:
- Run the code below.
- Enter values for Feature 1 (e.g., Size), Feature 2 (e.g., Color Intensity), and Class (0 or 1) to add to the dataset.
- Specify a new point to predict by entering its features.
- Observe how the boundary, margins, and support vectors update with the new data.

In [None]:
# Interactive exercise for SVM
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC

print("Welcome to the 'Adjust Data and See the Boundary' Exercise!")
print("You’ll add a new point to the dataset and see how the SVM decision boundary changes.")

# Original dataset
X = np.array([[1, 1], [2, 1.5], [5, 4], [6, 5], [1.5, 2]])
y = np.array([0, 0, 1, 1, 0])

# Ask user to add a new data point
try:
    new_f1 = float(input("Enter Feature 1 for new point (e.g., Size, like 3.5): "))
    new_f2 = float(input("Enter Feature 2 for new point (e.g., Color Intensity, like 3.0): "))
    new_class = int(input("Enter Class for new point (0 or 1): "))
    if new_class not in [0, 1]:
        raise ValueError("Class must be 0 or 1.")
    X = np.vstack([X, [new_f1, new_f2]])
    y = np.append(y, new_class)
    print(f"Added point: Feature 1={new_f1}, Feature 2={new_f2}, Class={new_class}.")
except ValueError as e:
    print(f"Invalid input: {e}. Using original data without changes.")

# Train the model with updated data
model = SVC(kernel='linear', C=1.0)
model.fit(X, y)

# Ask user for a new point to predict
try:
    predict_f1 = float(input("Enter Feature 1 to predict class (e.g., Size, like 3): "))
    predict_f2 = float(input("Enter Feature 2 to predict class (e.g., Color Intensity, like 2.5): "))
    new_point = np.array([[predict_f1, predict_f2]])
    prediction = model.predict(new_point)[0]
    print(f"Predicted class for point (Feature 1={predict_f1}, Feature 2={predict_f2}): Class {prediction}")
except ValueError:
    new_point = np.array([[3, 2.5]])
    prediction = model.predict(new_point)[0]
    print(f"Invalid input. Defaulting to Feature 1=3, Feature 2=2.5. Predicted class: {prediction}")

# Get support vectors
support_vectors = model.support_vectors_
print(f"Updated Support Vectors (points defining the boundary):\n{support_vectors}")

# Visualize the updated data, decision boundary, and support vectors
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.RdYlBu)
plt.scatter(X[:-1][y[:-1] == 0][:, 0], X[:-1][y[:-1] == 0][:, 1], color='blue', label='Original Class 0', alpha=0.8)
plt.scatter(X[:-1][y[:-1] == 1][:, 0], X[:-1][y[:-1] == 1][:, 1], color='red', label='Original Class 1', alpha=0.8)
plt.scatter(X[-1, 0], X[-1, 1], color='orange', label=f"Your Added Data (Class {y[-1]})", alpha=0.8)
plt.scatter(new_point[0][0], new_point[0][1], color='green', marker='x', s=200, label='Prediction')
plt.scatter(support_vectors[:, 0], support_vectors[:, 1], color='yellow', edgecolor='black', s=150, alpha=0.5, label='Support Vectors')
w = model.coef_[0]
a = -w[0] / w[1]
xx_line = np.linspace(x_min, x_max)
yy_line = a * xx_line - (model.intercept_[0]) / w[1]
plt.plot(xx_line, yy_line, 'k-', label='Decision Boundary')
margin = 1 / np.sqrt(np.sum(model.coef_ ** 2))
yy_down = yy_line - np.sqrt(1 + a ** 2) * margin
yy_up = yy_line + np.sqrt(1 + a ** 2) * margin
plt.plot(xx_line, yy_down, 'k--')
plt.plot(xx_line, yy_up, 'k--')
plt.xlabel('Feature 1 (e.g., Size)')
plt.ylabel('Feature 2 (e.g., Color Intensity)')
plt.title('Support Vector Machine: Updated Decision Boundary')
plt.legend()
plt.grid(True)
plt.show()

print("Look at the plot above:")
print("- Blue dots are original Class 0 points.")
print("- Red dots are original Class 1 points.")
print("- Orange dot is the point you added.")
print("- The colored background shows the updated decision regions.")
print("- The black solid line is the updated decision boundary.")
print("- The black dashed lines show the updated margins.")
print("- Yellow highlighted points are the updated support vectors.")
print("- The green 'X' is the new point being classified.")

## 5. Key Considerations for Support Vector Machines

SVMs are powerful for classification, especially when data is well-structured, but they come with some considerations to keep in mind:

- **Choosing the Kernel**: For non-linearly separable data, selecting the right kernel (e.g., RBF, polynomial) and tuning its parameters is crucial. The wrong kernel can lead to poor performance.
- **Computationally Intensive**: SVMs can be slow to train on large datasets because they need to calculate distances and optimize the margin, especially with complex kernels.
- **Sensitive to Feature Scaling**: Since SVMs rely on distances between points, features on different scales (e.g., one in meters, another in kilometers) can distort the margin. Features should be normalized or standardized.
- **Parameter Tuning (C)**: The parameter `C` controls the trade-off between maximizing the margin and allowing misclassifications. A high `C` tries to classify all points correctly (risking overfitting), while a low `C` prioritizes a wider margin (risking underfitting).

**Analogy**: SVM is like drawing a fence between two yards to keep dogs and cats separate. If the yards are oddly shaped (non-linear data), you need a special fence design (kernel). If you obsess over every pet staying on their side (high C), the fence might be too tight and impractical. If you’re too lax (low C), the fence might not separate them well.

Despite these challenges, SVMs are highly effective for many classification tasks, especially when combined with the right kernel and tuning, and they provide a clear geometric interpretation of classification.

## 6. Key Takeaways

- **Support Vector Machines (SVMs)** are a supervised learning algorithm for classification (and regression), finding the best hyperplane to separate classes with the widest margin.
- They work by maximizing the margin between classes, relying on support vectors (closest points) to define the boundary, and can handle non-linear data with kernels.
- Use them for binary classification tasks like spam detection or medical diagnosis when you want a strong boundary and have small to medium-sized datasets.
- Be aware of limitations: they require careful kernel selection and parameter tuning, are slow on large data, and need scaled features for accurate results.

You’ve now learned a robust boundary-based algorithm! SVMs introduce the concept of margins and optimal separation, which is a powerful approach for classification problems.

**What's Next?**
Move on to **Notebook 9: K-Means Clustering** to learn about an unsupervised learning algorithm for grouping data without labels. See you there!