# Support Vector Machines (SVM)

## Problem Type
**Support Vector Machines (SVM)** are primarily used for:
- **Classification** (binary and multiclass)
- **Regression** (Support Vector Regression - SVR)
- **Supervised** learning

### How SVM Works
- **Finds the optimal hyperplane** that maximizes the margin between the closest data points of different classes (support vectors).
- **Maximizes the margin** between classes to ensure better generalization.
- **Uses different kernel functions** (linear, polynomial, RBF, etc.) to handle non-linearly separable data by transforming the data into a higher-dimensional space.
- **Supports both hard margin and soft margin** classification:
  - **Hard margin**: Requires all data points to be correctly classified.
  - **Soft margin**: Allows some misclassification to create a more generalizable model.

### Key Tuning Metrics
- **`C`:**
  - **Description:** Regularization parameter that controls the trade-off between maximizing the margin and minimizing classification errors.
  - **Impact:** Smaller `C` values increase the margin but allow more misclassifications (higher bias, lower variance); larger `C` values make the margin smaller with fewer misclassifications (lower bias, higher variance).
  - **Default:** `C = 1.0`.
- **`kernel`:**
  - **Description:** Specifies the kernel type to be used in the algorithm (`linear`, `poly`, `rbf`, `sigmoid`).
  - **Impact:** Determines the decision boundary shape. The `linear` kernel is best for linearly separable data; `rbf` and `poly` can handle non-linear relationships.
  - **Default:** `rbf` (Radial Basis Function).
- **`gamma`:**
  - **Description:** Kernel coefficient for `rbf`, `poly`, and `sigmoid` kernels.
  - **Impact:** Controls the influence of individual training examples. Lower `gamma` values mean a larger influence radius (smoother decision boundary); higher `gamma` values lead to a more complex boundary.
  - **Default:** `scale`, which is `1 / (n_features * X.var())`.
- **`degree`:**
  - **Description:** Degree of the polynomial kernel function (only used with `poly` kernel).
  - **Impact:** Determines the flexibility of the decision boundary when using polynomial kernels.
  - **Default:** `degree = 3`.
- **`coef0`:**
  - **Description:** Independent term in kernel function (used for `poly` and `sigmoid`).
  - **Impact:** Controls the influence of higher-degree polynomials in the decision function.
  - **Default:** `coef0 = 0.0`.

### Pros vs Cons

| Pros                                                  | Cons                                                   |
|-------------------------------------------------------|--------------------------------------------------------|
| Effective for high-dimensional spaces                 | Computationally intensive, especially with large datasets |
| Works well with clear margin of separation            | Choosing the right kernel and hyperparameters can be complex |
| Handles both linear and non-linear classification     | Sensitive to noisy data and outliers                   |
| Flexible through different kernel choices             | SVM models can be difficult to interpret               |
| Supports soft margin for better generalization        | Memory-intensive during training                       |

### Evaluation Metrics
- **Accuracy:**
  - **Description:** Ratio of correct predictions to total predictions.
  - **Good Value:** Higher is better; generally, above 0.8 is considered good for classification tasks.
  - **Bad Value:** Below 0.5 indicates poor performance (worse than random guessing in binary classification).
- **Precision:**
  - **Description:** Proportion of positive identifications that were actually correct (True Positives / (True Positives + False Positives)).
  - **Good Value:** Higher is better, especially when False Positives are costly.
  - **Bad Value:** Low values indicate many False Positives.
- **Recall (Sensitivity):**
  - **Description:** Proportion of actual positives that were correctly identified (True Positives / (True Positives + False Negatives)).
  - **Good Value:** Higher is better, especially when False Negatives are costly.
  - **Bad Value:** Low values indicate many False Negatives.
- **F1 Score:**
  - **Description:** Harmonic mean of Precision and Recall; balances the trade-off between the two.
  - **Good Value:** Higher is better; values above 0.7-0.8 indicate strong performance.
  - **Bad Value:** Lower values indicate an imbalanced trade-off between Precision and Recall.
- **ROC-AUC:**
  - **Description:** Area under the Receiver Operating Characteristic curve, showing the trade-off between True Positive Rate (TPR) and False Positive Rate (FPR).
  - **Good Value:** Closer to 1 is better; above 0.8 indicates good discrimination between classes.
  - **Bad Value:** Close to 0.5 suggests the model is no better than random guessing.
- **Support Vectors:**
  - **Description:** Number of support vectors used by the model; a higher number can indicate a more complex model.
  - **Good Value:** Lower numbers can indicate a simpler, more generalizable model.
  - **Bad Value:** Very high numbers can indicate overfitting or a model struggling with noise.



In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.metrics import (
    accuracy_score,
    classification_report,
    roc_auc_score,
    roc_curve,
)
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

In [None]:
# Load the breast cancer dataset
data = datasets.load_breast_cancer()
X = data.data
y = data.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
# Initialize the Support Vector Classifier (SVC)
svc = SVC(kernel='linear', probability=True, random_state=42)

# Train the model
svc.fit(X_train, y_train)

# Predict probabilities
y_pred_proba = svc.predict_proba(X_test)[:, 1]

# Predict class labels
y_pred = svc.predict(X_test)

In [None]:
# Evaluate the model using ROC-AUC
roc_auc = roc_auc_score(y_test, y_pred_proba)
print(f'ROC-AUC Score: {roc_auc:.2f}')

# Plot the ROC curve
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
plt.plot(fpr, tpr, label=f'ROC curve (area = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], 'k--')  # Diagonal line
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc='lower right')
plt.show()

In [None]:
# Print classification report and accuracy
print(f'Accuracy: {accuracy_score(y_test, y_pred):.2f}')
print('Classification Report:')
print(classification_report(y_test, y_pred))

In [None]:
X = data.data[:, :2]  # Use only the first two features for visualization
y = data.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize the Support Vector Classifier (SVC)
svc = SVC(
    C=1,
    kernel="linear",
    gamma='scale',
    probability=True,
    random_state=42,
)

# Train the model
svc.fit(X_train, y_train)

# Predict probabilities
y_pred_proba = svc.predict_proba(X_test)[:, 1]

# Predict class labels
y_pred = svc.predict(X_test)

In [None]:
# Plot the decision boundary and support vectors
def plot_svc_decision_boundary(model, X, y):
    plt.figure(figsize=(10, 6))
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.bwr, s=30)

    # Plot decision boundary
    ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()

    # Create grid to evaluate model
    xx, yy = np.meshgrid(
        np.linspace(xlim[0], xlim[1], 100), np.linspace(ylim[0], ylim[1], 100)
    )

    Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plot decision boundary and margins
    plt.contour(
        xx,
        yy,
        Z,
        colors="k",
        levels=[-1, 0, 1],
        alpha=0.5,
        linestyles=["--", "-", "--"],
    )

    # Plot support vectors
    plt.scatter(
        model.support_vectors_[:, 0],
        model.support_vectors_[:, 1],
        s=100,
        linewidth=1,
        facecolors="none",
        edgecolors="k",
    )
    plt.xlabel(data.feature_names[0])
    plt.ylabel(data.feature_names[1])
    plt.title("SVM Decision Boundary with Support Vectors")
    plt.show()


plot_svc_decision_boundary(svc, X_train, y_train)