#
# Support Vector Machine
#
**Lecture Timeline**
- Introduction
- Dataset Analysis
- Model Building
- Training, Testing, and Evaluation

######

The goal of SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane.

![image.png](attachment:image.png)

###

A SVM classifier would attempt to draw a straight line seperating the two sets of data, and thereby create a model for classification. For two dimensional data like that shown here, this is a task we could do by hand. But immediately we see a problem: there is more than one possible dividing line that can perfectly discriminate between the two classes!


![image-2.png](attachment:image-2.png)


# Support Vector Machines (SVM)

Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression tasks. It is particularly well-suited for classification problems where the goal is to find the best boundary that separates different classes.

## Basic Concepts

- **Support Vectors:** Data points closest to the decision boundary (or hyperplane) that are crucial in defining the position and orientation of the hyperplane.
- **Hyperplane:** A decision boundary that separates data points of different classes. In two dimensions, it's a line; in three dimensions, a plane; and in higher dimensions, a hyperplane.
- **Margin:** The distance between the hyperplane and the nearest data points from either class. SVM aims to maximize this margin for better class separation.

## Key Components

- **Linear SVM:** Used when data is linearly separable. The goal is to find a linear hyperplane that best divides the data into two classes.
- **Non-Linear SVM:** Applied when data is not linearly separable. SVM uses kernel functions to map the data into higher-dimensional space where it becomes linearly separable.
  - **Common Kernels:**
    - **Polynomial Kernel:** `K(x, x') = (x · x' + c)^d`
    - **Radial Basis Function (RBF) Kernel:** `K(x, x') = exp(-γ ||x - x'||^2)`
    - **Sigmoid Kernel:** `K(x, x') = tanh(α (x · x') + c)`

## Optimization

- **Objective Function:** SVM aims to maximize the margin between classes while minimizing classification errors. This is formulated as an optimization problem where you maximize the margin and minimize classification error.

## Regularization

- **C Parameter:** Controls the trade-off between achieving a low training error and minimizing model complexity. A higher C value aims to classify all training examples correctly, potentially leading to overfitting, while a lower C value allows for some misclassification in favor of a simpler decision boundary.

## Soft Margin

- **Soft Margin SVM:** Allows for some misclassification to handle cases where the data is not perfectly separable. It introduces a penalty for misclassifications, controlled by the C parameter.

## Applications

- **Classification:** Used in various fields such as text classification, image recognition, and bioinformatics.
- **Regression:** Adapted for regression tasks as Support Vector Regression (SVR), where the goal is to predict continuous values.

## Example Workflow

1. **Data Preparation:** Clean and preprocess your data.
2. **Model Selection:** Choose the appropriate kernel (linear or non-linear) based on the data.
3. **Training:** Train the SVM model on the training dataset.
4. **Tuning:** Adjust hyperparameters like C and kernel parameters to optimize model performance.
5. **Evaluation:** Test the model on the test dataset to evaluate its performance.

## Advantages and Disadvantages

- **Advantages:**
  - Effective in high-dimensional spaces.
  - Works well where the number of dimensions exceeds the number of samples.
  - Robust to overfitting, especially in high-dimensional space.

- **Disadvantages:**
  - Computationally intensive, especially with large datasets.
  - Choice of kernel and hyperparameters can significantly impact performance.
  - Less interpretable compared to some other models.

## Resources for Further Learning

- **Books:**
  - "Pattern Recognition and Machine Learning" by Christopher Bishop
  - "Machine Learning: A Probabilistic Perspective" by Kevin Murphy

- **Online Courses:**
  - Coursera: [Machine Learning by Andrew Ng](https://www.coursera.org/learn/machine-learning)
  - Udacity: [Intro to Machine Learning with PyTorch and TensorFlow](https://www.udacity.com/course/intro-to-machine-learning-with-pytorch--ud188)

For further assistance with implementing SVM or specific questions, feel free to ask!
