### Support Vector Machine (SVM)

#### Introduction  
Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression tasks. It is particularly effective for high-dimensional spaces and works well when the number of dimensions exceeds the number of samples.

#### How SVM Works  
SVM finds the optimal **hyperplane** that best separates data points into different classes. The hyperplane is chosen to maximize the **margin**, which is the distance between the nearest points (support vectors) of each class.

#### Key Concepts  
1. **Hyperplane**: A decision boundary that separates different classes.  
2. **Support Vectors**: Data points closest to the hyperplane that influence its position.  
3. **Margin**: The distance between the hyperplane and the nearest support vectors. SVM aims to maximize this margin for better generalization.  

### Types of SVM  
### 1. **Linear SVM**  
   - Used when data is **linearly separable**.
   - Finds the straight-line (or hyperplane in higher dimensions) that separates classes.  
   
   **Equation of the hyperplane:**  
   \[
   w \cdot x + b = 0
   \]
   where:
   - \( w \) is the weight vector,
   - \( x \) is the feature vector,
   - \( b \) is the bias term.

### 2. **Non-Linear SVM (Using Kernels)**  
   - Used when data is **not linearly separable**.
   - Maps data into a higher-dimensional space where it becomes linearly separable.
   - Uses kernel functions such as:
     - **Polynomial Kernel**: \( (x \cdot x')^d \)
     - **Radial Basis Function (RBF) Kernel**: \( e^{-\gamma ||x - x'||^2} \)
     - **Sigmoid Kernel**: \( \tanh(\alpha x \cdot x' + c) \)

### Hyperparameters of SVM  
1. **C (Regularization Parameter)**  
   - Controls the trade-off between achieving a low error and maximizing the margin.
   - A **high C** results in a smaller margin but fewer misclassifications.
   - A **low C** results in a larger margin but more misclassifications.

2. **Gamma (for RBF Kernel)**  
   - Defines how much influence a single training example has.
   - **High gamma** → Closer decision boundary, risk of overfitting.
   - **Low gamma** → Smoother decision boundary, risk of underfitting.

## Advantages of SVM  
- Effective in high-dimensional spaces.  
- Works well for both linear and non-linear data (using kernels).  
- Robust to overfitting in high-dimensional datasets.  
- Memory efficient, as it uses a subset of training points (support vectors).  

## Disadvantages of SVM  
- Can be slow on large datasets.  
- Sensitive to the choice of kernel and hyperparameters.  
- Not ideal when the dataset has significant overlapping classes.

