## A Support Vector Machine (SVM) 
SVM is a supervised machine learning algorithm widely used for both classification and regression tasks. SVMs work by finding the optimal hyperplane that best separates data points of different classes in a high-dimensional space. This hyperplane maximizes the margin between classes, aiming to create a clear boundary between them.

## Hard Margin vs. Soft Margin:
Hard Margin SVM: Used when data is linearly separable, meaning the classes can be perfectly divided by a hyperplane. Here, the algorithm aims for a margin that separates all data points correctly, with no tolerance for misclassification. However, this approach is sensitive to outliers, as any misclassified point would prevent the model from finding a separating hyperplane.

Soft Margin SVM: Used when data is not perfectly separable, allowing some points to fall on the wrong side of the hyperplane to achieve a better overall solution. This approach introduces a penalty term for misclassified points, which balances maximizing the margin and minimizing classification errors, making it more robust to noise and outliers.

Overall, SVMs are powerful for both linear and non-linear tasks, with the flexibility to handle complex datasets through the use of kernel functions that map data into higher dimensions where it becomes more easily separable.

### 1. Mathematical Formulation for Classification
In SVM classification, given a dataset of $n$ points $(x_i, y_i)$ where $x_i$ is the feature vector and $y_i$ is the label ($y_i = +1$ or $y_i = -1$), we aim to find a hyperplane defined by:
$$
f(x) = w \cdot x + b = 0
$$
where $w$ is the weight vector, and $b$ is the bias. The goal is to maximize the margin, or distance between the hyperplane and the closest data points from each class. 


### 2. Objective: Maximizing the Margin
The margin is $\frac{2}{\|w\|}$, so to maximize it, we minimize $\|w\|^2$, subject to the constraint that all points are correctly classified:
$$
y_i (w \cdot x_i + b) \geq 1 \quad \forall i
$$
This is the **hard margin** SVM formulation.

### 3. Soft Margin SVM
In cases where data is not perfectly separable, we introduce slack variables $\xi_i$ to allow for some misclassification. The objective then becomes:
$$
\min \frac{1}{2} \|w\|^2 + C \sum_{i=1}^n \xi_i
$$
subject to $y_i (w \cdot x_i + b) \geq 1 - \xi_i$ for all $i$, where $C$ is a regularization parameter that controls the trade-off between maximizing the margin and minimizing misclassification. Larger values of $C$ put more emphasis on minimizing misclassification.

### 4. Loss Function for Classification (Hinge Loss)
The loss function commonly used in SVM classification is the **hinge loss**, given by:
$$
\text{Loss} = \sum_{i=1}^n \max(0, 1 - y_i (w \cdot x_i + b))
$$
The hinge loss penalizes points that are within the margin or on the wrong side of the hyperplane. This formulation allows SVMs to focus on correctly classifying points that are closer to the decision boundary.


### 5. SVM for Regression (Support Vector Regression - SVR)
In regression, the objective is to find a function $f(x) = w \cdot x + b$ that approximates the target variable as closely as possible within a certain margin of tolerance $\epsilon$. The goal is to ensure that most data points lie within an $\epsilon$-distance of the predicted function, forming a "tube" around the regression line.

### 6. Loss Function for Regression (ε-Insensitive Loss)
The **ε-insensitive loss** function used in SVR ignores errors within the $\epsilon$-margin but penalizes those outside it. This is defined as:
$$
\text{Loss} = \sum_{i=1}^n \max(0, |y_i - (w \cdot x_i + b)| - \epsilon)
$$
This loss function allows the SVM to approximate the target variable while ignoring small errors within the $\epsilon$-tube.

### Summary of Loss Functions:
- **Classification (Hinge Loss)**: $\sum \max(0, 1 - y_i (w \cdot x_i + b))$
- **Regression (ε-Insensitive Loss)**: $\sum \max(0, |y_i - (w \cdot x_i + b)| - \epsilon)$

In both cases, the SVM aims to find an optimal balance between minimizing classification or regression error and maximizing the margin (for classification) or maintaining an error tolerance (for regression).