# Support Vector Regression (SVR) – Mathematical Intuition

Unlike classification, in regression the output $$ y $$ is **continuous**.
The goal of SVR is to find a **best-fit line (or hyperplane)** such that most data points lie within a margin of tolerance.

---

## 1. SVR Hyperplane and Margins

* Best-fit line:
  $$
  f(x) = w^T x + b
  $$

* Margins:
  $$
  f(x) + \epsilon \quad \text{and} \quad f(x) - \epsilon
  $$

Here, $$ \epsilon $$ is the **margin of tolerance** (maximum allowed error).
Most points should lie within this “epsilon tube.”

---

## 2. Constraints

For each data point $$ (x_i, y_i) $$:

$$
|y_i - (w^T x_i + b)| \leq \epsilon
$$

If the point lies inside the margin → no penalty.
If it lies outside → introduce slack variables $$ \xi_i, \xi_i^* \geq 0 $$:

* Above the tube: $$ y_i - (w^T x_i + b) > \epsilon + \xi_i $$
* Below the tube: $$ (w^T x_i + b) - y_i > \epsilon + \xi_i^* $$

---

## 3. SVR Cost Function

The optimization problem is:

$$
\min_{w, b, \xi, \xi^*} \ \frac{1}{2} | w |^2 + C \sum_{i=1}^{n} (\xi_i + \xi_i^*)
$$

subject to:

$$
\begin{aligned}
y_i - (w^T x_i + b) &\leq \epsilon + \xi_i \
(w^T x_i + b) - y_i &\leq \epsilon + \xi_i^* \
\xi_i, \ \xi_i^* &\geq 0
\end{aligned}
$$

---

## 4. Parameters Recap

*  w  → weight vector (defines hyperplane)
*  b  → bias
*  epsilon  → margin of tolerance (insensitive loss)
*  \xi_i, \xi_i^*  → slack variables (deviation outside margin)
*  C  → regularization (trade-off between flatness of hyperplane and error penalty)

---

## 5. Key Idea

* **Inside margin ($\epsilon$-tube):** no penalty
* **Outside margin:** penalty proportional to deviation
* SVR balances **flatness of function** with **tolerance to small errors**

---


# Support Vector Regression (SVR) – Complete Notes

In this video, we are going to discuss the **Support Vector Regression (SVR)** machine learning algorithm.

In our previous video, we have already seen **Support Vector Classifier (SVC)** and we have learned:

- What is the cost function  
- What are the constraints  
- About a new loss function called **hinge loss**  

In this hinge loss, we have two parameters:  
- $$ C_i $$  
- $$ \epsilon $$  

Our main aim is to reduce this cost function by changing $$ w $$ and $$ b $$.

---

## 1. Regression Problem Statement

For a regression problem:

1. We need to find a **best-fit line**.  
2. We need to find **marginal planes**.  
3. We have to ensure that the distance between these planes is maximum.  

### Example:

- **Y-axis:** Price of a house  
- **X-axis:** Size of the house  
- Goal: Predict the price of the house based on its size using SVR.

> This is a **regression problem** because the output $$ y $$ is a **continuous value**.

---

## 2. SVR Best-Fit Line and Margins

The best-fit line can be represented as:

$$
f(x) = w^T x + b
$$

- **Top marginal plane:**  
  $$
  w^T x + b + \epsilon
  $$
- **Bottom marginal plane:**  
  $$
  w^T x + b - \epsilon
  $$

Here, $$ \epsilon $$ is the **margin error**, i.e., the distance from the best-fit line to the top/bottom marginal plane.

---

## 3. SVR Constraints

- Most points should lie **within the margin**:  

$$
|y_i - (w^T x_i + b)| \leq \epsilon
$$

- If a point lies **inside the margin**, no penalty is applied.  
- If a point lies **outside the margin**, define **deviations** as slack variables:

  - Above the top margin:  
    $$
    y_i - (w^T x_i + b) > \epsilon \implies \eta_i
    $$

  - Below the bottom margin:  
    $$
    (w^T x_i + b) - y_i > \epsilon \implies \eta_i
    $$

Here, $$ \eta_i $$ is the **distance outside the margin**, treated as a hyperparameter.

---

## 4. Cost Function for SVR

The SVR cost function with hyperparameters is:

$$
\min_{w, b, \eta} \frac{1}{2} \| w \|^2 + C \sum_{i=1}^{n} \eta_i
$$

**Where:**  
- $$ w $$ → Weight vector  
- $$ b $$ → Bias  
- $$ \eta_i $$ → Deviation from the margin  
- $$ C $$ → Hyperparameter controlling trade-off between flatness of the line and penalty for errors  

### Explanation:

1. **Minimize $$ \| w \|^2 $$:** Keep the line as flat as possible.  
2. **Minimize $$ \sum \eta_i $$:** Reduce the total deviation of points outside the margin.  

---

## 5. Epsilon (ε) Tube

- $$ \epsilon $$ defines the **margin of tolerance**.  
- Points **inside the ε-tube**: No penalty  
- Points **outside the ε-tube**: Penalized using $$ \eta_i $$  

> Both $$ \epsilon $$ and $$ \eta_i $$ are **hyperparameters** that can be tuned.

---

## 6. Hyperparameters in SVR

1. **$$ \epsilon $$ (epsilon):** Maximum allowed error inside the margin.  
2. **$$ \eta_i $$:** Deviation of points outside the margin.  
3. **$$ C $$:** Regularization hyperparameter controlling **penalty for deviations**.

### Relationship between C and Loss:

- As $$ C $$ **increases**, the **loss decreases**.  
- This is because higher $$ C $$ penalizes deviations more, forcing the model to fit points more closely.

---

## 7. Summary of SVR Concept

- SVR finds a **best-fit line** with **marginal planes**.  
- Most points should lie **within the ε-tube**.  
- Points outside the tube are handled with **deviation variables $$ \eta_i $$**.  
- **Cost function** balances **flatness** and **penalty for deviations**.  
- Hyperparameters $$ \epsilon $$, $$ \eta_i $$, and $$ C $$ control model flexibility and tolerance.

---

## 8. Key Points to Remember

1. SVR is used for **continuous output regression problems**.  
2. The **best-fit line**: $$ w^T x + b $$  
3. **Top margin**: $$ w^T x + b + \epsilon $$  
4. **Bottom margin**: $$ w^T x + b - \epsilon $$  
5. **Deviation outside margins**: $$ \eta_i $$  
6. **Hyperparameter C** controls penalty on deviations.  
7. **Epsilon** defines the width of the margin tube.

---

> By adjusting $$ \epsilon $$, $$ \eta_i $$, and $$ C $$, SVR can handle **overlapping or noisy data**, making it robust for regression tasks.  

