# Loss Function for Regression
- In general, the loss function may take the form $(\hat{y}, y) \mapsto l(\hat{y}, y) \in R$  
- Regression losses usually only depend on the residual $r = y - \hat{y}$.  
- Loss $l(\hat{y},y)$ is called **distance-based** if it only depends on the residual  
## Distance-Based Losses are Translation Invariant  
- Translation Invariant
$$\ell(\hat{y}+a, y+a)=\ell(\hat{y}, y)$$  
- Sometimes relative error $\frac{y - \hat{y}}{y}$ is a more natural loss (but not translation-invariant)  
## Some Losses for Regression  
- residual: $r = y - \hat{y}$
- $l_2$ loss: $l(r) = r^2$ (not robust)  
- $l_1$ loss: $l(r) = |r|$ (robust but not differentiable)
- Huber loss: : Quadratic for $|r| \leq \delta$ and linear for $|r| \ge \delta$ (Robust and differentiable)  
Square loss much more affected by outliers than absolute loss.  
<div align="center"><img src = "./losses.jpg" width = '500' height = '100' align = center /></div>  

## Loss Function Robustness
Robustness refers to how affected a learning algorithm is by outliers  
<div align="center"><img src = "./robustness.jpg" width = '500' height = '100' align = center /></div>  

# Classification Loss
## Notations  
- 0 - 1 Loss for $f : X \to R$  
$$\begin{array}{l}
f(x)>0 \Longrightarrow \text { Predict } 1 \\
f(x)<0 \Longrightarrow \text { Predict }-1
\end{array}$$  
## The Score Function
- Notation  
  - Action Space $\mathcal{A} = R$  -
  - Output Space $\mathcal{y} = \{-1, 1 \}$
  - Real-valued prediction function $f : X \to R$  
- Definition  
The value $f (x)$ is called the **score** for the input x
- In this context, $f$ may be called a **score function**  
- Intuitively, magnitude of the score represents the **confidence of our prediction.**  
## Margin  
- Definition  
The **margin (or functional margin)** for predicted score $\hat{y}$ and true class $y \in \{-1,1\}$ is $y\hat{y}$  
- The margin often looks like $yf (x)$, where $f (x)$ is our score function  
- The margin is a measure of how correct we are  
   - If $y$ and $\hat{y}$ are the same sign, prediction is correct and margin is positive.
   - If $y$ and $\hat{y}$ have different sign, prediction is incorrect and margin is negative   
   
We want to maximize the **margin**  


## Margin-Based Losses  
- Most classification losses depend only on the margin  
- Such a loss is called a margin-based loss.
- There is a related concept, the **geometric margin**, in the notes on hard-margin SVM

## Classification Losses: 0−1 Loss  
Empirical risk for 0-1 loss: 
$$\hat{R}_{n}(f)=\frac{1}{n} \sum_{i=1}^{n} 1\left(y_{i} f\left(x_{i}\right) \leqslant 0\right)$$  
- Minimizing empirical 0−1 risk not computationally feasible  
- $\hat{R}_n(f)$ is non-convex, not differentiable (in fact, discontinuous!)  
- Optimization is **NP-Hard**


### Zero-One loss:  $l_{0-1} = 1$  
<div align="center"><img src = "./01_figure.jpg" width = '500' height = '100' align = center /></div>  

## Hinge Loss:  $l_{hinge} = max(0, 1 - m)$  
Hinge is a convex, upper bound on 0−1 loss. Not differentiable at $m = 1$  
<div align="center"><img src = "./hinge.jpg" width = '500' height = '100' align = center /></div>  

## (Soft Margin) Linear Support Vector Machine  
- Hypothesis Space $\mathcal{F}=\left\{f(x)=w^{T} x \mid w \in \mathbf{R}^{d}\right\}$  
- Loss $l(m) = (1 - m)_+$  
- $l_2$ regularization  
$$\min _{w \in \mathbf{R}^{d}} \sum_{i=1}^{n}\left(1-y_{i} f_{w}\left(x_{i}\right)\right)_{+}+\lambda\|w\|_{2}^{2}$$

## Logistic Loss $l_{logistic} = log(1 + e^{-m})$  
- Logistic loss is differentiable. Logistic loss always wants more margin (loss never 0)  
<div align="center"><img src = "./logistics.jpg" width = '500' height = '100' align = center /></div>  
What if we substitute residual for margin?   
It is not possible, as the residual becomes larger, loss declines, which is unacceptable

## What About Square Loss for Classification?  
- Loss $\ell(f(x), y)=(f(x)-y)^{2}$  
- Turns out, can write this in terms of margin $m = f (x)y$:  
$$\ell(f(x), y)=(f(x)-y)^{2}=(1-f(x) y)^{2}=(1-m)^{2}$$  
- Heavily penalty the outliers  
<div align="center"><img src = "./square.jpg" width = '500' height = '100' align = center /></div>  