## Loss Functions:
### I am going to cover lots of Loss Function with some analogy i.e. when and where they should be used.
    1- MSE
    2- MAE
    3- Huber
    4- CrossEntropy
    5- Hinge Loss
    6- Focul Loss
    7- Triplet Loss
    8- Contrastive Loss
    9- KL (Kullback-Leibler Divergence)
    10- Dice Loss / Jaccord Loss)
    11- Label Smoothing Loss
    12- CTC Loss (Connectionist Temporal Classification)
    13- Wasserstein Loss
    14- Perceptual Loss
    15- InfoNCE Loss

## 1- MSE (Mean Squared Error)
Imagine you are throwing dart and the let bulls eye is y_true and you get little away from bullseye (y_pred) so MSE will be average squared distance of each dart from the bullseye <br><br>
<b>MSE = 1/n(mean((y_true - y_pred)^2)) </b> <br>
<br>

Pros: Smooth gradient , easy to optimize  <BR>
Cons: Very Sensitive with outliers

In [1]:
import numpy as np
def MSE(y_true, y_pred):
    return np.mean((y_true-y_pred)**2)

In [3]:
#Gradient of MSE
def grad_MSE(y_true, y_pred):
    return 2*(y_true-y_pred)/len(y_true)

## 2- MAE (Mean Absolute Error) <br>
Here unline MSE, you dont get penalized more for being way off, in simple terms a miss is a miss, doesn't matter how big it is. <BR> <br>
<b>MAE = np.mean(mean(y_true-y_pred))</b> <br>
<br>
Pros: Robust to outliers <br>
Cons: Non-differential at zero, gradients are not smooth



In [7]:
def MAE(y_true, y_pred):
    return np.mean(np.abs(y_true - y_pred))

In [8]:
#Gradient of MAE
def grad_MAE(y_true, y_pred):
    return (y_pred-y_true) # i.e. negative if underpredicted, positive if over predicted.

## 3: Huber Loss
a. Imagine you are driving a self-driving car and the GPS signal is jittery. Most of the time its accurate but sometime it goes far somewhere else.<br>
b. You want to trust the small errors (use MSE for precision)<br>
c. You don’t want to panic over rare, huge blips (use MAE for robustness)<br> <br>
<b>Huber is the peace between MSE and MAE<b>

In [15]:
def huber_loss(y_true, y_pred, delta):
    error = np.abs(y_true - y_pred)
    condition = error <= delta
    return np.mean(np.where(condition, 0.5*(error)**2, delta*(error-0.5*delta)))

In [16]:
def grad_huber_loss(y_true, y_pred, delta):
    error = y_pred-y_true
    return np.mean(np.where(np.abs(error)<=delta), error, delta*np.sign(error))/len(y_true)