# Loss functions 

Loss functions are used to train neural networks. They measure the difference between the output of the network and the desired output. The loss function is a key part of the training process because it is the guide to the network about how to update the weights. The loss function takes in the (output, target) pair of inputs and computes a value that estimates how far away the output is from the target. The higher the loss value, the more different the output is from the target. The goal of training is to reduce this loss value.

Different loss functions:

   1. Mean Squared Error (MSE)
   2. Mean Absolute Error (MAE)
   3. Huber Loss
   4. Cross Entropy Loss
   5. Binary Cross Entropy Loss
   6. Kullback-Leibler Divergence Loss
   7. Hinge Loss

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### 1. Mean Squared Error (MSE)

MSE is the most commonly used regression loss function. MSE is the sum of squared distances between our target variable and predicted values. The MSE is a measure of the quality of an estimator. It is always non-negative, and values closer to zero are better.

Equation:

$$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y_i})^2$$

where $y_i$ is the target value and $\hat{y_i}$ is the predicted value.

In [2]:
# example
def mse_loss(y_true, y_pred):
    return np.mean(np.power(y_true - y_pred, 2))

y_true = np.random.normal(0, 1, 100)
y_pred = np.random.normal(0, 1, 100)
print(mse_loss(y_true, y_pred))

1.6584520478375584


### 2. Mean Absolute Error (MAE)

MAE is the average of the absolute difference between the target value and the value predicted by the model. It is the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weight.

Equation:

$$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y_i}|$$

In [3]:
# example
def mae_loss(y_true, y_pred):
    return np.mean(np.abs(y_true - y_pred))

y_true = np.random.normal(0, 1, 100)
y_pred = np.random.normal(0, 1, 100)
print(mae_loss(y_true, y_pred))

1.024812570272987


### 3. Huber Loss

Huber loss is less sensitive to outliers in data than the squared error loss. It is a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss. The Huber loss combines MSE and MAE. It is quadratic for small values of the error and linear for large values.

Equation:

$$L_{\delta}(y, f(x)) = \begin{cases} \frac{1}{2}(y - f(x))^2 & \text{for } |y - f(x)| \leq \delta \\ \delta|y - f(x)| - \frac{1}{2}\delta^2 & \text{otherwise} \end{cases}$$

where $y$ is the target value, $f(x)$ is the predicted value and $\delta$ is the threshold.

In [5]:
# example
def huber_loss(y_true, y_pred, delta=1.0):
    return np.mean(np.where(np.abs(y_true - y_pred) < delta, 0.5 * np.power(y_true - y_pred, 2), delta * (np.abs(y_true - y_pred) - 0.5 * delta)))

y_true = np.random.normal(0, 1, 100)
y_pred = np.random.normal(0, 1, 100)
print(huber_loss(y_true, y_pred))

0.8757908259200415


### 4. Cross Entropy Loss

Cross entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0. Cross entropy loss is used when we have two or more label classes.

Equation:

$L_{CE} = -\sum_{i=1}^{n}y_i\log(\hat{y_i})$

where $y_i$ is the target value and $\hat{y_i}$ is the predicted value.

In [9]:
# example
def cross_entropy_loss(y_true, y_pred):
    return -np.mean(y_true * np.log(y_pred))

y_true = [0, 0, 1, 0, 0]
y_pred = [0.05, 0.05, 0.8, 0.05, 0.05]
print(f'Cross entropy loss: {cross_entropy_loss(y_true, y_pred):.4f}')

Cross entropy loss: 0.0446
