In [1]:
import torch 
import torch.nn as nn

Source: https://neptune.ai/blog/pytorch-loss-functions

1.) The Mean Absolute Error (MAE), also called L1 Loss, computes the average of the sum of absolute differences between actual values and predicted values.

It checks the size of errors in a set of predicted values, without caring about their positive or negative direction. If the absolute values of the errors are not used, then negative values could cancel out the positive values. 

The Pytorch L1 Loss is expressed as:

$ loss(x,y) = |x-y| $

$x$ represents the actual value and $y$ the predicted value.

When could it be used?

Regression problems, especially when the distribution of the target variable has outliers, such as small or big values that are a great distance from the mean value. It is considered to be more robust to outliers.

In [2]:
pred = torch.randn(3, 5, requires_grad=True) ### Your model's output.
target = torch.randn(3, 5) ### Ground truth data.

mae_loss = nn.L1Loss()
output = mae_loss(pred, target)
output.backward()

print('pred: ', pred)
print('pred: ', pred.shape)
print('target: ', target)
print('output: ', output)

pred:  tensor([[-2.2341, -1.3957,  0.2228, -0.7950,  1.0993],
        [ 0.1686,  0.0910, -0.8216,  0.6300, -0.6463],
        [-1.2265, -0.0884,  0.6268, -0.7210, -0.6643]], requires_grad=True)
pred:  torch.Size([3, 5])
target:  tensor([[ 0.2543, -0.9881, -1.7214,  2.4189, -0.5889],
        [-0.9297, -0.6787, -1.2995, -1.0962,  1.9567],
        [-0.5292,  1.0484, -0.7898, -1.1584, -0.2697]])
output:  tensor(1.3667, grad_fn=<L1LossBackward>)


2.) The Mean Squared Error (MSE), also called L2 Loss, computes the average of the squared differences between actual values and predicted values.

Pytorch MSE Loss always outputs a positive result, regardless of the sign of actual and predicted values. To enhance the accuracy of the model, you should try to reduce the L2 Loss—a perfect value is 0.0. 

The squaring implies that larger mistakes produce even larger errors than smaller ones. If the classifier is off by 100, the error is 10,000. If it’s off by 0.1, the error is 0.01. This punishes the model for making big mistakes and encourages small mistakes. 

The Pytorch L2 Loss is expressed as:

$ loss(x,y) = (x-y)^2 $

$x$ represents the actual value and $y$ the predicted value.

When could it be used?

MSE is the default loss function for most Pytorch regression problems.

In [3]:
pred = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
mse_loss = nn.MSELoss()
output = mse_loss(pred, target)
output.backward()

print('pred: ', pred)
print('pred shape: ', pred.shape)
print('target: ', target)
print('output: ', output)

pred:  tensor([[-0.4049,  1.5857,  0.0674,  0.0907,  1.7957],
        [ 0.2795,  2.1948,  0.6551, -0.5793,  0.6910],
        [ 1.0747, -1.5500, -1.2251,  0.1091, -0.4304]], requires_grad=True)
pred shape:  torch.Size([3, 5])
target:  tensor([[ 0.8597,  0.0467,  0.7292, -1.5397,  0.0193],
        [ 0.6137, -0.8175,  1.4498, -3.1394,  0.5444],
        [ 0.9343, -0.6699,  0.5403,  0.6300,  1.3114]])
output:  tensor(2.2553, grad_fn=<MseLossBackward>)


3.) The Negative Log-Likelihood Loss function (NLL) is applied only on models with the softmax function as an output activation layer. Softmax refers to an activation function that calculates the normalized exponential function of every unit in the layer.

The Softmax function is expressed as:

$S(f_{y_i}) = \frac{e^{{f_y}_i}}{\sum_j e^{f_j}}$

The function takes an input vector of size N, and then modifies the values such that every one of them falls between 0 and 1. Furthermore, it normalizes the output such that the sum of the N values of the vector equals to 1.

NLL uses a negative connotation since the probabilities (or likelihoods) vary between zero and one, and the logarithms of values in this range are negative. In the end, the loss value becomes positive.

In NLL, minimizing the loss function assists us get a better output. The negative log likelihood is retrieved from approximating the maximum likelihood estimation (MLE). This means that we try to maximize the model’s log likelihood, and as a result, minimize the NLL.  

In NLL, the model is punished for making the correct prediction with smaller probabilities and encouraged for making the prediction with higher probabilities. The logarithm does the punishment. 

NLL does not only care about the prediction being correct but also about the model being certain about the prediction with a high score. 

The Pytorch NLL Loss is expressed as:

$loss(x,y) = -(\log y)$

$x$ represents the actual value and $y$ the predicted value.

When could it be used?

Multi-class classification problems

In [4]:
# size of input (N x C) is = 3 x 5
pred = torch.randn(3, 5, requires_grad=True) ### Your model outputs.
# every element in target should have 0 <= value < C
target = torch.tensor([1, 0, 4])

m = nn.LogSoftmax(dim=1)
nll_loss = nn.NLLLoss()
print('log softmax m(pred): ', m(pred))
output = nll_loss(m(pred), target)
output.backward()

print('pred: ', pred)
print('target: ', target)
print('output: ', output)

log softmax m(pred):  tensor([[-2.2447, -1.5042, -1.6114, -1.0294, -2.1627],
        [-2.2217, -1.5101, -2.1768, -2.3715, -0.7680],
        [-3.8714, -1.4713, -2.6377, -0.5809, -2.1320]],
       grad_fn=<LogSoftmaxBackward>)
pred:  tensor([[-0.0477,  0.6928,  0.5856,  1.1675,  0.0343],
        [-0.0233,  0.6884,  0.0217, -0.1731,  1.4304],
        [-1.3067,  1.0935, -0.0730,  1.9839,  0.4328]], requires_grad=True)
target:  tensor([1, 0, 4])
output:  tensor(1.9526, grad_fn=<NllLossBackward>)


4.) The Categorical Cross-Entropy Loss Function computes the difference between two probability distributions for a provided set of occurrences or random variables.

It is used to work out a score that summarizes the average difference between the predicted values and the actual values. To enhance the accuracy of the model, you should try to minimize the score—the cross-entropy score is between 0 and 1, and a perfect value is 0.

Other loss functions, like the squared loss, punish incorrect predictions. Cross-Entropy penalizes greatly for being very confident and wrong.

Unlike the Negative Log-Likelihood Loss, which doesn’t punish based on prediction confidence, Cross-Entropy punishes incorrect but confident predictions, as well as correct but less confident predictions. 

The Cross-Entropy function has a wide range of variants, of which the most common type is the Binary Cross-Entropy (BCE). The BCE Loss is mainly used for binary classification models; that is, models having only 2 classes. 

The Pytorch Cross-Entropy Loss is expressed as:

$loss(x,y) = -\sum x\log y$

$x$ represents the true label’s probability and $y$ represents the predicted label’s probability. 

When could it be used?

Multi-classification tasks, for which it’s the default loss function in Pytorch.
Creating confident models—the prediction will be accurate and with a higher probability.

In [5]:
pred = torch.randn(3, 5, requires_grad=True) ### Your model's prediction. Does not need to be normalized.
target = torch.empty(3, dtype=torch.long).random_(5) ### Integer classes (total classes = 5)
# target = torch.empty(3, dtype=torch.long).random_(9) ### Invalid number of classes
# target = torch.randint(-5, 5, (3,)) ### Negative classes give error
# target = torch.randn(3) ### Incorrect ground truth data

print('pred: ', pred)
print('target: ', target)

cross_entropy_loss = nn.CrossEntropyLoss()
output = cross_entropy_loss(pred, target)
output.backward()

print('output: ', output)

pred:  tensor([[-0.3467, -1.3341, -1.4863,  0.4818, -0.3365],
        [ 0.6944,  0.1601, -1.5800, -1.4331, -1.1954],
        [ 0.7142,  0.8041,  0.2729, -0.1831,  0.9941]], requires_grad=True)
target:  tensor([1, 3, 3])
output:  tensor(2.5965, grad_fn=<NllLossBackward>)


5.) The Binary Cross Entropy Function computes the loss of only two output outcomes, which is the presence or absence of a feature.

The Pytorch Binary Cross-Entropy Loss is expressed as:

$BCE(t,p) = -\frac{1}{N}(t * \log(p) + (1-t)*\log(1-p) )$

where $t$ is the label (1 and 0) and $p$ is the predicted probability for all N points.

Make sure that the target is between 0 and 1. Both the input and target tensors may have any number of dimensions. 

In [6]:
loss = nn.BCELoss()
pred = torch.rand(3, 5, requires_grad=True) ### Your model output.
target = torch.randn(3, 5).softmax(dim=1) ### Grount truth / label.
output = loss(pred, target)
output.backward()

print("pred: ", pred)
print("target: ", target)

pred:  tensor([[0.5801, 0.1129, 0.5670, 0.0796, 0.0400],
        [0.3979, 0.7031, 0.6871, 0.8936, 0.3000],
        [0.6469, 0.8474, 0.3247, 0.3977, 0.0179]], requires_grad=True)
target:  tensor([[0.4882, 0.1352, 0.0435, 0.0546, 0.2785],
        [0.4706, 0.0450, 0.1592, 0.2269, 0.0983],
        [0.2869, 0.2539, 0.2574, 0.1049, 0.0969]])


Extra: You can also create your own Custom loss function

In [7]:
def myCustomLoss(my_outputs, my_labels):
    #specifying the batch size
    my_batch_size = my_outputs.size()[0] 
    #calculating the log of softmax values           
    my_outputs = F.log_softmax(my_outputs, dim=1)  
    #selecting the values that correspond to labels
    my_outputs = my_outputs[range(my_batch_size), my_labels] 
    #returning the results
    return -torch.sum(my_outputs)/number_examples