This is not really about using loss functions in applications, but it's just basically learning how to use them.

In [1]:
import torch
import torch.nn as nn

##### **MSE Loss**
``torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')``
- `size_average`, `reduce`, and `reduction` control how the loss values are processed. However, `size_average` and `reduce` are deprecated and replaced by `reduction`. 
- `reduction`: how the loss values are processed
    - `none`: no reduction, returns individual loss values for each batch element
    - `mean`: averages the loss values
    - `sum`: sums the loss values
- input: (*), where * means any number of dimensions
- target: (*), same shape as the input


In [4]:
prediction = torch.randn(4, 5)
label = torch.randn(4, 5)

In [10]:
mse = nn.MSELoss(reduction='none')
loss = mse(prediction, label)
loss

tensor([[1.2654, 0.0048, 0.4278, 2.3536, 3.4113],
        [1.2562, 2.3183, 0.6317, 2.5777, 3.4099],
        [0.1179, 1.0581, 0.9525, 0.1674, 0.0176],
        [2.3933, 0.1924, 0.2863, 1.8164, 1.9684]])

In [11]:
mse = nn.MSELoss(reduction='mean')
loss = mse(prediction, label)
loss

tensor(1.3314)

In [13]:
# same as
((prediction - label)**2).mean()

tensor(1.3314)

In [12]:
mse = nn.MSELoss(reduction='sum')
loss = mse(prediction, label)
loss

tensor(26.6272)

##### **BCE Loss**
``torch.nn.BCELoss(weight=None, reduction='mean')``
-  `weight`: a manual rescaling weight given to the loss of each batch element.
- input: (*), where * means any number of dimensions
- target: (*), same shape as the input
- output: scalar, if `reduction` is `none`, then (*) same shape as the input

In [17]:
prediction
print(prediction)

# Assigns a random integer value to each element of the tensor within the range [low, high)
label = torch.zeros(4, 5).random_(0, 2)
print(label)

tensor([[-0.4305,  1.3474, -1.2240,  1.4649, -0.9720],
        [ 0.5704, -1.5824,  0.6929, -1.2097,  2.5763],
        [ 0.8078, -1.1626,  0.6324,  0.3928,  0.3318],
        [-2.1183,  0.8027,  1.0416, -0.8392, -0.0648]])
tensor([[1., 1., 0., 1., 1.],
        [0., 1., 0., 0., 0.],
        [1., 0., 1., 1., 0.],
        [1., 1., 1., 0., 0.]])


When using Binary Cross-Entropy (BCE) Loss, a sigmoid layer is needed because it assumes the input values are probabilities in the range [0, 1].
- If you use `BCEWithLogitsLoss`, which includes the sigmoid operation internally, there's no need to add a separate sigmoid layer.
- If you ensure the input values are already in the [0, 1] range and arge confident the model will not produce values outside this range, the Sigmoid layer can be omitted.

In [18]:
# add a sigmoid layer
sigmoid = nn.Sigmoid()

In [23]:
bce = nn.BCELoss(reduction='mean')
loss = bce(sigmoid(prediction), label)
loss

tensor(0.8049)

In [24]:
bces = nn.BCEWithLogitsLoss(reduction='mean')
loss = bces(prediction, label)
loss

tensor(0.8049)

##### **Cross Entropy Loss**
``torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean', label_smoothing=0.0)``
- input: shape (N, C) where C is number of classes 
- target: (N) where each value is 0 <= target[i] <= C-1
