#### Author: Prakash C. Sukhwal
#### July 2021
#### Associated Lecturer & Consultant
#### Institute of Systems Science, NUS

---

In [None]:
## turn on the autocomplete if off by default
%config use_jedi = False

###### All the given implementations are in pytorch version: 1.7.1
- to check you version type the below commands in your notebook
      - import torch
      - torch.__version__

In [None]:
import torch
import torch.nn as nn

In [None]:
# check the version
torch.__version__

'1.8.1+cu101'

##### 1. MAE: Mean Absolute Error or L1 Loss

<img src="https://drive.google.com/uc?id=1swk0KoIIFKH6LUgza_DKw_nC1nEuzE9V" alt="image" 
    width="400" 
    height="180" class="center">
    

    Note:
    - less affected by outliers
    - when we use minibatch n is the batch size else n is all the samples

###### how to invoke it in torch?
       - we invoke it from class torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean')
       - reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 
        'none': no reduction will be applied, 
        'mean': the sum of the output will be divided by the number of elements in the output, 
        'sum': the output will be summed. 
         Default: 'mean'
source: https://pytorch.org/docs/stable/generated/torch.nn.L1Loss.html#torch.nn.L1Loss

In [None]:
## let us say you want to get the loss between two tensors pred and act where 
## pred: predicted target values and act: actual target values

loss = nn.L1Loss()
pred = torch.randn(3, 5, requires_grad= True)
act = torch.randn(3, 5)
err = loss(pred, act)
err.backward()

##### 2. MSE: Mean Square Error or L2 Loss

<img src="https://drive.google.com/uc?id=1aNtxU25D0CzDByxSMY2xdZ3UeMrEtYm_" alt="image" 
    width="400" 
    height="180" class="center">

    Note:
    - impact of outliers is more pronounced in MSE than MAE
    - when we use minibatch n is the batch size else n is all the samples

###### how to invoke it in torch?

    - we invole it from class torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')
    - reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 
    'none': no reduction will be applied, 
    'mean': the sum of the output will be divided by the number of elements in the output, 
    'sum': the output will be summed. 
     Default: 'mean'
source: https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss

In [None]:
## let us say you want to get the loss between two tensors pred and act where 
## pred: predicted target values and act: actual target values

loss = nn.MSELoss(reduction='none')
#print(loss)

In [None]:
# create random pred and act
pred = torch.randn(3, 5)
print(pred)
act = torch.randn(3, 5)
print(act)


err = loss(pred, act)
print('error \n')
print(err)

print(pred.grad)

tensor([[ 0.0463,  0.3386,  0.5060,  0.3870, -0.3964],
        [ 0.6855, -0.8004,  1.0951,  0.1720, -0.3706],
        [ 0.3975,  1.0624,  2.9286, -0.6500,  0.0071]])
tensor([[ 1.2731,  1.1445,  2.0540,  0.3410, -1.0231],
        [ 0.7668, -0.0611, -0.6694, -0.1932,  0.3795],
        [-1.5548, -0.1715, -0.9569, -0.1222,  1.1381]])
error 

tensor([[1.5049e+00, 6.4943e-01, 2.3963e+00, 2.1160e-03, 3.9278e-01],
        [6.6242e-03, 5.4645e-01, 3.1135e+00, 1.3332e-01, 5.6261e-01],
        [3.8116e+00, 1.5224e+00, 1.5098e+01, 2.7849e-01, 1.2792e+00]])
None


    Note:
        - backpropagation is handled by variables and not nn.Module

In [None]:
# compute with backpropagation
pred = torch.randn(3, 5, requires_grad= True)
act = torch.randn(3, 5)

err = loss(pred, act)
print('error \n')
print(err)

print(pred.grad)

error 

tensor([[1.0710e+01, 9.2594e-01, 3.9583e-01, 3.9658e+00, 3.2851e+00],
        [9.8965e-03, 7.7764e-01, 5.1768e-01, 4.2635e+00, 6.3811e+00],
        [8.2533e-04, 5.2075e-01, 6.4039e+00, 2.7051e+00, 9.8436e+00]],
       grad_fn=<MseLossBackward>)
None


In [None]:
# with reduction
loss = nn.MSELoss(reduction= 'mean')
# compute with backpropagation
pred = torch.randn(3, 5, requires_grad= True)
act = torch.randn(3, 5)

err = loss(pred, act)

print(err)

print(pred.grad)

tensor(1.3740, grad_fn=<MseLossBackward>)
None


###### Question: How do you get RMSE from the MSE?


In [None]:
class RMSELoss(nn.Module):
    def __init__(self, eps=1e-6):
        super().__init__()
        self.mse = nn.MSELoss()
        self.eps = eps
    def forward(self, pred, act):
        loss = torch.sqrt(self.mse(pred, act)+ self.eps)
        return loss

mse=0 will cause issue for the gradient during backward pass as a result of multiplying 0 by derivative of 0 which is infinity. 
So you see eps added above

In [None]:
rmse = RMSELoss()

rmse_loss = rmse(pred, act)

print(rmse_loss.backward())

print(pred.grad)

None
tensor([[-0.2314, -0.4303, -0.1116, -0.5244,  0.1890],
        [ 0.0233, -0.1346,  0.3256,  0.3586,  0.0416],
        [-0.2692, -0.1448,  0.2487,  0.3116,  0.1140]])


###### Question: above we see formula for one neuron output; How to incorporate errors from more than one neuron?

    we sum-up the errors from all the neurons

###### Question: Can you combine L1 and L2 losses?

    Yes, it is called smooth L1 loss or Huber loss

##### 3. Binary Cross Entropy Loss or Log Loss

<img src="https://drive.google.com/uc?id=11-rib5RXvdzMa2yUr2_LpV0Wuozi6act" alt="image" 
    width="600" 
    height="400" class="center">
    
source: https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html

    Note: 
        - we sum over all the samples (1 to N) and divide by -(1/N) to get overall error

    Entropy: tells us about the uncertainty involved with certain probability distribution. Eg. you need 3 bits to represent 8 different animals given by log2(8) [binary encoding scheme] and for 1024 animals it is 10. More variation in probability means more entropy.
    
    Cross-Entrpy: number of bits required to explain the difference between 2 probability distributions

 ###### how to invoke it in torch?

    - we invole it from class torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')
    - reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 
    'none': no reduction will be applied, 
    'mean': the sum of the output will be divided by the number of elements in the output, 
    'sum': the output will be summed. 
     Default: 'mean'
source: https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html#torch.nn.BCELoss

In [None]:
m = nn.Sigmoid() # activation function
loss = nn.BCELoss()

In [None]:
pred = torch.randn(3, requires_grad=True)
print(pred)
act = torch.empty(3).random_(2)
print(act)

tensor([ 1.4958, -1.8144, -0.4815], requires_grad=True)
tensor([0., 0., 1.])


In [None]:
err = loss(m(pred), act)
err.backward()

###### Question: In a distribution where target is set of classes {cat (1), dog (0)} and both classes are equally distruted.
    1. is the entropy very low or very high?
    2. what is the binary cross entropy in this case?

    1. highest for equal distribution
    2. -(1*(log(0.5)) + (1-1)* (log(1-0.5))) => log(2)

###### Question: In a multi-class emotion detection exercise (where more than one label can be correct) where a person is both happy and energetic in reality we got the below output probabilities in a NN. Assuming only one sample (N=1), compute the binary cross entropy loss

    pred = [0.2, 0.8, 0.2, 0.4]
    act = [sad, happy, energetic, scared]

    bce = (-1/1)* [(1-0)*log(1- 0.2) + 1*log(0.8) + 1*log(0.2) + (1-0)*log(1-0.4)]
        = - [-0.09  + (-0.09) + (-0.69) + (-0.22)]
    note: when the network makes error and leans towards wrong labels (i.e., high predicted prob. for a wrong label) we see high magnitude of individual error.

##### 4. Cross Entropy Loss or Log Loss

In [None]:
loss = nn.CrossEntropyLoss()
pred = torch.randn(3, 5, requires_grad=True)
print(pred)

act = torch.empty(3, dtype=torch.long).random_(5)# note: 1-D
print(act)

err = loss(pred, act)
err.backward()

tensor([[-0.3726,  0.2790, -1.1045,  1.5715, -1.9348],
        [ 1.2946,  0.6695, -1.2225, -3.9203,  1.0332],
        [ 1.2877,  1.1664, -0.4983, -1.2589,  0.0110]], requires_grad=True)
tensor([1, 3, 4])


    note:
        - cross entroy loss in pytorch invokes softmax activation function internally

###### 5. Hinge Loss (max-margin loss)

<img src="https://drive.google.com/uc?id=1R9KG2V4UOah8PRQialYS3sh-0jupBtZe" alt="image" 
    width="600" 
    height="400" class="center">

<img src="https://drive.google.com/uc?id=11JLZBqZadIY0l3N5lbxYSZ9GgsZJOLg8" alt="image" 
    width="400" 
    height="180" class="center">
            
    source: 
    https://stats.stackexchange.com/questions/372999/confusion-on-hinge-loss-and-svm
    https://en.wikipedia.org/wiki/Hinge_loss

    note:
        - it is used for classification problems such as SVMs
        - labels used are -1 and +1 for target
        - penalizes wrong predictions and predictions with less confidence in correct class based on a margin
        - works towards maximizing the score for the true class (i.e., true class to have score larger than false classes by a margin)

###### Question
    compute the cross-entropy loss and hinge loss as shown in the figure below usnig pytorch. The blue class is the true class in this case.
    
<img src="https://drive.google.com/uc?id=1sEuCwxpFZS4GmQiTpl4f9SGGwQU6ENmC" alt="image" 
    width="600" 
    height="350" class="center">
source: https://cs231n.github.io/linear-classify/#loss-function

In [None]:
## Let's try the values given in the figure above in pytorch

lg_prob = [-2.85, 0.86, 0.28]
print(type(lg_prob))

# convert to tensor
lg_prob_tens = torch.tensor(lg_prob)
print(type(lg_prob_tens))

<class 'list'>
<class 'torch.Tensor'>


In [None]:
## 1. Cross Entropy Loss
# use softmax 
# instantiate 
sft = nn.Softmax()
out_prob = sft(lg_prob_tens)

print(out_prob)

# final loss
-torch.log(out_prob)

tensor([0.0154, 0.6312, 0.3534])


  """


tensor([4.1702, 0.4602, 1.0402])

In [None]:
## 2. Hinge Loss
target_tens = torch.tensor([-1, -1, 1])
print(type(target_tens))

<class 'torch.Tensor'>


In [None]:
## try-1
def hinge(y_true, y_pred):
    zero = torch.Tensor([0]) 
    out = torch.max(zero, 1 - y_true * y_pred)
    print(out)
    return torch.sum(out)

In [None]:
hinge(lg_prob_tens, target_tens)

tensor([0.0000, 1.8600, 0.7200])


tensor(2.5800)

In [None]:
## try-2
class MyHingeLoss(torch.nn.Module):

    def __init__(self):
        super(MyHingeLoss, self).__init__()

    def forward(self, output, target):

        hinge_loss = 1 - torch.mul(output, target)
        hinge_loss[hinge_loss < 0] = 0
        return hinge_loss

In [None]:
h_loss2 = MyHingeLoss()

#final loss-
print(h_loss2(lg_prob_tens,target_tens))
sum(h_loss2(lg_prob_tens,target_tens))

tensor([0.0000, 1.8600, 0.7200])


tensor(2.5800)

In [None]:
## try-3
h_loss3 = nn.MultiLabelMarginLoss()

In [None]:
h_loss3(lg_prob_tens,target_tens)

tensor(0.)