# CrossEntropy Loss

---

`CrossEntropyError` of **mlpack** is actually `BCELoss` of **PyTorch** and works only when target is one-hot encoded and calculates reduction='sum' by default whereas PyTorch calculates reduction='mean' by default. Also the mlpack implementation doesn't currently support weights, for which #2368 by @kartikdutt18 is currently open. Also, none of these assumptions are written anywhere, inspite of the discussion in #1070 which was merged a long time back.

Also the Reconstruction Loss of mlpack basically does the exact same thing as this, though maybe it calculates reduction='mean' by default.

---

### General equation of cross entropy loss for multi-class scenario with C classes

![alt text](https://latex.codecogs.com/gif.latex?CE&space;=&space;-\sum_{i}^{C}t_{i}&space;log&space;(s_{i}))

Where 
ti and si are the groundtruth and the CNN score for each class i in C

---

### Specific equation of the loss function for a binary classification scenario with 2 classes only used in the mlpack implementation

![Equation](https://latex.codecogs.com/gif.latex?CE&space;=&space;-\sum_{i=1}^{C%27=2}t_{i}&space;log&space;(s_{i})&space;=&space;-t_{1}&space;log(s_{1})&space;-&space;(1&space;-&space;t_{1})&space;log(1&space;-&space;s_{1}))

---

### Assumptions made by the PyTorch implementation of BCELoss

Assertion `x >= 0. && x <= 1.' ensures each input value should be between 0~1. <br>
Also, each value of label can be either 0 or 1 (one-hot encoded).

---

### Imports and installation of mlpack

In [0]:
%%capture
!sudo apt-get install libmlpack-dev 
import torch
import torch.nn as nn

## 2 classes (Demonstrates that it works for Binary classification)

### PyTorch

#### None Reduction


In [0]:
loss = nn.BCELoss(reduction='none')
input = torch.tensor([[ 0.1778,  0.1203],
                      [ 0.0957,  0.2403],
                      [ 0.1397,  0.1925], 
                      [ 0.2256, 0.3144]], requires_grad=True) # 4 Rows, 2 columns 
target = torch.tensor([[0., 1.],
                       [1., 0.],
                       [0., 0.],
                       [1., 0.]])
output = loss(input, target)
output.backward(torch.ones(input.shape))
print("Input : ")
print(input)
print("Target : ")
print(target)
print("FORWARD : ")
print("Loss : ")
print(output)
print("BACKWARD : ")
print(input.grad)

Input : 
tensor([[0.1778, 0.1203],
        [0.0957, 0.2403],
        [0.1397, 0.1925],
        [0.2256, 0.3144]], requires_grad=True)
Target : 
tensor([[0., 1.],
        [1., 0.],
        [0., 0.],
        [1., 0.]])
FORWARD : 
Loss : 
tensor([[0.1958, 2.1178],
        [2.3465, 0.2748],
        [0.1505, 0.2138],
        [1.4890, 0.3775]], grad_fn=<BinaryCrossEntropyBackward>)
BACKWARD : 
tensor([[  1.2162,  -8.3126],
        [-10.4493,   1.3163],
        [  1.1624,   1.2384],
        [ -4.4326,   1.4586]])


#### Sum Reduction

In [0]:
loss = nn.BCELoss(reduction='sum')
input = torch.tensor([[ 0.1778,  0.1203],
                      [ 0.0957,  0.2403],
                      [ 0.1397,  0.1925], 
                      [ 0.2256, 0.3144]], requires_grad=True) # 4 Rows, 2 columns 
target = torch.tensor([[0., 1.],
                       [1., 0.],
                       [0., 0.],
                       [1., 0.]])
output = loss(input, target)
output.backward()
print("Input : ")
print(input)
print("Target : ")
print(target)
print("FORWARD : ")
print("Loss : ")
print(output)
print("BACKWARD : ")
print(input.grad)

Input : 
tensor([[0.1778, 0.1203],
        [0.0957, 0.2403],
        [0.1397, 0.1925],
        [0.2256, 0.3144]], requires_grad=True)
Target : 
tensor([[0., 1.],
        [1., 0.],
        [0., 0.],
        [1., 0.]])
FORWARD : 
Loss : 
tensor(7.1656, grad_fn=<BinaryCrossEntropyBackward>)
BACKWARD : 
tensor([[  1.2162,  -8.3126],
        [-10.4493,   1.3163],
        [  1.1624,   1.2384],
        [ -4.4326,   1.4586]])


#### Mean reduction

In [0]:
loss = nn.BCELoss(reduction='mean')
input = torch.tensor([[ 0.1778,  0.1203],
                      [ 0.0957,  0.2403],
                      [ 0.1397,  0.1925], 
                      [ 0.2256, 0.3144]], requires_grad=True) # 4 Rows, 2 columns 
target = torch.tensor([[0., 1.],
                       [1., 0.],
                       [0., 0.],
                       [1., 0.]])
output = loss(input, target)
output.backward()
print("Input : ")
print(input)
print("Target : ")
print(target)
print("FORWARD : ")
print("Loss : ")
print(output)
print("BACKWARD : ")
print(input.grad)

Input : 
tensor([[0.1778, 0.1203],
        [0.0957, 0.2403],
        [0.1397, 0.1925],
        [0.2256, 0.3144]], requires_grad=True)
Target : 
tensor([[0., 1.],
        [1., 0.],
        [0., 0.],
        [1., 0.]])
FORWARD : 
Loss : 
tensor(0.8957, grad_fn=<BinaryCrossEntropyBackward>)
BACKWARD : 
tensor([[ 0.1520, -1.0391],
        [-1.3062,  0.1645],
        [ 0.1453,  0.1548],
        [-0.5541,  0.1823]])


### mlpack


In [0]:
%%capture
%%writefile test.cpp  

// This uses the computations present in cross_entropy_error_impl.hpp in ann/loss_functions.
#include <iostream>
#include <armadillo>

using namespace std;
using namespace arma;

int main()
{
  // Constructor
  arma::mat x,y;
  arma::mat weight;

  x << 0.1778 << 0.1203 << endr
    << 0.0957 << 0.2403 << endr
    << 0.1397 << 0.1925 << endr
    << 0.2256 << 0.3144 << endr;

  y << 0 << 1 << endr
    << 1 << 0 << endr
    << 0 << 0 << endr
    << 1 << 0 << endr;

  // Forward
  const double eps = 1e-10;
  arma::mat loss_none = -(y % arma::log(x + eps) + (1. - y) % arma::log(1. - x + eps));
  double loss_sum = arma::accu(loss_none);
  double loss_mean = loss_sum / x.n_elem;

  // Backward
  arma::mat output;
  output = (1. - y) / (1. - x + eps) - y / (x + eps);

  // Display
  cout << "------------------------------------------------------------------" << endl;
  cout << "USER-PROVIDED MATRICES : " << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "Input shape : "<< x.n_rows << " " << x.n_cols << endl;
  cout << "Input : " << endl << x << endl;
  cout << "Target shape : "<< y.n_rows << " " << y.n_cols << endl;
  cout << "Target : " << endl << y << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "SUM " << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "FORWARD : " << endl;
  cout << "Loss : \n" << loss_none << '\n';
  cout << "Loss (sum):\n" << loss_sum << '\n';
  cout << "BACKWARD : " << endl;
  cout << "Output shape : "<< output.n_rows << " " << output.n_cols << endl;
  cout << "Output (sum) : " << endl << output << endl;
  cout << "Sum of all values in this matrix : " << arma::as_scalar(arma::accu(output)) << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "MEAN " << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "FORWARD : " << endl;
  cout << "Loss (mean):\n" << loss_mean << '\n';
  cout << "BACKWARD : " << endl;
  cout << "Output shape : "<< output.n_rows << " " << output.n_cols << endl;
  cout << "Output (mean) : " << endl << output / x.n_elem << endl;
  cout << "Sum of all values in this matrix : " << arma::as_scalar(arma::accu(output / x.n_elem)) << endl;
  cout << "------------------------------------------------------------------" << endl;
  return 0;                                            
}

In [0]:
%%script bash
g++ test.cpp -o test -larmadillo && ./test

------------------------------------------------------------------
USER-PROVIDED MATRICES : 
------------------------------------------------------------------
Input shape : 4 2
Input : 
   0.1778   0.1203
   0.0957   0.2403
   0.1397   0.1925
   0.2256   0.3144

Target shape : 4 2
Target : 
        0   1.0000
   1.0000        0
        0        0
   1.0000        0

------------------------------------------------------------------
SUM 
------------------------------------------------------------------
FORWARD : 
Loss : 
   0.1958   2.1178
   2.3465   0.2748
   0.1505   0.2138
   1.4890   0.3775

Loss (sum):
7.16565
BACKWARD : 
Output shape : 4 2
Output (sum) : 
    1.2162   -8.3126
  -10.4493    1.3163
    1.1624    1.2384
   -4.4326    1.4586

Sum of all values in this matrix : -16.8026
------------------------------------------------------------------
MEAN 
------------------------------------------------------------------
FORWARD : 
Loss (mean):
0.895706
BACKWARD : 
Output shape :

## 3 classes (Demonstrates that it works for Multi Class classification)

### PyTorch

#### None Reduction


In [0]:
loss = nn.BCELoss(reduction='none')
input = torch.tensor([[ 0.1778,  0.1203, 0.2264],
                      [ 0.0957,  0.2403, 0.3400],
                      [ 0.1397,  0.1925, 0.3336], 
                      [ 0.2256,  0.3144, 0.8695]], requires_grad=True) # 4 Rows, 3 columns 
target = torch.tensor([[0., 1., 0.],
                       [1., 0., 0.],
                       [0., 0., 1.],
                       [1., 0., 0.]])
output = loss(input, target)
output.backward(torch.ones(input.shape))
print("Input : ")
print(input)
print("Target : ")
print(target)
print("FORWARD : ")
print("Loss : ")
print(output)
print("BACKWARD : ")
print(input.grad)

Input : 
tensor([[0.1778, 0.1203, 0.2264],
        [0.0957, 0.2403, 0.3400],
        [0.1397, 0.1925, 0.3336],
        [0.2256, 0.3144, 0.8695]], requires_grad=True)
Target : 
tensor([[0., 1., 0.],
        [1., 0., 0.],
        [0., 0., 1.],
        [1., 0., 0.]])
FORWARD : 
Loss : 
tensor([[0.1958, 2.1178, 0.2567],
        [2.3465, 0.2748, 0.4155],
        [0.1505, 0.2138, 1.0978],
        [1.4890, 0.3775, 2.0364]], grad_fn=<BinaryCrossEntropyBackward>)
BACKWARD : 
tensor([[  1.2162,  -8.3126,   1.2927],
        [-10.4493,   1.3163,   1.5152],
        [  1.1624,   1.2384,  -2.9976],
        [ -4.4326,   1.4586,   7.6628]])


#### Sum Reduction

In [0]:
loss = nn.BCELoss(reduction='sum')
input = torch.tensor([[ 0.1778,  0.1203, 0.2264],
                      [ 0.0957,  0.2403, 0.3400],
                      [ 0.1397,  0.1925, 0.3336], 
                      [ 0.2256,  0.3144, 0.8695]], requires_grad=True) # 4 Rows, 3 columns 
target = torch.tensor([[0., 1., 0.],
                       [1., 0., 0.],
                       [0., 0., 1.],
                       [1., 0., 0.]])
output = loss(input, target)
output.backward()
print("Input : ")
print(input)
print("Target : ")
print(target)
print("FORWARD : ")
print("Loss : ")
print(output)
print("BACKWARD : ")
print(input.grad)

Input : 
tensor([[0.1778, 0.1203, 0.2264],
        [0.0957, 0.2403, 0.3400],
        [0.1397, 0.1925, 0.3336],
        [0.2256, 0.3144, 0.8695]], requires_grad=True)
Target : 
tensor([[0., 1., 0.],
        [1., 0., 0.],
        [0., 0., 1.],
        [1., 0., 0.]])
FORWARD : 
Loss : 
tensor(10.9721, grad_fn=<BinaryCrossEntropyBackward>)
BACKWARD : 
tensor([[  1.2162,  -8.3126,   1.2927],
        [-10.4493,   1.3163,   1.5152],
        [  1.1624,   1.2384,  -2.9976],
        [ -4.4326,   1.4586,   7.6628]])


#### Mean reduction

In [0]:
loss = nn.BCELoss(reduction='mean')
input = torch.tensor([[ 0.1778,  0.1203, 0.2264],
                      [ 0.0957,  0.2403, 0.3400],
                      [ 0.1397,  0.1925, 0.3336], 
                      [ 0.2256,  0.3144, 0.8695]], requires_grad=True) # 4 Rows, 3 columns 
target = torch.tensor([[0., 1., 0.],
                       [1., 0., 0.],
                       [0., 0., 1.],
                       [1., 0., 0.]])
output = loss(input, target)
output.backward()
print("Input : ")
print(input)
print("Target : ")
print(target)
print("FORWARD : ")
print("Loss : ")
print(output)
print("BACKWARD : ")
print(input.grad)

Input : 
tensor([[0.1778, 0.1203, 0.2264],
        [0.0957, 0.2403, 0.3400],
        [0.1397, 0.1925, 0.3336],
        [0.2256, 0.3144, 0.8695]], requires_grad=True)
Target : 
tensor([[0., 1., 0.],
        [1., 0., 0.],
        [0., 0., 1.],
        [1., 0., 0.]])
FORWARD : 
Loss : 
tensor(0.9143, grad_fn=<BinaryCrossEntropyBackward>)
BACKWARD : 
tensor([[ 0.1014, -0.6927,  0.1077],
        [-0.8708,  0.1097,  0.1263],
        [ 0.0969,  0.1032, -0.2498],
        [-0.3694,  0.1215,  0.6386]])


### mlpack


In [0]:
%%capture
%%writefile test.cpp  

// This uses the computations present in cross_entropy_error_impl.hpp in ann/loss_functions.
#include <iostream>
#include <armadillo>

using namespace std;
using namespace arma;

int main()
{
  // Constructor
  arma::mat x,y;
  arma::mat weight;

  x << 0.1778 << 0.1203 << 0.2264 << endr
    << 0.0957 << 0.2403 << 0.3400 << endr
    << 0.1397 << 0.1925 << 0.3336 << endr
    << 0.2256 << 0.3144 << 0.8695 << endr;

  y << 0 << 1 << 0 << endr
    << 1 << 0 << 0 << endr
    << 0 << 0 << 1 << endr
    << 1 << 0 << 0 << endr;
 
  // Forward
  const double eps = 1e-10;
  arma::mat loss_none = -(y % arma::log(x + eps) + (1. - y) % arma::log(1. - x + eps));
  double loss_sum = arma::accu(loss_none);
  double loss_mean = loss_sum / x.n_elem;

  // Backward
  arma::mat output;
  output = (1. - y) / (1. - x + eps) - y / (x + eps);

  // Display
  cout << "------------------------------------------------------------------" << endl;
  cout << "USER-PROVIDED MATRICES : " << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "Input shape : "<< x.n_rows << " " << x.n_cols << endl;
  cout << "Input : " << endl << x << endl;
  cout << "Target shape : "<< y.n_rows << " " << y.n_cols << endl;
  cout << "Target : " << endl << y << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "SUM " << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "FORWARD : " << endl;
  cout << "Loss : \n" << loss_none << '\n';
  cout << "Loss (sum):\n" << loss_sum << '\n';
  cout << "BACKWARD : " << endl;
  cout << "Output shape : "<< output.n_rows << " " << output.n_cols << endl;
  cout << "Output (sum) : " << endl << output << endl;
  cout << "Sum of all values in this matrix : " << arma::as_scalar(arma::accu(output)) << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "MEAN " << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "FORWARD : " << endl;
  cout << "Loss (mean):\n" << loss_mean << '\n';
  cout << "BACKWARD : " << endl;
  cout << "Output shape : "<< output.n_rows << " " << output.n_cols << endl;
  cout << "Output (mean) : " << endl << output / x.n_elem << endl;
  cout << "Sum of all values in this matrix : " << arma::as_scalar(arma::accu(output / x.n_elem)) << endl;
  cout << "------------------------------------------------------------------" << endl;
  return 0; 
}

In [0]:
%%script bash
g++ test.cpp -o test -larmadillo && ./test

------------------------------------------------------------------
USER-PROVIDED MATRICES : 
------------------------------------------------------------------
Input shape : 4 3
Input : 
   0.1778   0.1203   0.2264
   0.0957   0.2403   0.3400
   0.1397   0.1925   0.3336
   0.2256   0.3144   0.8695

Target shape : 4 3
Target : 
        0   1.0000        0
   1.0000        0        0
        0        0   1.0000
   1.0000        0        0

------------------------------------------------------------------
SUM 
------------------------------------------------------------------
FORWARD : 
Loss : 
   0.1958   2.1178   0.2567
   2.3465   0.2748   0.4155
   0.1505   0.2138   1.0978
   1.4890   0.3775   2.0364

Loss (sum):
10.9721
BACKWARD : 
Output shape : 4 3
Output (sum) : 
    1.2162   -8.3126    1.2927
  -10.4493    1.3163    1.5152
    1.1624    1.2384   -2.9976
   -4.4326    1.4586    7.6628

Sum of all values in this matrix : -9.32954
---------------------------------------------------