# Margin Ranking loss

PyTorch docs -> [Here](https://pytorch.org/docs/stable/nn.html#marginrankingloss) <br>

---

The current mlpack implementation of Forward() is correct. It implements **`mean`** reduction. <br>
I have implemented the other reduction types in the last cell. <br>

---

I don't think the Backward() is correct. <br>
If there are 2 input tensors x1 and x2 for a loss function L, aren't we supposed to get 2 different values during backpropagation -> dL/dx1 and dL/dx2? <br>

How is he only getting 1 tensor as `output`? What does that tensor represent? <br>
-> The guy who wrote this function in mlpack is doing doing y.grad instead of x1.grad and x2.grad, that's how he is getting a single value. [See the test case he wrote here](https://github.com/mlpack/mlpack/pull/2264/files/e0b76b7a595ee5cc92e381e575f0c951700f6044#diff-1f48422898330115de61fe91a3cc4242). I am pretty sure that isn't correct, because under no circumstances does calculating the gradients wrt the labels make any sense.

Because, in no logical way can this loss function require us to calculate dL/dy instead of dL/dx. <br>

Now, that this confusion is resolved, I don't know how to go about resolving the problem. Because, currently the Backward() function of mlpack only supports returning a single ```output``` matrix, but here inevitably we need 2 matrices to be returned to the layer prior to the loss function (the matrices corresponding to dL/dx1 and dL/dx2).  As far as I know, this isn;t currently possible in mlpack, so this might require some complicated change. I will create a separate issue mentioning this.

### Imports and installation of mlpack

In [None]:
%%capture
!sudo apt-get install libmlpack-dev 
import torch
import torch.nn as nn

### PyTorch

input 1, input 2, target have to be 1D tensors.

#### Input generation with fixed seeds

In [None]:
import random
import os
import numpy as np

def fix_seeds(seed=0):
  SEED = seed
  random.seed(seed)
  os.environ['PYTHONHASHSEED'] = str(SEED)
  np.random.seed(SEED)
  torch.manual_seed(SEED)
  torch.backends.cudnn.deterministic = True
  torch.backends.cudnn.benchmark = False
  if (torch.cuda.is_available()):
    torch.cuda.manual_seed(SEED)

fix_seeds()

In [None]:
x1 = torch.randn(3)
x2 = torch.randn(3)
y = torch.FloatTensor(np.random.choice([1, -1], 3))
x = torch.cat((x1,x2), dim=0).view((2,3))

print("------------------------------------------------------------------")
print('Input 1 : ')
print(x1)
print('Input 1 shape : ')
print(x1.shape)
print("------------------------------------------------------------------")
print('Input 2 : ')
print(x2)
print('Input 2 shape : ')
print(x2.shape)
print("------------------------------------------------------------------")
print('Input : ')
print(x)
print(x.shape)
print("------------------------------------------------------------------")
print('Target : ')
print(y)
print("------------------------------------------------------------------")

------------------------------------------------------------------
Input 1 : 
tensor([ 0.4033,  0.8380, -0.7193])
Input 1 shape : 
torch.Size([3])
------------------------------------------------------------------
Input 2 : 
tensor([-0.4033, -0.5966,  0.1820])
Input 2 shape : 
torch.Size([3])
------------------------------------------------------------------
Input : 
tensor([[ 0.4033,  0.8380, -0.7193],
        [-0.4033, -0.5966,  0.1820]])
torch.Size([2, 3])
------------------------------------------------------------------
Target : 
tensor([ 1., -1., -1.])
------------------------------------------------------------------


#### None Reduction


In [None]:
loss = torch.nn.MarginRankingLoss(margin=1.0, reduction='none')
input1 = torch.tensor([[0.4287, -1.6208, -1.5006, -0.4473, 1.5208, -4.5184, 9.3574, -4.8090, 4.3455, 5.2070]], requires_grad=True)
input2 = torch.tensor([[-4.5288, -9.2766, -0.5882, -5.6643, -6.0175, 8.8506, 3.4759, -9.4886, 2.2755, 8.4951]], requires_grad=True)
target = torch.tensor([[1., 1., -1., 1., -1., 1., 1., 1., -1., 1.]])
output = loss(input1, input2, target)

print("------------------------------------------------------------------")
print("Input 1: ")
print(input1)
print("Input 2: ")
print(input2)
print("------------------------------------------------------------------")
print("Target : ")
print(target)
print("------------------------------------------------------------------")
print("FORWARD : ")
print("Loss : ")
print(output)
print("------------------------------------------------------------------")
output.backward(torch.ones(input1.shape), retain_graph=True)
print("BACKWARD (wrt input 1): ")
print(input1.grad)
print()
print("BACKWARD (wrt input 2): ")
print(input2.grad)
print("------------------------------------------------------------------")

------------------------------------------------------------------
Input 1: 
tensor([[ 0.4287, -1.6208, -1.5006, -0.4473,  1.5208, -4.5184,  9.3574, -4.8090,
          4.3455,  5.2070]], requires_grad=True)
Input 2: 
tensor([[-4.5288, -9.2766, -0.5882, -5.6643, -6.0175,  8.8506,  3.4759, -9.4886,
          2.2755,  8.4951]], requires_grad=True)
------------------------------------------------------------------
Target : 
tensor([[ 1.,  1., -1.,  1., -1.,  1.,  1.,  1., -1.,  1.]])
------------------------------------------------------------------
FORWARD : 
Loss : 
tensor([[ 0.0000,  0.0000,  0.0876,  0.0000,  8.5383, 14.3690,  0.0000,  0.0000,
          3.0700,  4.2881]], grad_fn=<ClampMinBackward>)
------------------------------------------------------------------
BACKWARD (wrt input 1): 
tensor([[-0., -0.,  1., -0.,  1., -1., -0., -0.,  1., -1.]])

BACKWARD (wrt input 2): 
tensor([[ 0.,  0., -1.,  0., -1.,  1.,  0.,  0., -1.,  1.]])
---------------------------------------------------

#### Sum Reduction

In [None]:
loss = torch.nn.MarginRankingLoss(margin=1.0, reduction='sum')
input1 = torch.tensor([[0.4287, -1.6208, -1.5006, -0.4473, 1.5208, -4.5184, 9.3574, -4.8090, 4.3455, 5.2070]], requires_grad=True)
input2 = torch.tensor([[-4.5288, -9.2766, -0.5882, -5.6643, -6.0175, 8.8506, 3.4759, -9.4886, 2.2755, 8.4951]], requires_grad=True)
target = torch.tensor([[1., 1., -1., 1., -1., 1., 1., 1., -1., 1.]])
output = loss(input1, input2, target)

print("------------------------------------------------------------------")
print("Input 1: ")
print(input1)
print("Input 2: ")
print(input2)
print("------------------------------------------------------------------")
print("Target : ")
print(target)
print("------------------------------------------------------------------")
print("FORWARD : ")
print("Loss : ")
print(output)
print("------------------------------------------------------------------")
output.backward(retain_graph=True)
print("BACKWARD (wrt input 1): ")
print(input1.grad)
print()
print("BACKWARD (wrt input 2): ")
print(input2.grad)
print("------------------------------------------------------------------")

------------------------------------------------------------------
Input 1: 
tensor([[ 0.4287, -1.6208, -1.5006, -0.4473,  1.5208, -4.5184,  9.3574, -4.8090,
          4.3455,  5.2070]], requires_grad=True)
Input 2: 
tensor([[-4.5288, -9.2766, -0.5882, -5.6643, -6.0175,  8.8506,  3.4759, -9.4886,
          2.2755,  8.4951]], requires_grad=True)
------------------------------------------------------------------
Target : 
tensor([[ 1.,  1., -1.,  1., -1.,  1.,  1.,  1., -1.,  1.]])
------------------------------------------------------------------
FORWARD : 
Loss : 
tensor(30.3530, grad_fn=<SumBackward0>)
------------------------------------------------------------------
BACKWARD (wrt input 1): 
tensor([[-0., -0.,  1., -0.,  1., -1., -0., -0.,  1., -1.]])

BACKWARD (wrt input 2): 
tensor([[ 0.,  0., -1.,  0., -1.,  1.,  0.,  0., -1.,  1.]])
------------------------------------------------------------------


#### Mean reduction

In [None]:
loss = torch.nn.MarginRankingLoss(margin=1.0, reduction='mean')
input1 = torch.tensor([[0.4287, -1.6208, -1.5006, -0.4473, 1.5208, -4.5184, 9.3574, -4.8090, 4.3455, 5.2070]], requires_grad=True)
input2 = torch.tensor([[-4.5288, -9.2766, -0.5882, -5.6643, -6.0175, 8.8506, 3.4759, -9.4886, 2.2755, 8.4951]], requires_grad=True)
target = torch.tensor([[1., 1., -1., 1., -1., 1., 1., 1., -1., 1.]])
output = loss(input1, input2, target)

print("------------------------------------------------------------------")
print("Input 1: ")
print(input1)
print("Input 2: ")
print(input2)
print("------------------------------------------------------------------")
print("Target : ")
print(target)
print("------------------------------------------------------------------")
print("FORWARD : ")
print("Loss : ")
print(output)
print("------------------------------------------------------------------")
output.backward(retain_graph=True)
print("BACKWARD (wrt input 1): ")
print(input1.grad)
print()
print("BACKWARD (wrt input 2): ")
print(input2.grad)
print("------------------------------------------------------------------")

------------------------------------------------------------------
Input 1: 
tensor([[ 0.4287, -1.6208, -1.5006, -0.4473,  1.5208, -4.5184,  9.3574, -4.8090,
          4.3455,  5.2070]], requires_grad=True)
Input 2: 
tensor([[-4.5288, -9.2766, -0.5882, -5.6643, -6.0175,  8.8506,  3.4759, -9.4886,
          2.2755,  8.4951]], requires_grad=True)
------------------------------------------------------------------
Target : 
tensor([[ 1.,  1., -1.,  1., -1.,  1.,  1.,  1., -1.,  1.]])
------------------------------------------------------------------
FORWARD : 
Loss : 
tensor(3.0353, grad_fn=<MeanBackward0>)
------------------------------------------------------------------
BACKWARD (wrt input 1): 
tensor([[-0.0000, -0.0000,  0.1000, -0.0000,  0.1000, -0.1000, -0.0000, -0.0000,
          0.1000, -0.1000]])

BACKWARD (wrt input 2): 
tensor([[ 0.0000,  0.0000, -0.1000,  0.0000, -0.1000,  0.1000,  0.0000,  0.0000,
         -0.1000,  0.1000]])
---------------------------------------------------

### mlpack


#### CURRENT IMPLEMENTATION - Backward is not correct. See explanation in topmost cell.


In [None]:
%%capture
%%writefile test.cpp  

#include <iostream>
#include <armadillo>

using namespace std;
using namespace arma;

int main()
{
  // Constructor
  arma::mat input, input1, input2, target;
  const double margin = 1.0;
 
  input1 = arma::mat("0.4287 -1.6208 -1.5006 -0.4473 1.5208 -4.5184 9.3574 "
      "-4.8090 4.3455 5.2070");
  input2 = arma::mat("-4.5288 -9.2766 -0.5882 -5.6643 -6.0175 8.8506 3.4759 "
      "-9.4886 2.2755 8.4951");
  input = arma::join_cols(input1, input2);
  target = arma::mat("1 1 -1 1 -1 1 1 1 -1 1");

  // Forward
  const int inputRowsForward = input.n_rows;
  const arma::mat& input1Forward = input.rows(0, inputRowsForward / 2 - 1);
  const arma::mat& input2Forward = input.rows(inputRowsForward / 2, inputRowsForward - 1);
  double loss_mean =  arma::accu(arma::max(arma::zeros(size(target)), -target % (input1Forward - input2Forward) + margin)) / target.n_cols;

  // Backward
  arma::mat output;
  const int inputRowsBackward = input.n_rows;
  const arma::mat& input1Backward = input.rows(0, inputRowsBackward / 2 - 1);                       // same as input1Forward (x1)
  const arma::mat& input2Backward = input.rows(inputRowsBackward / 2, inputRowsBackward - 1);       // same as input2Forward (x2)
  output = -target % (input1Backward - input2Backward) + margin;
  output.elem(arma::find(output >= 0)).ones();
  output.elem(arma::find(output < 0)).zeros();
  output = (input2Backward - input1Backward) % output / target.n_cols;

  // Display
  cout << "------------------------------------------------------------------" << endl;
  cout << "USER-PROVIDED MATRICES : " << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "Input shape : "<< input.n_rows << " " << input.n_cols << endl;
  cout << "Input : " << endl << input << endl;
  cout << "Target shape : "<< target.n_rows << " " << target.n_cols << endl;
  cout << "Target : " << endl << target << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "MEAN REDUCTION" << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "FORWARD : " << endl;
  cout << "Loss (mean):\n" << loss_mean << '\n';
  cout << "BACKWARD : " << endl;
  cout << "Output shape : "<< output.n_rows << " " << output.n_cols << endl;
  cout << "Output (sum) : " << endl << output << endl;
  cout << "Sum of all values in this matrix : " << arma::as_scalar(arma::accu(output)) << endl;
  cout << "------------------------------------------------------------------" << endl;
  return 0;
}

In [None]:
%%script bash
g++ test.cpp -o test -larmadillo && ./test

------------------------------------------------------------------
USER-PROVIDED MATRICES : 
------------------------------------------------------------------
Input shape : 2 10
Input : 
   0.4287  -1.6208  -1.5006  -0.4473   1.5208  -4.5184   9.3574  -4.8090   4.3455   5.2070
  -4.5288  -9.2766  -0.5882  -5.6643  -6.0175   8.8506   3.4759  -9.4886   2.2755   8.4951

Target shape : 1 10
Target : 
   1.0000   1.0000  -1.0000   1.0000  -1.0000   1.0000   1.0000   1.0000  -1.0000   1.0000

------------------------------------------------------------------
MEAN REDUCTION
------------------------------------------------------------------
FORWARD : 
Loss (mean):
3.0353
BACKWARD : 
Output shape : 1 10
Output (sum) : 
        0        0   0.0912        0  -0.7538   1.3369        0        0  -0.2070   0.3288

Sum of all values in this matrix : 0.79612
------------------------------------------------------------------


#### NEW IMPLEMENTATION - Reduction done. Backward TODO, but don't yet know how to.

In [None]:
%%capture
%%writefile test.cpp  

#include <iostream>
#include <armadillo>

using namespace std;
using namespace arma;

int main()
{
  // Constructor
  arma::mat input, input1, input2, target;
  const double margin = 1.0;
 
  input1 = arma::mat("0.4287 -1.6208 -1.5006 -0.4473 1.5208 -4.5184 9.3574 "
      "-4.8090 4.3455 5.2070");
  input2 = arma::mat("-4.5288 -9.2766 -0.5882 -5.6643 -6.0175 8.8506 3.4759 "
      "-9.4886 2.2755 8.4951");
  input = arma::join_cols(input1, input2);
  target = arma::mat("1 1 -1 1 -1 1 1 1 -1 1");

  // Forward
  const int inputRowsForward = input.n_rows;
  const arma::mat& input1Forward = input.rows(0, inputRowsForward / 2 - 1);
  const arma::mat& input2Forward = input.rows(inputRowsForward / 2, inputRowsForward - 1);
  arma::mat loss_none = arma::max(arma::zeros(size(target)), -target % (input1Forward - input2Forward) + margin);
  double loss_sum = arma::accu(loss_none);
  double loss_mean = loss_sum / input1Forward.n_elem;

  // Backward
  arma::mat output;

  // Display
  cout << "------------------------------------------------------------------" << endl;
  cout << "USER-PROVIDED MATRICES : " << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "Input shape : "<< input.n_rows << " " << input.n_cols << endl;
  cout << "Input : " << endl << input << endl;
  cout << "Target shape : "<< target.n_rows << " " << target.n_cols << endl;
  cout << "Target : " << endl << target << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "SUM " << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "FORWARD : " << endl;
  cout << "Loss : \n" << loss_none << '\n';
  cout << "Loss (sum):\n" << loss_sum << '\n';
  //cout << "BACKWARD : " << endl;
  //cout << "Output shape : "<< output.n_rows << " " << output.n_cols << endl;
  //cout << "Output (sum) : " << endl << output << endl;
  //cout << "Sum of all values in this matrix : " << arma::as_scalar(arma::accu(output)) << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "MEAN " << endl;
  cout << "------------------------------------------------------------------" << endl;
  cout << "FORWARD : " << endl;
  cout << "Loss (mean):\n" << loss_mean << '\n';
  //cout << "BACKWARD : " << endl;
  //cout << "Output shape : "<< output.n_rows << " " << output.n_cols << endl;
  //cout << "Output (mean) : " << endl << output / x.n_elem << endl;
  //cout << "Sum of all values in this matrix : " << arma::as_scalar(arma::accu(output / x.n_elem)) << endl;
  cout << "------------------------------------------------------------------" << endl;
  return 0;
}

In [None]:
%%script bash
g++ test.cpp -o test -larmadillo && ./test

------------------------------------------------------------------
USER-PROVIDED MATRICES : 
------------------------------------------------------------------
Input shape : 2 10
Input : 
   0.4287  -1.6208  -1.5006  -0.4473   1.5208  -4.5184   9.3574  -4.8090   4.3455   5.2070
  -4.5288  -9.2766  -0.5882  -5.6643  -6.0175   8.8506   3.4759  -9.4886   2.2755   8.4951

Target shape : 1 10
Target : 
   1.0000   1.0000  -1.0000   1.0000  -1.0000   1.0000   1.0000   1.0000  -1.0000   1.0000

------------------------------------------------------------------
SUM 
------------------------------------------------------------------
FORWARD : 
Loss : 
         0         0    0.0876         0    8.5383   14.3690         0         0    3.0700    4.2881

Loss (sum):
30.353
------------------------------------------------------------------
MEAN 
------------------------------------------------------------------
FORWARD : 
Loss (mean):
3.0353
---------------------------------------------------------