Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MarginRankingLoss with multiple examples per batch is broken #9526

Closed
marchbnr opened this issue Jul 18, 2018 · 12 comments
Closed

MarginRankingLoss with multiple examples per batch is broken #9526

marchbnr opened this issue Jul 18, 2018 · 12 comments

Comments

@marchbnr
Copy link

marchbnr commented Jul 18, 2018

Issue description

Using the MarginRankingLoss with multiple examples per batch seems to be broken.

It seems like this has been implemented here:
#972

but is broken after this change:
#5346

Code example

import torch
import torch.nn as nn
x1 = torch.ones(64, 128)
x2 = torch.ones(64, 128)
target = torch.ones(64)
loss = nn.MarginRankingLoss(margin=1.0).forward(x1, x2, target)

This results in the following Exception:
Traceback (most recent call last):
File "", line 1, in
File "/home/marc/Applications/miniconda3/envs/pytorch-env/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 874, in forward
self.reduce)
File "/home/marc/Applications/miniconda3/envs/pytorch-env/lib/python3.6/site-packages/torch/nn/functional.py", line 1580, in margin_ranking_loss
return torch.margin_ranking_loss(input1, input2, target, margin, size_average, reduce)
RuntimeError: The size of tensor a (64) must match the size of tensor b (128) at non-singleton dimension 1

System Info

Collecting environment information...
PyTorch version: 0.4.0
Is debug build: No
CUDA used to build PyTorch: 8.0.61

OS: Ubuntu 16.04.4 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: Could not collect

Python version: 3.6
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip3] numpy (1.13.1)
[conda] pytorch-ignite 0.1.0
[conda] pytorch-nlp 0.3.5
[conda] pytorch-quasar 0.0.1a0
[conda] torch 0.4.0
[conda] torchfile 0.1.0
[conda] torchtext 0.2.3 <pip

@marchbnr
Copy link
Author

Ok after reading the documentation more carefully I realized, that the target tensor has to be of the same shape as the inputs.

@kristenmasada
Copy link

Where did you see this in the documentation? When I looked here, it said the following about the input sizes for MarginRankingLoss:
Shape:
Input: (N,D) where N is the batch size and D is the size of a sample.
Target: (N)
Output: scalar. If reduce is False, then (N).

If 64 is your batch size and 128 is the size of your sample, shouldn't it be okay to make target = torch.ones(64), rather than target = torch.ones(128)? Maybe this is a mistake in the documentation and it should say Target: (D) instead of Target: (N)?

@MuadDev
Copy link

MuadDev commented Jan 29, 2020

Or size Target(N, D)?

@ragy-deepbiome
Copy link

ragy-deepbiome commented Jan 29, 2020

To do multiple batches in margin ranking loss:

batch_size = 2
True Sample + Similar Sample
x1 = torch.randn(batch_size,64) # Tensor of positive output, target = 1
True Sample + Dissimilar Sample
x2 = torch.randn(batch_size,64)# Tensor of negative output, target = -1
target = torch.randn(batch_size,1) # Should be (1.0,-1.0)
Optimize margin loss function on features of (True,similar,target=1) and (True,dissimilar,target=-1)
torch.nn.functional.margin_ranking_loss(x1,x2,target)

@MuadDev
Copy link

MuadDev commented Jan 29, 2020

In the documentation (https://pytorch.org/docs/stable/nn.html#torch.nn.MarginRankingLoss) it is stated that dissimilar samples should be indicated with -1 instead of 0 if I read it correctly.

@ragy-deepbiome
Copy link

ragy-deepbiome commented Jan 29, 2020

Yes, Thank you for catching that - Ill fix it in my comment

@MuadDev
Copy link

MuadDev commented Jan 29, 2020

By the way, I assumed the loss worked differently, more like this:

A row in x1 at index i is compared to a row in x2 also with index i using the target in y defined at index i.

But if I read your comment then it seems that all the samples in x1 should be of a single id, and this single id should be different from the various ids in x2.

I find it hard to determine which of us is right.. Can you help me? @ragy-deepbiome

@MuadDev
Copy link

MuadDev commented Jan 29, 2020

Where did you see this in the documentation? When I looked here, it said the following about the input sizes for MarginRankingLoss:
Shape:
Input: (N,D) where N is the batch size and D is the size of a sample.
Target: (N)
Output: scalar. If reduce is False, then (N).

If 64 is your batch size and 128 is the size of your sample, shouldn't it be okay to make target = torch.ones(64), rather than target = torch.ones(128)? Maybe this is a mistake in the documentation and it should say Target: (D) instead of Target: (N)?

I think it works like this:

With a batch size of 64 and 128 features your target must have this size: target = torch.ones(64, 1). Notice the single dimension added. For this you can use target = target.unsqueeze(dim=1)

@ragy-deepbiome
Copy link

This depends on how you are using it but yes, the target dimension you mentioned in the previous comment seem correct to me, it is the same as what I mentioned.
From what I understand is that you need an (anchor + Positive) and you need an (anchor + Negative)
where
x1 = outputs from Pair(Anchor sample + Positive Samples)
and
x2 = outputs from Pair(Anchor sample + Negative Samples)

@MuadDev
Copy link

MuadDev commented Jan 29, 2020

No that is not what I meant.

what i mean is this:

# Pseudo code
def get_id(sample):
   """function that returns (somehow) the ID for a given sample's feature vector"""
  ...

batch_size = ...
for i in range(batch_size):
  if target[i] == 1:
    assert get_id(x1[i, :]) == get_id(x2[i, :])
  elif target[i] == -1:
    assert get_id(x1[i, :]) != get_id(x2[i, :])
  else:
    assert False, "cannot have anything else than -1 or 1 in the target vector"

So it is not necessary to have an "anchor sample". Each pair of samples is evaluated on its own. It is different from the triplet loss (= what I think that you are describing), where you do have an "anchor sample".

Also there is no relation between the ids of the samples within x1. (They thus need not all be of the same ID). As there is also no relation between the ids of the samples within x2. There exists only a relation between the ids of the samples in x1 and x2 which have the same index.

But please correct me if I am wrong!

@MuadDev
Copy link

MuadDev commented Jan 29, 2020

After reading the following blog post: https://gombru.github.io/2019/04/03/ranking_loss/ I come to the conclusion that indeed you are right and I am wrong.

So all the IDs of the samples in x1 indeed must be the same, and that ID must be different from all the IDs in x2. Right?

@bharnoufi
Copy link

I don't understand how we should use this Loss since there is a gap between the documentation and the execution regarding the sizes that are needed.

"RuntimeError: The size of tensor a (64) must match the size of tensor b (5) at non-singleton dimension 1"
both if the target size is torch.Size([64, 1]) or torch.Size([64]).

In the documentation the size of the target is (N) where N is the batch size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants