Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

What's the point if we do not gather all outputs in different GPUs to compute contrastive loss #20

Closed
BaohaoLiao opened this issue Sep 2, 2021 · 3 comments

Comments

@BaohaoLiao
Copy link

Hi,

this is a really great work. However, I have a general question for the contrastive loss.

In your code, you use 8GPUs for a total batch size of 256. It means 32 samples in one GPU. You compute the contrastive loss of these 32 samples on the same GPU firstly, then gather the loss from different GPUs to compute the final gradient.

However, it makes little sense for me to use this way to increase the batch size. One challenge for the contrastive loss is to find hard negative. Normally we increase the batch size on one single GPU to handle this problem. Since larger batch size offer us more possibility to find hard negatives. But if we use DDP, this kind of larger total batch size is not useful.

For example, I use 16 GPUs for a total batch size of 512. This will result in the same number of samples (32) on one GPU as above. Would it better to gather all of the output embeddings from different GPUs to one GPU to compute the contrastive loss?

In Table 2 of your paper, how do your change the batch size? Increasing the samples on a single GPU and fix the number of GPUs, or increasing the number of GPUs and fix the number of samples on a single GPU? The result is a little weird for me, total batch size of 4096 is the worst.

@endernewton
Copy link
Contributor

Is it posted on the wrong repo? For SimSiam, we use l2-loss (cosine similarity), we do not use contrastive loss.

@endernewton
Copy link
Contributor

4096 batch size being worse is due to the difficulty of training large batch size, it is observed in other training (e.g. supervised ImageNet) as well.

@BaohaoLiao
Copy link
Author

Thank you for your response. It seems I mix your method with moco. Sorry for this

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants