All samples from each GPU combined before applying contrastive loss? #5

fedshyvana · 2022-07-16T05:08:11Z

Hi, thank you for the great work Jianwei! I was wondering, for distributed training, do you:

combine the mini-batches across GPUs before applying contrastive loss (therefore the actual batchsize = n_GPUs x batchsize per GPU)
OR
simply compute the contrastive loss seperately for each GPU? (batchsize is just the batchsize on each GPU)

I've seen implementations of contrastive pretraining methods such as this one (SimCLR) do the 1st option:
https://github.com/Spijkervet/SimCLR/blob/cd85c4366d2e6ac1b0a16798b76ac0a2c8a94e58/simclr/modules/gather.py#L5

I ask because in your code, you have a comment that says "# gather features from all gpus" but if I'm not mistaken I don't actually see where the features are gathered across all GPUs:

UniCL/main.py

Line 177 in 4f680ff

# gather features from all gpus

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All samples from each GPU combined before applying contrastive loss? #5

All samples from each GPU combined before applying contrastive loss? #5

fedshyvana commented Jul 16, 2022

All samples from each GPU combined before applying contrastive loss? #5

All samples from each GPU combined before applying contrastive loss? #5

Comments

fedshyvana commented Jul 16, 2022