Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

About process_group in SyncBN #22

Closed
yxgeee opened this issue Aug 28, 2020 · 5 comments
Closed

About process_group in SyncBN #22

yxgeee opened this issue Aug 28, 2020 · 5 comments

Comments

@yxgeee
Copy link

yxgeee commented Aug 28, 2020

Hi,

I noticed that you adopted 8 GPUs as a group in SyncBN (https://github.com/facebookresearch/swav/blob/master/main_swav.py#L158) when training with a large batch size of 4096, i.e. 512 training samples in a group for sync batchnorm. I am wondering that 1) why don't you use global syncBN for training and 2) how much does it affect?

Thanks!

@yxgeee
Copy link
Author

yxgeee commented Aug 28, 2020

Plus, I met some issues on reproducing SimCLR based on your code. As mentioned in your paper, you have reproduced SimCLR's performance, could you please provide the main.py file for SimCLR or provide some training tips? For example, besides the different loss and multi-crop augmentation, is there any other difference from SwAV when training?

Your implementation is really clear and easy to extend! Looking forward to your reply. Thanks.

@mathildecaron31
Copy link
Contributor

Hi @yxgeee, thanks for your interest in this repo.

I use communication groups of 8 GPUs when training with 64 GPUs (8 machines) in order to speed up training. Training with global synchronized batch-norm (i.e stats are shared across all processes) takes about x2 more time ! Sharing batch statistics across processes located on the same machine only allows to get rid of inter-machine communications which we found to be a bottleneck.

Surprisingly enough, we did not observe any decay of performance when synchronizing batch-norm per machine compared to global syncbn.

@mathildecaron31
Copy link
Contributor

Regarding SimCLR, I've not been planning to share my code since there are already many implementations out there. I might include it if there is an interest for the community.

@yxgeee
Copy link
Author

yxgeee commented Sep 7, 2020

Thanks a lot for your reply!

@BIGBALLON
Copy link

Hi @yxgeee, thanks for your interest in this repo.

I use communication groups of 8 GPUs when training with 64 GPUs (8 machines) in order to speed up training. Training with global synchronized batch-norm (i.e stats are shared across all processes) takes about x2 more time ! Sharing batch statistics across processes located on the same machine only allows to get rid of inter-machine communications which we found to be a bottleneck.

Surprisingly enough, we did not observe any decay of performance when synchronizing batch-norm per machine compared to global syncbn.

sounds great !! thanks a lot!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants