Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryBankModule - no distributed all gather prior to queueing a batch #1018

Closed
ibro45 opened this issue Dec 22, 2022 · 2 comments · Fixed by #1291
Closed

MemoryBankModule - no distributed all gather prior to queueing a batch #1018

ibro45 opened this issue Dec 22, 2022 · 2 comments · Fixed by #1291

Comments

@ibro45
Copy link
Contributor

ibro45 commented Dec 22, 2022

If I'm not mistaken, the batch is not all_gathered before the queue is updated. I was just comparing the code against the one it was based on (MoCo's memory bank) and noticed this: https://github.com/facebookresearch/moco/blob/78b69cafae80bc74cd1a89ac3fb365dc20d157d3/moco/builder.py#L55

Was this done on purpose?

Is it because different processes would use the same queued batches, unlike the DistributedDataLoader batches which are indexed in DDP way?

@guarin
Copy link
Contributor

guarin commented Jan 3, 2023

Hi, you are right that our implementation differs and that our code does not share memory bank entries between processes. Thanks for pointing it out! Will make an issue to change this, but might take a while until we have time to work on it.

Not sharing has the following benefits:

  • No sync between processes
  • Larger diversity in the memory bank (because memory bank is unique for each process)
    But also a couple of drawbacks:
  • Training performance might slightly change if number of processes is changed
  • Memory bank is updated less frequently

@ibro45
Copy link
Contributor Author

ibro45 commented Jan 3, 2023

We could have both approaches by adding all_gather_batches=False to the MemoryBankModule. But I would list the pros and cons in docstrings, as you might want to avoid gathering batches despite using DDP. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants