Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Doubt regarding all_gather_list in case of DDP #230

Open
bhattg opened this issue Aug 29, 2022 · 0 comments
Open

Doubt regarding all_gather_list in case of DDP #230

bhattg opened this issue Aug 29, 2022 · 0 comments

Comments

@bhattg
Copy link

bhattg commented Aug 29, 2022

Hi,

Thanks for the amazing framework. I have a doubt regarding the utility of the all_gather_list function, that gathers the tensors across the GPUs. When we are training in DDP, the gradients are synchronized before the parameter updates, therefore, why is this step needed? Is it just to collate the loss or number of correct predictions or the rank (in evaluation)? If yes, then couldn't one gather all of them after computing the loss, instead of exchanging the question and context representations first and then going forward with it?

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant