Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper seeding for DDP #12

Closed
ajaysaini725 opened this issue Oct 13, 2021 · 0 comments · Fixed by #173
Closed

Proper seeding for DDP #12

ajaysaini725 opened this issue Oct 13, 2021 · 0 comments · Fixed by #173
Assignees
Labels
bug Something isn't working

Comments

@ajaysaini725
Copy link
Member

If the seed is not set in hparams, it is randomly selected in __init__. Each DDP process, when it starts up, gets a different random seed.

The seed from the rank 0 process is saved in checkpoints

When resuming from a checkpoint, the seed from the rank 0 process is restored across all DDP processes.
This leads to inconsistent behavior, since the non-rank-0 process now resume with a different seed than they first trained with.

To fix: add the seed to the RNG state, and sync across all DDP processes

@ajaysaini725 ajaysaini725 added the bug Something isn't working label Oct 13, 2021
@ajaysaini725 ajaysaini725 changed the title Store Seeds across all ddp processes Copy rank0 process to all DDP processes Oct 26, 2021
This was referenced Oct 26, 2021
@hanlint hanlint changed the title Copy rank0 process to all DDP processes Proper seeding for DDP Nov 3, 2021
@hanlint hanlint assigned ajaysaini725 and unassigned jbloxham Nov 3, 2021
@hanlint hanlint linked a pull request Dec 17, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants