Update numpy and pytorch seeding for dataloader and multiple processes per machine. #299

iseessel · 2021-04-21T23:14:25Z

Summary:
Current state:

Currently we set different seeds per-nodes, but the same seed among all training processes on a node. However, each of our Dataloader process seeds are all different each epoch, but non-deterministic.

Proposed State:

Different random number seeds for each dist_rank, and different, deterministic, for each Dataloader process seeds per epoch.

https://fb.quip.com/hVIcAahpVLo2

Effects:

Fixes randomization for a few losses, hooks, and trunks. Fixes randomization when using the Fork multiprocessing option for transformations. Fixes collapse of seeds to 0 when config seed set to 0.

There are 3 changes summarized as:

Use dist_rank instead of node_id for seeding training processes.
Create worker_init_fn to manually set numpy seed for dataloader workers.
Add knowledge of training process seed to sampler.

Differential Revision: D27784137

…s per machine. Summary: *Current state:* Currently we set different seeds per-nodes, but the same seed among all training processes on a node. However, each of our Dataloader process seeds are all different each epoch, but non-deterministic. *Proposed State:* Different random number seeds for each dist_rank, and different, deterministic, for each Dataloader process seeds per epoch. https://fb.quip.com/hVIcAahpVLo2 *Effects:* Fixes randomization for a few losses, hooks, and trunks. Fixes randomization when using the Fork multiprocessing option for transformations. Fixes collapse of seeds to 0 when config seed set to 0. There are 3 changes summarized as: 1. Use dist_rank instead of node_id for seeding training processes. 2. Create worker_init_fn to manually set numpy seed for dataloader workers. 3. Add knowledge of training process seed to sampler. Differential Revision: D27784137 fbshipit-source-id: 023c1ded41a4ed7c0b2caaedca5b5d1999d1ce42

facebook-github-bot · 2021-04-21T23:14:36Z

This pull request was exported from Phabricator. Differential Revision: D27784137

…s per machine. (#299) Summary: Pull Request resolved: #299 *Current state:* Currently we set different seeds per-nodes, but the same seed among all training processes on a node. However, each of our Dataloader process seeds are all different each epoch, but non-deterministic. *Proposed State:* Different random number seeds for each dist_rank, and different, deterministic, for each Dataloader process seeds per epoch. https://fb.quip.com/hVIcAahpVLo2 *Effects:* Fixes randomization for a few losses, hooks, and trunks. Fixes randomization when using the Fork multiprocessing option for transformations. Fixes collapse of seeds to 0 when config seed set to 0. There are 3 changes summarized as: 1. Use dist_rank instead of node_id for seeding training processes. 2. Create worker_init_fn to manually set numpy seed for dataloader workers. 3. Add knowledge of training process seed to sampler. Reviewed By: prigoyal, QuentinDuval Differential Revision: D27784137 fbshipit-source-id: 55008e1a10d88637f7b8ff44c45fd4658706a465

facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Apr 21, 2021

facebook-github-bot closed this Apr 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update numpy and pytorch seeding for dataloader and multiple processes per machine. #299

Update numpy and pytorch seeding for dataloader and multiple processes per machine. #299

iseessel commented Apr 21, 2021

facebook-github-bot commented Apr 21, 2021

Update numpy and pytorch seeding for dataloader and multiple processes per machine. #299

Update numpy and pytorch seeding for dataloader and multiple processes per machine. #299

Conversation

iseessel commented Apr 21, 2021

facebook-github-bot commented Apr 21, 2021