Skip to content
This repository has been archived by the owner on Mar 19, 2024. It is now read-only.

Update numpy and pytorch seeding for dataloader and multiple processes per machine. #299

Closed
wants to merge 1 commit into from

Conversation

iseessel
Copy link
Contributor

Summary:
Current state:

Currently we set different seeds per-nodes, but the same seed among all training processes on a node. However, each of our Dataloader process seeds are all different each epoch, but non-deterministic.

Proposed State:

Different random number seeds for each dist_rank, and different, deterministic, for each Dataloader process seeds per epoch.

https://fb.quip.com/hVIcAahpVLo2

Effects:

Fixes randomization for a few losses, hooks, and trunks. Fixes randomization when using the Fork multiprocessing option for transformations. Fixes collapse of seeds to 0 when config seed set to 0.

There are 3 changes summarized as:

  1. Use dist_rank instead of node_id for seeding training processes.
  2. Create worker_init_fn to manually set numpy seed for dataloader workers.
  3. Add knowledge of training process seed to sampler.

Differential Revision: D27784137

…s per machine.

Summary:
*Current state:*

Currently we set different seeds per-nodes, but the same seed among all training processes on a node. However, each of our Dataloader process seeds are all different each epoch, but non-deterministic.

*Proposed State:*

Different random number seeds for each dist_rank, and different, deterministic, for each Dataloader process seeds per epoch.

https://fb.quip.com/hVIcAahpVLo2

*Effects:*

Fixes randomization for a few losses, hooks, and trunks. Fixes randomization when using the Fork multiprocessing option for transformations. Fixes collapse of seeds to 0 when config seed set to 0.

There are 3 changes summarized as:
1. Use dist_rank instead of node_id for seeding training processes.
2. Create worker_init_fn to manually set numpy seed for dataloader workers.
3. Add knowledge of training process seed to sampler.

Differential Revision: D27784137

fbshipit-source-id: 023c1ded41a4ed7c0b2caaedca5b5d1999d1ce42
@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Apr 21, 2021
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D27784137

facebook-github-bot pushed a commit that referenced this pull request Apr 28, 2021
…s per machine. (#299)

Summary:
Pull Request resolved: #299

*Current state:*

Currently we set different seeds per-nodes, but the same seed among all training processes on a node. However, each of our Dataloader process seeds are all different each epoch, but non-deterministic.

*Proposed State:*

Different random number seeds for each dist_rank, and different, deterministic, for each Dataloader process seeds per epoch.

https://fb.quip.com/hVIcAahpVLo2

*Effects:*

Fixes randomization for a few losses, hooks, and trunks. Fixes randomization when using the Fork multiprocessing option for transformations. Fixes collapse of seeds to 0 when config seed set to 0.

There are 3 changes summarized as:
1. Use dist_rank instead of node_id for seeding training processes.
2. Create worker_init_fn to manually set numpy seed for dataloader workers.
3. Add knowledge of training process seed to sampler.

Reviewed By: prigoyal, QuentinDuval

Differential Revision: D27784137

fbshipit-source-id: 55008e1a10d88637f7b8ff44c45fd4658706a465
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants