Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampler should be seeded lazily #63609

Closed
ejguan opened this issue Aug 19, 2021 · 0 comments
Closed

Sampler should be seeded lazily #63609

ejguan opened this issue Aug 19, 2021 · 0 comments
Assignees
Labels
module: dataloader Related to torch.utils.data.DataLoader and Sampler module: regression It used to work, and now it doesn't triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ejguan
Copy link
Contributor

ejguan commented Aug 19, 2021

🐛 Bug

After #63026 is landed, the generator for Sampler is attached to the instance, which helps to serialize the state of Sampler. But, it brings a problem that it will prevent Sampler's generator being seeded before each epoch.

To Reproduce

Check #63026 (comment)

User would expect same result for the sampler without specifying generator input when set seed for each epoch.

sampler = RandomSampler(ds)
torch.manual_seed(0)
l1 = list(sampler)
torch.manual_seed(0)
l2 = list(sampler)
# Expect same
assert l1 == l2

cc @ssnl @VitalyFedyunin @ejguan @NivekT

@ejguan ejguan added module: dataloader Related to torch.utils.data.DataLoader and Sampler triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Aug 19, 2021
@ejguan ejguan added this to Needs triage in torch.utils.data via automation Aug 19, 2021
@ejguan ejguan self-assigned this Aug 19, 2021
@ejguan ejguan linked a pull request Aug 20, 2021 that will close this issue
ejguan added a commit that referenced this issue Aug 20, 2021
ejguan added a commit that referenced this issue Aug 20, 2021
ejguan added a commit that referenced this issue Aug 20, 2021
@ejguan ejguan moved this from Needs triage to To Do in torch.utils.data Aug 20, 2021
ejguan added a commit that referenced this issue Aug 20, 2021
ejguan added a commit that referenced this issue Aug 20, 2021
ejguan added a commit that referenced this issue Aug 20, 2021
ejguan added a commit that referenced this issue Aug 20, 2021
ejguan added a commit that referenced this issue Sep 29, 2021
…construction"

Fixes #63609


- Revert #63026 
  - Sampler is expected to be re-seeded if user specify seed before each epoch
  - Can not attach generator to self with `__iter__` because multiple iterators will ruin the use case
- Add tests to prevent the same case for different Samplers



Differential Revision: [D30451774](https://our.internmc.facebook.com/intern/diff/D30451774)

[ghstack-poisoned]
ejguan added a commit that referenced this issue Sep 29, 2021
Fixes #63609


- Revert #63026 
  - Sampler is expected to be re-seeded if user specify seed before each epoch
  - Can not attach generator to self with `__iter__` because multiple iterators will ruin the use case
- Add tests to prevent the same case for different Samplers



Differential Revision: [D30451774](https://our.internmc.facebook.com/intern/diff/D30451774)

[ghstack-poisoned]
ejguan added a commit that referenced this issue Sep 29, 2021
…construction"

Fixes #63609


- Revert #63026 
  - Sampler is expected to be re-seeded if user specify seed before each epoch
  - Can not attach generator to self with `__iter__` because multiple iterators will ruin the use case
- Add tests to prevent the same case for different Samplers
- Keep same functionality introduced in #63026 that user can serialize the Sampler.



Differential Revision: [D30451774](https://our.internmc.facebook.com/intern/diff/D30451774)

[ghstack-poisoned]
ejguan added a commit that referenced this issue Sep 29, 2021
Fixes #63609


- Revert #63026 
  - Sampler is expected to be re-seeded if user specify seed before each epoch
  - Can not attach generator to self with `__iter__` because multiple iterators will ruin the use case
- Add tests to prevent the same case for different Samplers
- Keep same functionality introduced in #63026 that user can serialize the Sampler.



Differential Revision: [D30451774](https://our.internmc.facebook.com/intern/diff/D30451774)

[ghstack-poisoned]
ejguan added a commit that referenced this issue Sep 29, 2021
…construction"

Fixes #63609


- Revert #63026 
  - Sampler is expected to be re-seeded if user specify seed before each epoch
  - Can not attach generator to self with `__iter__` because multiple iterators will ruin the use case
- Add tests to prevent the same case for different Samplers


Differential Revision: [D30451774](https://our.internmc.facebook.com/intern/diff/D30451774)

[ghstack-poisoned]
ejguan added a commit that referenced this issue Sep 29, 2021
Fixes #63609


- Revert #63026 
  - Sampler is expected to be re-seeded if user specify seed before each epoch
  - Can not attach generator to self with `__iter__` because multiple iterators will ruin the use case
- Add tests to prevent the same case for different Samplers


Differential Revision: [D30451774](https://our.internmc.facebook.com/intern/diff/D30451774)

[ghstack-poisoned]
torch.utils.data automation moved this from To Do to Done Sep 30, 2021
ejguan added a commit to ejguan/pytorch that referenced this issue Sep 30, 2021
Summary:
Pull Request resolved: pytorch#63646

Fixes pytorch#63609

Test Plan: Imported from OSS

Reviewed By: NivekT

Differential Revision: D30451774

Pulled By: ejguan

fbshipit-source-id: 550d77494326446d1a42b5da0559e0d384c47413
@ejguan ejguan added the module: regression It used to work, and now it doesn't label Oct 6, 2021
malfet pushed a commit that referenced this issue Oct 8, 2021
Summary:
Pull Request resolved: #63646

Fixes #63609

Test Plan: Imported from OSS

Reviewed By: NivekT

Differential Revision: D30451774

Pulled By: ejguan

fbshipit-source-id: 550d77494326446d1a42b5da0559e0d384c47413
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: dataloader Related to torch.utils.data.DataLoader and Sampler module: regression It used to work, and now it doesn't triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Development

Successfully merging a pull request may close this issue.

1 participant