Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataLoader] Improve typing and documentation of Sampler API for customization #97338

Closed
wants to merge 9 commits into from

Conversation

stsouko
Copy link
Contributor

@stsouko stsouko commented Mar 22, 2023

Explanation with examples of sampler customization added.

  • fixed TypeVar
  • removed unused init from Sampler class
  • added examples for custom sampler and batch sampler
  • Distributed sampler typing fixed.
  • _InfiniteConstantSampler fixed

Fixes #92268

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 22, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/97338

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5585f89:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Mar 22, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@pytorch-bot pytorch-bot bot added the release notes: dataloader release notes category label Mar 22, 2023
@zou3519 zou3519 requested review from ejguan and NivekT March 22, 2023 16:37
@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 22, 2023
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR! These cleanups are helpful and are always welcomed. Small comments below:

torch/utils/data/sampler.py Outdated Show resolved Hide resolved
torch/utils/data/sampler.py Outdated Show resolved Hide resolved
@stsouko stsouko requested review from NivekT and removed request for ejguan March 23, 2023 08:59
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM!

@NivekT
Copy link
Contributor

NivekT commented Mar 23, 2023

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 23, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually some changes are needed (primarily linting):
https://github.com/pytorch/pytorch/actions/runs/4498692677/jobs/7931055601

torch/utils/data/sampler.py Show resolved Hide resolved
DataLoader typing fixed.
@stsouko stsouko requested a review from NivekT March 23, 2023 22:35
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Let me use CI to double check everything

torch/utils/data/sampler.py Outdated Show resolved Hide resolved
torch/utils/data/sampler.py Outdated Show resolved Hide resolved
torch/utils/data/sampler.py Outdated Show resolved Hide resolved
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the back-and-forth, but I'm unable to fix the remaining linting error through GitHub (i.e. remove Sequence from import in dataloader.py)

https://github.com/pytorch/pytorch/actions/runs/4515159665/jobs/7952316990?pr=97338

docstring fixed
@stsouko
Copy link
Contributor Author

stsouko commented Mar 24, 2023

import fixed

@stsouko stsouko requested a review from NivekT March 24, 2023 22:31
torch/utils/data/sampler.py Outdated Show resolved Hide resolved
@NivekT
Copy link
Contributor

NivekT commented Mar 27, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@NivekT
Copy link
Contributor

NivekT commented Mar 28, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we need a follow-up PR to make sure it's not BC-breaking as Sampler might take indices other than integer

@@ -223,8 +217,8 @@ class DataLoader(Generic[T_co]):
__initialized = False

def __init__(self, dataset: Dataset[T_co], batch_size: Optional[int] = 1,
shuffle: Optional[bool] = None, sampler: Union[Sampler, Iterable, None] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emmm. This is BC breaking. The given sampler isn't necessarily yielding integers.



class DistributedSampler(Sampler[T_co]):
class DistributedSampler(Sampler[int]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

@@ -12,23 +12,57 @@
"WeightedRandomSampler",
]

T_co = TypeVar('T_co', covariant=True)
T_co = TypeVar('T_co', int, List[int], covariant=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is BC-breaking as adding restriction on the expected type for Sampler.

@NivekT NivekT changed the title Sampler API described for customization. [DataLoader] Sampler API described for customization. May 3, 2023
@NivekT NivekT changed the title [DataLoader] Sampler API described for customization. [DataLoader] Improve typing and documentation of Sampler API for customization May 3, 2023
pytorchmergebot pushed a commit that referenced this pull request May 3, 2023
API backward compatibility fixed:
#97338 (comment)

Mapped Dataset can accept noninteger indices from custom Samplers.

Fixes #97338

Pull Request resolved: #100409
Approved by: https://github.com/ejguan, https://github.com/NivekT
@kuraga
Copy link
Contributor

kuraga commented Apr 2, 2024

The data_source argument should be removed, but it hasn't been, #123172.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: dataloader release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unify Sampler API
7 participants