Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement len in IterableDatasetShard #13780

Merged
merged 1 commit into from
Sep 28, 2021
Merged

Implement len in IterableDatasetShard #13780

merged 1 commit into from
Sep 28, 2021

Conversation

sgugger
Copy link
Collaborator

@sgugger sgugger commented Sep 28, 2021

What does this PR do?

Currently code using a size iterable dataset in distributed mode will fail because:

  • the Trainer will detect the dataset is sized and try to get the length of the DataLoader
  • the DataLoader will call the length of its dataset which is an IterableDatasetShard
  • the IterableDatasetShard has no length

This PR adds the __len__ method to IterableDatasetShard to solve that issue. The one thing that is midly annoying is that it will make all instance of IterableDatasetShard be recognized as collections.abc.Sized instances since they do implement the len method, even if that method will return an error. But that check is only used on the dataset passed along to the Trainer, not that wrapper, so I think we should be fine.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for taking care of it

@sgugger sgugger merged commit a21ee1f into master Sep 28, 2021
@sgugger sgugger deleted the iterable_ds_shard_len branch September 28, 2021 22:22
stas00 pushed a commit to stas00/transformers that referenced this pull request Oct 12, 2021
lapisfluvialis pushed a commit to lapisfluvialis/transformers that referenced this pull request Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants