Better support for iterable datasets #3173
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
IterableDataset
objects currently don't work when_one_pass
is called, because this method attempts to index item0
butNone
would be required for anIterableDataset
. However, passingNone
is incompatible with indexed datasets. This PR aims to fix the issues.The following changes are made:
create_item
to be compatible with both iterable and indexed datasets.do_item
withNone
, not0
. For iterable datasets, this will get the next item, for indexible datasets it will get the first item.DataLoader
are updated to be compatible with this indexing scheme.IterableDataset
s have a stubbed__getitem__
method (that just raises an error), so don't just rely on the presence of that method when establishing theindexed
property - check forIterableDataset
classes/subclasses too.While I've tried to familiarise myself with how
_one_pass
is used, I'm a little unsure whether it could be called such that it consumed the first item of the iterator, then the iterator continued to be used by e.g. the training loop (effectively skipping the first item). Is this a problem, or is the iterator always reset by callingcreate_batches
?