Better support for iterable datasets #3173

jcaw · 2021-01-18T13:36:26Z

IterableDataset objects currently don't work when _one_pass is called, because this method attempts to index item 0 but None would be required for an IterableDataset. However, passing None is incompatible with indexed datasets. This PR aims to fix the issues.

The following changes are made:

Fix create_item to be compatible with both iterable and indexed datasets.
If a sample item is required, index do_item with None, not 0. For iterable datasets, this will get the next item, for indexible datasets it will get the first item.
Subclasses of DataLoader are updated to be compatible with this indexing scheme.
Native PyTorch IterableDatasets have a stubbed __getitem__ method (that just raises an error), so don't just rely on the presence of that method when establishing the indexed property - check for IterableDataset classes/subclasses too.

While I've tried to familiarise myself with how _one_pass is used, I'm a little unsure whether it could be called such that it consumed the first item of the iterator, then the iterator continued to be used by e.g. the training loop (effectively skipping the first item). Is this a problem, or is the iterator always reset by calling create_batches?

review-notebook-app · 2021-01-18T13:36:30Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

jcaw · 2021-01-18T13:41:54Z

Seems like I've pushed some cruft in the notebooks - let me remove that and force-push before this is reviewed.

- Fix `create_item` to be compatible with both iterable and indexed datasets. - If a sample item is required, index `do_item` with `None`, not `0`. For iterable datasets, this will get the next item, for indexible datasets it will get the first item. - Subclasses of `DataLoader` are updated to be compatible with this indexing scheme. - Native PyTorch `IterableDataset`s have a stubbed `__getitem__` method (that just raises an error), so don't just rely on the presence of that method when establishing the `indexed` property - check for `IterableDataset` classes/subclasses too.

jph00 · 2021-02-08T17:45:31Z

Looks good! - Many thanks :)

jcaw requested a review from jph00 as a code owner January 18, 2021 13:36

jcaw force-pushed the fix-iterable-datasets branch from f170a4c to 1c967eb Compare January 18, 2021 13:43

jph00 merged commit 45376f1 into fastai:master Feb 8, 2021

jph00 changed the title ~~Fix iterable datasets~~ Better support for iterable datasets Feb 8, 2021

jph00 added the enhancement label Feb 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better support for iterable datasets #3173

Better support for iterable datasets #3173

jcaw commented Jan 18, 2021 •

edited

review-notebook-app bot commented Jan 18, 2021

jcaw commented Jan 18, 2021

jph00 commented Feb 8, 2021

Better support for iterable datasets #3173

Better support for iterable datasets #3173

Conversation

jcaw commented Jan 18, 2021 • edited

review-notebook-app bot commented Jan 18, 2021

jcaw commented Jan 18, 2021

jph00 commented Feb 8, 2021

jcaw commented Jan 18, 2021 •

edited