-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enforce pickleability for v2 transforms and wrapped datasets #7860
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Philip, some comments below but LGTM if green.
Perhaps we'll want to add more tests w.r.t. multiprocessing_context="spawn"
With c68a6de
On
Meaning, we are adding roughly 1.5 minutes of test time. |
After 5358620
Meaning, we are down to roughly 30 seconds of extra time. |
After f339e6c while pretending I'm on macOS
We are down to ~20 seconds of extra runtime compared to |
@@ -548,7 +549,7 @@ def test_feature_types(self, config): | |||
@test_all_configs | |||
def test_num_examples(self, config): | |||
with self.create_dataset(config) as (dataset, info): | |||
assert len(dataset) == info["num_examples"] | |||
assert len(list(dataset)) == len(dataset) == info["num_examples"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never actually consumed the dataset before. Thus, any failures that happen not for the first sample are not detected. Fortunately, only one test was broken that I'll flag below.
|
||
class Caltech256TestCase(datasets_utils.ImageDatasetTestCase): | ||
DATASET_CLASS = datasets.Caltech256 | ||
|
||
def inject_fake_data(self, tmpdir, config): | ||
tmpdir = pathlib.Path(tmpdir) / "caltech256" / "256_ObjectCategories" | ||
|
||
categories = ((1, "ak47"), (127, "laptop-101"), (257, "clutter")) | ||
categories = ((1, "ak47"), (2, "american-flag"), (3, "backpack")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datasets.Caltech
relies on the fact that all categories are present. When actually consuming the dataset (see above), the old fake data setup falls flat. Our options are:
- Fix the dataset to account for gaps in the catgories.
- Create all categories as fakedata
- Create fakedata that starts at the first without any gaps, but not all categories.
Option 3. is by far the least amount of work, so I went for that here.
"occlusion": labels_tensor[:, 7], | ||
"pose": labels_tensor[:, 8], | ||
"invalid": labels_tensor[:, 9], | ||
"bbox": labels_tensor[:, 0:4].clone(), # x, y, width, height |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Views on tensor cannot be pickled correctly. Meaning regardless of the v2 wrapper, datasets.Widerface
has never worked in a spawn context.
Still LGTM, but the test duration is a bit unfortunate. I think we'd be better-off testing just 2-3 of these datasets (or even just 1 TBH, as one test is still infinitely better than zero). Also ideally we'd let them having |
…#7860) Summary: (Note: this ignores all push blocking failures!) Reviewed By: matteobettini Differential Revision: D48900370 fbshipit-source-id: be1b23dcab58d2a8b5bca7190f94c0123263d036
Fixes #6753 (comment).
pickle
a little for it to be able to deserialize the object. We only need to overwrite the__reduce__
method. Again, we also add tests to enforce pickleability in the future.cc @vfdev-5