Skip to content

Conversation

@pmeier
Copy link
Contributor

@pmeier pmeier commented Dec 27, 2021

Addresses #5108.

cc @pmeier @bjuncek

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Dec 27, 2021

💊 CI failures summary and remediations

As of commit 35f464c (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



🚧 5 ongoing upstream failures:

These were probably caused by upstream breakages that are not fixed yet.


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pmeier , I made a few comments below but this looks good

Comment on lines +121 to +124
def _getattr_closure(obj: Any, *, attrs: Sequence[str]) -> Any:
for attr in attrs:
obj = getattr(obj, attr)
return obj
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me understand why this change is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enables chained getattr calls. In turn, this allows me to use this filter function

images_dp = Filter(images_dp, path_comparator("parent.name", config.split))

rather than writing a custom function that extracts the name of the parent folder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it up to you to decide, as the entire logic of path_comparator and path_accessor is honestly too complex for me to handle ATM.

But as rule of thumb, I tend to avoid changes to helper functions when they just only address one single use-case.

Copy link
Contributor Author

@pmeier pmeier Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the entire logic of path_comparator and path_accessor is honestly too complex for me to handle ATM.

The complexity stems from the fact that we cannot use lambdas or local function together with the datapipes, since they cannot be serialized safely. Quick explanation:

  • path_accessor: When using datapipes you often have path-handle-tuples for files. path_accessor takes such a tuple, turns the the first item into a pathlib.Path and accesses it according to the input. So path_accessor("parent.name") is equivalent to

    def extract_folder_name(data):
        path = pathlib.Path(data[0])
        return path.parent.name

    This is useful when you want to merge two datapipes and need a function that generates merge keys.

  • path_comparator: This is one layer on top of path_accessor by providing an equality comparison for the extract path information. For example Filter(scenes_dp, path_comparator("name", f"CLEVR_{config.split}_scenes.json")) will select all files where the file name matches the second argument. (This is what I used to refactor your suggestion about the regex).

But as rule of thumb, I tend to avoid changes to helper functions when they just only address one single use-case.

It is true, that this currently only addresses this, but I already needed to make the same changes to getitem a while back. So I'm guessing if I don't do it now, I'll have to do it in the future anyway, but then I also have to remember that I need to go back to here and fix this.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @pmeier , LGTM!

@NicolasHug NicolasHug merged commit f948d79 into pytorch:main Jan 6, 2022
facebook-github-bot pushed a commit that referenced this pull request Jan 8, 2022
Summary:
* add prototype dataset

* add old-style dataset

* appease mypy

* simplify prototype scenes

* Update torchvision/datasets/clevr.py

Reviewed By: sallysyw

Differential Revision: D33479266

fbshipit-source-id: c9a130490a572546fdc6bc8b88ba21b8e9aebc30

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
@pmeier pmeier deleted the datasets/clevr branch January 11, 2022 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants