-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add DataPipes Graph Functions #61235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DataPipes Graph Functions #61235
Conversation
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 15bfa2c (more details on the Dr. CI page and at hud.pytorch.org/pr/61235):
🕵️ 2 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
self.assertEqual(count, len(self.temp_files)) | ||
|
||
|
||
# TODO(VitalyFedyunin): Generates unclosed buffer warning, need to investigate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did look at this problem before, but cannot find a feasible solution.
Using tar as an example, we open the tar file here:
tar = tarfile.open(fileobj=cast(Optional[IO[bytes]], data_stream), mode="r:*") |
and attach a reference of this tar file handle to each file within it to prevent source tar file handle is closed after yield here:
pytorch/torch/utils/data/datapipes/iter/readfilesfromtar.py
Lines 50 to 52 in 864dcbb
# Add a reference of the source tarfile into extracted_fobj, so the source | |
# tarfile handle won't be released until all the extracted file objs are destroyed. | |
extracted_fobj.source_ref = tar # type: ignore[attr-defined] |
I added a PR #58938 to close each file explicitly after read. But, it would not close the reference of tar file handle. And python gc would close the tar file stream, which generates this warning.
captured_connections.append(obj) | ||
return stub_unpickler, () | ||
|
||
# TODO(VitalyFedyunin): Better do it as `with` context for safety |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with comment. We need to also take care of the existing reduce_ex_hook
before we set hook to capture the connected datapipes.
[ghstack-poisoned]
[ghstack-poisoned]
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@VitalyFedyunin merged this pull request in f285788. |
Stack from ghstack:
is_shardable
andapply_sharding
#61236 Implement usage ofis_shardable
andapply_sharding
Differential Revision: D29588834