-
Notifications
You must be signed in to change notification settings - Fork 169
Refactor ShardingFilterIterDataPipe
into a seperate file (#94095)
#987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request was exported from Phabricator. Differential Revision: D43014692 |
…rch#987) Summary: Pull Request resolved: meta-pytorch#987 X-link: pytorch/pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 fbshipit-source-id: 223a9503fc5c3c78b2aebb30fe9665bb081f37ee
4c15f34
to
bfe7125
Compare
This pull request was exported from Phabricator. Differential Revision: D43014692 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D43014692 |
…rch#987) Summary: Pull Request resolved: meta-pytorch#987 X-link: pytorch/pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 fbshipit-source-id: 74713ac091b1dcb215687a9f572dc96d79d89c7e
bfe7125
to
e778b3c
Compare
…rch#987) Summary: Pull Request resolved: meta-pytorch#987 X-link: pytorch/pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 fbshipit-source-id: 917937c78c6839a43b8595f649ada9fe829a01fd
e778b3c
to
11315e0
Compare
This pull request was exported from Phabricator. Differential Revision: D43014692 |
…rch#987) Summary: Pull Request resolved: meta-pytorch#987 X-link: pytorch/pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 fbshipit-source-id: a1d3bcb02309ca7ee4d991a31945d4550f4174b9
11315e0
to
f6d31be
Compare
This pull request was exported from Phabricator. Differential Revision: D43014692 |
…rch#987) Summary: Pull Request resolved: meta-pytorch#987 X-link: pytorch/pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 Reviewed By: ejguan fbshipit-source-id: d7b450b89989f0a05c6e70fb8d7290893918a32a
f6d31be
to
31c3a2e
Compare
This pull request was exported from Phabricator. Differential Revision: D43014692 |
…rch#987) Summary: Pull Request resolved: meta-pytorch#987 X-link: pytorch/pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 Reviewed By: seemethere fbshipit-source-id: 79a2ee53d279a4dc8da84eb2af3e5c801ff3d1ea
31c3a2e
to
1af1c08
Compare
This pull request was exported from Phabricator. Differential Revision: D43014692 |
Summary: X-link: meta-pytorch/data#987 Pull Request resolved: pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 Test Plan: sandcastle and CI How to run unit tests in buck related to such changes? :) Reviewed By: seemethere fbshipit-source-id: fcd2a4e57b15fdf7411c959a38c377ac114f0ecb
This pull request has been merged in 914d3dd. |
This is BC breaking. The original module ( For us it is not a large problem to change, but other libraries that rely on that might break in the upcoming release. |
Just to give some unsolicited advice: I would push to make all modules below Using such an architecture would have prevented this, because the datapipe would still be exposed under |
@pmeier |
I was more concern on the |
Huh, I guess we never looked for that given that the module it is in is public. But fair enough, I'm gonna change that on our side. |
Sounds good. Let me import them back . Thanks for reporting this! |
…eprecated (pytorch#94527) Summary: Pull Request resolved: pytorch#94527 pytorch#94095 moves this into `tud.datapipes.iter.sharding`. However, since previously this is a public API, this is a BC break change. As discussed in meta-pytorch/data#987 (comment), we will have backward compatbile support but with deprecated warning. Test Plan: ``` buck test mode/opt //caffe2/test:datapipe ``` Reviewed By: ejguan Differential Revision: D43161015 fbshipit-source-id: 82049979981b83aefd247843fbc2c004a7e7a90a
…eprecated (#94527) Summary: #94095 moves this into `tud.datapipes.iter.sharding`. However, since previously this is a public API, this is a BC break change. As discussed in meta-pytorch/data#987 (comment), we will have backward compatbile support but with deprecated warning. Differential Revision: D43161015 Pull Request resolved: #94527 Approved by: https://github.com/ejguan, https://github.com/NivekT
Summary:
Move
ShardingFilterIterDataPipe
into a dedicated file.Also, propose to have a dedicated parent class (
_ShardingIterDataPipe
) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composableCallable[[Iterable], Iterable]
. But open to other discussions.Open question: Should
ShardingRoundRobinDispatcherIterDataPipe also be considered as a
_ShardingIterDataPipe
? (e.g. this sharding is executed by replicating (the metadata), whileShardingRoundRobinDispatcherIterDataPipe
hints too expensive to replicate so requires round robin data exchange/dispatch).Differential Revision:
D43014692
X-link: pytorch/pytorch#94095
D43014692