-
Notifications
You must be signed in to change notification settings - Fork 25.7k
support allgather on GPU #14576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support allgather on GPU #14576
Conversation
9f8b3c6 to
877d4de
Compare
877d4de to
1ed0ff7
Compare
1ed0ff7 to
8261ece
Compare
8261ece to
36d6678
Compare
36d6678 to
299ebd2
Compare
299ebd2 to
25f5f9a
Compare
25f5f9a to
2897144
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits and one structural thing.
Right now there is an implied assumption that all outputs in the nested output vector are placed on the same device. We should test that this is the case and throw if it isn't. If some part of this functionality is reused and the assumption is false then is can lead to synchronization issues.
There is also the possibility of using flattenDenseTensors directly instead of first copying into temporary tensors on the CPU side and then flattening them. That would save another copy. Since this is a backfill op it is not critical but will result in improved performance. Can you file an issue to track this? It would make for a good starter task for somebody new to the code base.
|
Pieter, regarding possibility of using flattenDenseTensors. I think this also requires the nested output vector tensors on the same device? |
2897144 to
0bd2876
Compare
0bd2876 to
9fe8a88
Compare
9fe8a88 to
a6d10e3
Compare
a6d10e3 to
acd3b8d
Compare
acd3b8d to
4a4a285
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. There is a problem in CI though, something about invalid events.
|
Re: flattenDenseTensors, I think it would work with multiple devices, looking at the implementation: pytorch/aten/src/THC/generic/THCTensorMath.cu Lines 87 to 255 in 7e4a5b8
|
|
Regarding DeviceIndex, also see #14729. |
|
@pietern it seems the failures is randomly. When I ran individual test, it always pass. But when I run all the tests, sometimes, it failed. investigating... |
|
crated #14812 for new comers. |
4a4a285 to
c6383a6
Compare
c6383a6 to
42f312b
Compare
42f312b to
0940e75
Compare
0940e75 to
6fa1498
Compare
6fa1498 to
2f08151
Compare
2f08151 to
de5cd78
Compare
de5cd78 to
21f0f38
Compare
21f0f38 to
9449e82
Compare
9449e82 to
667f1d9
Compare
Summary: as titled Reviewed By: pietern Differential Revision: D13266063 fbshipit-source-id: 413140d80df24f4d6db26d1fcb5051fc41b2ab9a
667f1d9 to
a1de0d9
Compare
Summary: as titled
Differential Revision: D13266063