-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT/CUDA/CUDA_IPC: add per stream refcount; avoid callbacks when no pending ops on stream #4646
UCT/CUDA/CUDA_IPC: add per stream refcount; avoid callbacks when no pending ops on stream #4646
Conversation
…o outstanding ops on stream
Can one of the admins verify this patch? |
Mellanox CI: PASSED on 25 workers (click for details)Note: the logs will be deleted after 11-Jan-2020
|
Mellanox CI: PASSED on 25 workers (click for details)Note: the logs will be deleted after 11-Jan-2020
|
ok to test |
Mellanox CI: PASSED on 25 workers (click for details)Note: the logs will be deleted after 11-Jan-2020
|
@Akshay-Venkatesh I just had a quick chat with @quasiben and I've pointed out that today we don't use CUDA streams with Dask. I have filed an issue in the past rapidsai/dask-cuda#96 to discuss how we could, but today using streams with Dask is not possible/beneficial. I do agree this is an important part of getting the most performance with CUDA, but I just wanted to point out that today we can't really get any performance benefits from this work, unfortunately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to add test for this?
Will work on this. |
@brminich Seems like finding out the number of times a file descriptor has been written is not trivial. As |
@brminich can you let me know if you have any further thoughts on this #4646 (comment) |
@alinask, plz review |
…stream-refcount UCT/CUDA/CUDA_IPC: add per stream refcount; avoid callbacks when no pending ops on stream
…stream-refcount UCT/CUDA/CUDA_IPC: add per stream refcount; avoid callbacks when no pending ops on stream
Why ?
When we arm CUDA-IPC transport to notify an event descriptor through callback threads, today we launch work on all streams that the process has created but there maybe work only on a subset of streams. This results in process waking up almost immediately after armed if at least one stream was empty (highly likely because some peers are not even CUDA-IPC accessible but there are streams created for them and there would never be any real work issued on streams dedicated for those peer GPUs). The way to get around this issue is to maintain a refcount and issue callbacks if there are outstanding ops on the stream.
cc @bureddy @quasiben