UCT/CUDA/CUDA_IPC: add per stream refcount; avoid callbacks when no pending ops on stream #4646

Akshay-Venkatesh · 2020-01-03T22:59:39Z

Why ?

When we arm CUDA-IPC transport to notify an event descriptor through callback threads, today we launch work on all streams that the process has created but there maybe work only on a subset of streams. This results in process waking up almost immediately after armed if at least one stream was empty (highly likely because some peers are not even CUDA-IPC accessible but there are streams created for them and there would never be any real work issued on streams dedicated for those peer GPUs). The way to get around this issue is to maintain a refcount and issue callbacks if there are outstanding ops on the stream.

cc @bureddy @quasiben

…o outstanding ops on stream

swx-jenkins3 · 2020-01-03T23:04:12Z

Can one of the admins verify this patch?

mellanox-github · 2020-01-04T02:06:41Z

Mellanox CI: PASSED on 25 workers (click for details)

Note: the logs will be deleted after 11-Jan-2020

Agent/Stage Status

_main ✔️ SUCCESS

hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS

hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS

hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS

hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS

hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS

hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS

hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS

hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS

hpc-test-node-gpu_W0 ✔️ SUCCESS

hpc-test-node-gpu_W1 ✔️ SUCCESS

hpc-test-node-gpu_W2 ✔️ SUCCESS

hpc-test-node-gpu_W3 ✔️ SUCCESS

hpc-test-node-legacy_W0 ✔️ SUCCESS

hpc-test-node-legacy_W1 ✔️ SUCCESS

hpc-test-node-legacy_W2 ✔️ SUCCESS

hpc-test-node-legacy_W3 ✔️ SUCCESS

hpc-test-node-new_W0 ✔️ SUCCESS

hpc-test-node-new_W1 ✔️ SUCCESS

hpc-test-node-new_W2 ✔️ SUCCESS

hpc-test-node-new_W3 ✔️ SUCCESS

r-vmb-ppc-jenkins_W0 ✔️ SUCCESS

r-vmb-ppc-jenkins_W1 ✔️ SUCCESS

r-vmb-ppc-jenkins_W2 ✔️ SUCCESS

r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

mellanox-github · 2020-01-04T02:07:28Z

Mellanox CI: PASSED on 25 workers (click for details)

Note: the logs will be deleted after 11-Jan-2020

Agent/Stage Status

_main ✔️ SUCCESS

hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS

hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS

hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS

hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS

hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS

hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS

hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS

hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS

hpc-test-node-gpu_W0 ✔️ SUCCESS

hpc-test-node-gpu_W1 ✔️ SUCCESS

hpc-test-node-gpu_W2 ✔️ SUCCESS

hpc-test-node-gpu_W3 ✔️ SUCCESS

hpc-test-node-legacy_W0 ✔️ SUCCESS

hpc-test-node-legacy_W1 ✔️ SUCCESS

hpc-test-node-legacy_W2 ✔️ SUCCESS

hpc-test-node-legacy_W3 ✔️ SUCCESS

hpc-test-node-new_W0 ✔️ SUCCESS

hpc-test-node-new_W1 ✔️ SUCCESS

hpc-test-node-new_W2 ✔️ SUCCESS

hpc-test-node-new_W3 ✔️ SUCCESS

r-vmb-ppc-jenkins_W0 ✔️ SUCCESS

r-vmb-ppc-jenkins_W1 ✔️ SUCCESS

r-vmb-ppc-jenkins_W2 ✔️ SUCCESS

r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

dmitrygx · 2020-01-04T07:44:17Z

ok to test

mellanox-github · 2020-01-04T10:18:07Z

Mellanox CI: PASSED on 25 workers (click for details)

Note: the logs will be deleted after 11-Jan-2020

Agent/Stage Status

_main ✔️ SUCCESS

hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS

hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS

hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS

hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS

hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS

hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS

hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS

hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS

hpc-test-node-gpu_W0 ✔️ SUCCESS

hpc-test-node-gpu_W1 ✔️ SUCCESS

hpc-test-node-gpu_W2 ✔️ SUCCESS

hpc-test-node-gpu_W3 ✔️ SUCCESS

hpc-test-node-legacy_W0 ✔️ SUCCESS

hpc-test-node-legacy_W1 ✔️ SUCCESS

hpc-test-node-legacy_W2 ✔️ SUCCESS

hpc-test-node-legacy_W3 ✔️ SUCCESS

hpc-test-node-new_W0 ✔️ SUCCESS

hpc-test-node-new_W1 ✔️ SUCCESS

hpc-test-node-new_W2 ✔️ SUCCESS

hpc-test-node-new_W3 ✔️ SUCCESS

r-vmb-ppc-jenkins_W0 ✔️ SUCCESS

r-vmb-ppc-jenkins_W1 ✔️ SUCCESS

r-vmb-ppc-jenkins_W2 ✔️ SUCCESS

r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

pentschev · 2020-01-06T18:59:42Z

@Akshay-Venkatesh I just had a quick chat with @quasiben and I've pointed out that today we don't use CUDA streams with Dask. I have filed an issue in the past rapidsai/dask-cuda#96 to discuss how we could, but today using streams with Dask is not possible/beneficial. I do agree this is an important part of getting the most performance with CUDA, but I just wanted to point out that today we can't really get any performance benefits from this work, unfortunately.

brminich

is it possible to add test for this?

Akshay-Venkatesh · 2020-01-08T19:12:37Z

is it possible to add test for this?

Will work on this.

Akshay-Venkatesh · 2020-01-09T19:48:42Z

is it possible to add test for this?

@brminich Seems like finding out the number of times a file descriptor has been written is not trivial. As epoll_wait returns only the number of file descriptors that were ready, it looks like that multiple writes to the same file descriptor cannot be distinguished. Also, it seems like if we use an external file descriptor through worker params, we can expect number of file descriptors which are ready to be just one so we cannot see how many streams wrote to cuda_ipc file descriptor unless we peek into the internal state of cuda_ipc UCT. I don't see an easy way to test this case with gtest/uct. Do you have any suggestions on how we can test this?

Akshay-Venkatesh · 2020-01-13T19:58:16Z

@brminich can you let me know if you have any further thoughts on this #4646 (comment)

brminich · 2020-01-16T06:36:51Z

@alinask, plz review

…stream-refcount UCT/CUDA/CUDA_IPC: add per stream refcount; avoid callbacks when no pending ops on stream

Akshay-Venkatesh added 2 commits January 3, 2020 14:49

UCT/CUDA/CUDA_IPC: add per stream refcount and avoid callbacks when n…

d1bd8f2

…o outstanding ops on stream

UCT/CUDA/CUDA_IPC: indentation fix

e7ba74b

dmitrygx approved these changes Jan 4, 2020

View reviewed changes

brminich reviewed Jan 7, 2020

View reviewed changes

brminich approved these changes Jan 14, 2020

View reviewed changes

bureddy approved these changes Jan 16, 2020

View reviewed changes

alinask approved these changes Jan 16, 2020

View reviewed changes

brminich merged commit 65e7ce3 into openucx:master Jan 17, 2020

pentschev mentioned this pull request Jan 29, 2020

Dask-cudf multi partition merge slows down with ucx rapidsai/ucx-py#402

Closed

pentschev pushed a commit to pentschev/ucx that referenced this pull request Mar 5, 2020

Merge pull request openucx#4646 from Akshay-Venkatesh/topic/cuda-ipc-…

86c4345

…stream-refcount UCT/CUDA/CUDA_IPC: add per stream refcount; avoid callbacks when no pending ops on stream

pentschev pushed a commit to pentschev/ucx that referenced this pull request Apr 6, 2020

Merge pull request openucx#4646 from Akshay-Venkatesh/topic/cuda-ipc-…

347d473

…stream-refcount UCT/CUDA/CUDA_IPC: add per stream refcount; avoid callbacks when no pending ops on stream

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UCT/CUDA/CUDA_IPC: add per stream refcount; avoid callbacks when no pending ops on stream #4646

UCT/CUDA/CUDA_IPC: add per stream refcount; avoid callbacks when no pending ops on stream #4646

Akshay-Venkatesh commented Jan 3, 2020

swx-jenkins3 commented Jan 3, 2020

mellanox-github commented Jan 4, 2020

mellanox-github commented Jan 4, 2020

dmitrygx commented Jan 4, 2020

mellanox-github commented Jan 4, 2020

pentschev commented Jan 6, 2020

brminich left a comment

Akshay-Venkatesh commented Jan 8, 2020

Akshay-Venkatesh commented Jan 9, 2020

Akshay-Venkatesh commented Jan 13, 2020

brminich commented Jan 16, 2020

UCT/CUDA/CUDA_IPC: add per stream refcount; avoid callbacks when no pending ops on stream #4646

UCT/CUDA/CUDA_IPC: add per stream refcount; avoid callbacks when no pending ops on stream #4646

Conversation

Akshay-Venkatesh commented Jan 3, 2020

Why ?

swx-jenkins3 commented Jan 3, 2020

mellanox-github commented Jan 4, 2020

mellanox-github commented Jan 4, 2020

dmitrygx commented Jan 4, 2020

mellanox-github commented Jan 4, 2020

pentschev commented Jan 6, 2020

brminich left a comment

Choose a reason for hiding this comment

Akshay-Venkatesh commented Jan 8, 2020

Akshay-Venkatesh commented Jan 9, 2020

Akshay-Venkatesh commented Jan 13, 2020

brminich commented Jan 16, 2020