Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if CUDA context was created in distributed.comm.ucx #722

Merged

Conversation

pentschev
Copy link
Member

Because communications in Nanny are initialized before Dask preload plugins, and UCX creates the context directly within its own initializer in Distributed, Dask-CUDA will always think the CUDA context has already been incorrectly initialized when using UCX, which isn't true, with the globals added here Dask-CUDA can verify the CUDA contexts are indeed valid.

Depends on dask/distributed#5308 .

Fixes #721 .

@pentschev pentschev requested a review from a team as a code owner September 10, 2021 20:28
@github-actions github-actions bot added the python python code needed label Sep 10, 2021
@pentschev pentschev added 3 - Ready for Review Ready for review by team bug Something isn't working non-breaking Non-breaking change labels Sep 10, 2021
@pentschev pentschev changed the title Check distributed context creation Check if CUDA context was created in distributed.comm.ucx Sep 10, 2021
@pentschev
Copy link
Member Author

rerun tests

@codecov-commenter
Copy link

codecov-commenter commented Sep 13, 2021

Codecov Report

Merging #722 (488ef3d) into branch-21.10 (8e6ab70) will increase coverage by 1.02%.
The diff coverage is 88.17%.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.10     #722      +/-   ##
================================================
+ Coverage         87.63%   88.65%   +1.02%     
================================================
  Files                15       15              
  Lines              1658     1737      +79     
================================================
+ Hits               1453     1540      +87     
+ Misses              205      197       -8     
Impacted Files Coverage Δ
dask_cuda/cuda_worker.py 77.64% <ø> (ø)
dask_cuda/get_device_memory_objects.py 90.00% <0.00%> (+21.94%) ⬆️
dask_cuda/local_cuda_cluster.py 77.88% <50.00%> (ø)
dask_cuda/utils.py 81.74% <65.95%> (-5.53%) ⬇️
dask_cuda/proxify_device_objects.py 95.45% <80.00%> (+6.56%) ⬆️
dask_cuda/initialize.py 94.73% <90.90%> (+5.84%) ⬆️
dask_cuda/proxify_host_file.py 93.46% <92.15%> (-5.94%) ⬇️
dask_cuda/proxy_object.py 90.59% <98.11%> (+0.94%) ⬆️
dask_cuda/device_host_file.py 71.66% <100.00%> (+1.50%) ⬆️
dask_cuda/explicit_comms/dataframe/shuffle.py 98.69% <100.00%> (+0.65%) ⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b6a7448...488ef3d. Read the comment docs.

@pentschev
Copy link
Member Author

rerun tests

@pentschev
Copy link
Member Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit af0e678 into rapidsai:branch-21.10 Sep 14, 2021
@pentschev
Copy link
Member Author

Thanks @jakirkham for reviewing!

@pentschev pentschev deleted the check-distributed-context-creation branch September 22, 2021 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working non-breaking Non-breaking change python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warnings about existing CUDA contexts on dask-cuda cluster startup
3 participants