-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UserWarning: A CUDA context for device 0 already exists #867
Comments
This must be new dask_cudf behavior, this warning was introduced specifically to catch this kind of change. @quasiben @shwina any ideas if anything may have changed in cudf regarding CUDA context creation? It may be relevant to mention that we have dask-cuda/dask_cuda/cuda_worker.py Lines 73 to 75 in 0bed313
|
There has been some warning cleanup recently by @bdice . But this should be going in the direction of less warning |
My warning cleanup has been limited to the cuDF API thus far, and I'm particularly focusing on |
The warning from the description of this issue is raised by Dask-CUDA, so I don't think warnings in cuDF are relevant. The relevant part here is CUDA context management. |
This issue has been labeled |
Did some digging into this - it looks like the issue here is stemming from the fact that as of 22.02, cuDF's In particular, the CUDA context is created on the call to EDIT: Hmm that doesn't explain the warning though - in my reproducers using stable 22.02 packages: import dask_cudf
from dask_cuda import LocalCUDACluster
if __name__ == "__main__":
cluster = LocalCUDACluster() I end up with two CUDA contexts on device 0, but don't get any warnings about a CUDA context already being created on one of the worker processes. |
But in @charlesbluca 's example from #867 (comment) the For now, the user simply must be mindful of importing anything that causes the CUDA context to be created before the cluster is created, and that includes importing |
Right - this only occurs because that variable hasn't been set through Dask-CUDA's cluster initialization or manually - it might be worth adding documentation to Dask-CUDA encouraging use of this environment variable, as it seems like in the past issues with context management have created larger problems (xref rapidsai/cudf#4827)
I would argue the warning here is still partially unexplained - we understand why cuDF is now creating a CUDA context when it once wasn't, but we still don't know why this process ends up being used for the worker assigned to device 0 (unless I'm missing some configuration option where something like this would be trivially possible) |
This is the parent Python process, it's not being used by any workers, but rather to only spawn the workers, including another process which will then use device 0 and thus raise the warning. |
Actually, to be more precise, this process is the same that will eventually be used by the client as well (besides spawning |
Thanks for the clarification @pentschev 🙂 I now understand that the warnings are the result of dask-cuda attempting to initialize a CUDA context on the parent process for UCX purposes - the following reproducer does give the warnings that @randerzander encountered: import dask_cudf
from dask_cuda import LocalCUDACluster
if __name__ == "__main__":
cluster = LocalCUDACluster(protocol="ucx") I think the best solution here is to add documentation for |
This issue has been labeled |
I've added documentation for Is there anything else we wanted to discuss here or can we close this issue? |
I think we're good to close, I'll tentatively do that and if there's more left, please feel free to reopen. |
With the latest dask_cudf and dask-cuda nightlies, if I've already imported dask_cudf by the time I start my LocalCUDACluster, I get a warning:
nvidia-smi shows I do indeed have two client processes running using GPU 0:
I'm not sure if this is new dask_cudf behavior, or if dask-cuda just now detects and warns about this?
The text was updated successfully, but these errors were encountered: