Relax the pin on pynvml again #1130

wence- · 2023-02-24T13:06:57Z

Handling the str vs. bytes discrepancy should have been covered by the changes in #1118.

Handling the str vs. bytes discrepancy should have been covered by the changes in rapidsai#1118.

ajschmidt8 · 2023-02-24T15:29:29Z

There were two pins in the PR below, but only one unpin in this PR.

https://github.com/rapidsai/dask-cuda/pull/1128/files

Should pyproject.toml also be unpinned?

wence- · 2023-02-24T17:35:22Z

Oh thanks, I suspect so (pushed that change). Thanks for the sharp eyes!

jakirkham · 2023-02-24T21:33:06Z

One CI job is failing with this error:

Unable to start CUDA Context
Traceback (most recent call last):
  File "/opt/conda/envs/test/lib/python3.8/site-packages/pynvml/nvml.py", line 850, in _nvmlGetFunctionPointer
    _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
  File "/opt/conda/envs/test/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
    func = self.__getitem__(name)
  File "/opt/conda/envs/test/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib64/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v3

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/test/lib/python3.8/site-packages/dask_cuda/initialize.py", line 31, in _create_cuda_context
    distributed.comm.ucx.init_once()
  File "/opt/conda/envs/test/lib/python3.8/site-packages/distributed/comm/ucx.py", line 136, in init_once
    pre_existing_cuda_context = has_cuda_context()
  File "/opt/conda/envs/test/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 219, in has_cuda_context
    if _running_process_matches(handle):
  File "/opt/conda/envs/test/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 179, in _running_process_matches
    running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses(handle)
  File "/opt/conda/envs/test/lib/python3.8/site-packages/pynvml/nvml.py", line 2608, in nvmlDeviceGetComputeRunningProcesses
    return nvmlDeviceGetComputeRunningProcesses_v3(handle);
  File "/opt/conda/envs/test/lib/python3.8/site-packages/pynvml/nvml.py", line 2576, in nvmlDeviceGetComputeRunningProcesses_v3
    fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v3")
  File "/opt/conda/envs/test/lib/python3.8/site-packages/pynvml/nvml.py", line 853, in _nvmlGetFunctionPointer
    raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

jakirkham · 2023-02-25T04:40:34Z

Rerunning CI to see if the Dask 2023.2.1 release helped

wence- · 2023-02-25T09:22:51Z

Rerunning CI to see if the Dask 2023.2.1 release helped

I imagine the problem is that pynvml has been updated to require a v3 version of a function in nvml, but that doesn't exist in cuda 11.2?

wence- · 2023-02-28T16:36:57Z

This is WIP until such time as a solution for backwards compat is decided on in nvidia-ml-py (and/or pynvml). So until then we should just keep pynvml at 11.4.1

jakirkham · 2023-02-28T23:07:27Z

Going to double check this, but my understanding is we want PyNVML 11.5 for CUDA 12 support

pentschev · 2023-03-01T08:32:26Z

Agreed, but it seems we need that fix to land in nvidia-ml-py first as we can't work around that in a reasonable manner.

wence- · 2023-03-01T10:45:52Z

Going to double check this, but my understanding is we want PyNVML 11.5 for CUDA 12 support

I don't think that is necessary, unless we need features in nvml that were only introduced in cuda 12.

Specifically, I have CTK 12 on my system, I install pvnvml < 11.5, and all the queries work. The C API preserves backwards compatibility so old versions of pynvml work fine with new versions of libnvidia-ml.so. The problem is the other way round, new versions of pynvml don't work with old versions of libnvidia-ml.so.

pentschev · 2023-07-28T19:45:34Z

This pending resolution of NVBug 4008080.

jakirkham · 2024-06-25T06:31:42Z

Curious where things landed here. AFAICT this is still pinned

copy-pr-bot · 2024-07-10T15:00:49Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

pentschev · 2024-07-10T15:05:16Z

/ok to test

pentschev · 2024-07-10T15:19:31Z

/ok to test

Relax the pin on pynvml again

703497a

Handling the str vs. bytes discrepancy should have been covered by the changes in rapidsai#1118.

wence- requested a review from a team as a code owner February 24, 2023 13:06

wence- mentioned this pull request Feb 24, 2023

Merge branch-23.02 into branch-23.04 #1128

Merged

wence- added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 24, 2023

And in pyproject

0d52fd9

ajschmidt8 approved these changes Feb 24, 2023

View reviewed changes

pentschev added 2 commits February 28, 2023 10:10

Merge branch 'branch-23.04' into wence/relax-pynvml-pin

718d8df

Merge branch 'branch-23.04' into wence/relax-pynvml-pin

1deff4c

jakirkham mentioned this pull request Feb 28, 2023

Dask-CUDA: CUDA 12 Conda Packages #1115

Closed

wence- marked this pull request as draft March 6, 2023 16:42

pentschev added the 0 - Blocked Cannot progress due to external reasons label Jul 28, 2023

pentschev changed the base branch from branch-23.04 to branch-24.08 July 10, 2024 14:59

Merge branch 'branch-24.08' into wence/relax-pynvml-pin

06f29b7

Update RAPIDS dependencies files

e67a2c9

github-actions bot added the conda conda issue label Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relax the pin on pynvml again #1130

Relax the pin on pynvml again #1130

wence- commented Feb 24, 2023

ajschmidt8 commented Feb 24, 2023

wence- commented Feb 24, 2023

jakirkham commented Feb 24, 2023

jakirkham commented Feb 25, 2023

wence- commented Feb 25, 2023

wence- commented Feb 28, 2023

jakirkham commented Feb 28, 2023

pentschev commented Mar 1, 2023

wence- commented Mar 1, 2023

pentschev commented Jul 28, 2023

jakirkham commented Jun 25, 2024

copy-pr-bot bot commented Jul 10, 2024

pentschev commented Jul 10, 2024

pentschev commented Jul 10, 2024

Relax the pin on pynvml again #1130

Are you sure you want to change the base?

Relax the pin on pynvml again #1130

Conversation

wence- commented Feb 24, 2023

ajschmidt8 commented Feb 24, 2023

wence- commented Feb 24, 2023

jakirkham commented Feb 24, 2023

jakirkham commented Feb 25, 2023

wence- commented Feb 25, 2023

wence- commented Feb 28, 2023

jakirkham commented Feb 28, 2023

pentschev commented Mar 1, 2023

wence- commented Mar 1, 2023

pentschev commented Jul 28, 2023

jakirkham commented Jun 25, 2024

copy-pr-bot bot commented Jul 10, 2024

pentschev commented Jul 10, 2024

pentschev commented Jul 10, 2024