gpu_temp_max_gpu_threshold missing #7

leinardi · 2018-12-30T23:32:46Z

I just found out that the GPU Max Operating Temp, exported with the XML tag gpu_temp_max_gpu_threshold, is missing from py3nvml.

Do you have any plan to add it?

Also, another missing tag is the cuda_version.

The text was updated successfully, but these errors were encountered:

fbcotter · 2019-01-03T12:20:28Z

Hi @leinardi. I'm not sure I totally understand what you mean. Are you referring to the xml dump from py3nvml.nvidia_smi? There is a tag called gpu_temp_max_threshold in there. As for the cuda_version, I don't know if this is possible to get from NVML although I may be incorrect. You can certainly get the driver version, but the cuda version will depend on what library file you have installed on your machine.

When you say it is missing, do you mean it is available in nvml but not in py3nvml? If so, I can probably find a way to wrap the nvml function and add it.

leinardi · 2019-01-03T12:28:38Z

There is a tag called gpu_temp_max_threshold in there.

Hi @fbcotter, gpu_temp_max_threshold is actually another temperature:

		<temperature>
			<gpu_temp>38 C</gpu_temp>
			<gpu_temp_max_threshold>94 C</gpu_temp_max_threshold>
			<gpu_temp_slow_threshold>91 C</gpu_temp_slow_threshold>
			<gpu_temp_max_gpu_threshold>89 C</gpu_temp_max_gpu_threshold>
			<memory_temp>N/A</memory_temp>
			<gpu_temp_max_mem_threshold>N/A</gpu_temp_max_mem_threshold>
		</temperature>

As for the cuda_version, I don't know if this is possible to get from NVML

The cuda_version is now part of the nvidia-smi output:

<nvidia_smi_log>
	<timestamp>Thu Jan  3 13:25:23 2019</timestamp>
	<driver_version>415.25</driver_version>
	<cuda_version>10.0</cuda_version>
	<attached_gpus>1</attached_gpus>
...

fbcotter · 2019-01-03T12:31:39Z

Ahh I see, are these screenshots taken from the xml dump of nvidia-smi? I can look at whether this info is possible to get.

leinardi · 2019-01-03T12:41:20Z

Yep, that's just the output of nvidia-smi -q -x.

fbcotter · 2019-01-03T12:58:22Z

The video clock I think I can add. The gpu_temp_max_gpu_threshold tag is not available for my GPU system so I can't check it. Can you check the following things for me please? If you run:

from py3nvml.py3nvml import *
handle = nvmlDeviceGetHandleByIndex(0)
nvmlDeviceGetClockInfo(handle, 3)

Does this give you the expected video_clock output?

Also, I think the Temperature threshold you are looking for can be got with

from py3nvml.py3nvml import *
handle = nvmlDeviceGetHandleByIndex(0)
nvmlDeviceGetTemperatureThreshold(handle, 2)

Is that right?

If so, we can add this into the py3nvml.nvidia_smi function. The CUDA version we might be able to query from nvmlSystemGetNVMLVersion() - calling that for me gives '10.410.72' where 410.72 is my driver version.

However in saying all this, the nvidia_smi output will continually be updated by NVIDIA, I worry that trying to constantly update the py3nvml.nvidia_smi function to ensure they provide the same info will be a laborious endeavour. If there are only these 3 (plus perhaps a few more) tags missing, we can update it this time, but I don't know the full extent. I'm also happy if you need the py3nvml.nvidia_smi function to be up-to-date for you to keep doing pull requests for it.

Let me know whether the above code works for you.

leinardi · 2019-01-03T13:06:00Z

from py3nvml.py3nvml import *
handle = nvmlDeviceGetHandleByIndex(0)
nvmlDeviceGetClockInfo(handle, 3)

This works 👍

from py3nvml.py3nvml import *
handle = nvmlDeviceGetHandleByIndex(0)
nvmlDeviceGetTemperatureThreshold(handle, 2)

This gives me this error:

Traceback (most recent call last):
  File "/home/leinardi/Workspace/gitlab/gwe/run", line 29, in <module>
    print("Device {}: {}".format(i,  nvmlDeviceGetTemperatureThreshold(handle, 2)))
  File "/home/leinardi/.local/lib/python3.6/site-packages/py3nvml/py3nvml.py", line 1113, in nvmlDeviceGetTemperatureThreshold
    _nvmlCheckReturn(ret)
  File "/home/leinardi/.local/lib/python3.6/site-packages/py3nvml/py3nvml.py", line 317, in _nvmlCheckReturn
    raise NVMLError(ret)
py3nvml.py3nvml.NVMLError_NotSupported: Not Supported

leinardi · 2019-01-07T17:42:02Z

Hi @fbcotter, I just found out that the right value for the max gpu threshold is 3 and not 2:
https://github.com/NVIDIA/nvidia-settings/blob/master/src/nvml.h#L518

I tested it and works fine 👍

from py3nvml.py3nvml import *
handle = nvmlDeviceGetHandleByIndex(0)
nvmlDeviceGetTemperatureThreshold(handle, 3)

I am planning to use your library for my app, GWE:

I don't care about the py3nvml.nvidia_smi being maintained (I am not using it) but I would like to know if you are still planning maintaining the library or if it would be better to just add and maintain the source inside my app source. I am fine also in making pull requests to this repo but I would like to know how long it could take to be approved and a new release of the lib released before investing effort on it.

fbcotter · 2019-01-08T17:36:01Z

Oh nice work, good find. I can look at seeing how easy it will be to update the package. I think there are several enums that need updating. Your app looks really nice!

As for updates, I regularly add features to py3nvml that I find useful so I do plan on maintaining it for the near future at least.

fbcotter · 2019-01-16T12:54:49Z

I'm currently in the process of updating py3nvml to match with the newer version of nvml, so will keep this issue open until I finish this work, hopefully in the next week or so.

leinardi · 2019-01-30T22:35:17Z

Hey @fbcotter, I finally managed to publish GWE on Flathub and I just want to say thank you for the nice library 👍

fbcotter · 2019-02-07T00:12:56Z

Well done, that looks really nice!

I just pushed to master an update that cleans up the old enums, the root of the problem you were talking about in this thread. I also added docstrings (copied the C style ones).

I haven't decided if I want to update the xml function as you can get all you want from the lowlevel functions now. I'll keep the issue open as I think there's a lot more to think about.

leinardi · 2019-02-07T08:25:37Z

Thanks a lot, looking forward for a new release 👍

fbcotter · 2019-03-04T18:33:57Z

Published the new release. Thanks for pointing out the problems.

fbcotter mentioned this issue Jan 3, 2019

video_clock tag missing #8

Closed

fbcotter closed this as completed Mar 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu_temp_max_gpu_threshold missing #7

gpu_temp_max_gpu_threshold missing #7

leinardi commented Dec 30, 2018

fbcotter commented Jan 3, 2019

leinardi commented Jan 3, 2019 •

edited

fbcotter commented Jan 3, 2019

leinardi commented Jan 3, 2019

fbcotter commented Jan 3, 2019

leinardi commented Jan 3, 2019

leinardi commented Jan 7, 2019

fbcotter commented Jan 8, 2019

fbcotter commented Jan 16, 2019

leinardi commented Jan 30, 2019

fbcotter commented Feb 7, 2019

leinardi commented Feb 7, 2019

fbcotter commented Mar 4, 2019

gpu_temp_max_gpu_threshold missing #7

gpu_temp_max_gpu_threshold missing #7

Comments

leinardi commented Dec 30, 2018

fbcotter commented Jan 3, 2019

leinardi commented Jan 3, 2019 • edited

fbcotter commented Jan 3, 2019

leinardi commented Jan 3, 2019

fbcotter commented Jan 3, 2019

leinardi commented Jan 3, 2019

leinardi commented Jan 7, 2019

fbcotter commented Jan 8, 2019

fbcotter commented Jan 16, 2019

leinardi commented Jan 30, 2019

fbcotter commented Feb 7, 2019

leinardi commented Feb 7, 2019

fbcotter commented Mar 4, 2019

leinardi commented Jan 3, 2019 •

edited