Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu_temp_max_gpu_threshold missing #7

Closed
leinardi opened this issue Dec 30, 2018 · 13 comments
Closed

gpu_temp_max_gpu_threshold missing #7

leinardi opened this issue Dec 30, 2018 · 13 comments

Comments

@leinardi
Copy link

I just found out that the GPU Max Operating Temp, exported with the XML tag gpu_temp_max_gpu_threshold, is missing from py3nvml.

Do you have any plan to add it?

Also, another missing tag is the cuda_version.

@fbcotter
Copy link
Owner

fbcotter commented Jan 3, 2019

Hi @leinardi. I'm not sure I totally understand what you mean. Are you referring to the xml dump from py3nvml.nvidia_smi? There is a tag called gpu_temp_max_threshold in there. As for the cuda_version, I don't know if this is possible to get from NVML although I may be incorrect. You can certainly get the driver version, but the cuda version will depend on what library file you have installed on your machine.

When you say it is missing, do you mean it is available in nvml but not in py3nvml? If so, I can probably find a way to wrap the nvml function and add it.

@leinardi
Copy link
Author

leinardi commented Jan 3, 2019

There is a tag called gpu_temp_max_threshold in there.

Hi @fbcotter, gpu_temp_max_threshold is actually another temperature:

		<temperature>
			<gpu_temp>38 C</gpu_temp>
			<gpu_temp_max_threshold>94 C</gpu_temp_max_threshold>
			<gpu_temp_slow_threshold>91 C</gpu_temp_slow_threshold>
			<gpu_temp_max_gpu_threshold>89 C</gpu_temp_max_gpu_threshold>
			<memory_temp>N/A</memory_temp>
			<gpu_temp_max_mem_threshold>N/A</gpu_temp_max_mem_threshold>
		</temperature>

As for the cuda_version, I don't know if this is possible to get from NVML

The cuda_version is now part of the nvidia-smi output:

<nvidia_smi_log>
	<timestamp>Thu Jan  3 13:25:23 2019</timestamp>
	<driver_version>415.25</driver_version>
	<cuda_version>10.0</cuda_version>
	<attached_gpus>1</attached_gpus>
...

@fbcotter
Copy link
Owner

fbcotter commented Jan 3, 2019

Ahh I see, are these screenshots taken from the xml dump of nvidia-smi? I can look at whether this info is possible to get.

@leinardi
Copy link
Author

leinardi commented Jan 3, 2019

Yep, that's just the output of nvidia-smi -q -x.

@fbcotter
Copy link
Owner

fbcotter commented Jan 3, 2019

The video clock I think I can add. The gpu_temp_max_gpu_threshold tag is not available for my GPU system so I can't check it. Can you check the following things for me please? If you run:

from py3nvml.py3nvml import *
handle = nvmlDeviceGetHandleByIndex(0)
nvmlDeviceGetClockInfo(handle, 3)

Does this give you the expected video_clock output?

Also, I think the Temperature threshold you are looking for can be got with

from py3nvml.py3nvml import *
handle = nvmlDeviceGetHandleByIndex(0)
nvmlDeviceGetTemperatureThreshold(handle, 2)

Is that right?

If so, we can add this into the py3nvml.nvidia_smi function. The CUDA version we might be able to query from nvmlSystemGetNVMLVersion() - calling that for me gives '10.410.72' where 410.72 is my driver version.

However in saying all this, the nvidia_smi output will continually be updated by NVIDIA, I worry that trying to constantly update the py3nvml.nvidia_smi function to ensure they provide the same info will be a laborious endeavour. If there are only these 3 (plus perhaps a few more) tags missing, we can update it this time, but I don't know the full extent. I'm also happy if you need the py3nvml.nvidia_smi function to be up-to-date for you to keep doing pull requests for it.

Let me know whether the above code works for you.

@leinardi
Copy link
Author

leinardi commented Jan 3, 2019

from py3nvml.py3nvml import *
handle = nvmlDeviceGetHandleByIndex(0)
nvmlDeviceGetClockInfo(handle, 3)

This works 👍

from py3nvml.py3nvml import *
handle = nvmlDeviceGetHandleByIndex(0)
nvmlDeviceGetTemperatureThreshold(handle, 2)

This gives me this error:

Traceback (most recent call last):
  File "/home/leinardi/Workspace/gitlab/gwe/run", line 29, in <module>
    print("Device {}: {}".format(i,  nvmlDeviceGetTemperatureThreshold(handle, 2)))
  File "/home/leinardi/.local/lib/python3.6/site-packages/py3nvml/py3nvml.py", line 1113, in nvmlDeviceGetTemperatureThreshold
    _nvmlCheckReturn(ret)
  File "/home/leinardi/.local/lib/python3.6/site-packages/py3nvml/py3nvml.py", line 317, in _nvmlCheckReturn
    raise NVMLError(ret)
py3nvml.py3nvml.NVMLError_NotSupported: Not Supported

@leinardi
Copy link
Author

leinardi commented Jan 7, 2019

Hi @fbcotter, I just found out that the right value for the max gpu threshold is 3 and not 2:
https://github.com/NVIDIA/nvidia-settings/blob/master/src/nvml.h#L518

I tested it and works fine 👍

from py3nvml.py3nvml import *
handle = nvmlDeviceGetHandleByIndex(0)
nvmlDeviceGetTemperatureThreshold(handle, 3)

I am planning to use your library for my app, GWE:
image

I don't care about the py3nvml.nvidia_smi being maintained (I am not using it) but I would like to know if you are still planning maintaining the library or if it would be better to just add and maintain the source inside my app source. I am fine also in making pull requests to this repo but I would like to know how long it could take to be approved and a new release of the lib released before investing effort on it.

@fbcotter
Copy link
Owner

fbcotter commented Jan 8, 2019

Oh nice work, good find. I can look at seeing how easy it will be to update the package. I think there are several enums that need updating. Your app looks really nice!

As for updates, I regularly add features to py3nvml that I find useful so I do plan on maintaining it for the near future at least.

@fbcotter
Copy link
Owner

I'm currently in the process of updating py3nvml to match with the newer version of nvml, so will keep this issue open until I finish this work, hopefully in the next week or so.

@leinardi
Copy link
Author

Hey @fbcotter, I finally managed to publish GWE on Flathub and I just want to say thank you for the nice library 👍

@fbcotter
Copy link
Owner

fbcotter commented Feb 7, 2019

Well done, that looks really nice!

I just pushed to master an update that cleans up the old enums, the root of the problem you were talking about in this thread. I also added docstrings (copied the C style ones).

I haven't decided if I want to update the xml function as you can get all you want from the lowlevel functions now. I'll keep the issue open as I think there's a lot more to think about.

@leinardi
Copy link
Author

leinardi commented Feb 7, 2019

Thanks a lot, looking forward for a new release 👍

@fbcotter
Copy link
Owner

fbcotter commented Mar 4, 2019

Published the new release. Thanks for pointing out the problems.

@fbcotter fbcotter closed this as completed Mar 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants