Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect GPU stats shown (drm card1 APU shown, card0 discrete GPU wanted) #114

Closed
john-tho opened this issue Apr 4, 2020 · 9 comments
Closed

Comments

@john-tho
Copy link

john-tho commented Apr 4, 2020

Using 0.3.1, with a Ryzen APU (drm card1), and an RX580 (drm card0).

The RX580 is the active card with the monitor attached.

HUD shows the stats from card1 (always 0% load, GPU temp matches CPU temp), cat: /sys/class/drm/card1/device/gpu_busy_percent: Invalid argument, rather that the wanted discrete GPU stats.

Guessing the values selected by the first loop pass card0 are overwritten by a later loop over card1?

for (auto& dir : dirs) {

Possible workarounds / fixes?:

  • OVERLAY_PARAM_CUSTOM drm_gpuname
  • Support multiple GPUs
  • Determine which GPU is active

Happy to try implementing a parameter, just looking for discussion for best option.

Cheers!

@plasticbomb1986
Copy link

plasticbomb1986 commented Apr 7, 2020

Similar issue.

Dual Vega64 system, when the game runs (on GPU0 the hud shows GPU1 info). For a while its not been working correctly, i thought i messed up something, but today, when i was playing, left foldingahome running on the secondary gpu, and i was surprised to see gpu activity is shown, so made a fast fact check. GPU usage freq, temp, vram all what was shown is connected to gpu1, the secondary gpu in my system.

Since Vulkan by itself anyway had issues with dual gpu systems (systems with two identical gpu), i have an extension for it whats lets me override which gpu a game renders on. From aejsmith the vkdevicechooser. And i can check it by simply takin a look at my gpus, on VEGA64 there are some usage indicator led called GPUTach (what for some reason amd did not brought over to VEGA7).

@VortexAcherontic
Copy link
Contributor

VortexAcherontic commented Apr 10, 2020

I am not sure if this relates to this too since I do not have an AMD dGPU but I am experiencing the same issue (or a similar one)

The GPU load and VRAM seems to be not messured but the temps seem to be correct (3rd screenshot)

I the nvidia-settings window you can see the actual GPU load.

Screenshot_20200410_150631

Screenshot_20200410_150928

Screenshot_20200410_162729

Specs:
OS: openSUSE Tumbleweed
CPU: i5-3230M
iGPU: Intel HD 3000
dGPU: NVidia GT 730M (Driver: 440.86)
MangoHUD: 0.3.1
Prime support by: suse-prime (aka. Prime Manager or prime-select)

@jackun
Copy link
Collaborator

jackun commented Apr 10, 2020

@VortexAcherontic Kind of. Latest release uses only NVML which apparently doesn't work with old cards. You need XNVCtrl which will be in next release but usually these "prime" launchers run nvidia gpu in a separate Xorg session with DISPLAY=:8 or something similar. Current XNVCtrl code path tries to use current session $DISPLAY which is usually :0. No idea which way is best for detecting correct DISPLAY env var so it might still be borked in next release.

@VortexAcherontic
Copy link
Contributor

VortexAcherontic commented Apr 10, 2020

@jackun Thanks for your reply. I checked the used XScreen which the NVidia card is running on and according to nvidia-settings and a plain echo $DISPLAY show the nvidia card is running on :0 it seems.
Also as far as I understand how suse-prime/prime-select works since you need to log out and log in to switch between the iGPU and dGPU the whole X11 session runs on the GPU which was selected.

Furthermore I checked if libXNVCtrl is installed on my system and it seems my distro offers 2 versions libNVCtrl0 and libXNVCtrl. I installed both but this seems not to change anything.

Can you point me somewhere where I can check if NVML is supported/working or not?
Since I think it is related to the NVML thing maybe.

Screenshot_20200410_164951

@jackun
Copy link
Collaborator

jackun commented Apr 10, 2020

If you can, try building develop branch to see if selecting with MANGOHUD_PCI_DEV works.

Like MANGOHUD_PCI_DEV=0000:xx:xx.x MANGOHUD=1 vkcube where the x's are domain:bus:slot.function. Check with lspci -D.

@john-tho
Copy link
Author

Yes, that work for me, thanks.

readlink /sys/class/drm/card0
../../devices/pci0000:00/0000:00:01.1/0000:01:00.0/drm/card0

MANGOHUD=1 MANGOHUD_CONFIG=full MANGOHUD_PCI_DEV="0000:01:00.0" vkcube
Gives me values for the wanted GPU.

Console shows MANGOHUD: skipping GPU, no PCI ID match for the skipped device. It is worth showing the skipped pci_device here, or hiding this in debug?

Cheers

@flightlessmango
Copy link
Owner

@VortexAcherontic I'm pretty sure these are two different issues, this fix was for NVML but your problem has to do with xnvctrl. I think you should make a separate issue on it

@plasticbomb1986
Copy link

Or an another option could be in the mangohud config, to show x gpu or x+y. Like:

GPU0 Usage% Freq Temp
GPU0 VRAMused
GPU1 Usage% Freq Temp
GPU1 VRAMused
CPU Usage% Average freq temp
RAM usage
IO RD/RW
DXVK/OGL FPS frametime
dxvkver
Frametime graph

With AMDGPUs under /sys/class/drm/ these are the presented "options" for me:
card0 card0-DP-1 card0-DP-2 card0-DP-3 card0-HDMI-A-1 card1 card1-DP-4 card1-DP-5 card1-DP-6 card1-HDMI-A-2 renderD128 renderD129 ttm version
under /sys/class/drm/card0/device and /sys/class/drm/card1/device

Screenshot from 2020-04-20 18-41-35

@jackun jackun mentioned this issue Apr 25, 2020
@john-tho
Copy link
Author

john-tho commented May 3, 2020

Thanks for the fix.
Works fine for me with the latest release:
MANGOHUD=1 MANGOHUD_CONFIG="full,pci_dev=0000\:01\:00.0" vkcube

@john-tho john-tho closed this as completed May 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants