Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vega 56 voltages might not apply #5

Closed
FKleinebreil opened this issue Oct 8, 2019 · 11 comments
Closed

Vega 56 voltages might not apply #5

FKleinebreil opened this issue Oct 8, 2019 · 11 comments

Comments

@FKleinebreil
Copy link

FKleinebreil commented Oct 8, 2019

I'm using an Asus Vega 56 Strix with Ubuntu Budgie 19.04. I wanted to use your tool (which is nevertheless great btw!) to undervolt the card and overclock the memory. I have the suspicion that while clocks apply, voltages don't.

My custom power states (core clocks as they were + custom voltages (1.2V -> 1.0V for P7), overclocked P3 memory):

OD_SCLK:
0: 852Mhz 800mV
1: 991Mhz 900mV
2: 1138Mhz 906mV
3: 1269Mhz 912mV
4: 1312Mhz 918mV
5: 1474Mhz 975mV
6: 1538Mhz 987mV
7: 1590Mhz 1000mV
OD_MCLK:
0: 167Mhz 800mV
1: 500Mhz 800mV
2: 700Mhz 900mV
3: 940Mhz 975mV
OD_RANGE:
SCLK: 852MHz 2400MHz
MCLK: 167MHz 1500MHz
VDDC: 800mV 1000mV

After running the script /sys/class/drm/cardX/device/pp_od_clk_voltage is identical to /etc/default/amdgpu-custom-state.card0 except:

VDDC: 800mV 1200mV

1200mV was the default value, too.
When I use WattmanGTK to monitor the Vega 56 during Unigine Superposition the reported vddgfx is 1.2V. Also the GPU temperature hits 80°C, which should not happen if 1.0V were actually applied. The memory clock is read correctly at 940MHz.

Do you have any idea what is going on? Many thanks in advance!

@sibradzic
Copy link
Owner

sibradzic commented Oct 8, 2019

Hi there @FKleinebreil. Can you please share:

  1. output of uname -a
  2. output of sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage before you apply any changes
  3. output of sudo cat /etc/default/amdgpu-custom-states.card0
  4. output of sudo amdgpu-clocks, i.e. when you apply change
  5. output of sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage after you apply any changes

(please try to wrap these outputs in "Insert code" when adding comments, for clarity)

BTW, you should not worry too much about OD_RANGE: outputs, such as VDDC: 800mV 1200mV, these only show possible MCLK voltage range, which should not change at all unless you mess up with FORCE_POWER_CAP.

@FKleinebreil
Copy link
Author

FKleinebreil commented Oct 8, 2019

Thank you for the fast reply! I removed the OD_RANGE: constraints from amdgpu-custom-states.card0.

Linux XXXXX 5.0.0-31-generic #33-Ubuntu SMP Mon Sep 30 18:51:59 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

OD_SCLK:
0:        852Mhz        800mV
1:        991Mhz        900mV
2:       1138Mhz        950mV
3:       1269Mhz       1000mV
4:       1312Mhz       1050mV
5:       1474Mhz       1100mV
6:       1538Mhz       1150mV
7:       1590Mhz       1200mV
OD_MCLK:
0:        167Mhz        800mV
1:        500Mhz        800mV
2:        700Mhz        900mV
3:        800Mhz        950mV
OD_RANGE:
SCLK:     852MHz       2400MHz
MCLK:     167MHz       1500MHz
VDDC:     800mV        1200mV
OD_SCLK:
0:        852Mhz        800mV
1:        991Mhz        900mV
2:       1138Mhz        906mV
3:       1269Mhz       912mV
4:       1312Mhz       918mV
5:       1474Mhz       975mV
6:       1538Mhz       987mV
7:       1590Mhz       1000mV
OD_MCLK:
0:        167Mhz        800mV
1:        500Mhz        800mV
2:        700Mhz        900mV
3:        940Mhz        975mV
Detecting the state values at /sys/class/drm/card0/device/pp_od_clk_voltage:
  SCLK state 0: 852Mhz, 800mV
  SCLK state 1: 991Mhz, 900mV
  SCLK state 2: 1138Mhz, 950mV
  SCLK state 3: 1269Mhz, 1000mV
  SCLK state 4: 1312Mhz, 1050mV
  SCLK state 5: 1474Mhz, 1100mV
  SCLK state 6: 1538Mhz, 1150mV
  SCLK state 7: 1590Mhz, 1200mV
  MCLK state 0: 167Mhz, 800mV
  MCLK state 1: 500Mhz, 800mV
  MCLK state 2: 700Mhz, 900mV
  MCLK state 3: 800Mhz, 950mV
  Maximum clocks & voltages:
    SCLK clock 2400MHz
    MCLK clock 1500MHz
    VDDC voltage 1200mV
  Curent power cap: 260W
Verifying user state values at /etc/default/amdgpu-custom-state.card0:
  SCLK state 0: 852Mhz, 800mV
  SCLK state 1: 991Mhz, 900mV
  SCLK state 2: 1138Mhz, 906mV
  SCLK state 3: 1269Mhz, 912mV
  SCLK state 4: 1312Mhz, 918mV
  SCLK state 5: 1474Mhz, 975mV
  SCLK state 6: 1538Mhz, 987mV
  SCLK state 7: 1590Mhz, 1000mV
  MCLK state 0: 167Mhz, 800mV
  MCLK state 1: 500Mhz, 800mV
  MCLK state 2: 700Mhz, 900mV
  MCLK state 3: 940Mhz, 975mV
Committing custom states to /sys/class/drm/card0/device/pp_od_clk_voltage:
  Done
OD_SCLK:
0:        852Mhz        800mV
1:        991Mhz        900mV
2:       1138Mhz        906mV
3:       1269Mhz        912mV
4:       1312Mhz        918mV
5:       1474Mhz        975mV
6:       1538Mhz        987mV
7:       1590Mhz       1000mV
OD_MCLK:
0:        167Mhz        800mV
1:        500Mhz        800mV
2:        700Mhz        900mV
3:        940Mhz        975mV
OD_RANGE:
SCLK:     852MHz       2400MHz
MCLK:     167MHz       1500MHz
VDDC:     800mV        1200mV

However, WattmanGTK still reports a vddgfx (which I assume is core voltage) of 1.2V and temps reach 80°C under load quickly, as I said. Or to say it differently: Temps are identical before and after after the change.

@sibradzic
Copy link
Owner

sibradzic commented Oct 8, 2019

However, WattmanGTK still reports a vddgfx (which I assume is core voltage) of 1.2V and temps reach 80°C under load quickly, as I said. Or to say it differently: Temps are identical before and after after the change.

I have no clue about WattmanGTK, just to be sure it is sane, compare the output of (as root)
watch -n1 "cat /sys/kernel/debug/dri/0/amdgpu_pm_info | tail -n16" (keep the terminal "always on top" and compare SCLK, MCLK, voltage, wattage & temperature outpus)

@FKleinebreil
Copy link
Author

FKleinebreil commented Oct 8, 2019

Sorry, this is my first time using the github forum. There were proper new lines before the insert code and I don't know how to add newlines in that environment. I'll google it and edit it.

I cross checked with the command you gave me and it too reports a VDDGFX of 1200mV, while the changed memory clock (940 MHz) seems to be applied correctly.

Typical output:

GFX Clocks and Power:
        940 MHz (MCLK)
        1591 MHz (SCLK)
        1269 MHz (PSTATE_SCLK)
        700 MHz (PSTATE_MCLK)
        1200 mV (VDDGFX)
        169.0 W (average GPU)

GPU Temperature: 60 C
GPU Load: 99 %

@sibradzic
Copy link
Owner

sibradzic commented Oct 8, 2019

Thanks for the output fix! According to the output, noting wrong with the script itself, the clocks and voltages are being set correctly. As of the VDDGFX report, it is indeed strange, it could be related to requirement to set the clock and voltages in order (see ROCm/ROCm#463 for details). Perhaps you can test applying the following /etc/default/amdgpu-custom-states.card0:

OD_SCLK:
7:       1590Mhz       1170mV
OD_MCLK:
3:        800Mhz        930mV
FORCE_SCLK: 7
FORCE_MCLK: 1
FORCE_PERF_LEVEL: manual

reboot before you try, and check the output of cat /sys/kernel/debug/dri/0/amdgpu_pm_info | tail -n16 before and after you apply the custom states (no need to generate GPU load, both SCLK & MCLK states are being forced here) and let me know...

Also, as I don't have Vega to test the thing, but I know that Polaris for example have some strange limitations when it comes to relation between state and memory voltages (highest voltage of current SCLK and MCLK will "prevail" as VDDGFX), Vega 56/64 may also have some weird things to consider, so try with simpler custom states first (changing just one state at a time) and see what result you end up with.

@FKleinebreil
Copy link
Author

FKleinebreil commented Oct 8, 2019

I edited /etc/default/amdgpu-custom-states.card0 as you suggested, rebooted and applied the changes. This is amdgpu_pm_info:

GFX Clocks and Power:
	800 MHz (MCLK)
	1633 MHz (SCLK)
	1269 MHz (PSTATE_SCLK)
	700 MHz (PSTATE_MCLK)
	1200 mV (VDDGFX)
	37.0 W (average GPU)

GPU Temperature: 48 C
GPU Load: 0 %

SMC Feature Mask: 0x000000001ba1fb4f
UVD: Disabled

VCE: Disabled

So the issue remains. However, I get that the issue is not with your script but somewhere else. So there is probably not much you can do about it. I will see if I can find someone with similar issues.

Thank you very much for your time!

@sibradzic
Copy link
Owner

If you really applied that custom state I suggested correctly, that output of amdgpu_pm_info is super strange indeed. Are you sure you don't have some other over/under/clocking/volting software active at the same time? Do you perhaps have multiple graphic cards in your system?

@FKleinebreil
Copy link
Author

FKleinebreil commented Oct 23, 2019

I don't have other software active and only a single Vega 56. No iGPU either, CPU is a Ryzen 5 3600.

However, I consider the issue closed as it's mostly likely not due to your script.

@sibradzic
Copy link
Owner

This seems to be the bug affecting you, and apparently the fix was just applied to 5.4 RC:
https://bugs.freedesktop.org/show_bug.cgi?id=109887
https://bugzilla.kernel.org/show_bug.cgi?id=205277

@FKleinebreil
Copy link
Author

Thank you very much for the Update!

@xcom169
Copy link

xcom169 commented Apr 14, 2022

I have 5.15.28 kernel, but I think I have the same issue with Vega56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants