Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Degradated eth hashrate gpu vega 56 between rocm stack 2.10 and 3.+ #172

Closed
perestoronin opened this issue Sep 25, 2020 · 7 comments
Closed

Comments

@perestoronin
Copy link

Please help me return 20%+ in tests for vega 56 lost in upgrade rocm stack from 2.10 to 3.*

This tools not work for new stack and new drivers Eliovp/amdmemorytweak#46

@perestoronin perestoronin changed the title Degradate eth hashrate gpu vega 56 between rocm stack 2.10 and 3.+ Degradated eth hashrate gpu vega 56 between rocm stack 2.10 and 3.+ Sep 25, 2020
@justxi
Copy link
Owner

justxi commented Sep 25, 2020

I think this is a questions for -> https://github.com/RadeonOpenCompute/ROCm/issues ?

@perestoronin
Copy link
Author

perestoronin commented Sep 25, 2020

I think this is a questions for -> https://github.com/RadeonOpenCompute/ROCm/issues ?

added: ROCm/ROCm#1246

not only for rocm 3.*, but also for new amdgpu-pro, same issue :( Eliovp/amdmemorytweak#46

@a-repko
Copy link

a-repko commented Sep 26, 2020

@perestoronin
I'm using older versions of ROCm stack as well (each version has different problems, so I needed to select one, which is best for my purposes). I noticed that these versions were removed from Gentoo repository, so I decided to collect all relevant ebuilds and make them available also for others (versions 2.10, 3.0, 3.1, 3.3, 3.5, 3.7 and 3.8).

Here you are: ebuilds, and for your convenience also distfiles (30 MiB, except llvm-roc (too large) - which can be automatically downloaded by emerge anyway)

I'm attaching the ebuilds also locally here: rocm_ebuilds.tar.gz. You will need to replace corresponding subdirectories in /var/db/repos/gentoo/ (since if you keep various ...-9999.ebuild files, then emerge will complain). OK, this approach is quite raw, but hopefully should work.

BTW: problems with memory overclock can also be related to kernel driver, because, in fact, you don't need the ROCm stack to do it - the rocm-smi seems to be just a python script that communicates directly to kernel (or its /sys interface)

@perestoronin
Copy link
Author

perestoronin commented Sep 27, 2020

@perestoronin
I'm using older versions of ROCm stack as well (each version has different problems, so I needed to select one, which is best for my purposes). I noticed that these versions were removed from Gentoo repository, so I decided to collect all relevant ebuilds and make them available also for others (versions 2.10, 3.0, 3.1, 3.3, 3.5, 3.7 and 3.8).

Here you are: ebuilds, and for your convenience also distfiles (30 MiB, except llvm-roc (too large) - which can be automatically downloaded by emerge anyway)

I'm attaching the ebuilds also locally here: rocm_ebuilds.tar.gz. You will need to replace corresponding subdirectories in /var/db/repos/gentoo/ (since if you keep various ...-9999.ebuild files, then emerge will complain). OK, this approach is quite raw, but hopefully should work.

BTW: problems with memory overclock can also be related to kernel driver, because, in fact, you don't need the ROCm stack to do it - the rocm-smi seems to be just a python script that communicates directly to kernel (or its /sys interface)

Thank you for old ebuilds 2.10, in my described case with rocm driver 2.10 was 44+Mh, but with case rocm drivers 3.8 or linux kernel drivers 5.8.11 now 36 Mh in ethminer, but estimated in both cases 50Mh.

used overclock scripts: vega56-50-all vega56-50 show.sh tweak50.sh.work https://gist.github.com/raw/2eb3345074fe5141219c714301f98543

amdmeminfo:
Found Card: 1002:687f rev c3 (AMD Radeon RX Vega 56)
Chip Type: Vega10
BIOS Version: 113-D0500300-102
PCI: 16:00.0
OpenCL Platform: 0
OpenCL ID: 4
Subvendor: 0x1002
Subdevice: 0x0b36
Sysfs Path: /sys/bus/pci/devices/0000:16:00.0
Memory Type: HBM2
Memory Model: Samsung KHA843801B

rocm-smi --showdriverversion :
Driver version: 5.8.11-gentoo

PS. used /etc/portage/patches/dev-util/opencl-headers/rocm-opencl-headers.patch https://gist.github.com/raw/429ba545d2d42135dcc2121cce079777 to compile amdmeminfo from https://github.com/perestoronin/rocmnew/tree/master/dev-util/amdmeminfo

@justxi
Copy link
Owner

justxi commented Oct 10, 2020

As mentioned above, this seems to be not related to the ebuilds.
If there is a solution please let us know.

@justxi justxi closed this as completed Oct 10, 2020
@perestoronin
Copy link
Author

This not fixed for new kernel 5.7+, but can restored hashrate by downgrade linux kernel to 5.4.

@a-repko
Copy link

a-repko commented Apr 8, 2021

Hi, just in case, I'm posting here a new collection of ROCm ebuilds from version 2.10 up to 4.1:
rocm_ebuilds.tar.gz
These are mainly aimed at OpenCL; moreover, version 4.0 contains also rocm-smi, HIP and some additional ROC-machinery. Above-posted off-site links are updated as well.

A few comments about hardware support and (off-topic) undervolting are in order here:

Raven Ridge (APU series 2000G) worked well up to 3.3, and then again started to work from 3.10 up, see ROCm/ROCm#1219

Renoir (APU series 4000G) is still not working properly. Kernel 5.8.18 (ebuild) reports a correct number of CU, but newer kernels up to 5.11 are still adding +20CU, although there seem to be a slightly improving support in ROCm 3.10, 4.0, 4.1 with newer kernels. Version 4.1 in Gentoo is producing a lot of error messages which clutter the testing-program output. I recommend installing OpenCL from AMDGPU-PRO 20.40 by this script (versions newer than 20.40 are messy due to an added ROCm-derived OpenCL for Big Navi)

Radeon (Pro) VII is reportedly not supported in ROCm 4.1 with upstream kernel (the case of Gentoo)

Vega FE apparently cannot be undervolted by the human-readable /sys interface. You need to edit binary /sys/class/drm/card1/device/pp_table, as discussed in ROCm/ROCm#463. The main point is that SoC voltage values are stored first, and then sclk and mclk levels are referencing them, see the source code vega10_pptable.h. Radeon Pro VII cannot be undervolted neither by this binary interface (at least I didn't managed to do it; so the power consumption is ca. 25% higher than optimum).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants