Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error executing xmr-stak VM_CONTEXT1_PROTECTION_FAULT #1587

Open
yoburtu opened this issue May 21, 2018 · 38 comments
Open

Error executing xmr-stak VM_CONTEXT1_PROTECTION_FAULT #1587

yoburtu opened this issue May 21, 2018 · 38 comments

Comments

@yoburtu
Copy link

yoburtu commented May 21, 2018

Please provide as much as possible information to reproduce the issue.

Basic information

  • Type of the CPU. Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
  • Type of the GPU (if you try to miner with the GPU). Sapphire RX550 2G.

Compile issues

  • Which OS do you use?

Arch Linux. Linux Gondor 4.16.9-1-ARCH #1 SMP PREEMPT Thu May 17 02:10:09 UTC 2018 x86_64 GNU/Linux

add all commands you used and the full compile output here
yaourt -S aur/xmr-stak-nvidia-git

Issue with the execution

  • Do you compiled the miner by our own?
    By yaourt/aur
    run ./xmr-stak --version-long and add the output here
    Version: xmr-stak/2.4.3/26a5d65/makepkg/lin/amd-cpu/aeon-cryptonight-monero/0

AMD OpenCl issue

run `clinfo` and add the output here

opencl-amd 18.20.579836

# Stability issue
- Is the CPU or GPU overclocked?. No
- Is the Main memory of the CPU or GPU undervolted?. No.

may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: GPU fault detected: 147 0x05a00402
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001E8CB4
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A004002
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM fault (0x02, vmid 5) at page 2002100, read from 'TC3' (0x544>
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: GPU fault detected: 147 0x05e80802
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001938BD
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A008002
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM fault (0x02, vmid 5) at page 1652925, read from 'TC2' (0x544>
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: GPU fault detected: 147 0x05c04802
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001E26B8
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A048002
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM fault (0x02, vmid 5) at page 1975992, read from 'TC0' (0x544>
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: GPU fault detected: 147 0x08184802
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00184D03
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A048002
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM fault (0x02, vmid 5) at page 1592579, read from 'TC0' (0x544>
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: GPU fault detected: 147 0x07a04802
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001944F4
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A048002
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM fault (0x02, vmid 5) at page 1656052, read from 'TC0' (0x544>
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: GPU fault detected: 147 0x0dd04802
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001CABBA
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A048002
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM fault (0x02, vmid 5) at page 1878970, read from 'TC0' (0x544>
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: GPU fault detected: 147 0x0d380802
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001FCBA7
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A008002
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM fault (0x02, vmid 5) at page 2083751, read from 'TC2' (0x544>
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: GPU fault detected: 147 0x07284402
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001F9C03
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A004002
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM fault (0x02, vmid 5) at page 2071555, read from 'TC3' (0x544>
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: GPU fault detected: 147 0x0d380802
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001975A7
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A008002
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM fault (0x02, vmid 5) at page 1668519, read from 'TC2' (0x544>
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: GPU fault detected: 147 0x0cf04802
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00192D9E
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A048002
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM fault (0x02, vmid 5) at page 1650078, read from 'TC0' (0x544>
@psychocrypt
Copy link
Collaborator

psychocrypt commented May 21, 2018 via email

@yoburtu
Copy link
Author

yoburtu commented May 21, 2018

$ clinfo
Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.1 AMD-APP (2633.3)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Host timer resolution 1ns
Platform Extensions function suffix AMD

Platform Name AMD Accelerated Parallel Processing
Number of devices 1
Device Name gfx804
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 1.2 AMD-APP (2633.3)
Driver Version 2633.3
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Board Name (AMD) Radeon RX 550 Series
Device Topology (AMD) PCI-E, 28:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 8
SIMD per compute unit (AMD) 4
SIMD width (AMD) 16
SIMD instruction width (AMD) 1
Max clock frequency 1183MHz
Graphics IP (AMD) 8.0
Device Partition (core)
Max number of sub-devices 8
Supported partition types (n/a)
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 256
Preferred work group size (AMD) 256
Max work group size (AMD) 1024
Preferred work group size multiple 64
Wavefront width (AMD) 64
Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs No
Round to nearest No
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 1971896320 (1.836GiB)
Global free memory (AMD) 1905932 (1.818GiB)
Global memory channels (AMD) 4
Global memory banks per channel (AMD) 16
Global memory bank width (AMD) 256 bytes
Error Correction support No
Max memory allocation 1463539302 (1.363GiB)
Unified memory for Host and Device No
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Global Memory cache type Read/Write
Global Memory cache size 16384 (16KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 256 bytes
Pitch alignment for 2D image buffers 256 pixels
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Local
Local memory size 32768 (32KiB)
Local memory syze per CU (AMD) 65536 (64KiB)
Local memory banks (AMD) 32
Max number of constant args 8
Max constant buffer size 1463539302 (1.363GiB)
Preferred constant buffer size (AMD) 16384 (16KiB)
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1526928045086352374ns (Mon May 21 20:40:45 2018)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Thread trace supported (AMD) Yes
Number of async queues (AMD) 2
Max real-time compute queues (AMD) 0
Max real-time compute units (AMD) 0
SPIR versions 1.2
printf() buffer size 4194304 (4MiB)
Built-in kernels (n/a)
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) AMD Accelerated Parallel Processing
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [AMD]
clCreateContext(NULL, ...) [default] Success [AMD]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name gfx804
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name gfx804
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name gfx804

ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.12
ICD loader Profile OpenCL 2.2

@yoburtu
Copy link
Author

yoburtu commented May 22, 2018

Can you deduct some of the information from clinfo?.

Best regards.

@yoburtu
Copy link
Author

yoburtu commented May 24, 2018

I have installed Ubuntu and work fine. The message has disappeared.

But now, I have other problem, :-(((. I can’t do overclocking of gpus. I don’t understand!!.

The value in pp_mclk_od don’t change:

root@galadriel:# cat /sys/class/drm/card2/device/pp_sclk_od
0
root@galadriel:
# echo “5” > /sys/class/drm/card2/device/pp_sclk_od
root@galadriel:# cat /sys/class/drm/card2/device/pp_sclk_od
0
root@galadriel:
# cat /sys/class/drm/card2/device/pp_dpm_sclk
0: 214Mhz
1: 551Mhz
2: 734Mhz
3: 980Mhz
4: 1046Mhz
5: 1098Mhz
6: 1124Mhz
7: 1183Mhz *

root@galadriel:# uname -a
Linux galadriel 4.13.0-43-generic #48
16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Any ideas??.

@cdarken
Copy link

cdarken commented Jun 7, 2018

I have same errors on Arch, with same version of kernel and opencl-amd. The issue is with xmr-stak, because other miners like ethminer, tdxminer, cpuminer-multi-opencl (for purk coin) work just fine.

@yoburtu
Copy link
Author

yoburtu commented Jun 7, 2018

I have tried with xmrig and the problem also occurs. With what seems to be not just from xmr-stak.

@cdarken
Copy link

cdarken commented Jun 8, 2018

I think there are some issues with the driver in 4.16 kernel, it was working fine with 4.15

@yoburtu
Copy link
Author

yoburtu commented Jun 8, 2018

I will try with 4.15 kernel.

@cdarken
Copy link

cdarken commented Jun 17, 2018

@yoburtu did you get a chance to test with kernel 4.15 ?
I had the idea to test with claymore 11.3, released at the beginning of april. That didn't work either.

@cdarken
Copy link

cdarken commented Jun 20, 2018

I got around installing kernel 4.15.9. It works, but only with opencl-amd 18.10, when I install 18.20 it breaks. Same on kernel 4.16, it works fine with opencl-amd 18.10. I will run some tests using kernel 4.17 when I have time, I'll post here the result.

@yoburtu
Copy link
Author

yoburtu commented Jun 21, 2018

@cdarken I have tested with kernel 4.14 and don't work. I will try with amdgpu-pro 18.10.

@cdarken
Copy link

cdarken commented Jun 21, 2018

I'm 99% percent sure it's something to do with amdgpu-pro v 18.20, even the latest one, 606296, from June 15 didn't work.

@christiankakesa
Copy link

christiankakesa commented Jun 26, 2018

I have the same issue on Ubuntu Server 18.04 and amdgpu-pro 18.20.
But it works with another miner : lukminer-0.11.0 https://sites.google.com/site/lukxmrminer/

@beni-sandu
Copy link

@FeNicks Thanks for the suggestion, I'm curious if that will change anything since I have a similar issue.

It looks like newer kernels and the 18.20 amdgpu-pro driver don't get along very well.

I'm getting same type of errors on both Ubuntu 18.04 and Ubuntu 16.04 with amdgpu-pro 18.20 driver. I will try some other combinations when I have some time.

@nick-perchev
Copy link

nick-perchev commented Jul 9, 2018

I am observing this too. Built xmr-stak from sources, separate cpu, nvidia and amd miner binaries.

I'm using Fedora Rawhide, but compile my own kernels with Fedora config, lightly modified: some features disabled (selinux, paravirt, Meltdown fix, retpolines) and with amdgpu driver enabled (so that OpenCL works using it). Currently I'm on 1.14.44.

I installed parts of amdgpu-pro-18.20-606296.tar.xz. (Unfortunately, this filename seems to be the same for different OSes on AMD site. I downloaded one from the link "Radeon™ Software for Linux® version 18.20 for RHEL 7.4 / CentOS 7.4", as this one is closest to Fedora.)

By "installed parts", I mean the following: by looking carefully at all the RPMs in that file, which are many, I realized that since I don't need kernel modules (since I use ones from vanilla kernel), I only need a few RPMs. I sorted them out into directories:

One RPM for compilation:
minrpm_devel/opencl-amdgpu-pro-devel-18.20-606296.el7.x86_64.rpm

RPMs with libs to support older hardware:
minrpm_legacy/amdgpu-core-18.20-606296.el7.noarch.rpm
minrpm_legacy/amdgpu-pro-core-18.20-606296.el7.noarch.rpm
minrpm_legacy/libopencl-amdgpu-pro-18.20-606296.el7.x86_64.rpm
minrpm_legacy/opencl-orca-amdgpu-pro-icd-18.20-606296.el7.x86_64.rpm

RPMs with libs to support newer hardware:
minrpm_pal/amdgpu-core-18.20-606296.el7.noarch.rpm
minrpm_pal/amdgpu-pro-core-18.20-606296.el7.noarch.rpm
minrpm_pal/libopencl-amdgpu-pro-18.20-606296.el7.x86_64.rpm
minrpm_pal/opencl-amdgpu-pro-icd-18.20-606296.el7.x86_64.rpm

I just "dnf install *.rpm" these. amdgpu-core will fail to install (wants to be on RHEL), but it's ok, it's a meta-package with no content. Rest install fine, mostly under /opt/amdgpu-pro

Building xmr-stak purely for AMD mining with:

CC=gcc cmake .. \
-DCPU_ENABLE=OFF \
-DHWLOC_ENABLE=OFF \
-DOpenSSL_ENABLE=OFF \
-DCUDA_ENABLE=OFF \
-DOpenCL_ENABLE=ON \
-DOpenCL_INCLUDE_DIR=/opt/amdgpu-pro/include \
-DOpenCL_LIBRARY=/opt/amdgpu-pro/lib64 \
-DMICROHTTPD_ENABLE=OFF \
-DCMAKE_LINK_STATIC=ON \
-DCMAKE_BUILD_TYPE=Release \
-DXMR-STAK_COMPILE=generic \
&& make

works.

So. With this setup, I'm getting "GPU fault detected: 147" with amd miner on this card:
https://www.asus.com/us/Graphics-Cards/ROG-STRIX-RX470-O4G-GAMING/

(I also have an old GCN card based on R7 240, with it this miner works)

Going back to amdgpu-pro-18.10-572953.tar.xz results in a binary which immediately segfaults, so I don't know whether the "GPU fault detected: 147" thing happens on it too.

lukminer on this hardware and 18.20 works fine. Did not try 18.10.

On which kernels did you guys had success building and running against 18.10? Were you using in-kernel amdgpu modules, or ones from amdgpu-pro-18.10 tarball?

@nick-perchev
Copy link

Can confirm that 18.10 works fine with 4.14 and 4.16 kernels. Did not come around yet to checking whether 18.20 on the very same setup does not work and thus confirmed to have regression.

"binary which immediately segfaults" problem was caused by AMD not being able to use standard libdrm: install
libgbm-amdgpu-pro-18.10-572953.el7.x86_64.rpm
in addition to packages mentioned in my previous post.

Also, OpenCL_LIBRARY should point to to library itself, not to its directory:
...
-DOpenCL_LIBRARY=/opt/amdgpu-pro/lib64/libOpenCL.so.1
...

@beni-sandu
Copy link

@nick-perchev Hey Nick, thanks a lot for the input. Do you know where can I find the 18.10 AMD driver? I can't seem to find it on the AMD site.

@cdarken
Copy link

cdarken commented Jul 16, 2018

@BeniSandu use this wget --referer https://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Driver-for-Linux-Release-Notes.aspx -N https://www2.ati.com/drivers/linux/ubuntu/amdgpu-pro-18.10-572953.tar.xz

@beni-sandu
Copy link

It looks like the issue is within AMD kernel modules, I could make it work on a fresh Ubuntu 18.04 install with the latest AMD 18.20 driver, using the modules that come with the 4.15.0-23-generic kernel.

A quick guide thanks to the replies from @cdarken and @nick-perchev:

  1. Download latest AMD 18.20 driver from AMD website.

  2. Unpack and cd to the amdgpu-pro-18.20-606296 directory:

sudo dpkg --install amdgpu-core_18.20-606296_all.deb amdgpu-pro-core_18.20-606296_all.deb opencl-amdgpu-pro_18.20-606296_amd64.deb opencl-amdgpu-pro-dev_18.20-606296_amd64.deb opencl-amdgpu-pro-icd_18.20-606296_amd64.deb opencl-orca-amdgpu-pro-icd_18.20-606296_amd64.deb ids-amdgpu_1.0.0-606296_all.deb libdrm-amdgpu-amdgpu1_2.4.91-606296_amd64.deb libdrm2-amdgpu_2.4.91-606296_amd64.deb libgbm1-amdgpu-pro_18.20-606296_amd64.deb libgbm1-amdgpu-pro-base_18.20-606296_all.deb libopencl1-amdgpu-pro_18.20-606296_amd64.deb clinfo-amdgpu-pro_18.20-606296_amd64.deb

You'll get a warning about the amd-dkms package missing, just ignore it.

  1. If this is a fresh install, add hugepages for xmr-stak and add yourself to the video group:

echo 'vm.nr_hugepages=128' | sudo tee --append /etc/sysctl.conf

sudo sysctl -p

sudo usermod -a -G video $LOGNAME

  1. Clone and compile xmr-stak:

sudo apt install libmicrohttpd-dev libssl-dev cmake build-essential libhwloc-dev

git clone https://github.com/fireice-uk/xmr-stak.git

mkdir xmr-stak/build && cd xmr-stak/build

CC=gcc cmake .. -DCPU_ENABLE=OFF -DHWLOC_ENABLE=OFF -DOpenSSL_ENABLE=OFF -DCUDA_ENABLE=OFF -DOpenCL_ENABLE=ON -DOpenCL_INCLUDE_DIR=/opt/amdgpu-pro/include -DOpenCL_LIBRARY=/opt/amdgpu-pro/lib/x86_64-linux-gnu/libOpenCL.so -DMICROHTTPD_ENABLE=OFF -DCMAKE_LINK_STATIC=ON -DCMAKE_BUILD_TYPE=Release -DXMR-STAK_COMPILE=generic && make

Here you can use your own config options, the important part is the path to the libOpenCL.so library.

  1. Reboot and start mining.

@JerichoJones
Copy link

@yoburto

I will try with amdgpu-pro 18.10.

Did you try this?

@yoburtu
Copy link
Author

yoburtu commented Jul 21, 2018

Yes. Don’t work.

Regards.

@pcca-matrix
Copy link

I have the same problem before , only with Sapphire Baffin GPU 2Go ,Sapphire Lexa GPU 2Go work !
, fixed by using Kernel 4.16 and driver amdgpu-pro-16.60-379184. (ubuntu 16.04 LTS)

The only problems with this driver is 40H/s is lost per card (484 before but crash, 446 now but stable)

@Spudz76
Copy link
Contributor

Spudz76 commented Aug 1, 2018

amdgpu, 17.50-552542, 4.13.0-45-generic, x86_64: installed

That combo works good for me as far as I know. 18.x is poison until we have a chance to fix the OpenCL code.

@yoburtu
Copy link
Author

yoburtu commented Aug 6, 2018

The driver 17.40 don’t detect my RX550 gpu.

And I have the same problem with driver 18.X:

may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: GPU fault detected: 147 0x05a00402
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001E8CB4
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A004002
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM fault (0x02, vmid 5) at page 2002100, read from 'TC3' (0x544>
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: GPU fault detected: 147 0x05e80802
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001938BD
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A008002
may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: VM fault (0x02, vmid 5) at page 1652925, read from 'TC2' (0x544>
m

I am tired...

@beni-sandu
Copy link

I have found a combination that works out of the box:

  • Ubuntu 16.04
  • Kernel: 4.15.18-041518-generic
  • AMD driver: amdgpu-pro-18.20-606296

I have 14 * RX 550 boards on the mobo (both POLARIS11 and POLARIS12 chips), all are detected and mining without any errors.

@pcca-matrix
Copy link

@BeniSandu are you able to use dual threads config for yout polaris 11 ?

@beni-sandu
Copy link

@pcca-matrix Not really, if I use 2 threads on Polaris 11, the hashrate is too low. But on the same note, I get around 420 H/s on Polaris 11 card with 1 thread, compared to around 460 with 2 threads on Windows, so it's not that bad. On Polaris 12, I can use 2 threads just fine.

@zhumingyu
Copy link

i get this problem too,is there any solutions?

@beni-sandu
Copy link

@zhumingyu AMD released a new driver for Linux (18.30) which I didn't try yet, but for 18.20 you can use a working configuration from the previous messages. The one that I use is:

  • Ubuntu 16.04
  • Kernel: 4.15.18-041518-generic
  • AMD driver: amdgpu-pro-18.20-606296

With this one I get -40 H/s on POLARIS 11 chips, but other than that it works without errors.

@zhumingyu
Copy link

@BeniSandu thx!i try a new driver for Linux (18.30),but i get this error .

@pcca-matrix
Copy link

for now the best combination for rx550 polaris 10 and 11 with dual thread working is:

Ubuntu 16.04
Kernel: 4.16.0-041600-generic
AMD driver: amdgpu-pro-17.40-483984 (blockchain beta driver)
vm_fragmentsize=9 hugepage=128
490-515 h/s with polaris 11 475-490 polaris 10
bios mod one-click
gpu :1075Mhz 935 , memory : 1780 Mhz for polaris 11
gpu :1063Mhz 900 , memory : 1942 Mhz for polaris 10
gpu temp : 58-62 C

Polaris 10

{ "index" : 1,
"intensity" : 490, "worksize" : 14,
"affine_to_cpu" : false, "strided_index" : 1, "mem_chunk" : 2,
"comp_mode" : true
},
{ "index" : 1,
"intensity" : 490, "worksize" : 14,
"affine_to_cpu" : false, "strided_index" : 1, "mem_chunk" : 2,
"comp_mode" : true
},

Poalris 11

// gpu: Baffin memory:1395
// compute units: 12
{ "index" : 2,
"intensity" : 432, "worksize" : 8,
"affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,
"comp_mode" : true
},
{ "index" : 2,
"intensity" : 432, "worksize" : 8,
"affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,
"comp_mode" : true
},

@Spudz76
Copy link
Contributor

Spudz76 commented Sep 3, 2018

Drivers newer than 18.10 do not work, yet. They changed the compiler inside the driver.
Best is that blockchain driver, or 17.50.

@zhumingyu
Copy link

@Spudz76 Do we have to wait for an new updated version?

@Spudz76
Copy link
Contributor

Spudz76 commented Sep 3, 2018

Yes, watch #1797 as that is where any action is happening.

Or run the 17.x and give up on the broken 18.x

Someone should write Vulkan backend soon... OpenCL is deprecated so we're chasing a dead rabbit, really.

@cdarken
Copy link

cdarken commented Sep 4, 2018

@Spudz76 you're confusing OpenCL with OpenGL. Vulkan is for graphics, not computing.

@Spudz76
Copy link
Contributor

Spudz76 commented Sep 4, 2018

@cdarken nope

Vulkan-Compute exists and will replace OpenCL, MacOS is dropping OpenCL in preference of Metal... etc

@MalakiLab
Copy link

While playing with pp_od_clk_voltage on Linux AMDGPU, i discovered i got the same error when the memory is undervolted. When i put "m 2 2075 900" (900mV for 2075MHz, memory DPM state 2) on one of my RX 580, it's what i get. Bumping up to 905mV for that precise card makes it flawlessly work. It's probably part of silicone lottery, as the RX 580, on the exact same system, exact configurations, work flawlessly. If someone have same issue, try pushing a little bit more voltage in memory might resolve the issue. I guess version 18.x might be more aggressive on DPM states.

@Spudz76
Copy link
Contributor

Spudz76 commented Jul 12, 2019

Yes, I usually use these as signs of needing to back off on clock or raise volts or etc. The GPU crashed the job, effectively. Note the thermal efficiency of everything involved in cooling gets worse over time and eventually the stock clocks can be too much. So it can also be a sign your card needs a repaste (or maybe just a dusting, if it is clogged up severely). Some R9 cards and maybe some RX also, from some manufacturers, had less than ideal VRM cooling (or less than ideal VRMs for 100%/24/7 mining) and the on-card voltages would have ripple that caused random errors (even at stock clocks, even at underclocked, etc). Luckily if it's that, shortly the card will stop working completely as it only takes one VRM (of 6-12 units) to completely die and the whole thing stops working (or acts like you didn't connect the additional power whips, but most of them stop showing up on PCI at all). Some even 'crowbar' and will just blow any PCIe slot or mobo you put it in. But more common they melt-open and don't short out.

But it still can be driver issues, it's a very generic crash message. You can intentionally cause it with bad clocking/voltage setup however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests