-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error executing xmr-stak VM_CONTEXT1_PROTECTION_FAULT #1587
Comments
what is the output of clinfo? I think you have driver issues.
|
$ clinfo Platform Name AMD Accelerated Parallel Processing NULL platform behavior ICD loader properties |
Can you deduct some of the information from clinfo?. Best regards. |
I have installed Ubuntu and work fine. The message has disappeared. But now, I have other problem, :-(((. I can’t do overclocking of gpus. I don’t understand!!. The value in pp_mclk_od don’t change: root@galadriel: root@galadriel: Any ideas??. |
I have same errors on Arch, with same version of kernel and opencl-amd. The issue is with xmr-stak, because other miners like ethminer, tdxminer, cpuminer-multi-opencl (for purk coin) work just fine. |
I have tried with xmrig and the problem also occurs. With what seems to be not just from xmr-stak. |
I think there are some issues with the driver in 4.16 kernel, it was working fine with 4.15 |
I will try with 4.15 kernel. |
@yoburtu did you get a chance to test with kernel 4.15 ? |
I got around installing kernel 4.15.9. It works, but only with opencl-amd 18.10, when I install 18.20 it breaks. Same on kernel 4.16, it works fine with opencl-amd 18.10. I will run some tests using kernel 4.17 when I have time, I'll post here the result. |
@cdarken I have tested with kernel 4.14 and don't work. I will try with amdgpu-pro 18.10. |
I'm 99% percent sure it's something to do with amdgpu-pro v 18.20, even the latest one, 606296, from June 15 didn't work. |
I have the same issue on Ubuntu Server 18.04 and amdgpu-pro 18.20. |
@FeNicks Thanks for the suggestion, I'm curious if that will change anything since I have a similar issue. It looks like newer kernels and the 18.20 amdgpu-pro driver don't get along very well. I'm getting same type of errors on both Ubuntu 18.04 and Ubuntu 16.04 with amdgpu-pro 18.20 driver. I will try some other combinations when I have some time. |
I am observing this too. Built xmr-stak from sources, separate cpu, nvidia and amd miner binaries. I'm using Fedora Rawhide, but compile my own kernels with Fedora config, lightly modified: some features disabled (selinux, paravirt, Meltdown fix, retpolines) and with amdgpu driver enabled (so that OpenCL works using it). Currently I'm on 1.14.44. I installed parts of amdgpu-pro-18.20-606296.tar.xz. (Unfortunately, this filename seems to be the same for different OSes on AMD site. I downloaded one from the link "Radeon™ Software for Linux® version 18.20 for RHEL 7.4 / CentOS 7.4", as this one is closest to Fedora.) By "installed parts", I mean the following: by looking carefully at all the RPMs in that file, which are many, I realized that since I don't need kernel modules (since I use ones from vanilla kernel), I only need a few RPMs. I sorted them out into directories: One RPM for compilation: RPMs with libs to support older hardware: RPMs with libs to support newer hardware: I just "dnf install *.rpm" these. amdgpu-core will fail to install (wants to be on RHEL), but it's ok, it's a meta-package with no content. Rest install fine, mostly under /opt/amdgpu-pro Building xmr-stak purely for AMD mining with: CC=gcc cmake .. \ works. So. With this setup, I'm getting "GPU fault detected: 147" with amd miner on this card: (I also have an old GCN card based on R7 240, with it this miner works) Going back to amdgpu-pro-18.10-572953.tar.xz results in a binary which immediately segfaults, so I don't know whether the "GPU fault detected: 147" thing happens on it too. lukminer on this hardware and 18.20 works fine. Did not try 18.10. On which kernels did you guys had success building and running against 18.10? Were you using in-kernel amdgpu modules, or ones from amdgpu-pro-18.10 tarball? |
Can confirm that 18.10 works fine with 4.14 and 4.16 kernels. Did not come around yet to checking whether 18.20 on the very same setup does not work and thus confirmed to have regression. "binary which immediately segfaults" problem was caused by AMD not being able to use standard libdrm: install Also, OpenCL_LIBRARY should point to to library itself, not to its directory: |
@nick-perchev Hey Nick, thanks a lot for the input. Do you know where can I find the 18.10 AMD driver? I can't seem to find it on the AMD site. |
@BeniSandu use this |
It looks like the issue is within AMD kernel modules, I could make it work on a fresh Ubuntu 18.04 install with the latest AMD 18.20 driver, using the modules that come with the 4.15.0-23-generic kernel. A quick guide thanks to the replies from @cdarken and @nick-perchev:
You'll get a warning about the amd-dkms package missing, just ignore it.
Here you can use your own config options, the important part is the path to the libOpenCL.so library.
|
@yoburto
Did you try this? |
Yes. Don’t work. Regards. |
I have the same problem before , only with Sapphire Baffin GPU 2Go ,Sapphire Lexa GPU 2Go work ! The only problems with this driver is 40H/s is lost per card (484 before but crash, 446 now but stable) |
That combo works good for me as far as I know. 18.x is poison until we have a chance to fix the OpenCL code. |
The driver 17.40 don’t detect my RX550 gpu. And I have the same problem with driver 18.X: may 21 17:35:53 Gondor kernel: amdgpu 0000:28:00.0: GPU fault detected: 147 0x05a00402 I am tired... |
I have found a combination that works out of the box:
I have 14 * RX 550 boards on the mobo (both POLARIS11 and POLARIS12 chips), all are detected and mining without any errors. |
@BeniSandu are you able to use dual threads config for yout polaris 11 ? |
@pcca-matrix Not really, if I use 2 threads on Polaris 11, the hashrate is too low. But on the same note, I get around 420 H/s on Polaris 11 card with 1 thread, compared to around 460 with 2 threads on Windows, so it's not that bad. On Polaris 12, I can use 2 threads just fine. |
i get this problem too,is there any solutions? |
@zhumingyu AMD released a new driver for Linux (18.30) which I didn't try yet, but for 18.20 you can use a working configuration from the previous messages. The one that I use is:
With this one I get -40 H/s on POLARIS 11 chips, but other than that it works without errors. |
@BeniSandu thx!i try a new driver for Linux (18.30),but i get this error . |
for now the best combination for rx550 polaris 10 and 11 with dual thread working is: Ubuntu 16.04 Polaris 10{ "index" : 1, Poalris 11// gpu: Baffin memory:1395 |
Drivers newer than 18.10 do not work, yet. They changed the compiler inside the driver. |
@Spudz76 Do we have to wait for an new updated version? |
Yes, watch #1797 as that is where any action is happening. Or run the 17.x and give up on the broken 18.x Someone should write Vulkan backend soon... OpenCL is deprecated so we're chasing a dead rabbit, really. |
@Spudz76 you're confusing OpenCL with OpenGL. Vulkan is for graphics, not computing. |
While playing with pp_od_clk_voltage on Linux AMDGPU, i discovered i got the same error when the memory is undervolted. When i put "m 2 2075 900" (900mV for 2075MHz, memory DPM state 2) on one of my RX 580, it's what i get. Bumping up to 905mV for that precise card makes it flawlessly work. It's probably part of silicone lottery, as the RX 580, on the exact same system, exact configurations, work flawlessly. If someone have same issue, try pushing a little bit more voltage in memory might resolve the issue. I guess version 18.x might be more aggressive on DPM states. |
Yes, I usually use these as signs of needing to back off on clock or raise volts or etc. The GPU crashed the job, effectively. Note the thermal efficiency of everything involved in cooling gets worse over time and eventually the stock clocks can be too much. So it can also be a sign your card needs a repaste (or maybe just a dusting, if it is clogged up severely). Some R9 cards and maybe some RX also, from some manufacturers, had less than ideal VRM cooling (or less than ideal VRMs for 100%/24/7 mining) and the on-card voltages would have ripple that caused random errors (even at stock clocks, even at underclocked, etc). Luckily if it's that, shortly the card will stop working completely as it only takes one VRM (of 6-12 units) to completely die and the whole thing stops working (or acts like you didn't connect the additional power whips, but most of them stop showing up on PCI at all). Some even 'crowbar' and will just blow any PCIe slot or mobo you put it in. But more common they melt-open and don't short out. But it still can be driver issues, it's a very generic crash message. You can intentionally cause it with bad clocking/voltage setup however. |
Please provide as much as possible information to reproduce the issue.
Basic information
Compile issues
Arch Linux. Linux Gondor 4.16.9-1-ARCH #1 SMP PREEMPT Thu May 17 02:10:09 UTC 2018 x86_64 GNU/Linux
add all commands you used and the full compile output here
yaourt -S aur/xmr-stak-nvidia-git
Issue with the execution
By yaourt/aur
run
./xmr-stak --version-long
and add the output hereVersion: xmr-stak/2.4.3/26a5d65/makepkg/lin/amd-cpu/aeon-cryptonight-monero/0
AMD OpenCl issue
The text was updated successfully, but these errors were encountered: