Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck Device Error #25

Closed
BallowandSunder opened this issue Jan 14, 2020 · 7 comments
Closed

Stuck Device Error #25

BallowandSunder opened this issue Jan 14, 2020 · 7 comments

Comments

@BallowandSunder
Copy link

BallowandSunder commented Jan 14, 2020

Setup: Ubuntu 18.04.1 - Kernel 4.15 using lolminer 0.9.4 with Intel 3855u CPU - 8x PCIe - 8GB RAM - 5x RX570 16GB and 1x Vega FE 16GB - 1600w EVGA Titanium PSU.

Problem: After about 2-4 hours one of the cards ends up with an error talking about a card getting "Stuck" with that cards hashrate dropping to 0.

What I tried so far:

  1. I tried without the Vega in the system, and still experienced the same problem with the 5x RX570 16Gb cards alone.

  2. I've also shuffled between AMD drivers 18.20, 18.50, & 19.50 to no avail.. I used opencl=legacy,pal (legacy for the RX cards and pal for the Vega)

  3. I double-checked using the ePIC miner for the 5x RX570 16GB cards, and was able to continuously mine no problem.

  4. I tired just running one card in the system, the lolminer works perfectly with one card continuously long term, no errors.

I prefer to use lolminer due to the performance enhancements and the fact I can use both the Vega and RX570s at the same time unlike the ePIC miner (where I can only use the RX570 16GB cards and only Legacy drivers in the system), but performance is moot if I cannot continuously run a rig, especially for solo mining.

Any suggestions?

Error

This is what happens right after I use lolminer. This does not happen at all with the ePIC miner. So unfortunately I am going to stop using lolminer until there is a fix or solution to this issue.

More Errors due to Miner

Nevertheless, I appreciate all the hard work you are doing, keep up the good work!

@Lolliedieb
Copy link
Owner

Can you tell me is it always the same care failing?
Usually there are different reasons that could cause this, the most common three are

  • Too high gpu clk
  • Too low voltage
  • Algorithm error (writing to non allocated memory address)

I can not say which of the three is causing the issue here, but maybe you can share your settings for the 570 cards. Also note the C32 8G solve is rather experimental at the moment. It will get an update for 16G cards with 0.9.6 (and a slight speed improvement with 0.9.5, but thats only a bit + bit lower power use).

But I will recheck memory limits / bounds again :)

@BallowandSunder
Copy link
Author

BallowandSunder commented Jan 14, 2020

Thank you for the speedy response!

No, it isn't the same card failing, but I have noticed that its always an RX570 16GB card that gets stuck, and not the Vega Frontier Edition.

The RX570 16GB cards are at stock settings, no modifications.

I tried using both 120v and 240v type outlets with the miner and still arrived to the same issue. I have a brand new EVGA SuperNOVA 1600 T2, 80+ TITANIUM 1600W. The setup is only drawing 900w-950w when using the lolminer.

I did do some power draw tests comparing lolminer to Sapphire's ePIC miner and noticed that when using the lolminer the power draw for C32 was 145w-155w per RX570 card where as ePIC was drawing 121w-125w per card.

The extra power demand made sense why the lolminer dishes out hashes for C32 between 0.15-0.20 GPS where as the ePIC is at 0.101-0.103 GPS. This is all just with stock settings for the RX570 16GB cards.

I've tested C31 and hit the same "Stuck" error. I will try a completely different algorithm
than Grin's, like Beam or Zel to see if its an issue with the lolminer and these cards for Grin.

If you have any other suggestions, possible troubleshooting techniques, or in need of more information/screenshots please let me know. I would definitely like to use the lolminer more full-time.

That is refreshing to hear that you've been working on the next updates, excited to test out your latest updated lolminer when it comes out! Thanks again for your time.

@BallowandSunder
Copy link
Author

BallowandSunder commented Jan 18, 2020

@Lolliedieb Tried the 0.9.5.1 update, so far still getting stuck device error... Also 0.9.5 & 0.9.5.1 on Linux Ubuntu is lagging when it comes to the miner displaying the information as it mines. Looking forward to the next update. Hopefully taking advantage of the 16GB and a few other adjustments that can lead to more stability in the next update so I can begin using this miner 24/7, Cheers!

@BallowandSunder
Copy link
Author

BallowandSunder commented Jan 21, 2020

@Lolliedieb Tried the 0.9.6 update, and I am unfortunately still getting stuck device error... Sigh... Taking advantage of the 16GB has created spectacular hashing but stability issues are the same and still a card gets stuck after 2-4 hours, hashing goes to 0.

Perhaps an intensity adjustment feature for both the CPU and GPUs would help. It might be due to a low-end CPU bottleneck (I read somewhere that Grin's algorithms tend to like mid to high-end CPU power to solve), if I could lower the intensity to lower the CPU usage while shifting more workloads towards the GPUs maybe the GPUs won't get stuck.

In the meantime, keep up the good work and cheers!

@Lolliedieb
Copy link
Owner

Just wanted you to know that I had a similar issue on my 16G 570.

Reason: The settings coded into its firmware are more like eth settings and voltage is coded too low for the memclock. There is a firmware update that I got for my 570 16G that makes it work.

@alanng22
Copy link

Just wanted you to know that I had a similar issue on my 16G 570.

Reason: The settings coded into its firmware are more like eth settings and voltage is coded too low for the memclock. There is a firmware update that I got for my 570 16G that makes it work.

Could you point to me what firmware it is?

@BallowandSunder
Copy link
Author

@alanng22 I want to know that too.

@Lolliedieb Got a link to that firmware update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants