Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error -4: Enqueueing kernel onto command queue. (clEnqueueNDRangeKernel) #246

Closed
platinum4 opened this issue Jun 9, 2014 · 43 comments
Closed
Milestone

Comments

@platinum4
Copy link
Contributor

When building a bin file for scrypt kernel [zuikkis and/or alexkarnew] on a Hawaii chipset R9 290/X architecture, sgminer5 throws this error.

[00:30:50] Probing for an alive pool
[00:30:51] Switching to NiceHash_Scrypt_backup - first alive pool
[00:30:52] Initialising kernel alexkarnew.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:30:52] Initialising kernel alexkarnew.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:30:52] Initialising kernel alexkarnew.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:30:52] Initialising kernel alexkarnew.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:30:52] Initialising kernel alexkarnew.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:30:52] Initialising kernel alexkarnew.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:30:54] Error -4: Enqueueing kernel onto command queue. (clEnqueueNDRangeKernel)
[00:30:54] Error -4: Enqueueing kernel onto command queue. (clEnqueueNDRangeKernel)
[00:30:54] Error -4: Enqueueing kernel onto command queue. (clEnqueueNDRangeKernel)
[00:30:54] GPU 0 failure, disabling!
[00:30:54] GPU 2 failure, disabling!
[00:30:54] GPU 1 failure, disabling!

This issue is not found present when running scrypt bins under kalroth's cgminer.

The settings for the scrypt bin are as follows, which are the preferred/max settings for a Hawaii R9 290.

worksize 256, TC 32765 - bin's initialize just fine on cgminer and provide 995Kh/s
under this new sgminer - nothing but Error -4; no change in memory or architecture, as I can close sgminer and go directly over to cgminer

This is the current working alex scrypt bin file that I have scrypt130511_alexeyHawaiiglg2tc32765w256l4

I can get sgminer to build
alexkarnewHawaiiglg2tc32765w128l4, it never respects the pool-worksize setting.

And regardless, if you force it into 256, it still -4 Error.

@mrbrdo
Copy link
Contributor

mrbrdo commented Jun 9, 2014

There is no pool-worksize setting.

Can we try a few things:

  • make cgminer_kalroth build the bin file with appropriate settings, then copy it into sgminer and make the filename match (it might be different than in cgminer_kalroth). is there any difference?
  • please use ckolivas kernel when comparing to kalroth, because this is what kalroth uses
  • try lower thread-concurrency (but don't immediately go very low, and make it a multiple of shaders - e.g. for R9 290 it should be X * 2560)
  • show me output of when it's building (and then loading) the kernel (I only see loading from what you've shown)

Also, it seems people had this error before: https://www.weminecryptos.com/forum/topic/2299-getting-error-4-enqueueing-kernel-onto-command-queue/ - it seems to be related with low system RAM. I think you might just be getting this because alexkarnew might need different settings, so you need to try ckolivas first.

@platinum4
Copy link
Contributor Author

Ok this has been mitigated by replacing alexkarnew with ckolivas as algorithm; however, this now predictably crashes every 60s from this error.

[22:28:03] Started sgminer 4.2.1
[22:28:03] Loaded configuration file C:\sgminer_v5_0_06062014\sgminer-nicehash.conf
[22:28:03] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:28:03] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:28:03] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:28:03] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:28:03] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:28:03] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:28:03] Probing for an alive pool
[22:28:04] Switching to Marucoin suprnova - first alive pool
[22:28:04] Network diff set to 19
[22:28:04] NiceHash_X13_multi alive, testing stability
[22:28:04] Switching to NiceHash_X13_multi
[22:28:04] Network diff set to 1.66K
[22:28:04] New block detected on network before pool notification
[22:28:06] Network diff set to 22
[22:28:06] Stratum from Marucoin suprnova detected new block
[22:28:06] Switching mrr x13 platinum4.2 to stratum+tcp://us-east01.miningrigrentals.com:50100
[22:28:07] Network diff set to 1.3K
[22:28:07] Stratum from NiceHash_X13_multi detected new block
[22:28:14] Switching to NiceHash_Scrypt
[22:28:15] NiceHash_Scrypt difficulty changed to 512
[22:28:19] Network diff set to 438K
[22:28:19] New block detected on network before pool notification
[22:28:19] Initialising kernel ckolivas.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:28:19] Initialising kernel ckolivas.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:28:19] Initialising kernel ckolivas.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:28:19] Network diff set to 481K
[22:28:19] Stratum from NiceHash_Scrypt detected new block
[22:28:22] Network diff set to 510K
[22:28:22] Stratum from NiceHash_Scrypt detected new block
[22:28:26] Network diff set to 520
[22:28:26] Stratum from tmb x13 multiport detected new block
[22:28:36] Network diff set to 1.39K
[22:28:36] Stratum from NiceHash_X13_multi detected new block
[22:28:37] NiceHash_Scrypt stale share detected, submitting as user requested
[22:28:37] Accepted Coin 510491 Diff 1.1K/512 GPU 1 at NiceHash_Scrypt
[22:29:00] Accepted Coin 510491 Diff 3.38K/512 GPU 0 at NiceHash_Scrypt
[22:29:00] Accepted Coin 510491 Diff 3.23K/512 GPU 2 at NiceHash_Scrypt
[22:29:04] Network diff set to 562K
[22:29:04] Stratum from NiceHash_Scrypt detected new block
[22:29:11] Accepted Coin 562426 Diff 661/512 GPU 0 at NiceHash_Scrypt
[22:29:19] thread was not cancelled in 60 seconds after restart_mining_threads
[22:29:19]
Summary of runtime statistics:

[22:29:19] Started at [2014-06-08 22:28:04]
[22:29:19] Runtime: 0 hrs : 1 mins : 0 secs
[22:29:19] Average hashrate: 2.7 Megahash/s
[22:29:19] Solved blocks: 0
[22:29:19] Best share difficulty: 3.38K
[22:29:19] Share submissions: 4
[22:29:19] Accepted shares: 4
[22:29:19] Rejected shares: 0
[22:29:19] Accepted difficulty shares: 2048
[22:29:19] Rejected difficulty shares: 0
[22:29:19] Reject ratio: 0.0%
[22:29:19] Hardware errors: 0
[22:29:19] Utility (accepted shares / min): 4.00/min
[22:29:19] Work Utility (diff1 shares solved / min): 2494.52/min

[22:29:19] Stale submissions discarded due to new blocks: 0
[22:29:19] Unable to get work from server occasions: 0
[22:29:19] Work items generated locally: 136
[22:29:19] Submitting work remotely delay occasions: 0
[22:29:19] New blocks detected on network: 10

[22:29:19] Summary of per device statistics:

[22:29:19] GPU0 | (5s):937.7K (avg):944.7Kh/s | A:1024 R:0 HW:0 WU:1006.016/m
[22:29:19] GPU1 | (5s):936.4K (avg):944.7Kh/s | A:512 R:0 HW:0 WU:704.711/m
[22:29:19] GPU2 | (5s):935.2K (avg):944.7Kh/s | A:512 R:0 HW:0 WU:783.791/m
[22:29:19]
[22:29:19] Stratum connection to NiceHash_Scrypt interrupted

Then it crashes out.

This is with pool-gpu-threads : 1 - I continue to get -4 Errors if we do not specify pool-gpu-threads to 1 since my main gpu-threads is 2. gpu-threads 2 will always lead to an -4 Error.

@platinum4
Copy link
Contributor Author

I am trying with "no-restart" : true right now.

@platinum4
Copy link
Contributor Author

[22:33:41] Started sgminer 4.2.1
[22:33:41] Loaded configuration file C:\sgminer_v5_0_06062014\sgminer-nicehash.conf
[22:33:42] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:33:42] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:33:42] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:33:42] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:33:42] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:33:42] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:33:42] Probing for an alive pool
[22:33:43] Switching to Marucoin suprnova - first alive pool
[22:33:43] Network diff set to 20
[22:33:44] NiceHash_X13_multi alive, testing stability
[22:33:44] Switching to NiceHash_X13_multi
[22:33:44] Network diff set to 1.48K
[22:33:44] New block detected on network before pool notification
[22:33:44] Marucoin suprnova stale share detected, submitting as user requested
[22:33:44] Accepted Coin 21 Diff 0.039/0.004 GPU 0 at Marucoin suprnova
[22:33:45] Switching mrr x13 platinum4.2 to stratum+tcp://us-east01.miningrigrentals.com:50100
[22:33:53] Stratum from NiceHash_X13_multi requested work restart
[22:33:54] Accepted Coin 1484 Diff 0.019/0.008 GPU 0 at NiceHash_X13_multi
[22:33:57] Accepted Coin 1484 Diff 0.098/0.008 GPU 0 at NiceHash_X13_multi
[22:33:57] Accepted Coin 1484 Diff 0.010/0.008 GPU 0 at NiceHash_X13_multi
[22:33:58] Accepted Coin 1484 Diff 0.046/0.008 GPU 1 at NiceHash_X13_multi
[22:34:07] Switching to NiceHash_Scrypt
[22:34:08] NiceHash_Scrypt difficulty changed to 512
[22:34:11] Network diff set to 162K
[22:34:11] Stratum from NiceHash_Scrypt detected new block
[22:34:12] Network diff set to 162K
[22:34:12] Stratum from NiceHash_Scrypt detected new block
[22:34:12] Initialising kernel ckolivas.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:34:12] Initialising kernel ckolivas.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:34:12] Initialising kernel ckolivas.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[22:34:16] Network diff set to 162K
[22:34:16] Stratum from NiceHash_Scrypt detected new block
[22:34:19] Accepted Coin 162013 Diff 848/512 GPU 1 at NiceHash_Scrypt
[22:34:21] Network diff set to 2.54M
[22:34:21] Stratum from NiceHash_Scrypt detected new block
[22:34:40] Accepted Coin 2539405 Diff 1.05K/512 GPU 0 at NiceHash_Scrypt
[22:34:40] Accepted Coin 2539405 Diff 2.85K/512 GPU 2 at NiceHash_Scrypt
[22:34:45] NiceHash_Scrypt extranonce change requested
[22:34:45] Network diff set to 309K
[22:34:45] Stratum from NiceHash_Scrypt detected new block
[22:34:47] Accepted Coin 308808 Diff 2.93K/512 GPU 0 at NiceHash_Scrypt
[22:34:56] Accepted Coin 308808 Diff 2.4K/512 GPU 0 at NiceHash_Scrypt
[22:34:58] Accepted Coin 308808 Diff 752/512 GPU 2 at NiceHash_Scrypt
[22:34:59] Stratum from NiceHash_Scrypt requested work restart
[22:35:06] Accepted Coin 308808 Diff 1.01K/512 GPU 2 at NiceHash_Scrypt
[22:35:12] thread was not cancelled in 60 seconds after restart_mining_threads
[22:35:12]

@mrbrdo
Copy link
Contributor

mrbrdo commented Jun 9, 2014

Just for start, can you try around line 6260 in sgminer.c, just above quit(1, "thread was not cancelled in 60 seconds after restart_mining_threads");, add a new line with pthread_testcancel(); and then run make again to recompile. I don't think it will help but just in case let's try it.

Edit: Oh right you are on Windows. Are you able to recompile?

@platinum4
Copy link
Contributor Author

tbh I wait on the Windows compiles provided by Elun on bitcointalk. I have
already asked him to make a new set of binaries based on your recent
commits. Can you add that line into sgminer.c as you don't think it will
be detrimental, or do you want me to add into my repo and have Elun build
it? No matter what I do, how I try, that stupid win-build guide does not
work for me, and I have followed it from the beginning now at least 5x,
enough to get frustrated.

On Sun, Jun 8, 2014 at 10:49 PM, Jan Berdajs notifications@github.com
wrote:

Just for start, can you try around line 6260 in sgminer.c, just above quit(1,
"thread was not cancelled in 60 seconds after restart_mining_threads");,
add a new line with pthread_testcancel(); and then run make again to
recompile. I don't think it will help but just in case let's try it.


Reply to this email directly or view it on GitHub
#246 (comment).

@mrbrdo
Copy link
Contributor

mrbrdo commented Jun 9, 2014

Yes I also had problems with Windows build. Here is some discussion about it, it seems they figured it out: #229

I wouldn't like to add it to the branch, because I don't want to add code that has no effect.
As a side note, if you remove all "pool-gpu-threads" from your config, this should not happen.

@mrbrdo
Copy link
Contributor

mrbrdo commented Jun 9, 2014

I think I figured out how to reproduce this. Seems it is indeed a bug. Only happens when "pool-gpu-threads" is configured.

@Bllacky
Copy link
Contributor

Bllacky commented Jun 9, 2014

@mrbrdo Have you managed to build SGminer with the latest commits? I tried last night and I had some major issues. But I don't know if it's my fault or it's something from the source.

@platinum4
Copy link
Contributor Author

"I think I figured out how to reproduce this. Seems it is indeed a bug. Only happens when "pool-gpu-threads" is configured."

Ditto - any thoughts on how to side-step this bug?

Edit: hardcoding gpu-threads 1 in the bottom portion of .conf file & removing all instances of pool-gpu-threads appears to have allowed me to build ckolivas24000nf11 and start hashing on scrypt-N for more than a minute; however, if you switch pools down to scrypt with an nf10 it immediately fails.

[00:42:46] Started sgminer 4.2.1
[00:42:46] Loaded configuration file C:\sgminer_v5_0_06062014\sgminer-nicehash.conf
[00:42:46] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:42:46] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:42:46] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:42:46] Probing for an alive pool
[00:42:47] Switching to Marucoin suprnova - first alive pool
[00:42:47] Network diff set to 16
[00:42:47] NiceHash_X13_multi alive, testing stability
[00:42:47] Switching to NiceHash_X13_multi
[00:42:47] Network diff set to 1.07K
[00:42:47] New block detected on network before pool notification
[00:42:49] Switching mrr x13 platinum4.2 to stratum+tcp://us-east01.miningrigrentals.com:50100
[00:42:49] Accepted Coin 1071 Diff 0.008/0.005 GPU 2 at NiceHash_X13_multi
[00:42:49] Stratum from NiceHash_X13_multi requested work restart
[00:42:52] Accepted Coin 1071 Diff 0.075/0.005 GPU 0 at NiceHash_X13_multi
[00:42:53] Network diff set to 1.09K
[00:42:53] Stratum from NiceHash_X13_multi detected new block
[00:42:53] Accepted Coin 1089 Diff 0.015/0.005 GPU 1 at NiceHash_X13_multi
[00:42:55] Accepted Coin 1089 Diff 0.022/0.005 GPU 2 at NiceHash_X13_multi
[00:42:55] Accepted Coin 1089 Diff 0.005/0.005 GPU 0 at NiceHash_X13_multi
[00:42:56] Accepted Coin 1089 Diff 0.012/0.005 GPU 1 at NiceHash_X13_multi
[00:42:57] Accepted Coin 1089 Diff 0.009/0.005 GPU 2 at NiceHash_X13_multi
[00:42:58] Accepted Coin 1089 Diff 0.006/0.005 GPU 1 at NiceHash_X13_multi
[00:43:00] Accepted Coin 1089 Diff 0.006/0.005 GPU 2 at NiceHash_X13_multi
[00:43:01] Accepted Coin 1089 Diff 0.009/0.005 GPU 2 at NiceHash_X13_multi
[00:43:05] Accepted Coin 1089 Diff 0.284/0.005 GPU 1 at NiceHash_X13_multi
[00:43:08] Switching to NiceHash_Scrypt-N
[00:43:08] NiceHash_Scrypt-N difficulty changed to 512
[00:43:12] Network diff set to 654M
[00:43:12] Stratum from NiceHash_Scrypt detected new block
[00:43:13] Network diff set to 15.6M
[00:43:13] New block detected on network before pool notification
[00:43:13] Building binary ckolivasHawaiiglg2tc24000nf11w128l4.bin
[00:43:20] Initialising kernel ckolivas.cl with bitalign, unpatched BFI, nfactor 11, n 2048
[00:43:20] Initialising kernel ckolivas.cl with bitalign, unpatched BFI, nfactor 11, n 2048
[00:43:20] Initialising kernel ckolivas.cl with bitalign, unpatched BFI, nfactor 11, n 2048
[00:43:34] Network diff set to 1.12K
[00:43:34] Stratum from tmb x13 multiport detected new block
[00:43:50] Accepted Coin 15580881 Diff 1.08K/512 GPU 0 at NiceHash_Scrypt-N
[00:43:50] Stratum from NiceHash_Scrypt-N requested work restart
[00:43:58] Network diff set to 15.6M
[00:43:58] Stratum from NiceHash_Scrypt-N detected new block
[00:44:06] Stratum from NiceHash_Scrypt-N requested work restart
[00:44:12] Accepted Coin 15574040 Diff 705/512 GPU 2 at NiceHash_Scrypt-N
[00:44:42] Switching to NiceHash_Scrypt
[00:44:43] NiceHash_Scrypt difficulty changed to 512
[00:44:47] Network diff set to 654M
[00:44:47] New block detected on network before pool notification
[00:44:47] Initialising kernel ckolivas.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:44:47] Initialising kernel ckolivas.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:44:48] Initialising kernel ckolivas.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:44:48] Error -4: Enqueueing kernel onto command queue. (clEnqueueNDRangeKernel)
[00:44:48] Error -4: Enqueueing kernel onto command queue. (clEnqueueNDRangeKernel)
[00:44:48] Error -4: Enqueueing kernel onto command queue. (clEnqueueNDRangeKernel)
[00:44:48] GPU 1 failure, disabling!
[00:44:48] GPU 2 failure, disabling!
[00:44:48] GPU 0 failure, disabling!

@platinum4
Copy link
Contributor Author

Here is an attempt to trick the miner into using 2 different kernel.cl files as the algorithm

[00:47:33] Started sgminer 4.2.1
[00:47:33] Loaded configuration file C:\sgminer_v5_0_06062014\sgminer-nicehash.conf
[00:47:33] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:47:33] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:47:33] Initialising kernel marucoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:47:33] Probing for an alive pool
[00:47:34] Switching to Marucoin suprnova - first alive pool
[00:47:34] Network diff set to 16
[00:47:34] NiceHash_X13_multi alive, testing stability
[00:47:34] Switching to NiceHash_X13_multi
[00:47:35] Switching mrr x13 platinum4.2 to stratum+tcp://us-east01.miningrigrentals.com:50100
[00:47:35] Network diff set to 1.11K
[00:47:35] New block detected on network before pool notification
[00:47:35] Marucoin suprnova stale share detected, submitting as user requested
[00:47:35] Accepted Coin 16 Diff 0.008/0.004 GPU 2 at Marucoin suprnova
[00:47:41] Accepted Coin 1107 Diff 0.016/0.008 GPU 0 at NiceHash_X13_multi
[00:47:41] Stratum from NiceHash_X13_multi requested work restart
[00:47:44] Switching to NiceHash_Scrypt
[00:47:45] NiceHash_Scrypt difficulty changed to 512
[00:47:48] Network diff set to 1.16K
[00:47:48] Stratum from tmb x13 multiport east2 detected new block
[00:47:49] Network diff set to 51.9M
[00:47:49] New block detected on network before pool notification
[00:47:50] Building binary zuikkisHawaiiglg2tc32765nf10w128l4.bin
[00:47:54] NiceHash_Scrypt extranonce change requested
[00:47:54] Stratum from NiceHash_Scrypt requested work restart
[00:47:55] Stratum from NiceHash_Scrypt requested work restart
[00:47:57] Initialising kernel zuikkis.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:47:57] Initialising kernel zuikkis.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:47:57] Initialising kernel zuikkis.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[00:48:00] Accepted Coin 51916139 Diff 2.83K/512 GPU 0 at NiceHash_Scrypt
[00:48:02] Network diff set to 654M
[00:48:02] Stratum from NiceHash_Scrypt detected new block
[00:48:14] Switching to NiceHash_Scrypt-N
[00:48:15] Network diff set to 15.6M
[00:48:15] New block detected on network before pool notification
[00:48:15] Initialising kernel ckolivas.cl with bitalign, unpatched BFI, nfactor 11, n 2048
[00:48:15] Initialising kernel ckolivas.cl with bitalign, unpatched BFI, nfactor 11, n 2048
[00:48:15] Initialising kernel ckolivas.cl with bitalign, unpatched BFI, nfactor 11, n 2048
[00:48:15] Error -4: Enqueueing kernel onto command queue. (clEnqueueNDRangeKernel)
[00:48:15] Error -4: Enqueueing kernel onto command queue. (clEnqueueNDRangeKernel)
[00:48:15] Error -4: Enqueueing kernel onto command queue. (clEnqueueNDRangeKernel)
[00:48:15] GPU 1 failure, disabling!
[00:48:15] GPU 0 failure, disabling!
[00:48:15] GPU 2 failure, disabling!
[00:48:16] Network diff set to 654M
[00:48:16] Stratum from NiceHash_Scrypt detected new block

Again, trying to from from nf10 -> nf11; or nf11 -> nf10 throws failures. For now; I can add scrypt as a backup without an issue, since ckolivas works with nf10

@platinum4
Copy link
Contributor Author

As it is right now, this miner can effectively algo flip between scrypt-N, keccak, x11, x13 [no scrypt nf10]

Which, for right now, is all of the feasible mining algorithms for GPUs. It looks like ASICs have increased scrypt difficulty to a phenomenal level already.

@mrbrdo
Copy link
Contributor

mrbrdo commented Jun 9, 2014

If you don't use "pool-gpu-threads" in your config (at all) then it should not happen. Does it?

Also, I experience this problem with Scrypt-N too.

@platinum4
Copy link
Contributor Author

"If you don't use "pool-gpu-threads" in your config (at all) then it should not happen. Does it?"

Confirmed - we are past the 60second shut off. However, I have removed all instances of pool-gpu-threads, and hardcoded gpu-threads to equal 1 in the .conf file. It allows the miner to run, but when you flip from scrypt to scrypt-N it throws -4 Error as provided above.

@platinum4
Copy link
Contributor Author

It's not the "pool-nfactor" setting, because that one works effectively. However, I have NOT tried to flip back to an X13 after that. Experimenting now.

@platinum4
Copy link
Contributor Author

Yeah it algo flips amongst nfactors ok. The problem lies in how it flips amongst nfactors within the same algo (ie scrypt). It fails consistently trying to switch only from scrypt10 to scrypt11 and/or either way back.

@troky
Copy link
Contributor

troky commented Jun 9, 2014

@platinum4 How much RAM do you have installed? What is configured pagefile size? 3x R9 290, right?

@mrbrdo
Copy link
Contributor

mrbrdo commented Jun 9, 2014

Well, it seems somehow it cannot allocate enough memory for kernel/buffers. This seems to work fine when pool-gpu-threads is not set (which means soft restart, mining threads are not completely stopped and restarted). But when it is set (hard restart, mining threads completely stopped and then restarted), it seems like there is some memory left reserved in the devices. But I cannot figure what it could be and why it is not happening with soft restart. It definitely happens because it is out of memory, for example if I set gpu-threads to 1 (which means less VRAM use), then it works fine.
For me it happens when switching from darkcoin, darkcoin-mod or maxcoin kernel to scrypt (I didn't try others).

(also http://www.popekim.com/2012/07/opencl-getting-outofresource-or.html this is the error we get - -4 CL_MEM_OBJECT_ALLOCATION_FAILURE)

@platinum4
Copy link
Contributor Author

@troky 8GB RAM, no problem mining any algos with it. Pagefile is set to automatic I am assuming. I'll try 12GB pagefile size and report back.

Pagefile shouldn't matter. The same error occurs on these rigs - 2x 290X, 3x 290X, 3x 390X, so pagefiles should ideally be 8gb,12gb,12gb

Expanded all page files to 12228MB still -4 Errors when switching amonst scrypt and scrypt-N

@platinum4
Copy link
Contributor Author

And still getting SICK -> DEAD errors on a few cards (mainly only R9 290X Tri-X OC 1040/1300)

@mrbrdo
Copy link
Contributor

mrbrdo commented Jun 9, 2014

That's probably unrelated, would probably happen with sph-sgminer-x11mod too. It's probably just misconfiguration.

@platinum4
Copy link
Contributor Author

Well, not when Elun had extended that SICK timer... ;D

But yeah, we'll go with a 'continuous configuration error' after all this time if it's easier to stomach. ;)

@mrbrdo
Copy link
Contributor

mrbrdo commented Jun 9, 2014

Extending SICK timer is not a solution, it's a workaround... And if we start with those, we will get nowhere again. But I might just throw that restarting SICK GPUs out some day because it doesn't seem to ever help anything.

One guy was complaining about SICK/DEAD on X13-mod, but then he found some different settings on some forum and no more SICK, and he even got better hashrate/WU out of it. I'm not saying the X13-mod kernel doesn't have problems, but we use the same that everyone does (from girino), so the problems should be the same no matter which miner you use. Also for example someone told me that he got better hashrate with intensity 18 instead of 20. So it's not necessary to always put everything to absolute max. Similarly keccak seems to work better with gpu-threads 1 instead of 2 on R9 280X (without changing any other settings).

But back to the point, I spent some time looking at what could be causing this weird bug with scrypt kernels, but I have no idea at all yet :/

@platinum4
Copy link
Contributor Author

This issue is still open until more reports in on a successful flip to nscrypt and back.

@mrbrdo
Copy link
Contributor

mrbrdo commented Jun 30, 2014

@platinum4 so are you saying you yourself do not experience the problem anymore?

@platinum4
Copy link
Contributor Author

Still with nf11, and I think others do as well, reports if it from a few members in sgminer-dev IRC and on bitcoin talk still

Sent from my iPhone

On Jun 30, 2014, at 2:15 PM, Jan Berdajs notifications@github.com wrote:

@platinum4 so are you saying you yourself do not experience the problem anymore?


Reply to this email directly or view it on GitHub.

@mrbrdo
Copy link
Contributor

mrbrdo commented Jun 30, 2014

Hm, well the new v5_0 does not do a hard reset unless gpu-threads is really different. Does this still happen for you when switching from scrypt to scrypt-n or vice versa (assuming that you use the same gpu-threads for both)? Or does it only happen when switching from some algo where you use a different gpu-threads (e.g. Keccak)?
I tested it a few minutes ago without hard restart (so gpu-threads was same), and it worked fine (as before, I only experienced this on hard-restart).

@platinum4
Copy link
Contributor Author

I'll have to test on rigs later tonight I will get back to you ok

Sent from my iPhone

On Jun 30, 2014, at 5:51 PM, Jan Berdajs notifications@github.com wrote:

Hm, well the new v5_0 does not do a hard reset unless gpu-threads is really different. Does this still happen for you when switching from scrypt to scrypt-n or vice versa (assuming that you use the same gpu-threads for both)? Or does it only happen when switching from some algo where you use a different gpu-threads (e.g. Keccak)?
I tested it a few minutes ago without hard restart (so gpu-threads was same), and it worked fine (as before, I only experienced this on hard-restart).


Reply to this email directly or view it on GitHub.

@platinum4
Copy link
Contributor Author

@mrbrdo I cannot replicate this issue anymore [Windows binaries, possibly fixed by the slew of commits in June 2014]; closing this issue for now unless others can repeat it.

@platinum4
Copy link
Contributor Author

Re-opened this issue as it is being experienced by others, such as @evolvia31 #308

@platinum4 platinum4 reopened this Jul 1, 2014
@mrbrdo mrbrdo added bug labels Jul 2, 2014
@mrbrdo mrbrdo added this to the 5.0 milestone Jul 2, 2014
@evolvia31
Copy link

Hi all, i just try this morning the last commit to branch v5_0 and issue continue :(
Mysgminer version is : 4.2.2-240-gec8b
If you need more log or debug tell me which i do post to help you to solve this issue.
(My full config is post in my issue number #308 )
I try all scrypt N kernel support (zuikkis, alexkar, ckolivas ) and i have the same bug when i switch from x11, x13 or Keccak to scrypt-N
this is my output log:
[07:50:34] Initialising kernel darkcoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[07:50:34] Initialising kernel darkcoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[07:50:34] Initialising kernel darkcoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[07:50:34] Initialising kernel darkcoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[07:50:34] Initialising kernel darkcoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[07:50:34] Initialising kernel darkcoin-mod.cl with bitalign, unpatched BFI, nfactor 10, n 1024
[07:51:01] Accepted 12b76333 Diff 0.053/0.040 GPU 1 at NiceHash_X11
[07:51:14] Accepted 115262c9 Diff 0.058/0.040 GPU 0 at NiceHash_X11
[07:51:44] Accepted 0b3411a8 Diff 0.089/0.040 GPU 1 at NiceHash_X11
[07:52:04] Stratum connection to Waffle_X11 interrupted
[07:52:12] Accepted 10394a79 Diff 0.062/0.040 GPU 0 at NiceHash_X11
[07:52:15] Accepted 17755b6a Diff 0.043/0.040 GPU 0 at NiceHash_X11
[07:53:26] Accepted 081c27b0 Diff 0.123/0.040 GPU 0 at NiceHash_X11
[07:53:42] Accepted 037b5162 Diff 0.287/0.040 GPU 1 at NiceHash_X11
[07:54:02] Accepted 5415e379 Diff 3.044/0.040 GPU 0 at NiceHash_X11
[07:54:08] Accepted 0a602be4 Diff 0.096/0.040 GPU 1 at NiceHash_X11
[07:54:20] Switching to NiceHash_N
[07:54:20] NiceHash_N difficulty changed to 128
[07:54:25] Applying pool settings for NiceHash_N...
[07:54:25] Initialising kernel zuikkis.cl with bitalign, unpatched BFI, nfactor 11, n 2048
[07:54:25] Initialising kernel zuikkis.cl with bitalign, unpatched BFI, nfactor 11, n 2048
[07:54:25] Initialising kernel zuikkis.cl with bitalign, unpatched BFI, nfactor 11, n 2048
[07:54:25] Initialising kernel zuikkis.cl with bitalign, unpatched BFI, nfactor 11, n 2048
[07:54:25] Initialising kernel zuikkis.cl with bitalign, unpatched BFI, nfactor 11, n 2048
[07:54:25] Initialising kernel zuikkis.cl with bitalign, unpatched BFI, nfactor 11, n 2048
[07:54:25] Error -4: Enqueueing kernel onto command queue. (clEnqueueNDRangeKernel)
[07:54:25] Error -4: Enqueueing kernel onto command queue. (clEnqueueNDRangeKernel)
[07:54:25] GPU 1 failure, disabling!
[07:54:25] GPU 2 failure, disabling!
[07:54:27] Accepted 8bdaabb6 Diff 468/128 GPU 0 at NiceHash_N
[07:54:29] Accepted 01109c02 Diff 240/128 GPU 0 at NiceHash_N
[07:54:30] Accepted 40e8602c Diff 1.01K/128 GPU 0 at NiceHash_N

@mrbrdo
Copy link
Contributor

mrbrdo commented Jul 28, 2014

@evolvia31 I have time to look into it this week, can you confirm it is still a problem, or has it been fixed?

@ystarnaud
Copy link
Contributor

@evolvia31 can you paste your full config not just the profiles section please?

@ystarnaud
Copy link
Contributor

Also your specs might be helpful. What model GPUs are you using? How much system memory?

@evolvia31
Copy link

Hi, yes I try this morning again and bug is already with crash.
My config is:
Ubuntu 13.10 (GNU/Linux 3.11.0-15-generic x86_64)
Driver ATI 14.6 beta
sgminer verison: sgminer 4.2.2-255-gb6aef
graphic card: R9 280X MSI Gaming
RAM: 2Go
GPU infos:
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT [Radeon HD 7970/R9 280X](prog-if 00 [VGA controller])
Subsystem: Micro-Star International Co., Ltd. Device 2775
Flags: bus master, fast devsel, latency 0, IRQ 52
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at f7b00000 (64-bit, non-prefetchable) [size=256K]
I/O ports at b000 [size=256]
Expansion ROM at f7b40000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010
Capabilities: [150] Advanced Error Reporting
Capabilities: [270] #19
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] #13
Capabilities: [2d0] #1b
Kernel driver in use: fglrx_pci

My sgminer config file:
"profiles":[
{
"name":"x11",
"algorithm":"darkcoin-mod",
"nfactor" : "10"
},{
"name":"x13",
"algorithm":"marucoin-mod",
"nfactor" : "10"
},{
"name":"Scrypt",
"algorithm":"zuikkis",
"nfactor" : "10"
},{
"name":"ScryptN",
"algorithm":"zuikkis",
"nfactor" : "11"
},{
"name":"keccak",
"algorithm":"maxcoin",
"nfactor" : "10"
},{
"name":"x15",
"algorithm":"bitblock",
"nfactor" : "10",
"intensity" : "19",
"worksize" : "64",
"gpu-memclock" : "1250",
"gpu-engine" : "1100"
},{
"name":"nist5",
"algorithm":"talkcoin-mod",
"nfactor" : "10"
}
],
"intensity" : "13",
"vectors" : "1",
"worksize" : "256",
"kernel" : "zuikkis",
"lookup-gap" : "2",
"thread-concurrency" : "8192",
"shaders" : "2048",
"gpu-engine" : "1050",
"gpu-fan" : "80",
"gpu-memclock" : "1500",
"gpu-memdiff" : "0",
"gpu-powertune" : "5",
"gpu-vddc" : "0.000",
"temp-cutoff" : "88",
"temp-overheat" : "83",
"temp-target" : "75",
"api-mcast-port" : "4028",
"api-port" : "4028",
"api-listen" : true,
"api-allow" : "W:192.168.0.45",
"expiry" : "30",
"gpu-dyninterval" : "7",
"gpu-platform" : "0",
"gpu-threads" : "2",
"log" : "60",
"no-pool-disable" : true,
"queue" : "0",
"scan-time" : "5",
"scrypt" : true,
"temp-hysteresis" : "3",
"shares" : "0",
"kernel-path" : "/home/sgminer/v50/kernel"
}

@ystarnaud
Copy link
Contributor

Where are your pools in the config? Do you only have 1 GPU? I could have sworn the enqueue error was with GPU 1 and 2 while 0 was ok.

Also using gpu threads 2 across the board seems a bit dangerous. I haven't dealt with non Xn algorithms in a while but I'm pretty sure some of the more intense algorithms needed only 1 thread.

I would recommend you test each algorithm individually with only 1 pool and no switching to make sure you have the correct settings before putting them all in 1 file.

One last thing... Do you run as root? I notice you have /home/sgminer/

@mrbrdo
Copy link
Contributor

mrbrdo commented Jul 29, 2014

@ystarnaud it's an older issue.. I was able to reproduce it too. It seems to happen when switching from Scrypt to Scrypt-N.

@Bllacky
Copy link
Contributor

Bllacky commented Jul 29, 2014

Scrypt-N is working in a very strange way.
For me it never starts all cards/threads the same. For instance I will have one card at 300 KH/s, one at 330 KH/s, one at 350 KH/s, and one at 370Kh/s . All my cards work top speed at 372 KH/s . And to reach speeds close to 365-372 I have the restart sgminer several times, as well as the rig.

Scrypt-N or its kernel are very capricious.

@evolvia31
Copy link

Hi, i have 3 ou 4 GPU card by server, i use more than 20 different pool so i don't publish them but, i have build my config with one pool and each algo config works fine since few months. The problem begin during may but i don't know with which exactly sgminer version.

All works fine during long weeks except if i try to switch from algo x11, x13 or x15 to scrypt-N.

I have no problem when i switch from Scrypt-N to any other algo.
I have no problem when i switch to Scrypt to scrypt-N
I have no problem when i switch from X11,X13 or X15 algo to scrypt

The only problem is from X11, X13 or X15 to Scrypt-N.
To reproduce the bug, i use only one pool X11 algo, when each card solve 2 shares each, when i try to switch i have the error message and i have 1 or 2 GPU from 3 which failed.

Sgminer restart don't work to re-enable GPU, i need to quit sgminer stay 30 sec and re-launch sgminer.

Yes all instance works as root.

@platinum4
Copy link
Contributor Author

I've given up on nscrypt until at least the winter time; I can build bins fine now but with 14.7RC drivers the TC was sliced in half, down to a maximum of 8192. Even then, hardware errors were experienced and the hashrate sucked. If I ever go back to nscrypt it will be with the 13.12 drivers on a dedicated rig, unless AMD comes out with better ones in the mean time.

The only error I notice now when building nf11 is Error -61 memory size, which is decrease TC or increase lookup gap error; so it's unrelated.

@platinum4
Copy link
Contributor Author

Also, as an aside note; I've found this particular kernel https://github.com/exeminer/exeminer/blob/master/scrypt140202.cl to not cause any HW errors, and it's different from bufius, ckolivas, alexkarnew & alexkarold

@Bllacky
Copy link
Contributor

Bllacky commented Jul 30, 2014

If you intend to merge this kernel into SGminer, please try to keep some consistency. We have different commits in master, 5.0 and developer.

@platinum4
Copy link
Contributor Author

This is largely dependent on scrypt n-factor 11, which sucks balls anyway; for the time being, I shall close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants