Skip to content
This repository has been archived by the owner on Apr 24, 2022. It is now read-only.

Runtime error - ethash_cuda_miner::search, line 365 #94

Closed
ericalandouglas opened this issue Jun 30, 2017 · 43 comments
Closed

Runtime error - ethash_cuda_miner::search, line 365 #94

ericalandouglas opened this issue Jun 30, 2017 · 43 comments
Labels

Comments

@ericalandouglas
Copy link

ericalandouglas commented Jun 30, 2017

I am hitting an error when attempting to run the newest pre release of the ethminer with enhanced cuda support. Running on Windows10, nvidia driver 382.53, MSI GeForce GTX 1060s. I noticed there was a newer driver, 384.76 when searching here. The line I believe the error is being thrown on is here.

Should I upgrade to this driver to fix the error I am seeing in the screenshot attached? I wasn't given anymore output on the error then what is shown.

19650256_10213483955722965_867713340_o

@ericalandouglas
Copy link
Author

If this is suspected to be caused by too unstable of an overclock let me know and I'll try different afterburner settings also.

@rizwansarwar
Copy link

I have got this error on stock clocks. Your issue is duplicate of #72 #80

@ericalandouglas
Copy link
Author

Thanks for the heads up. I will review the issues and progress made.

@twist2k
Copy link

twist2k commented Jul 1, 2017

Same issues has popped up twice since 0.11.0RC1 release.

@ghost
Copy link

ghost commented Jul 2, 2017

I have been getting the same error. Almost the same as #72 and #80, but not quite.

CUDA error in func 'search' at line 365 : unknown error.

1070's running on Ubuntu 16.04.1, CUDA version 8.0.61, Nvidia Driver 381.22
1070's have stock settings, no overclocking

@rizwansarwar
Copy link

@dereknixon my impression of -L is that it is only effective at startup when DAG is being loaded, and not after. Did your crashes happen randomly during runtime of miner? i-e when the miner was running for an hour and mining, then it crashed?

@ghost
Copy link

ghost commented Jul 2, 2017

@rizwansarwar

I've edited my comment above to remove mention of -L to prevent confusion.

Initially the error was occurring anywhere from a few minutes to maybe an hour after it had been started. This occurred multiple times yesterday. However, it did just run overnight without crashing (without -L).

It seems to occur after "received new job".

Also, I'm new to mining. I have the setup for running machine learning algorithms. Figured I'd try this to put it to use in it's spare time.

@chfast chfast added the bug label Jul 3, 2017
@MichaelA2014
Copy link

I've been getting them when I overclock +650 on memory clocks... I lowered it down to below +500 (+430) and error went away. Hashrate didn't get affected that much.

@davilizh
Copy link
Contributor

davilizh commented Jul 4, 2017

@ericalandouglas, @dereknixon, can you upgrade to 384 driver and have a try?
I have run the code on my GTX1060 for hours with driver 384, and cannot reproduce the issue.

@ericalandouglas
Copy link
Author

@davilizh are you using stock clocks?

@davilizh
Copy link
Contributor

davilizh commented Jul 4, 2017

@ericalandouglas Yes, I'm using stock clocks.

@ericalandouglas
Copy link
Author

Ok @davilizh, I think I should lower my overclock settings and give this another try as per @MichaelA2014 comment. If that doesn't help I will upgrade my driver.

@tex6246
Copy link

tex6246 commented Jul 5, 2017

@MichaelA2014 , @davilizh i can also confirm on mem clocks over 650+ that i am also getting this error, and even a small core clock of 100 can cause this issue

@davilizh
Copy link
Contributor

davilizh commented Jul 5, 2017

@tex6246 Thanks. Which version of driver are you using? Can you upgrade to 384 driver and have a try?

@tex6246
Copy link

tex6246 commented Jul 5, 2017

@davilizh i am on 384.76 latest version, error occurs on both my 1060 and my 1080 ti cards
i do realize this was more aligned for 1060's

@davilizh
Copy link
Contributor

davilizh commented Jul 5, 2017

@chfast I used the old version code "ethminer 0.10.0" from https://github.com/ethereum-mining/ethminer/releases/tag/v0.10.0, this issue occurs when I increase the gpcclk by 500 for my GTX1060 with driver 384.
I think we need to add some guard code to ethereum to recover from such issue. Do you agree?

@chfast
Copy link
Contributor

chfast commented Jul 5, 2017

If it is possible to handle this error and continue, sure!

@davilizh
Copy link
Contributor

davilizh commented Jul 5, 2017

Actually, I do not know how to do this. But there must be ways.

@chfast
Copy link
Contributor

chfast commented Jul 5, 2017

I see 2 possible improvements:

  1. For memory allocation issues, we should try to allocate DAG first, smaller buffers later. In case of failure retry some times.
  2. In case of an exception like "invalid instruction" catch it, log it, and try to restart the CUDA mining.

@ghost
Copy link

ghost commented Jul 5, 2017

@davilizh Stock clocks, very odd... crashing repeatedly with this issue a couple of days ago. But has been running smoothly since early Sunday. I've tried on both 375.66 driver and 381.22 (for about a day each) without issue, now. 384 not available for Ubuntu 16.04 (unless someone could direct me to a resource I'm missing).

Outside of this topic, if anyone is able to help me overclock the cards, I'd be happy to try that and help test any changes you work on. However, I'm new to using these cards, and can't figure out how to overclock these cards. Keep getting an error from nvidia-smi stating 'setting application clocks is not supported'. Side topic, but was hoping someone here might be able to help, as well as be able to provide some support testing changes.

@maarten74
Copy link

Wanted to add another data point. I was consistently getting "error in func 'ethash_cuda_miner::search' at line 365 : unspecified launch failure", similar to #72. I would get the error typically around the 12hr mark, but sometimes after 1hr. This was on both earlier drivers and 384.

By decreasing OC to <700 mem and <100 clock, I have been running for >24 hr without issue. Similar findings to @tex6246. I am using Win 10, 6x GTX 1060 (138-140Mhs), 384 driver, G3900 CPU.

Thanks for all the hard work on the project!

@davilizh
Copy link
Contributor

davilizh commented Jul 5, 2017

@dereknixon 384 is here: http://www.nvidia.com/download/driverResults.aspx/120294/en-us. I do not know it supports ubuntu 16 or not. BTW, one of my friends use GPUMonitor to over clock on windows OS.

@ghost
Copy link

ghost commented Jul 5, 2017

@davilizh Thanks! I'll give it a shot sometime this evening. Off-topic... Beginning to wonder if I should just move to Windows install... generally, speaking everything I'm finding seems to hint at better support and utilities...

@derubm
Copy link

derubm commented Jul 5, 2017

@dereknixon here you go : ( settings for a Zotax gtx 1060 mini, 6gb ddr5 SAMSUNG ram
image

@h1sfy
Copy link

h1sfy commented Jul 6, 2017

I experienced this issue on 0.11 RC1, 1060 3Gb on OC at +0/+750(Samsung) and 1070 at +0/+500(Micron), drivers were 382.53, Windows 10 x64

Was getting this error constantly after ~0.5-2 hours of mining.

Rechecked on release 0.11 with updated driver to 384.76 and OC lowered to +0/+650 and +0/+450 respectively. No issue for 20+ hours straight, I assume it's a stable workaround for now. Btw latest hashrate improvement for 1060 is still 6% for me even with lowered OC, that's just great!

Hope it helps. Thanks for the updates!

@rizwansarwar
Copy link

@new9uy I have had same observations as you. Driver is probably the root cause of the crashes.

In Linux, you can see Xid messages in log files. According to Nvidia, Xid 31 and Xid 32 are application/driver memory corruption issue. I think they might have fixed it in the most recent release of the driver which is why we are seeing the improvement.

@gremo23
Copy link

gremo23 commented Jul 7, 2017

same issue here. Win 10, 382 drivers, 7x 1060, no OC. Every time after 2-3 minutes

edit. line is Ethminer -U -S eu1.ethermine.org:4444 -0 wallet.stratum --cuda-parallel-hash 4

@MichaelA2014
Copy link

MichaelA2014 commented Jul 8, 2017

I am wondering if this issue is related to something other then overclocking. I haven't changed the OC settings in a few days. Ethminer ran fine for three or four days non stop. Now I started getting this issue three or four times per day. The last time it happened I had a Windows prompt that some updates were available. I am wondering if it's related to Windows 10 instead of the code. Otherwise why would it run fine for several days in a row then all of a sudden start crashing.

Either that or a hardware failure (risers)

As a workaround it would be great if @chfast or @davilizh could program a restart function in case the crash is detected. So whenever we get this illegal operation crash the miner would close and restart on its own until permanent fix could be implemented

@derubm
Copy link

derubm commented Jul 8, 2017

when there was epoch change yesterday, the miner created a new dag. could this routine be used for the error ? Miner went to 0 mh / s , created a new dag, and ran fine after that... maybe a way to ship around the crash ? 0 mh/s for some seconds ->> use that routine to restart mining.

@ghost
Copy link

ghost commented Jul 9, 2017

i have the same issue, 48 hours straight mining without any kind of issues, then these errors kicked in.
now i deleted the DAG files, and started the miner again, we will see how it goes.
Win10, 4xEVGA 1070, driver 384.76, ethminer 0.11.0

EDIT: crashed after 1.5 hours

@babuloseo
Copy link

Lowering your mem overclocks to around 750 seems to make it more stable, and I haven't seen a crash yet.

@MichaelA2014
Copy link

I tried with stock clocks and with overclock and without. It is not depending on overclocking. I have no idea what causes the error because like @aiden1408 said sometimes it runs for 2-3 days without any issues and sometimes it crashes 5 minutes after start... Most frequently at night when I am asleep

@feracon
Copy link

feracon commented Jul 10, 2017

Getting the same issue. Usually happens between 1-12 hours and the program doesnt terminate prompting restart, it just hangs there doing nothing till you come and look.

image

Has there been any progress on this?

6x asus strix 1070s, asrock h81, win 10h, core -100, mem +500

@feracon
Copy link

feracon commented Jul 10, 2017

Does anyone have a fallback miner that they're using in the mean time while this one is being fixed? I was thinking about using Claymore but don't like the idea of it blasting my CPU to mine for the devs 24/7.

@evilny0
Copy link
Contributor

evilny0 commented Jul 10, 2017

The miner sometimes hangs even without displaying a CUDA error.

I do not think Claymore is using your CPU. It's switching pools, but still using your GPU.

@jimmykl
Copy link
Contributor

jimmykl commented Jul 10, 2017

@feracon @evilny0 Claymore v9.7 hashes almost as fast on nvidia now as ethminer. Of course there are still fees but if you want to use it as a fallback miner it's not as much of a loss as it used to be.

@jimmykl
Copy link
Contributor

jimmykl commented Jul 10, 2017

I know there's a lot of discussion here but I wonder if it should be closed and further discussion moved to the original issue for this bug #72?

@derubm
Copy link

derubm commented Jul 17, 2017

Windows only temporary fix

(Linux should be pretty easy to do)
https://github.com/derubm/Ethminer_Watchdog
workaround; puts the output to a logfile, runs in a loop (10 seconds sleeptime each round) untill "error" appears in the logfile, kills old process, restarts mining process.
Hint: i was not able to produce the error at the moment, so please comment on that thing ;)

@oleng
Copy link

oleng commented Jul 18, 2017

@jimmykl claymore 9.7 is around -1MH/s compared to ethminer in my case. is that also what you have? latest driver win 10

@NightsBest
Copy link

NightsBest commented Jul 19, 2017

I added a EVGA 1060 3GB mini to my rig last night and keep getting this error. I am trying the solution provided by @derubm all of my cards are running stock speeds for this test. I can't seem to run longer than 15mins without crashing out or freezing. I will keep this updated with what I find.

Update 1: unfortunately it looks like my new 1060 3GB came with the dreaded Micron Memory and that's why I keep crashing every 15 mins.

Update 2: I pulled the EVGA 1060 3GB mini and the Ethminer_Watchdog is running fast and flawlessly for me even with an aggressive OC.

@derubm
Copy link

derubm commented Jul 19, 2017

@NightsBest you dont need to run stock speeds, at least lower power limit - no need for running the cards at full wattage. 15 minutes crash cycle is very unstable. i added a 4 hour - auto restart to my batchfiles, keep that in mind, tha tmight be also a reason the miner restarts regular.

@Malapha
Copy link

Malapha commented Jul 21, 2017

Very nice Workaround for Windows:
https://github.com/orkblutt/MinerLamp

see. Issue 72 https://github.com/ethereum-mining/ethminer/issues/72

@DeadManWalkingTO
Copy link
Contributor

GTX1060 +150/+500/65%TDP @ 23-24MHs

  1. Try Update Drivers.
    Download and install the latests.

  2. Try Update Ethminer.
    Download (or beter build) the latest.

  3. Try use -U for CUDA devices.
    CUDA Hardware Test Launch Command:
    ethminer -RH -U -S eu1.ethermine.org:4444 -FS us1.ethermine.org:4444 -O 0x7013275311fc37ccc1e40193D75086293eCb43A4.issue94

  4. Try to change P2 State and Power managment mode.
    You can use NVidiaProfileInspectorDmW.
    For the best mining hashrate choose from sector "5 - Common":

    • CUDA - Force P2 State (Set to "Off")
    • Power managment mode (Set to "Prefer maximum performance")
  5. Try Tweak Win10.
    You can use Windows10MiningTweaksDmW (Solution for 100% CPU usage (Win10 - CUDA - OpenCL) #695).

  6. Try Optimize/Overclock GPUs.
    You can use MSI Afterburner for GPU OverClock/Optimize.

  7. Try use a WatchDog
    You can use ETHminerWatchDogDmW (Simple Script WatchDog #735).

Please feedback.
Thank you!

@chfast chfast closed this as completed May 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests