Memory leak when restarting thread #106

agrajag9 · 2017-09-07T13:00:52Z

First reported here: https://www.reddit.com/r/EtherMining/comments/6yfqxg/not_enough_gpu_memory_to_place_dag_you_cannot/

When a pool connection fails, we see an error in STDOUT that the miner thread is hanging and needs to be restarted. The thread is then restarted successfully, but without reinitializing the VRAM, as shown below (ellipsized for brevity):

DevFee: ETH: Stratum - connecting to 'us1.ethpool.org' <149.56.26.222> port 3333
...
ETH: Stratum - Cannot connect to us1.ethpool.org:3333
DevFee: ETH: Stratum - Failed to connect, retry in 20 sec...
...
Miner thread hangs, need to restart miner!

ͼ

ETH: 2 pools are specified
Main Ethereum pool is us1.ethermine.org:4444
At least 16 GB of Virtual Memory is required for multi-GPU systems
Make sure you defined GPU_MAX_ALLOC_PERCENT 100
Be careful with overclocking, use default clocks for first tests
Press "s" for current statistics, "0".."9" to turn on/off cards, "r" to reload pools, "e" or "d" to select current pool
OpenCL initializing...

AMD Cards available: 1
GPU #0: Ellesmere, 1461 MB available, 36 compute units
GPU #0 recognized as Radeon RX 480/580

This same card at first initialization is detected with 8169 MB available.

Hardware:
https://pcpartpicker.com/list/f6Cyhq

Version:
0ebb105bd3a6cdd35d94663eabf245e9 Claymore.s.Dual.Ethereum.Decred_Siacoin_Lbry_Pascal.AMD.NVIDIA.GPU.Miner.v9.8.-.LINUX.tar.gz
a919e303d2250f2719c7a28bfebd9a79 ethdcrminer64

Drivers:
AMDGPU-PRO Driver Version 17.30 for Ubuntu 16.04.3

OS:
Ubuntu 16.04.3 LTS

[ 2017-09-07T12:57:26 agrajag9@eth1.srv.a9development.com:/home/agrajag9 ]
$ uname -a
Linux eth1.srv.a9development.com 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

koenvandenberge · 2017-09-15T15:19:17Z

Same issue here on Windows 10

LionRelaxe · 2017-09-21T14:05:47Z

Same here, it's systematically going lower and lower at each restart-inducing error, on two separate miners.
Setup:
Xubuntu 16.04 LTS, one miner with 1x RX580 4gb, one miner with 3x RX580 4gb.
Noticed on Claymore 10.0, pretty sure it happened on 9.7 too.
AMD Drivers 17.10, and also with 17.30 with the ROCm kernel.

Note that pressing CTRL-C, and manually restarting miner restores the memory to maximum.

====
GPU0 t=57C fan=92%
DCR: 09/21/17-09:52:52 - New job from dcr-us.coinmine.pl:2222
DCR: 09/21/17-09:52:56 - New job from dcr-us.coinmine.pl:2222
ETH: Stratum - Cannot connect to us1.ethermine.org:4444
DevFee: ETH: Stratum - Failed to connect, retry in 20 sec...
ETH: 09/21/17-09:53:20 - New job from us2.ethermine.org:14444
ETH - Total Speed: 24.548 Mh/s, Total Shares: 6, Rejected: 0, Time: 00:21
ETH: GPU0 24.548 Mh/s
DCR - Total Speed: 1153.749 Mh/s, Total Shares: 35, Rejected: 1
DCR: GPU0 1153.749 Mh/s
GPU0 t=57C fan=92%
Miner thread hangs, need to restart miner!

��ͻ
� Claymore's Dual ETH + DCR/SC/LBC/PASC GPU Miner v10.0 �
��ͼ

ETH: 9 pools are specified
Main Ethereum pool is us2.ethermine.org:14444
DCR: 1 pool is specified
Main Decred pool is dcr-us.coinmine.pl:2222
At least 16 GB of Virtual Memory is required for multi-GPU systems
Make sure you defined GPU_MAX_ALLOC_PERCENT 100
Be careful with overclocking, use default clocks for first tests
Press "s" for current statistics, "0".."9" to turn on/off cards, "r" to reload pools, "e" or "d" to select current pool
OpenCL initializing...

AMD Cards available: 1
GPU #0: Ellesmere, 1714 MB available, 36 compute units
GPU #0 recognized as Radeon RX 480/580
POOL/SOLO version
GPU #0: algorithm ASM
No NVIDIA CUDA GPUs detected.
Total cards: 1
AMD ADL library not found.
ETH: Stratum - connecting to 'us2.ethermine.org' <45.79.103.105> port 14444
DUAL MINING MODE ENABLED: ETHEREUM+DECRED
ETH: eth-proxy stratum mode
"-allpools" option is set, default pools can be used for devfee, check "Readme" file for details.
Watchdog enabled
Remote management (READ-ONLY MODE) is enabled on port 3333

costinh · 2017-11-10T04:32:55Z

I'm starting to have this issue on all of my miners, did you guys find a workaround?

JusCallMeRico · 2017-11-14T17:07:07Z

I'm having same problem. Manually relaunching miner does restore memory to max.. but only runs for a few hours at best...

Running 3 ASUS RX 570 ROG Strix OC with 1112 cclock and 2000 mclock; in total rig only pulling 420W so don't think this is thermal. Running Xubuntu and Claymore V10

Glad I'm not alone.. sad there doesn't seem to be a fix at the moment.

LionRelaxe · 2017-11-14T17:38:13Z

I've found a workaround.
Usually, Claymore enter this states, then retry-fail-"take more memory"-retry-refail-repeat.
Closing claymore (CTRL-C) before the computer is totally jammed works, and free the VRAM.
Restarting Claymore works.

My workaround is to use the -r 1 option, forcing Claymore to close. You can reboot if you wish. I invoke claymore in a bash script with a forever loop, forcing it to restart on closure.
So when the snag hits, Claymore kills itself and the script restarts it.
Hope this helps.

agrajag9 · 2017-11-14T18:00:17Z

Yes, killing and restarting the process resolves the issue temporarily as the kernel frees the memory once the PID no longer needs it. However this is not a viable long-term solution as it doesn't effectively mitigate the memory leak when creating the new process.

In order to resolve the memory leak problem, when the code enters a failed state, it should exit with a non-0 return value. This is standard procedure for applications that enter unrecoverable failed states. Although in my particular case the memory leak is recoverable, in other situations where the GPU is unresponsive it may not be recoverable. As such, if the situations are not capable of being handled independently, then they should be handled as the worst-case scenario (unresponsive hardware).

The non-0 return value also allows the parent process (e.g. a script) to effectively handle the failure itself.

imperialgames · 2017-11-24T20:15:36Z

Do you guys mine only eth by chance? we are having the same problem (available memory decreasing over time until i can't assign dag file).

JusCallMeRico · 2017-12-02T06:47:41Z

Yes ETH only... tbh switched to ethOS with exact same settings and it ran faster and completely stable... this was about two weeks ago and it's still stable on ethOS getting ready to add a couple more cards

agrajag9 · 2017-12-13T15:34:08Z

I am mining ETH only, but I suspect the leak persists across dual mining as well since it appears to be related to how ETH-mining threads are killed.

It sure would be nice if @nanopool would show up in this thread. This bug should be fixable with a simple destructor update for the thread class.

Mr10001 · 2017-12-23T13:42:14Z

I have this problem on EthOS as well.

mrsags · 2017-12-23T14:22:48Z

Increase virtual memory to match memory of ALL cards. I have 8x 4gb and 1 8gb. Virtual memory = 40GB ( set at 45gb for Windows program cache). This worked :)

ghost · 2017-12-31T18:02:11Z

This is not a virtual memory related issue, the best could be a driver or memory errors clogging up with time. Sometimes this occurs immediately after restarting a miner, and other times it takes one or two days for it to shit the bed.

mrsags · 2018-01-01T14:17:04Z

Have any tried the fix and repeated the error? Also, make sure you DON’T all “lock pages in memory” under group policy. This cause tons of problem across mining all kinds of coins...

mrsags · 2018-01-01T14:18:27Z

My fix is to prevent the hanging and restarting in the first place..

YasserGomaa · 2018-01-03T08:26:28Z

The same problem here with ETHOS Any HELP ?

imperialgames · 2018-01-03T08:41:27Z

only way we found on ethos is to remove the dev fee with
claymore=flags -r 1 -nofee

YasserGomaa · 2018-01-03T08:49:50Z

Where to type this line ?

imperialgames · 2018-01-03T08:50:39Z

in either your local or remote config file.

YasserGomaa · 2018-01-03T08:52:37Z

so after adding the line what will happen ?

JusCallMeRico · 2018-01-03T08:53:22Z

Going to attempt this now with fresh ethOS images; are we sure it's not supposed to be in claymore.stub.conf?

YasserGomaa · 2018-01-03T08:54:14Z

this is my old config
flags --cl-global-work 8192 --farm-recheck 200

imperialgames · 2018-01-03T08:55:53Z

the line says, claymore only flags = (reboot 1 if crash) and (no fee to dev). the problem is that the dev fee pool keep disconnecting. that's why the memory does not get flushed.

imperialgames · 2018-01-03T08:56:35Z

keep the regular flags line

YasserGomaa · 2018-01-03T08:56:50Z

so it should be like this "flags -r 1 -nofee" or "claymore=flags -r 1 -nofee"

imperialgames · 2018-01-03T08:57:16Z

claymore=flags -r 1 -nofee
cause it will apply only when you use the globalminer claymore

YasserGomaa · 2018-01-03T08:59:47Z

globaldriver amdgpu
maxgputemp 85
globalminer claymore
stratumproxy enabled
globalfan 85
proxywallet Mywallet
proxypool1 eu1.ethermine.org:4444
globalcore 1400
globalmem 2000
globalfan 90
globalpowertune 4
flags --cl-global-work 8192 --farm-recheck 200
claymore=flags -r 1 -nofee

YasserGomaa · 2018-01-03T09:00:04Z

is this good configuration ?

imperialgames · 2018-01-03T09:01:28Z

maxgputemp 90
stratumproxy enabled

#ETH POOL
globalminer claymore
proxypool1 us1.ethermine.org:14444
proxywallet WALLET
dualminer enabled
dualminer-coin lbry
dualminer-pool lbry.suprnova.cc:6256
dualminer-wallet WALLET
claymore=flags -r 1 -nofee 1 -mport 3333 -allcoins 1
flags --cl-global-work 16384 --farm-recheck 200

that's mine

JusCallMeRico · 2018-01-03T11:00:02Z

imperialgames:
"the line says, claymore only flags = (reboot 1 if crash) and (no fee to dev). the problem is that the dev fee pool keep disconnecting. that's why the memory does not get flushed."

I had been running with -allpools 1 but am going to try taking that out... also had been leaving -allcoins 1 out as well... not sure if it'll help but we'll see

YasserGomaa · 2018-01-03T15:00:49Z

So is there anyway to mine etherum without claymore ?

JusCallMeRico · 2018-01-03T15:23:16Z

Yup, Etherminer.... slower but honestly slow and steady may just win the race

YasserGomaa · 2018-01-03T23:13:48Z

Dears i found something strange however i changed the user name and password of ethos however i found strange commands have been typed in the shell the followed link has been added to my pc and scripts from it ran to my system https://github.com/pooler/cpuminer

do this mean that iam hacked hhhhh

YasserGomaa · 2018-01-04T00:17:32Z

ETH+LBRY how to dual mine ?

what to type in claymore.stub.conf ?

imperialgames · 2018-01-04T00:30:21Z

Check the settings i posted above you do it in the local or remote.conf

YasserGomaa · 2018-01-04T00:34:17Z

so what about the claymore.stub.conf ?

imperialgames · 2018-01-04T00:44:24Z

You dont have to touch it

YasserGomaa · 2018-01-06T13:04:10Z

imperialgames what is the default for claymore.stub.conf ?

LionRelaxe mentioned this issue Sep 21, 2017

Miner thread is hanging, Baffin memory is decreasing continually after few hours of mining and miner exists with an ERROR (Happened last 2 days) #82

Open

Memory leak when restarting thread #106

Memory leak when restarting thread #106

Comments

agrajag9 commented Sep 7, 2017

koenvandenberge commented Sep 15, 2017

LionRelaxe commented Sep 21, 2017 • edited Loading

costinh commented Nov 10, 2017

JusCallMeRico commented Nov 14, 2017

LionRelaxe commented Nov 14, 2017

agrajag9 commented Nov 14, 2017 • edited Loading

imperialgames commented Nov 24, 2017

JusCallMeRico commented Dec 2, 2017

agrajag9 commented Dec 13, 2017

Mr10001 commented Dec 23, 2017

mrsags commented Dec 23, 2017

ghost commented Dec 31, 2017

mrsags commented Jan 1, 2018

mrsags commented Jan 1, 2018

YasserGomaa commented Jan 3, 2018

imperialgames commented Jan 3, 2018

YasserGomaa commented Jan 3, 2018

imperialgames commented Jan 3, 2018

YasserGomaa commented Jan 3, 2018

JusCallMeRico commented Jan 3, 2018

YasserGomaa commented Jan 3, 2018 • edited Loading

imperialgames commented Jan 3, 2018

imperialgames commented Jan 3, 2018

YasserGomaa commented Jan 3, 2018

imperialgames commented Jan 3, 2018

YasserGomaa commented Jan 3, 2018

YasserGomaa commented Jan 3, 2018

imperialgames commented Jan 3, 2018

JusCallMeRico commented Jan 3, 2018

YasserGomaa commented Jan 3, 2018

JusCallMeRico commented Jan 3, 2018

YasserGomaa commented Jan 3, 2018

YasserGomaa commented Jan 4, 2018

imperialgames commented Jan 4, 2018

YasserGomaa commented Jan 4, 2018

imperialgames commented Jan 4, 2018

YasserGomaa commented Jan 6, 2018

LionRelaxe commented Sep 21, 2017 •

edited

Loading

agrajag9 commented Nov 14, 2017 •

edited

Loading

YasserGomaa commented Jan 3, 2018 •

edited

Loading