Skip to content
This repository has been archived by the owner on May 18, 2023. It is now read-only.

Memory leak when restarting thread #106

Open
agrajag9 opened this issue Sep 7, 2017 · 37 comments
Open

Memory leak when restarting thread #106

agrajag9 opened this issue Sep 7, 2017 · 37 comments

Comments

@agrajag9
Copy link

agrajag9 commented Sep 7, 2017

First reported here: https://www.reddit.com/r/EtherMining/comments/6yfqxg/not_enough_gpu_memory_to_place_dag_you_cannot/

When a pool connection fails, we see an error in STDOUT that the miner thread is hanging and needs to be restarted. The thread is then restarted successfully, but without reinitializing the VRAM, as shown below (ellipsized for brevity):

DevFee: ETH: Stratum - connecting to 'us1.ethpool.org' <149.56.26.222> port 3333
...
ETH: Stratum - Cannot connect to us1.ethpool.org:3333
DevFee: ETH: Stratum - Failed to connect, retry in 20 sec...
...
Miner thread hangs, need to restart miner!

ͼ

ETH: 2 pools are specified
Main Ethereum pool is us1.ethermine.org:4444
At least 16 GB of Virtual Memory is required for multi-GPU systems
Make sure you defined GPU_MAX_ALLOC_PERCENT 100
Be careful with overclocking, use default clocks for first tests
Press "s" for current statistics, "0".."9" to turn on/off cards, "r" to reload pools, "e" or "d" to select current pool
OpenCL initializing...

AMD Cards available: 1
GPU #0: Ellesmere, 1461 MB available, 36 compute units
GPU #0 recognized as Radeon RX 480/580

This same card at first initialization is detected with 8169 MB available.

Hardware:
https://pcpartpicker.com/list/f6Cyhq

Version:
0ebb105bd3a6cdd35d94663eabf245e9 Claymore.s.Dual.Ethereum.Decred_Siacoin_Lbry_Pascal.AMD.NVIDIA.GPU.Miner.v9.8.-.LINUX.tar.gz
a919e303d2250f2719c7a28bfebd9a79 ethdcrminer64

Drivers:
AMDGPU-PRO Driver Version 17.30 for Ubuntu 16.04.3

OS:
Ubuntu 16.04.3 LTS

[ 2017-09-07T12:57:26 agrajag9@eth1.srv.a9development.com:/home/agrajag9 ]
$ uname -a
Linux eth1.srv.a9development.com 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
@koenvandenberge
Copy link

Same issue here on Windows 10

@LionRelaxe
Copy link

LionRelaxe commented Sep 21, 2017

Same here, it's systematically going lower and lower at each restart-inducing error, on two separate miners.
Setup:
Xubuntu 16.04 LTS, one miner with 1x RX580 4gb, one miner with 3x RX580 4gb.
Noticed on Claymore 10.0, pretty sure it happened on 9.7 too.
AMD Drivers 17.10, and also with 17.30 with the ROCm kernel.

Note that pressing CTRL-C, and manually restarting miner restores the memory to maximum.

====
GPU0 t=57C fan=92%
DCR: 09/21/17-09:52:52 - New job from dcr-us.coinmine.pl:2222
DCR: 09/21/17-09:52:56 - New job from dcr-us.coinmine.pl:2222
ETH: Stratum - Cannot connect to us1.ethermine.org:4444
DevFee: ETH: Stratum - Failed to connect, retry in 20 sec...
ETH: 09/21/17-09:53:20 - New job from us2.ethermine.org:14444
ETH - Total Speed: 24.548 Mh/s, Total Shares: 6, Rejected: 0, Time: 00:21
ETH: GPU0 24.548 Mh/s
DCR - Total Speed: 1153.749 Mh/s, Total Shares: 35, Rejected: 1
DCR: GPU0 1153.749 Mh/s
GPU0 t=57C fan=92%
Miner thread hangs, need to restart miner!

����������������������������������������������������������������ͻ
� Claymore's Dual ETH + DCR/SC/LBC/PASC GPU Miner v10.0 �
����������������������������������������������������������������ͼ

ETH: 9 pools are specified
Main Ethereum pool is us2.ethermine.org:14444
DCR: 1 pool is specified
Main Decred pool is dcr-us.coinmine.pl:2222
At least 16 GB of Virtual Memory is required for multi-GPU systems
Make sure you defined GPU_MAX_ALLOC_PERCENT 100
Be careful with overclocking, use default clocks for first tests
Press "s" for current statistics, "0".."9" to turn on/off cards, "r" to reload pools, "e" or "d" to select current pool
OpenCL initializing...

AMD Cards available: 1
GPU #0: Ellesmere, 1714 MB available, 36 compute units
GPU #0 recognized as Radeon RX 480/580
POOL/SOLO version
GPU #0: algorithm ASM
No NVIDIA CUDA GPUs detected.
Total cards: 1
AMD ADL library not found.
ETH: Stratum - connecting to 'us2.ethermine.org' <45.79.103.105> port 14444
DUAL MINING MODE ENABLED: ETHEREUM+DECRED
ETH: eth-proxy stratum mode
"-allpools" option is set, default pools can be used for devfee, check "Readme" file for details.
Watchdog enabled
Remote management (READ-ONLY MODE) is enabled on port 3333

@costinh
Copy link

costinh commented Nov 10, 2017

I'm starting to have this issue on all of my miners, did you guys find a workaround?

@JusCallMeRico
Copy link

I'm having same problem. Manually relaunching miner does restore memory to max.. but only runs for a few hours at best...

Running 3 ASUS RX 570 ROG Strix OC with 1112 cclock and 2000 mclock; in total rig only pulling 420W so don't think this is thermal. Running Xubuntu and Claymore V10

Glad I'm not alone.. sad there doesn't seem to be a fix at the moment.

@LionRelaxe
Copy link

I've found a workaround.
Usually, Claymore enter this states, then retry-fail-"take more memory"-retry-refail-repeat.
Closing claymore (CTRL-C) before the computer is totally jammed works, and free the VRAM.
Restarting Claymore works.

My workaround is to use the -r 1 option, forcing Claymore to close. You can reboot if you wish. I invoke claymore in a bash script with a forever loop, forcing it to restart on closure.
So when the snag hits, Claymore kills itself and the script restarts it.
Hope this helps.

@agrajag9
Copy link
Author

agrajag9 commented Nov 14, 2017

Yes, killing and restarting the process resolves the issue temporarily as the kernel frees the memory once the PID no longer needs it. However this is not a viable long-term solution as it doesn't effectively mitigate the memory leak when creating the new process.

In order to resolve the memory leak problem, when the code enters a failed state, it should exit with a non-0 return value. This is standard procedure for applications that enter unrecoverable failed states. Although in my particular case the memory leak is recoverable, in other situations where the GPU is unresponsive it may not be recoverable. As such, if the situations are not capable of being handled independently, then they should be handled as the worst-case scenario (unresponsive hardware).

The non-0 return value also allows the parent process (e.g. a script) to effectively handle the failure itself.

@imperialgames
Copy link

Do you guys mine only eth by chance? we are having the same problem (available memory decreasing over time until i can't assign dag file).

@JusCallMeRico
Copy link

Yes ETH only... tbh switched to ethOS with exact same settings and it ran faster and completely stable... this was about two weeks ago and it's still stable on ethOS getting ready to add a couple more cards

@agrajag9
Copy link
Author

I am mining ETH only, but I suspect the leak persists across dual mining as well since it appears to be related to how ETH-mining threads are killed.

It sure would be nice if @nanopool would show up in this thread. This bug should be fixable with a simple destructor update for the thread class.

@Mr10001
Copy link

Mr10001 commented Dec 23, 2017

I have this problem on EthOS as well.

@mrsags
Copy link

mrsags commented Dec 23, 2017

Increase virtual memory to match memory of ALL cards. I have 8x 4gb and 1 8gb. Virtual memory = 40GB ( set at 45gb for Windows program cache). This worked :)

@ghost
Copy link

ghost commented Dec 31, 2017

This is not a virtual memory related issue, the best could be a driver or memory errors clogging up with time. Sometimes this occurs immediately after restarting a miner, and other times it takes one or two days for it to shit the bed.

@mrsags
Copy link

mrsags commented Jan 1, 2018

Have any tried the fix and repeated the error? Also, make sure you DON’T all “lock pages in memory” under group policy. This cause tons of problem across mining all kinds of coins...

@mrsags
Copy link

mrsags commented Jan 1, 2018

My fix is to prevent the hanging and restarting in the first place..

@YasserGomaa
Copy link

The same problem here with ETHOS Any HELP ?

@imperialgames
Copy link

only way we found on ethos is to remove the dev fee with
claymore=flags -r 1 -nofee

@YasserGomaa
Copy link

Where to type this line ?

@imperialgames
Copy link

in either your local or remote config file.

@YasserGomaa
Copy link

so after adding the line what will happen ?

@JusCallMeRico
Copy link

Going to attempt this now with fresh ethOS images; are we sure it's not supposed to be in claymore.stub.conf?

@YasserGomaa
Copy link

YasserGomaa commented Jan 3, 2018

this is my old config
flags --cl-global-work 8192 --farm-recheck 200

@imperialgames
Copy link

the line says, claymore only flags = (reboot 1 if crash) and (no fee to dev). the problem is that the dev fee pool keep disconnecting. that's why the memory does not get flushed.

@imperialgames
Copy link

keep the regular flags line

@YasserGomaa
Copy link

so it should be like this "flags -r 1 -nofee" or "claymore=flags -r 1 -nofee"

@imperialgames
Copy link

claymore=flags -r 1 -nofee
cause it will apply only when you use the globalminer claymore

@YasserGomaa
Copy link

globaldriver amdgpu
maxgputemp 85
globalminer claymore
stratumproxy enabled
globalfan 85
proxywallet Mywallet
proxypool1 eu1.ethermine.org:4444
globalcore 1400
globalmem 2000
globalfan 90
globalpowertune 4
flags --cl-global-work 8192 --farm-recheck 200
claymore=flags -r 1 -nofee

@YasserGomaa
Copy link

is this good configuration ?

@imperialgames
Copy link

maxgputemp 90
stratumproxy enabled

#ETH POOL
globalminer claymore
proxypool1 us1.ethermine.org:14444
proxywallet WALLET
dualminer enabled
dualminer-coin lbry
dualminer-pool lbry.suprnova.cc:6256
dualminer-wallet WALLET
claymore=flags -r 1 -nofee 1 -mport 3333 -allcoins 1
flags --cl-global-work 16384 --farm-recheck 200

that's mine

@JusCallMeRico
Copy link

imperialgames:
"the line says, claymore only flags = (reboot 1 if crash) and (no fee to dev). the problem is that the dev fee pool keep disconnecting. that's why the memory does not get flushed."

I had been running with -allpools 1 but am going to try taking that out... also had been leaving -allcoins 1 out as well... not sure if it'll help but we'll see

@YasserGomaa
Copy link

So is there anyway to mine etherum without claymore ?

@JusCallMeRico
Copy link

Yup, Etherminer.... slower but honestly slow and steady may just win the race

@YasserGomaa
Copy link

Dears i found something strange however i changed the user name and password of ethos however i found strange commands have been typed in the shell the followed link has been added to my pc and scripts from it ran to my system https://github.com/pooler/cpuminer

do this mean that iam hacked hhhhh

@YasserGomaa
Copy link

ETH+LBRY how to dual mine ?

what to type in claymore.stub.conf ?

@imperialgames
Copy link

Check the settings i posted above you do it in the local or remote.conf

@YasserGomaa
Copy link

so what about the claymore.stub.conf ?

@imperialgames
Copy link

You dont have to touch it

@YasserGomaa
Copy link

imperialgames what is the default for claymore.stub.conf ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants