unreasonably high memory usage (without crash) and won't shut down #10821

iFA88 · 2019-06-30T14:33:31Z

Greetings, sadly my Parity-Ethereum/v2.4.6-stable-94164e1-20190514/x86_64-linux-gnu/rustc1.34.1 node eats unreasonably high memory.
Node log and process statistics in CSV : https://www.fusionsolutions.io/doc/memlog.tar.gz

Start parameters are:

--ipc-apis all --reserved-peers /own/config/archiveEthNode.txt --no-serve-light --no-periodic-snapshot --jsonrpc-allow-missing-blocks --no-persistent-txqueue --jsonrpc-server-threads 8 --ipc-path=/own/sockets/ethNode.ipc --min-gas-price=10000000 --tx-queue-mem-limit=4096 --tx-queue-size=256000 --reseal-on-txs=all --force-sealing --base-path "/mnt/node-1/eth" --rpcport 8548 --port 30306 --no-ws --no-secretstore --cache-size 4096 --log-file /own/log/nodes/eth/parity_eth_$DATE.log"

The memory usage will not be higher as 12gb.

On 16:20:20 have killed the process with KILLSIG, this is the only way that I can shut down the process.

I glad help with any trace parameters or statistics.

The text was updated successfully, but these errors were encountered:

dvdplm · 2019-06-30T14:54:19Z

How did you collect the memory stats showed in the csv?

iFA88 · 2019-06-30T14:59:30Z

Like in python:

import psutil
process = psutil.Process(PID)
print(process.memory_info().rss)

Gives the same as the htop:

dvdplm · 2019-06-30T15:36:15Z

So it's RSS, perfect! :)

The number of pending txs is pretty high, is that a normal amount in your setup?

iFA88 · 2019-06-30T15:37:25Z

Yeah i'm parsing pending transactions to my DB. Check my start parameters :)

dvdplm · 2019-06-30T15:41:44Z

Other than staying in sync, what is the node doing? I.e. what kind of RPC traffic is it used for?

iFA88 · 2019-06-30T16:26:37Z

I'm using for every new block this RPC's: trace_block eth_getBlockByNumber eth_getUncleByBlockHashAndIndex eth_blockNumber eth_getTransactionReceipt.
And for pending transactions every minute: parity_allTransactionHashes eth_getTransactionByHash.

dvdplm · 2019-06-30T21:09:40Z

I've been running a recent master build with your params now for ~6h and memory usage seems stable. While it's possible that this has been fixed in master, it is more probable that the leak is somewhere in the RPC layer. I need to set up some kind of load testing script to debug this further.

dvdplm · 2019-06-30T21:11:31Z

@iFA88 Do you have the possibility to confirm my findings by running a node without RPC traffic, just to check that it is indeed the RPC layer causing issues? Also, if you have a load testing script or something similar already written, that'd be helpful too ofc. Thanks!

dvdplm · 2019-07-01T13:33:07Z

In the log I see you must be running with --tracing on but that's not present in the the startup params from the original ticket. Are you using a config.toml file too? Can you post the full config please?

iFA88 · 2019-07-01T13:33:21Z

I have run the node without any RPC call, but the memory has increased continuously. There is the log, but please ignore the peer and pending TX values:

https://www.fusionsolutions.io/doc/memlog2.tar.gz

Without any RPC call the shutdown works very fast.

iFA88 · 2019-07-01T13:36:15Z

In the log I see you must be running with --tracing on but that's not present in the the startup params from the original ticket. Are you using a config.toml file too? Can you post the full config please?

You have right, tracing was ON while I synced from scratch, and after at it is automatic enabled.
No, i don't use any configuration file, only the parameters what I given in the first ticket.

dvdplm · 2019-07-01T13:51:25Z

Ok, not a problem. It explains why I couldn't repeat it. I'd have to slow sync the whole chain to repeat now I think so I'm going to try using the Goerli testnet and see if I can see the issue there. If you have the means to do so it would be great if you could try on Goerli as well using 2.4.x.

Thanks!

iFA88 · 2019-07-01T13:57:18Z

@dvdplm Sadly that not, but if you wish I can set some trace parameter.

dvdplm · 2019-07-01T19:44:11Z

Is your node synched?

iFA88 · 2019-07-02T07:40:05Z

Is your node synched?

Ofc, and you have seen that in the logs.

iFA88 · 2019-07-02T07:56:16Z

18.5h run:
https://www.fusionsolutions.io/doc/memlog3.tar.gz

dvdplm · 2019-07-02T08:49:52Z

Yeah, still synching Kovan here with traces. Goerli is synched and after 12+ hours show no signs of memory leaks.

ordian · 2019-07-02T08:57:55Z

@dvdplm are you testing this on macOS? The problem could be related to heapsize, which uses jemallocator only on macOS.
@iFA88 could you test it with a recent master build? We've removed heapsize in #10432.

dvdplm · 2019-07-02T09:02:57Z

@ordian yes, and yes it is possible that this is a platform issue, but we'll see. For now I'm trying to rule out the obvious stuff. I'm not sure how long it takes to slow-sync mainnet with tracing on, but judging how long it takes on Kovan I think it could take weeks so I was hoping to find an easier way to reproduce this.

iFA88 · 2019-07-02T10:01:44Z

@ordian I will upgrade my parity to https://github.com/paritytech/parity-ethereum/releases/tag/v2.4.9 I see that this build has the commit.

I need to KILLSIG the process, because they dont shut down.
I have an another node on the classic chain which are not infected with the issue: (Same parity version, but it is archive node with trace)

ordian · 2019-07-02T10:05:34Z

@iFA88 I don't think so, #10432 wasn't backported to stable and beta.

iFA88 · 2019-07-02T10:18:35Z

@ordian Thats not the commit?:
v2.4.9...master

Sorry when I'm wrong.

ordian · 2019-07-02T10:21:51Z

@iFA88 you're comparing v2.4.9 with master, so it shows you the difference, i.e. the commits that are in master and not in 2.4.9.

iFA88 · 2019-07-02T10:28:32Z

@ordian yes, i was wrong! If you can build the current master branch for linux, then I can use that, sadly i don't have any build tools now.

dvdplm · 2019-07-03T06:47:57Z

@iFA88 I think you can download a recent nightly from here (click the "Download" button on the right). It would be great if you could repeat the problem using that.

An update on my end: Goerli is synched and does not leak any memory. Kovan is still synching (and has been really stable, but that is irrelevant here).

iFA88 · 2019-07-03T08:04:35Z

@dvdplm Alright, I ran now that binary. Idk why, but the classic chain works flawless.

I have a trace about the shutdown, please look at it:
https://www.fusionsolutions.io/doc/shutdownerror.tar.gz

dvdplm · 2019-07-03T08:13:38Z

@dvdplm Alright, I ran now that binary. Idk why, but the classic chain works flawless.

You mean running with --chain classic using the master build does not leak memory? Or using stable?

I have a trace about the shutdown, please look at it: https://www.fusionsolutions.io/doc/shutdownerror.tar.gz

That is 2.4.6 so the latest fixes for shutdown problems are not included. Best would be to debug this further using the latest releases (or master builds). For shutdown issues it'd be good to enable shutdown=trace level logging. I don't think logging is going to provide enough info here, but best keep it on.

iFA88 · 2019-07-03T08:17:56Z

@dvdplm Yes, i have a classic node which runs in archive trace mode and the RES usage does not goes up as ~1.3gb, even not with Parity-Ethereum/v2.4.6-stable-94164e1-20190514/x86_64-linux-gnu/rustc1.34.1 or Parity-Ethereum/v2.4.9-stable-691580c-20190701/x86_64-linux-gnu/rustc1.35.0

I let the shutdown trace parameter on now and running Parity-Ethereum/v2.6.0-nightly-b4af8df-20190702/x86_64-linux-gnu/rustc1.35.0.

iFA88 · 2019-07-03T11:35:12Z

Sadly the new parity (Parity-Ethereum/v2.6.0-nightly-b4af8df-20190702/x86_64-linux-gnu/rustc1.35.0) doesn't solved the memory issue:
https://www.fusionsolutions.io/doc/memlog3.tar.gz

dvdplm · 2019-07-03T11:38:25Z

Parity-Ethereum/v2.6.0-nightly

Ok, and just to be clear: you ran it with mainnet with tracing on just like before, same settings except for shutdown logging?

Did you also experience shutdown problems with Parity-Ethereum/v2.6.0-nightly?

iFA88 · 2019-07-03T11:56:56Z

@dvdplm yes and yes :(

dvdplm · 2019-07-03T12:51:11Z

Ok, so @ordian, this that tells us that this is not related to jemalloc, do you agree?

iFA88 · 2019-07-04T08:59:10Z

The Parity-Ethereum/v2.6.0-nightly-b4af8df-20190702/x86_64-linux-gnu/rustc1.35.0 has been crashed at the night, the process runs, I can communicate trough RPC, they current block height is 8080446, so the syncing has been stopped. There was no incident even in kernel log or syslog. Free space was more than enough. I switch back to Parity-Ethereum/v2.4.9-stable-691580c-20190701/x86_64-linux-gnu/rustc1.35.0.
Last log:

2019-07-04 00:42:30  Verifier #7 INFO import  Imported #8081174 0xc041…70bb (92 txs, 7.64 Mgas, 78 ms, 24.82 KiB)
2019-07-04 00:42:33  Verifier #8 INFO import  Imported #8081175 0x38ca…d29f (50 txs, 7.98 Mgas, 73 ms, 17.97 KiB)
2019-07-04 00:42:43  IO Worker #0 INFO import    35/50 peers    208 MiB chain  145 MiB db  0 bytes queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2019-07-04 00:42:43  Verifier #6 INFO import  Import

dvdplm · 2019-07-04T09:49:28Z

Ouch that doesn't sound good. When you say "crashed" do you mean that the process hung in some way or did it actually crash? I mean, you write that you could still query the node over RPC right?

I am still synching mainnet, am about half-way through but I anticipate it'll take a long while still.

I wonder if there's anyway you could share your database with us to speed up the investigation?

iFA88 · 2019-07-04T10:03:46Z

Ouch that doesn't sound good. When you say "crashed" do you mean that the process hung in some way or did it actually crash? I mean, you write that you could still query the node over RPC right?

I called it crashed, because the logging and the syncing has stopped. Maybe the main thread has been hanged?! Yeah i have queried the block number to check for the sync works or not.

I wonder if there's anyway you could share your database with us to speed up the investigation?

I would glad to help, but i don't see any possibility how can we speed up this. If you wish i can set some parameter for the party. If you have any ideas share it.

iFA88 · 2019-07-14T12:55:56Z

There is anything what I can do? The two parity which runs on main network eats my all of my RAM after 1-2 days. Daily restart is not the best solution :(

andrewheadricke · 2019-07-15T03:06:31Z

I am facing similar issues with latest Parity releases. I used to be able to sync easily and run other applications, however now after an hour or two of syncing consumes all my RAM and running other applications is not possible, even Parity alone causing the computer to lock up.

Parity used to be faster to sync and lighter on the RAM than Geth, but now I can control the RAM usage in Geth, so am looking to switch back.

iFA88 · 2019-08-03T21:29:24Z

I suggest the shutdown problem comes true when i send a shutdown signal to the node, but the node still accepts RPC calls and that prevents the shutdown process..

iFA88 · 2019-08-11T09:12:48Z

I have discovered, when I don't use the --cache-size parameter, then the parity RES usage doesn't goes up as 2gb. When I use that parameter with ANY number then the memory usage goes up to 14GB (probably more but i don't have more for free) in 24 hours.

iFA88 · 2019-09-14T09:00:31Z

Hey @dvdplm ! Can you please check my last comment with the --cache-size issue? Thank you!

dvdplm · 2019-09-14T17:04:56Z

@iFA88 apologies for the late answer. I have not been able to reproduce the problem with ram usage and --cache-size and I have tried many different versions and chains. On my machine, running macOS and 32Gb, memory usage is very stable. I know this is kind of useless and it's much more interesting to see what happens on a machine with less ram.
What happens on your end if you run with the other caching-related switches, i.e. this is what I am running currently: --cache-size-db=32096 --cache-size-blocks=2048 --cache-size-queue=32512 --cache-size-state=16096 (don't read too much into the specific numbers, I mostly picked them at random tbh). Do you still see RES ballooning after a while?

iFA88 · 2019-09-14T17:27:33Z

@dvdplm Do we have any command to get cache statuses (usable/limit) or any debug level/trace?

dvdplm · 2019-09-14T17:40:07Z

No, not that I know of. It would be quite useful.

iFA88 · 2019-09-14T17:41:40Z

I'm running now with --cache-size-blocks=128 --cache-size-db=2048 parameters. I dont use --cache-size now.

iFA88 · 2019-09-16T09:16:13Z

The node uses now 9150mb RES after 2 days with the above parameters.

dvdplm · 2019-09-16T17:35:43Z

So I think I'm seeing something similar here: omitting the --cache* parameters seem to keep memory usage within limits. What I see here is that the sync speed slows down significantly as memory usage goes up (after restart the sync speed goes back up). So until we fix the bug I'd say the best work-around seems to be to avoid using those params.

iFA88 · 2019-09-16T17:53:06Z

I can not measure the importing speed because every block has very different EVM calls. I will try now using the --cache-size again to check the issue.

iFA88 · 2019-09-21T13:42:57Z

Ok, It seems the issue is somehow solved, When I'm using --cache-size the process RES usage doens't goes up 7-9 GB ( with --cache-size 2048 ). If i face with this issue again i will reopen the thread. Thanks for the support!

iFA88 mentioned this issue Jul 4, 2019

Archive node loses peers (stop syncing blocks) #10724

Closed

grbIzl mentioned this issue Jul 8, 2019

Treat blocks, that already downloaded, as synced #10864

Closed

jam10o-new added this to the 2.7 milestone Jul 11, 2019

jam10o-new added F7-footprint 🐾 An enhancement to provide a smaller (system load, memory, network or disk) footprint. M4-core ⛓ Core client code / Rust. labels Jul 15, 2019

iFA88 closed this as completed Sep 21, 2019

grbIzl mentioned this issue Nov 15, 2019

Treat blocks, that already downloaded, as synced #11264

Merged

unreasonably high memory usage (without crash) and won't shut down #10821

unreasonably high memory usage (without crash) and won't shut down #10821

Comments

iFA88 commented Jun 30, 2019

dvdplm commented Jun 30, 2019

iFA88 commented Jun 30, 2019 • edited

dvdplm commented Jun 30, 2019

iFA88 commented Jun 30, 2019 • edited

dvdplm commented Jun 30, 2019

iFA88 commented Jun 30, 2019 • edited

dvdplm commented Jun 30, 2019

dvdplm commented Jun 30, 2019

dvdplm commented Jul 1, 2019

iFA88 commented Jul 1, 2019

iFA88 commented Jul 1, 2019

dvdplm commented Jul 1, 2019

iFA88 commented Jul 1, 2019

dvdplm commented Jul 1, 2019

iFA88 commented Jul 2, 2019

iFA88 commented Jul 2, 2019

dvdplm commented Jul 2, 2019

ordian commented Jul 2, 2019

dvdplm commented Jul 2, 2019

iFA88 commented Jul 2, 2019 • edited

ordian commented Jul 2, 2019

iFA88 commented Jul 2, 2019 • edited

ordian commented Jul 2, 2019

iFA88 commented Jul 2, 2019

dvdplm commented Jul 3, 2019

iFA88 commented Jul 3, 2019

dvdplm commented Jul 3, 2019

iFA88 commented Jul 3, 2019

iFA88 commented Jul 3, 2019

dvdplm commented Jul 3, 2019

iFA88 commented Jul 3, 2019

dvdplm commented Jul 3, 2019

iFA88 commented Jul 4, 2019

dvdplm commented Jul 4, 2019

iFA88 commented Jul 4, 2019

iFA88 commented Jul 14, 2019

andrewheadricke commented Jul 15, 2019

iFA88 commented Aug 3, 2019

iFA88 commented Aug 11, 2019

iFA88 commented Sep 14, 2019

dvdplm commented Sep 14, 2019

iFA88 commented Sep 14, 2019

dvdplm commented Sep 14, 2019

iFA88 commented Sep 14, 2019

iFA88 commented Sep 16, 2019 • edited

dvdplm commented Sep 16, 2019

iFA88 commented Sep 16, 2019 • edited

iFA88 commented Sep 21, 2019

iFA88 commented Jun 30, 2019 •

edited

iFA88 commented Jun 30, 2019 •

edited

iFA88 commented Jun 30, 2019 •

edited

iFA88 commented Jul 2, 2019 •

edited

iFA88 commented Jul 2, 2019 •

edited

iFA88 commented Sep 16, 2019 •

edited

iFA88 commented Sep 16, 2019 •

edited