Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

RPC randomly hangs for no reason #10344

Closed
Pzixel opened this issue Feb 13, 2019 · 32 comments
Closed

RPC randomly hangs for no reason #10344

Pzixel opened this issue Feb 13, 2019 · 32 comments
Labels
F2-bug 🐞 The client fails to follow expected behavior. M6-rpcapi 📣 RPC API.
Milestone

Comments

@Pzixel
Copy link
Contributor

Pzixel commented Feb 13, 2019

  • Parity Ethereum version: 2.2.8
  • Operating system: Linux
  • Installation: docker
  • Fully synchronized: yes
  • Network: private (POA, chain.json specification)
  • Restarted: yes

For some reason after several hours of intensive block generations parity hangs for no reason.

Friday night I ran my batcher, that just call some contract in batches (50 items in batch).
Here are my logs:

2019-02-08 23:35:49.024 +03:00 [Information] "51,38%" done. Elapsed = 09:49:30.2509107.
2019-02-08 23:35:58.938 +03:00 [Information] "51,40%" done. Elapsed = 09:49:40.1648875.
2019-02-08 23:36:08.794 +03:00 [Information] "51,41%" done. Elapsed = 09:49:50.0207078.
2019-02-08 23:36:18.833 +03:00 [Information] "51,43%" done. Elapsed = 09:50:00.0597905.
2019-02-08 23:36:28.819 +03:00 [Information] "51,44%" done. Elapsed = 09:50:10.0459107.
2019-02-08 23:36:38.562 +03:00 [Information] "51,46%" done. Elapsed = 09:50:19.7880563.
2019-02-08 23:36:48.819 +03:00 [Information] "51,47%" done. Elapsed = 09:50:30.0456974.
2019-02-08 23:36:58.739 +03:00 [Information] "51,49%" done. Elapsed = 09:50:39.9653453.
2019-02-08 23:37:08.770 +03:00 [Information] "51,50%" done. Elapsed = 09:50:49.9966959.
2019-02-08 23:37:18.834 +03:00 [Information] "51,52%" done. Elapsed = 09:51:00.0605550.
2019-02-08 23:37:29.025 +03:00 [Information] "51,53%" done. Elapsed = 09:51:10.2512043.
2019-02-08 23:37:38.840 +03:00 [Information] "51,55%" done. Elapsed = 09:51:20.0662434.
2019-02-08 23:37:48.780 +03:00 [Information] "51,56%" done. Elapsed = 09:51:30.0061202.
2019-02-08 23:37:58.858 +03:00 [Information] "51,57%" done. Elapsed = 09:51:40.0840885.
2019-02-09 00:10:17.189 +03:00 [Information] "51,59%" done. Elapsed = 10:23:58.4150452.
2019-02-09 01:00:36.062 +03:00 [Information] "51,60%" done. Elapsed = 11:14:17.2886432.
2019-02-09 01:52:39.829 +03:00 [Information] "51,62%" done. Elapsed = 12:06:21.0557069.
2019-02-09 02:19:17.703 +03:00 [Information] "51,63%" done. Elapsed = 12:32:58.9298274.
2019-02-09 03:05:46.144 +03:00 [Information] "51,65%" done. Elapsed = 13:19:27.3706997.
2019-02-09 03:50:14.702 +03:00 [Information] "51,66%" done. Elapsed = 14:03:55.9286172.
2019-02-09 04:42:49.453 +03:00 [Information] "51,68%" done. Elapsed = 14:56:30.6793864.
2019-02-09 06:10:58.173 +03:00 [Information] "51,69%" done. Elapsed = 16:24:39.3998215.
2019-02-09 06:17:52.391 +03:00 [Information] "51,71%" done. Elapsed = 16:31:33.6170246.
2019-02-09 06:58:17.406 +03:00 [Information] "51,72%" done. Elapsed = 17:11:58.6323074.
2019-02-09 07:30:30.453 +03:00 [Information] "51,74%" done. Elapsed = 17:44:11.6792151.
2019-02-09 09:00:30.000 +03:00 [Information] "51,75%" done. Elapsed = 19:14:11.2265639.
2019-02-09 11:05:01.001 +03:00 [Information] "51,77%" done. Elapsed = 21:18:42.2271033.

As you can see, it worked fine for almost 10 hours. My setup is 5 sec so it took 10 sec (or two blocks) to insert the entire batch.

Then things get weirder, becasue now it takes almost an hour to insert the same amout of data in blockchain (I used "retry forever" policy with 30 sec awaiting between calls).

When I go to parity logs I see:

2019-02-13 15:08:45 UTC IO Worker #3 DEBUG txqueue  Removed 20 stalled transactions. Pool: 0/16384 (0 senders; 0/1048576 kB) [minGasPrice: 0 Mwei, maxGas: 6000000000]
2019-02-13 15:08:45 UTC jsonrpc-eventloop-1 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:45 UTC jsonrpc-eventloop-0 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:45 UTC IO Worker #0 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:45 UTC jsonrpc-eventloop-1 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:45 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0x9694bf54f5d9344e9faec2c0488d3895ab7547d79a8365b3d4262c8e93bb35aa] Added to the pool.
2019-02-13 15:08:45 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0x9694bf54f5d9344e9faec2c0488d3895ab7547d79a8365b3d4262c8e93bb35aa] Sender: 0x006d…6ecd, nonce: 354841, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2148))
2019-02-13 15:08:45 UTC IO Worker #2 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:45 UTC jsonrpc-eventloop-1 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:45 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x3bf547012ce2011c1c72f4bb0ba696e362a70e9e147c98a61c390a8877af3464] Added to the pool.
2019-02-13 15:08:45 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x3bf547012ce2011c1c72f4bb0ba696e362a70e9e147c98a61c390a8877af3464] Sender: 0x006d…6ecd, nonce: 354842, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2148))
2019-02-13 15:08:45 UTC IO Worker #2 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:45 UTC jsonrpc-eventloop-1 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:45 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0xc115fdcfc1809511d1ca0c848047dd3e19f8adb22ad49045795fe2d4ba43d556] Added to the pool.
2019-02-13 15:08:45 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0xc115fdcfc1809511d1ca0c848047dd3e19f8adb22ad49045795fe2d4ba43d556] Sender: 0x006d…6ecd, nonce: 354843, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2148))
2019-02-13 15:08:45 UTC IO Worker #2 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:45 UTC jsonrpc-eventloop-0 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:45 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x9e961c17d48ae430951c7f124d42721bc06bdab14c69260a25c7edfb4ecf40ed] Added to the pool.
2019-02-13 15:08:45 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x9e961c17d48ae430951c7f124d42721bc06bdab14c69260a25c7edfb4ecf40ed] Sender: 0x006d…6ecd, nonce: 354844, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2148))
2019-02-13 15:08:45 UTC IO Worker #1 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:45 UTC jsonrpc-eventloop-0 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:45 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0x6404b3047d379d12eba7e583accfe11fbd4e3f7d151fe7b0b31711b656ae96c4] Added to the pool.
2019-02-13 15:08:45 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0x6404b3047d379d12eba7e583accfe11fbd4e3f7d151fe7b0b31711b656ae96c4] Sender: 0x006d…6ecd, nonce: 354845, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2148))
2019-02-13 15:08:45 UTC IO Worker #1 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:45 UTC jsonrpc-eventloop-0 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:45 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0xe5dce08e69701c6ec31443c9a5421a4e3c3dba02b32470625d4499740e27bfee] Added to the pool.
2019-02-13 15:08:45 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0xe5dce08e69701c6ec31443c9a5421a4e3c3dba02b32470625d4499740e27bfee] Sender: 0x006d…6ecd, nonce: 354846, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2084))
2019-02-13 15:08:45 UTC IO Worker #1 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:45 UTC jsonrpc-eventloop-1 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:45 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x83634f25f430c1c09a015b0b78b93f50db427f37
7c5e3f597b95fedef23b985e] Added to the pool.
2019-02-13 15:08:45 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x83634f25f430c1c09a015b0b78b93f50db427f377c5e3f597b95fedef23b985e] Sender: 0x006d…6ecd, nonce: 354847, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2148))
2019-02-13 15:08:45 UTC IO Worker #1 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:45 UTC jsonrpc-eventloop-0 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:45 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0xb88d9bb8990cb48499b78aaa2f84caa108249cda11fbe68623e01dacc12032f9] Added to the pool.
2019-02-13 15:08:45 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0xb88d9bb8990cb48499b78aaa2f84caa108249cda11fbe68623e01dacc12032f9] Sender: 0x006d…6ecd, nonce: 354848, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2052))
2019-02-13 15:08:45 UTC IO Worker #1 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:45 UTC jsonrpc-eventloop-0 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:45 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x61fc58bbfce1862f93d0a9391fc42178d32203c64ec518300953a782c21d9a7e] Added to the pool.
2019-02-13 15:08:45 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x61fc58bbfce1862f93d0a9391fc42178d32203c64ec518300953a782c21d9a7e] Sender: 0x006d…6ecd, nonce: 354849, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2084))
2019-02-13 15:08:45 UTC IO Worker #1 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0x841276230b793ccf1cee34710a4ca21dd3888b91b9d6935f412e7f33e89a3e2e] Added to the pool.
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0x841276230b793ccf1cee34710a4ca21dd3888b91b9d6935f412e7f33e89a3e2e] Sender: 0x006d…6ecd, nonce: 354850, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2084))
2019-02-13 15:08:46 UTC IO Worker #1 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:46 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x5cadbdd499376946800299f2013cec8c1ebd222e247aad035dd91449329c40f2] Added to the pool.
2019-02-13 15:08:46 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x5cadbdd499376946800299f2013cec8c1ebd222e247aad035dd91449329c40f2] Sender: 0x006d…6ecd, nonce: 354851, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2148))
2019-02-13 15:08:46 UTC jsonrpc-eventloop-1 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:46 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x63baf2a1f4a6c82a0f4a065001887cbd90cdafcf7768db1c7c0664b83430e700] Added to the pool.
2019-02-13 15:08:46 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x63baf2a1f4a6c82a0f4a065001887cbd90cdafcf7768db1c7c0664b83430e700] Sender: 0x006d…6ecd, nonce: 354852, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2148))
2019-02-13 15:08:46 UTC IO Worker #3 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0x3cd41b2b4141f88868daae3b040a809bc0c017d8e856ba77dcad03f94c0c5c0e] Added to the pool.
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0x3cd41b2b4141f88868daae3b040a809bc0c017d8e856ba77dcad03f94c0c5c0e] Sender: 0x006d…6ecd, nonce: 354853, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2148))
2019-02-13 15:08:46 UTC IO Worker #3 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:46 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0xfa6fb85f6bed3ee30c32689a67bc98f8f53b303f3cce69e2585ebea473d9336e] Added to the pool.
2019-02-13 15:08:46 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0xfa6fb85f6bed3ee30c32689a67bc98f8f53b303f3cce69e2585ebea473d9336e] Sender: 0x006d…6ecd, nonce: 354854, gasPr
ice: 0, gas: 1000000000, value: 0, dataLen: 836))
2019-02-13 15:08:46 UTC IO Worker #3 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0xbdada581aab12f4deb6d120fdde505b780e6f756171bce7edcd0056fcd4362e8] Added to the pool.
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0xbdada581aab12f4deb6d120fdde505b780e6f756171bce7edcd0056fcd4362e8] Sender: 0x006d…6ecd, nonce: 354855, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2148))
2019-02-13 15:08:46 UTC IO Worker #3 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0xb94c83628fd456d62690419b78f929875fed6e203a92c786b1fe53aa2ae8cce6] Added to the pool.
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0xb94c83628fd456d62690419b78f929875fed6e203a92c786b1fe53aa2ae8cce6] Sender: 0x006d…6ecd, nonce: 354856, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2148))
2019-02-13 15:08:46 UTC IO Worker #3 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:46 UTC jsonrpc-eventloop-1 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0x450ddcd1dc144867befe4f7737d8be4fdeed9d8637670cb7a142f547f4e00afa] Added to the pool.
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0x450ddcd1dc144867befe4f7737d8be4fdeed9d8637670cb7a142f547f4e00afa] Sender: 0x006d…6ecd, nonce: 354857, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2148))
2019-02-13 15:08:46 UTC IO Worker #3 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:46 UTC jsonrpc-eventloop-1 WARN parity_rpc::v1::impls::eth  Fallback to BlockId::Latest
2019-02-13 15:08:46 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x9a68a5b7d9c6d52011a5883b93f5bca14869132cac11e228e0e63227f73afb4e] Added to the pool.
2019-02-13 15:08:46 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x9a68a5b7d9c6d52011a5883b93f5bca14869132cac11e228e0e63227f73afb4e] Sender: 0x006d…6ecd, nonce: 354858, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 1988))
2019-02-13 15:08:46 UTC IO Worker #3 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0x058aed54ec29e4c446bf8e65e7045958b22793f9dc917ea745a3a0634be1aa40] Added to the pool.
2019-02-13 15:08:46 UTC jsonrpc-eventloop-0 DEBUG txqueue  [0x058aed54ec29e4c446bf8e65e7045958b22793f9dc917ea745a3a0634be1aa40] Sender: 0x006d…6ecd, nonce: 354859, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2148))
2019-02-13 15:08:46 UTC IO Worker #1 DEBUG txqueue  Re-computing pending set for block: 15268
2019-02-13 15:08:46 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x9f2c51a0fb088a79efc26a3c143d6a502d600752a6201880b877b877d33f3ea6] Added to the pool.
2019-02-13 15:08:46 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x9f2c51a0fb088a79efc26a3c143d6a502d600752a6201880b877b877d33f3ea6] Sender: 0x006d…6ecd, nonce: 354860, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2052))
2019-02-13 15:08:46 UTC IO Worker #1 DEBUG txqueue  Re-computing pending set for block: 15268

Nothing more appears, it just writes it and in 20 sec my client fails with RPC timeout. Then it retries and I get another bunch of logs, then it again writes nothing more, repeat.

I restarted all nodes, I tried to leave single node to prevent desync, I double checked system time. I don't know what can I do.

Here is a code that gets called:

function updateStatus(bytes30 serviceNumber, uint64 responseDate, uint64 statusCode, string memory note) public onlyOwner {
	int index = getRequestIndex(serviceNumber);
	Request storage request = nodes[uint64(index)].request;
	updateStatusInternal(request, responseDate, statusCode, note);
}

function updateStatusInternal(Request storage request, uint64 responseDate, uint64 statusCode, string memory note) private {
	request.statusUpdates.push(Types.StatusUpdate(responseDate, statusCode, note));
	request.statusUpdatesNotes = strConcat(request.statusUpdatesNotes, "\x1f", note);
}

function getRequestIndex(bytes30 serviceNumber) public view returns (int) {
	uint64 index = requestServiceNumberToIndex[serviceNumber];

	if (index == 0 && (nodes.length == 0 || nodes[0].request.serviceNumber != serviceNumber)) {
		return - 1;
	}

	return int(index);
}

edit I tried another contract and had exactly same behaviour. It seems that actual code doesn't matter. Maybe some kind of deadlock?

It just does a lookup in the map and then insert one item in array. Nothing complicated that has to take hours to complete.


I provide logs and all info I possess. If I can do anything else I'd be rigorous in providing all required additional information. It just randomly stops to work with no sane reason. It seems that it depend on database somehow, because I restarted service with --no-persistent-queue, but still have the same behaviour. I already did all kind of optimizations: client-side signing and so on, but it still doesn't work.

cc @tomusdrw , could you take a look, please?

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 14, 2019

It's currently a hard blocked for me because I have a hard deadline in two weeks when my contract ends. If I can't get parity working until this date then I just don't know what to do... Probably try geth or something else... I just give up here: I have everything work, but parity just randomly hangs and I have no way to learn what's wrong and I have no power to change anything, because learning the code itself will take more time than I have left.

@jam10o-new jam10o-new added F2-bug 🐞 The client fails to follow expected behavior. M6-rpcapi 📣 RPC API. labels Feb 14, 2019
@jam10o-new jam10o-new added this to Needs Assignment in Core via automation Feb 14, 2019
@jam10o-new jam10o-new added this to the 2.4 milestone Feb 14, 2019
@jam10o-new
Copy link
Contributor

jam10o-new commented Feb 14, 2019

@Pzixel I don't think no-persistent-txqueue will help for your issue - can you share any of the rpc and mining settings in your config?

Specifically things like the server_threads setting in the rpc section and your txqueue related settings? (in mining ie, tx_queue_per_sender and tx_queue_mem_limit)

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 14, 2019

I only wanted to say that the only thing persisted between restart is db itself.

My current authority.toml:

[parity]
chain = "/parity/config/chain.json"

[rpc]
interface = "0.0.0.0"
cors = ["all"]
hosts = ["all"]
apis = ["web3", "eth", "net", "parity", "traces", "rpc", "personal", "parity_accounts", "signer", "parity_set"]

[network]
bootnodes = [
  "enode://147573f46fe9f5cc38fbe070089a31390baec5dd2827c8f2ef168833e4d0254fbee3969a02c5b9910ea5d5b23d86a6ed5eabcda17cc12007b7d9178b6c697aa5@172.16.0.10:30303",
  "enode://1412ee9b9e23700e4a67a8fe3d8d02e10376b6e1cb748eaaf8aa60d4652b27872a8e1ad65bb31046438a5d3c1b71b00ec3ce0b4b42ac71464b28026a3d0b53af@172.16.0.11:30303",
  "enode://9076c143a487aa163437a86f7d009f257f405c50bb2316800b9c9cc40e5a38fef5b414a47636ec38fdabc8a1872b563effa8574a7f8f85dc6bde465c368f1bf5@172.16.0.12:30303"
]

[account]
password = ["/parity/authority.pwd"]

[mining]
reseal_on_txs = "none"
gas_floor_target = "0x165A0BC00"
tx_queue_size = 16384
tx_queue_mem_limit = 1024

@jam10o-new
Copy link
Contributor

Can you perhaps try setting server_threads in the rpc section? Likely won't solve whatever the source of the problem is but it might be an improvement on the current state.

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 14, 2019

Gonna try it out. I see it might speed up things a bit, and I see it unlikely be the solution. However, is a good spot indeed.

@tomusdrw
Copy link
Collaborator

@Pzixel Are you running with pruning (i.e. default settings)? Could you re-run with -lminer=trace on the authority nodes? Also could you provide some logs with -lrpc=trace to see how much time the RPC requests take and which ones are the slowest?

I suspect it might be because of some underyling DB compaction that happens. Can you check the load on the machine (CPU and IO) and what threads are doing work?

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 15, 2019

I didn't recall this setting so I do believe it's default.

I'l retry with all logging on and will be monitoring resources as well. Brb when done. Thank you for attention.


P.S.

When I run with -ltxqueue=trace lminer=trace -lrpc=trace I get

Attaching to authority0_authority0_1
authority0_1  | error: The argument '--logging <LOGGING>' was provided more than once, but cannot be used multiple times
authority0_1  |
authority0_1  | USAGE:
authority0_1  |     parity --no-persistent-txqueue --unsafe-expose
authority0_1  |
authority0_1  | For more information try --help
authority0_authority0_1 exited with code 1

Is it intended?

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 15, 2019

With -lminer=trace

authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Skipping adding transaction to block because of invalid nonce: 0xc4fa9ea4e2a0366e1222e468bbd49c4b83df8136852f7580617fa55bcc2f7331 (expected: 24772, got: 20578)
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Adding tx 0xba9895fd78b9f34d146bf5d486758b8a46c29e3722e03a0753e3860effd7c26b took 0 ms
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Skipping adding transaction to block because of invalid nonce: 0xba9895fd78b9f34d146bf5d486758b8a46c29e3722e03a0753e3860effd7c26b (expected: 24772, got: 20579)
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Adding tx 0x8ce056a8b3b50ad47f1c1f050336b23e6ddeb7ec6c87589996c93175d98e645c took 6 ms
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Skipping adding transaction to block because of invalid nonce: 0x8ce056a8b3b50ad47f1c1f050336b23e6ddeb7ec6c87589996c93175d98e645c (expected: 24772, got: 20580)
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Adding tx 0xd4e9fb519afe2bc1fab4f751bca0fdf6a81cdb32b8c88f6c8c4247a95c3cbec3 took 0 ms
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Skipping adding transaction to block because of invalid nonce: 0xd4e9fb519afe2bc1fab4f751bca0fdf6a81cdb32b8c88f6c8c4247a95c3cbec3 (expected: 24772, got: 20581)
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Adding tx 0x028c1c9469ecab9eb2b85ab69e8da764deae0553e2eeabba13d3a6f9c802a32e took 0 ms
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Skipping adding transaction to block because of invalid nonce: 0x028c1c9469ecab9eb2b85ab69e8da764deae0553e2eeabba13d3a6f9c802a32e (expected: 24772, got: 20582)
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Adding tx 0x7665a3e564edba9b30c114cd7d9f70517818e9146d11a31eeb9ed87993c94d2b took 0 ms
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Skipping adding transaction to block because of invalid nonce: 0x7665a3e564edba9b30c114cd7d9f70517818e9146d11a31eeb9ed87993c94d2b (expected: 24772, got: 20583)
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Adding tx 0x6412dea1276721fdac515bc69a68110ffde43e3b556a4c1dc19a8f047306421e took 0 ms
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Skipping adding transaction to block because of invalid nonce: 0x6412dea1276721fdac515bc69a68110ffde43e3b556a4c1dc19a8f047306421e (expected: 24772, got: 20584)
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Adding tx 0x1dcec46d8902a259ffbaa6348e6e4a11b1f7139f7d1b716dd0fcc77f78986953 took 0 ms
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Skipping adding transaction to block because of invalid nonce: 0x1dcec46d8902a259ffbaa6348e6e4a11b1f7139f7d1b716dd0fcc77f78986953 (expected: 24772, got: 20585)
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Adding tx 0x412330bef39a99c50059794d598329b1f0d28d38d24ac90af09f13bab34e5831 took 0 ms
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Skipping adding transaction to block because of invalid nonce: 0x412330bef39a99c50059794d598329b1f0d28d38d24ac90af09f13bab34e5831 (expected: 24772, got: 20586)
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 DEBUG miner  Pushed 0 transactions in 10 ms
authority0_1  | 2019-02-15 12:22:20 UTC IO Worker #2 TRACE miner  update_sealing: engine indicates internal sealing
authority0_1  | 2019-02-15 12:22:21 UTC IO Worker #3 INFO import  Syncing    #1888 0x7478…407f     1.00 blk/s   99.2 tx/s   82.0 Mgas/s      0+  142 Qed     #2032    2/25 peers    0 bytes chain  382 MiB db   51 MiB queue  722 KiB sync  RPC:  0 conn, 1071 req/s,   36 µs

RPC responds nothing (no faliures/etc), the entire process is just hanged.

Maybe it's something like invalid client-side signing?

@tomusdrw
Copy link
Collaborator

You can merge multiple logging flags like this: -lminer=trace,rpc=trace,txqueue=trace. Seems that curently you have a bunch of old transactions to the pool, maybe importing takes long cause they are not cleared correctly. Will need txqueue logs to figure that out thouh

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 15, 2019

@joshua-mir

One thing I notice, which isn't entirely related to the hanging rpc, is that you appear to be sending many transactions with duplicate nonces - are you calculating the nonce of your transactions manually, or leaving them undefined and relying on parity's nonce calculation?

It's a client side nonce calculation as suggested in #8829 (comment). It was never an issue, but I suspect myself my library that perform a nonce generation. Need deeper investigation, so do I.

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 15, 2019

I see tons of request/response in logs

authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-1 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0xce11feb39e34d6c3cfe1f5937feec28fba8edeb5152952d77167852b2f0aa3c5"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0xeeb95faf19a557f771d67f99ccfd07d75f5a6f4a50dd175060723519fe2af374"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x041565d73e3c8d4f30ca689c40f0c43b0c197ec39a578b4777bb5a49b4d68672"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x7d830c49effccf745a79bc97debdb13584009b9c3338ba8420b0bbcbfa6e45d4"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x93ee17366c07fec05be594b220a7a640ee076807cbe342b11654577130ddbd5e"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0xaaa3a55ab6d389327be3f1dd44ed6b253c6499015dba37e4de5fc6b5b079d5a7"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x24dfdf2bd68d7975df60b89d7f44db37763fecf68c5ff39deb26a9766088a483"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0xbdffab24c877e3943acee1caea6bf8711900bea9b34c3c5960e32f727b771e4a"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0xe7df41a6743460252d240748693cf99e81428415eca0ad43e195762c6a4f34eb"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0xf043e1f57a34d2aef1e783101219d91005b0e58022245fafb5ebddd4795ada7d"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0xc18c985ad3d5b56073e69218a3d7903876cc964ac8c517aa5126f9db7d2ad90f"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x559e4cc3b76a9cb957e37fa521bfcc68c1b9f8ea64270192a1f5a2da0d0fa938"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x41dca9ab18cd7802bd6467b59c2844bea7a34ec488cbdb24ac52cec38ae81bdf"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0xa744674c0c8f918f25fe541a7d2f6f18f23d8fc59ca08c12b084a4c7a734ccd8"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x1fdd3c29c66d6e4ae09448e07271333e310ae1d26f0c71cf8e57dd9e1e20eb64"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0xf76fd0873d09805e68250504b39d32c1d688b369fe084a1a29a6952b3a0bee43"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x850278280436e31eceb99a18613907bd9a391e703b904daece555a273d820308"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x9463d7185be56f62293d38864ea69607a015ffcb613faa495401f78a210d6b84"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x1e9d2dae33026bd884818be29ac0abd6bcd62f1b1438070a32b9081e699d9318"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x075d8e4ad4ab22ad8ff454b2801a72b4cf9315489b9e5b61ba316934ca9b8aa9"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x468617c21a3199f78469203caf7095cc77f7428b3dcd4597e1d98e1821e63ae3"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x10d05af0a44a946e1ddecbce6728bd1d2b07b3df2f638a9881530320f48eb430"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0xc8f8dc2608ab068b717b75da035db0f577c6d028787251401a9f09907afb0b2b"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-1 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0xbce77c7be789f6d521cc374a17921576f199a4a4d532a685208c299135d19919"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x97daaa0ebd910e2d40f1b7d84d7f2837077349ffb3bdb051cd77a58c631297d9"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-1 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x59342de8b8a344297a8d83d6c68a6cb34cc0fe57600a595d065edb616f0184d3"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x79ce54d38b233292087a9a6e79dd2386beb695106768472ec07cd2035d522add"]}.
authority0_1  | 2019-02-15 12:46:30 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.

It looks like it generated several transactions and then trying to poll their receipts that never gets created

Code that gets executed (for reference):

var receipt = await createRequest.SendDefaultTransactionAndWaitForReceiptAsync(params);

Where SendDefaultTransactionAndWaitForReceiptAsync creates a transaction and tries to poll it:
https://github.com/Nethereum/Nethereum/blob/master/src/Nethereum.RPC/TransactionReceipts/TransactionReceiptPollingService.cs#L51

Where transactionFunction is SignAndSendTransactionAsync https://github.com/Nethereum/Nethereum/blob/master/src/Nethereum.Accounts/AccountSignerTransactionManager.cs#L125

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 15, 2019

How I see it:

  1. I send a raw transaction in parity
  2. I'm trying to poll its receipt
  3. Parity returns null
  4. Goto 2
  5. Infinite loop?
  6. Hanged app

Transaction queue is empty, so this process is infinte.

P.S. I actually see several hundreds of txs in the tx queue but it doesn't get smaller.

@tomusdrw
Copy link
Collaborator

@Pzixel that's quite possible. If the nonce is managed incorrectly transactions may end up occupying the pool forever (waiting for some future transaction). Also if you get null in the response of getTransactionReceipt it means that the transaction is nowhere to be found (neither in blockchain nor in the queue) so ti will never return anything else. I'd suggest making a safe-valve for this - if you receive null print a warning, if it's like 50th time then print an error that something went wrong.

Having a huge number of such requests will definitely affect performance of other requests.

We need to figure out WHY the transactions were lost (either because of some nonce issue or a bug). Could you please collect more logs so that we can analyze a path of one particular transaction that get's lost? We need info:

  1. When it's submitted (identify by hash)
  2. Why it didn't get to the block
  3. When it's lost from the pool (potentially dropped?

This will require collecting logs with -lminer=trace,txqueue=trace,rpc=debug and then filtering them for one concrete transaction hash.

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 16, 2019

@tomusdrw right, I had the same thoughts :) Unfortunately when I do tail 100000 on log file terminal hangs too, this is why I didn't provide them yet.

I'l send them the monday, if you don't mind.

@tomusdrw
Copy link
Collaborator

tomusdrw commented Feb 16, 2019

@Pzixel yeah, you need to find transaction hash (maybe log it in your application) and then grep -A5 <hash> ./parity.logifle to get filtered results.

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 18, 2019

I see following

2019-02-18 12:31:15 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e] Added to the pool.
2019-02-18 12:31:15 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e] Sender: 0x006d…6ecd, nonce: 1, gasPrice: 0, gas: 4700000, value: 0, dataLen: 17801))
2019-02-18 12:31:15 UTC jsonrpc-eventloop-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":"0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e","id":1}.
2019-02-18 12:31:15 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e"]}.
2019-02-18 12:31:15 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
2019-02-18 12:31:15 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e"]}.
2019-02-18 12:31:15 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
2019-02-18 12:31:15 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e"]}.
2019-02-18 12:31:15 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
2019-02-18 12:31:15 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e"]}.
2019-02-18 12:31:15 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
2019-02-18 12:31:15 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e"]}.
2019-02-18 12:31:15 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
2019-02-18 12:31:15 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e"]}.
2019-02-18 12:31:15 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
2019-02-18 12:31:15 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e"]}.
2019-02-18 12:31:15 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
2019-02-18 12:31:16 UTC jsonrpc-eventloop-1 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e"]}.
2019-02-18 12:31:16 UTC jsonrpc-eventloop-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
2019-02-18 12:31:16 UTC jsonrpc-eventloop-1 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e"]}.
2019-02-18 12:31:16 UTC jsonrpc-eventloop-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
2019-02-18 12:31:20 UTC IO Worker #3 TRACE miner  update_sealing
2019-02-18 12:31:20 UTC IO Worker #3 TRACE miner  requires_reseal: sealing enabled
2019-02-18 12:31:20 UTC IO Worker #3 TRACE miner  requires_reseal: should_disable_sealing=false; forced=false, has_local=true, internal=Some(true), had_requests=false
2019-02-18 12:31:20 UTC IO Worker #3 TRACE miner  update_sealing: preparing a block
2019-02-18 12:31:20 UTC IO Worker #3 TRACE miner  prepare_block: No existing work - making new block
2019-02-18 12:31:20 UTC IO Worker #3 DEBUG txqueue  Re-computing pending set for block: 1
2019-02-18 12:31:20 UTC IO Worker #3 DEBUG miner  Attempting to push 1 transactions.
2019-02-18 12:31:20 UTC IO Worker #3 DEBUG miner  Adding tx 0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e took 0 ms
2019-02-18 12:31:20 UTC IO Worker #3 DEBUG miner  Pushed 1 transactions in 0 ms
2019-02-18 12:31:20 UTC IO Worker #3 TRACE miner  update_sealing: engine indicates internal sealing
2019-02-18 12:31:20 UTC IO Worker #3 TRACE miner  seal_block_internally: attempting internal seal.
2019-02-18 12:31:25 UTC IO Worker #2 TRACE miner  update_sealing
2019-02-18 12:31:25 UTC IO Worker #2 TRACE miner  requires_reseal: sealing enabled
2019-02-18 12:31:25 UTC IO Worker #2 TRACE miner  requires_reseal: should_disable_sealing=false; forced=false, has_local=true, internal=Some(true), had_requests=false
2019-02-18 12:31:25 UTC IO Worker #2 TRACE miner  update_sealing: preparing a block
2019-02-18 12:31:25 UTC IO Worker #2 TRACE miner  prepare_block: No existing work - making new block
2019-02-18 12:31:25 UTC IO Worker #2 DEBUG miner  Attempting to push 1 transactions.
2019-02-18 12:31:25 UTC IO Worker #2 DEBUG miner  Adding tx 0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e took 0 ms
2019-02-18 12:31:25 UTC IO Worker #2 DEBUG miner  Pushed 1 transactions in 0 ms
2019-02-18 12:31:25 UTC IO Worker #2 TRACE miner  update_sealing: engine indicates internal sealing
2019-02-18 12:31:25 UTC IO Worker #2 TRACE miner  seal_block_internally: attempting internal seal.
2019-02-18 12:31:30 UTC IO Worker #1 TRACE miner  update_sealing
2019-02-18 12:31:30 UTC IO Worker #1 TRACE miner  requires_reseal: sealing enabled
2019-02-18 12:31:30 UTC IO Worker #1 TRACE miner  requires_reseal: should_disable_sealing=false; forced=false, has_local=true, internal=Some(true), had_requests=false
2019-02-18 12:31:30 UTC IO Worker #1 TRACE miner  update_sealing: preparing a block
2019-02-18 12:31:30 UTC IO Worker #1 TRACE miner  prepare_block: No existing work - making new block
2019-02-18 12:31:30 UTC IO Worker #1 DEBUG miner  Attempting to push 1 transactions.
2019-02-18 12:31:30 UTC IO Worker #1 DEBUG miner  Adding tx 0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e took 0 ms
2019-02-18 12:31:30 UTC IO Worker #1 DEBUG miner  Pushed 1 transactions in 0 ms
2019-02-18 12:31:30 UTC IO Worker #1 TRACE miner  update_sealing: engine indicates internal sealing
2019-02-18 12:31:30 UTC IO Worker #1 TRACE miner  seal_block_internally: attempting internal seal.
2019-02-18 12:31:30 UTC IO Worker #1 TRACE miner  Received a Regular seal.
2019-02-18 12:31:30 UTC IO Worker #1 TRACE miner  chain_new_blocks
2019-02-18 12:31:30 UTC IO Worker #1 TRACE miner  minimal_gas_price: recalibrating...
2019-02-18 12:31:30 UTC IO Worker #1 DEBUG miner  minimal_gas_price: Got gas price! 0
2019-02-18 12:31:30 UTC IO Worker #1 INFO import  Imported #2 0x57e5…74a7 (1 txs, 4.68 Mgas, 0 ms, 18.04 KiB)
2019-02-18 12:31:30 UTC IO Worker #1 INFO own_tx  Transaction mined (hash 0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e)
2019-02-18 12:31:30 UTC IO Worker #1 DEBUG txqueue  [0x652763bcfaa956698c6f61164b42c4c9d23876e1392a68b04da58848ea6d844e] Culled or mined.
2019-02-18 12:31:30 UTC IO Worker #1 DEBUG txqueue  Removed 1 stalled transactions. Pool: 0/16384 (0 senders; 0/1048576 kB) [minGasPrice: 0 Mwei, maxGas: 6000000000]

So getting nulls seems to be OK, because transaction should be mined in the future

@tomusdrw
Copy link
Collaborator

@Pzixel Right, sorry. I incorrectly thought that rececipts are available for pool transactions as well and pool data is only considered for eth_getTransactionByHash.

@tomusdrw
Copy link
Collaborator

So it still doesn't solve the original issue, could you provide some logs from the time the hang actually happens?

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 18, 2019

I'm not sure. For example here is my txs sent in 200tx batches. Surrounded logs were trimmed

2019-02-18 12:41:36 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x9501a5c6d5f63c6cf9ed5df52794f0db5b2082f6424cddc84210334a275596e7] Added to the pool.
2019-02-18 12:41:36 UTC jsonrpc-eventloop-1 DEBUG txqueue  [0x9501a5c6d5f63c6cf9ed5df52794f0db5b2082f6424cddc84210334a275596e7] Sender: 0x006d…6ecd, nonce: 247, gasPrice: 0, gas: 1000000000, value: 0, dataLen: 2180))
2019-02-18 12:41:36 UTC jsonrpc-eventloop-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":"0x9501a5c6d5f63c6cf9ed5df52794f0db5b2082f6424cddc84210334a275596e7","id":1}.
2019-02-18 12:41:36 UTC jsonrpc-eventloop-1 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x9501a5c6d5f63c6cf9ed5df52794f0db5b2082f6424cddc84210334a275596e7"]}.
2019-02-18 12:41:36 UTC jsonrpc-eventloop-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
...
2019-02-18 12:41:37 UTC jsonrpc-eventloop-1 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x9501a5c6d5f63c6cf9ed5df52794f0db5b2082f6424cddc84210334a275596e7"]}.
2019-02-18 12:41:37 UTC jsonrpc-eventloop-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
...
2019-02-18 12:41:37 UTC jsonrpc-eventloop-1 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x9501a5c6d5f63c6cf9ed5df52794f0db5b2082f6424cddc84210334a275596e7"]}.
2019-02-18 12:41:37 UTC jsonrpc-eventloop-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
...
... //378 more polls collapsed and surrounding logs are collapsed
...
2019-02-18 12:42:27 UTC jsonrpc-eventloop-0 TRACE rpc  Request: {"id":1,"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0x9501a5c6d5f63c6cf9ed5df52794f0db5b2082f6424cddc84210334a275596e7"]}.
2019-02-18 12:42:27 UTC jsonrpc-eventloop-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":null,"id":1}.
2019-02-18 12:42:30 UTC IO Worker #2 TRACE miner  update_sealing
2019-02-18 12:42:30 UTC IO Worker #2 TRACE miner  requires_reseal: sealing enabled
2019-02-18 12:42:30 UTC IO Worker #2 TRACE miner  requires_reseal: should_disable_sealing=false; forced=false, has_local=true, internal=Some(true), had_requests=false
2019-02-18 12:42:30 UTC IO Worker #2 TRACE miner  update_sealing: preparing a block
2019-02-18 12:42:30 UTC IO Worker #2 TRACE miner  prepare_block: No existing work - making new block
2019-02-18 12:42:30 UTC IO Worker #2 DEBUG txqueue  Re-computing pending set for block: 9
2019-02-18 12:42:30 UTC IO Worker #2 DEBUG miner  Attempting to push 0 transactions.
2019-02-18 12:42:30 UTC IO Worker #2 DEBUG miner  Pushed 0 transactions in 0 ms
2019-02-18 12:42:30 UTC IO Worker #2 TRACE miner  update_sealing: engine indicates internal sealing
2019-02-18 12:42:33 UTC IO Worker #1 INFO import     0/25 peers    180 KiB chain    4 MiB db  0 bytes queue 448 bytes sync  RPC:  0 conn,    0 req/s,   27 µs

You can see that tx was in the queue for 1 minute but wasn't mined. It may be a nonce problem but I'm not sure how to check it.

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 18, 2019

I provide the entire log so you could grep it yourself if you want.

full_log.zip

@tomusdrw
Copy link
Collaborator

tomusdrw commented Feb 18, 2019

Yeah, I can clearly see that the transaciton is not considered for pending block, but since there is a lot of logs trimmed I can't figure out what actually happened with it.

The issue is that, what I see is:

  1. Transactions enters the pool
  2. You poll RPC for it and get null (which basically means nothing)
  3. ???
  4. The pending block is created by no transactions are pushed.

What I want to know:

  1. What's the pool status (Is the transaciton still in the pool or was it dropped)
  2. What previous transactions were sent - is the nonce: 247 correct, i.e. the transaction might be in the pool, but it's not ready for the next block becacuse transaction with nonce: 246 is missing.

EDIT: just noticed the full logs, analysing now.

@tomusdrw
Copy link
Collaborator

@Pzixel what pool flags do you have? It seems that you are reaching per-sender limit and the transactions replace each other in the pool:
<hash> Dropped. Replaced by <hash>
entries mean that a better transaction replaces some old one in the pool.

Please make sure that authority0 is running with --tx-queue-per-sender <somehighnumber>

@tomusdrw
Copy link
Collaborator

It happens because transactions are received out of order (most likely multiple json-rpc server threads), which causes a transaction with nonce: 169 (0x52ebbd7fbc7ce8b39f5a367d593147ca7205dc6268a4904aa230da3b1088d3fc) to be replaced by 0x1c3e624ebe4478a50da7e9d671158b642c823eba1039ded3f9d321c2b6006b9a which has nonce: 167.

nonce: 167 eventually gets mined, same as nonce: 168 (0xdc36e..89d52), but nonce: 169 (0x52ebbd...) not. Same stuff happens for all subsequent nonces.

@tomusdrw
Copy link
Collaborator

Interestingly since it seems the transactions are local, after #9002 local transactions should be accepted above all possible limits, so it needs some investigation why it's being dropped.

As a workaround running with some high value of --tx-queue-max-per-sender should help.

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 18, 2019

Yes, I see. It seems that with client-side nonce management you are almost guaranteed to loose a tx. And if you loose any then your mining hangs because client doesn't expect tx to suddenly dissapear, and infinite loop occurs.

I just tried to run batch 1000 with --tx-queue-per-sender 1000 and it hanged. I'l try 1100 now to check if you need some "extra txs per sender" to make it work.


I could configure it via TOML tx_queue_per_sender right? Does parity logs somewhere which params it is running?

@tomusdrw
Copy link
Collaborator

  1. Yes, you can use the toml config for this.
  2. Not sure if it's printed in the logs, maybe wihen running with -txqueue=trace?
  3. Please set it close to infitity to first validate that it fixes the problem, if not could you provide the full logs?

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 18, 2019

I ran it with batch size 1000 (which was almost always failing before) and tx_queue_per_sender = 10000 in toml. And it seems that it works. It doesn't explain 100% reproducability when I was restarting my service since in this case nonce gets reseted and we're fine, but maybe I'm just unlucky or there is some correlation between db size and dropping probability (just random thoughts).

Thank you for your great analysis. I close it for now and I'l come back if it won't help.

My infinite gratitude is yours (personally ❤️ :) )


P.S. I hope #10324 will get some attention :)

@Pzixel Pzixel closed this as completed Feb 18, 2019
Core automation moved this from Needs Assignment to Done Feb 18, 2019
@Pzixel Pzixel reopened this Feb 19, 2019
Core automation moved this from Done to Needs Assignment Feb 19, 2019
@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 19, 2019

@tomusdrw sorry, it seems that it's not working yet. As I said, symptoms were different and it's indeed wasn't fixed by the queue length.

I started a stress test several hours ago and it failed with status code 134 when it was running as docker -d service. Gathering logs to see what happened, but it seems that there are several gigabytes so it takes some time to get them from the docker.

@jam10o-new
Copy link
Contributor

@Pzixel #10375 should address this when merged and backported.

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 19, 2019

Nope, it's completely different thing. Or maybe I should create another issue. My guess is that parity shutdowns under load in some cases.

CONTAINER ID        IMAGE                              COMMAND                 CREATED             STATUS                        PORTS               NAMES
2253bc13438e        parity-poa-playground_authority0   "./entrypoint.sh --c…"  4 hours ago         Exited (134) 37 minutes ago                       parity-poa-playground_authority0_1

@tomusdrw
Copy link
Collaborator

@Pzixel yeah, let's re-open a new issue if it's unrelated to this one

@Pzixel
Copy link
Contributor Author

Pzixel commented Feb 19, 2019

Got it. I'm going to reproduce this once again, and if so I'l open the new issue

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
F2-bug 🐞 The client fails to follow expected behavior. M6-rpcapi 📣 RPC API.
Projects
None yet
Development

No branches or pull requests

3 participants