Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jormungandr 0.8.2 stops syncing and node stats inconsistent #1374

Open
aalkilani opened this issue Dec 16, 2019 · 15 comments
Open

jormungandr 0.8.2 stops syncing and node stats inconsistent #1374

aalkilani opened this issue Dec 16, 2019 · 15 comments

Comments

@aalkilani
Copy link

Describe the bug
A clear and concise description of what the bug is.
jormungandr running. cardano-wallet (12/13 release) serve mode running connected to jormungandr. As far as I can tell, this likely happened the first time cardano-wallet connected.

Symptoms:
Node stats (jcli rest v0 node stats get) reports a recent, up to date, lastBlockTime however upon examining the lastBlockHeight, it appears the height is not moving forward and does not match what the lastBlockTime is saying it should be.
At first, did not notice an issue however the cardano-wallet sync percent kept decreasing so I figuring something isn't fetching correctly. Given the lastblockTime, I thought Jormungandr was fine. Finally noticed lastBlockHeight was not moving at all. No error messages on either side (wallet or jormungandr).

Additional thoughts:
Reason I think it was the initial (1st time) cardano-wallet connected to jormungandr that caused this is that I never say cardano-wallet report that it was in sync and ready.. it started showing in the 99% and then just kept dropping so looks like upon connection, something happened.

Resolution:
Stopped cardano-wallet
Stopped jurmangandr
Started jurmangandr
Started cardano-wallet

Mandatory Information

  1. jcli --full-version output;
    jcli 0.8.2 (HEAD-fe327f9, release, linux [x86_64]) - [rustc 1.39.0 (4560ea788 2019-11-04)]
  2. jormungandr --full-version output;
    jormungandr 0.8.2 (HEAD-fe327f9, release, linux [x86_64]) - [rustc 1.39.0 (4560ea788 2019-11-04)]

To Reproduce
See above

Expected behavior
See above

Additional context
See above

@rickymac68
Copy link

I saw a similar problem where my block height stopped going up and got greater than 200 behind. I only saw it once, and a restart of the node temporarily cleared the problem.

@rickymac68
Copy link

Dec 16 06:43:16.124 WARN blockchain is not moving up, the last block was 3603 seconds ago, task: stuck_notifier

It happened again.

@Willburn
Copy link

Willburn commented Dec 16, 2019

Had the problem as well. Needed reboot.
Screenshot_20191216-130714_JuiceSSH

I should add server did not have the uptime in seconds as the reported last block time so quite wierd.

@rickymac68
Copy link

Same problem on https://pooltool.io apparently

Screen Shot 2019-12-16 at 7 29 26 AM

@adavibesp
Copy link

adavibesp commented Dec 16, 2019

Seeing the same issue. Apparantly I'm receiving blocks in the wrong order and afterwards it stops syncing. I see a critical Task Panick.
Dec 16 14:32:01.695 CRIT Task panicked, reason: assertion failed: !self.candidate_map.contains_key(&root_hash), task: block

It looks like I'm getting block announcements but they are no longer being applied.

Dec 16 14:32:01.694 WARN response canceled by the processing task, request: UploadBlocks, sub_task: server, task: network
Dec 16 14:39:01.784 INFO receiving header stream from network, task: block
Dec 16 14:39:01.785 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network
Dec 16 14:39:01.789 INFO received block announcement from network, from_node_id: 2b48352a5dea929467ed7a3476e25b18b48056db4df62d1b, date: 2.33137, parent: 99571b1173c8b9f927a0de54e32393bf49fdd6f76ba8abd0885dd3dff92cbc4d, hash: af76563f6e701f6738a5c98302ff224def07e9697f829276fe3fb66e5bb19062, task: block
Dec 16 14:39:23.128 INFO block successfully applied, date: 2.33152, parent: 6f703ab76c9600bb1b3d8663ea4d3c092241f1355348dd79733017bbf4ef2c90, hash: 224eb03defccc5cd6c28017bddadb96bd487c0813f2f69e7c1d1e3bc2434fc37, task: block
Dec 16 14:39:23.128 WARN 1 more blocks have arrived out of order, but I don't know what to do with them yet!, date: 2.33152, parent: 6f703ab76c9600bb1b3d8663ea4d3c092241f1355348dd79733017bbf4ef2c90, hash: 224eb03defccc5cd6c28017bddadb96bd487c0813f2f69e7c1d1e3bc2434fc37, task: block
Dec 16 14:39:23.128 INFO update current branch tip, task: block
Dec 16 14:42:01.911 INFO receiving block stream from network, task: block
Dec 16 14:42:01.912 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network
Dec 16 14:42:01.915 INFO received block announcement from network, from_node_id: da13f35c351b9c4544f638b67fcfed354d96fe820957aa5a, date: 2.33222, parent: 38c2e59e5e231ee442ff4b3d980c01f8e61e6f28f563d505cba5aa6bed22ebed, hash: cf666a2487a6e1b9f624a6e63b3d58379f228f7b06314e1f6cda386391231561, task: block
Dec 16 15:02:31.819 INFO receiving header stream from network, task: block
Dec 16 15:02:31.823 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network
Dec 16 15:02:31.837 INFO incoming connection closed, peer_addr: 78.47.82.38:48462, task: network
Dec 16 15:11:12.042 INFO receiving header stream from network, task: block
Dec 16 15:11:12.047 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network
Dec 16 15:11:12.164 INFO incoming connection closed, peer_addr: 207.35.173.123:48838, task: network
Dec 16 15:17:02.000 INFO receiving header stream from network, task: block
Dec 16 15:17:02.002 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network
Dec 16 15:17:02.072 INFO receiving block stream from network, task: block
Dec 16 15:17:22.240 INFO receiving header stream from network, task: block
Dec 16 15:17:22.248 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network
Dec 16 15:17:31.605 INFO connecting to peer, node_id: 99cb10f53185fbef110472d45a36082905ee12df8a049b74, peer_addr: 3.124.116.145:3000, task: network
Dec 16 15:17:32.051 INFO receiving header stream from network, task: block
Dec 16 15:17:32.054 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network
Dec 16 15:17:32.055 INFO receiving block stream from network, task: block
Dec 16 15:17:41.776 INFO block successfully applied, date: 2.34298, parent: cd4f64156652ee558bf9723e8ea289e7300283c76aeda2aa30d525df59e0fee7, hash: a3bb7e6b9d8745bfe808aa6790fbfbe8ea6ef8bf90303b555cff88d719123678, task: block
Dec 16 15:17:41.776 WARN 3 more blocks have arrived out of order, but I don't know what to do with them yet!, date: 2.34298, parent: cd4f64156652ee558bf9723e8ea289e7300283c76aeda2aa30d525df59e0fee7, hash: a3bb7e6b9d8745bfe808aa6790fbfbe8ea6ef8bf90303b555cff88d719123678, task: block
Dec 16 15:17:41.776 INFO update current branch tip, task: block`

After this I'm not getting any blocks applied.

Finally I also notice a lot of task responces being canceled.

@rickymac68
Copy link

Now I am seeing this error: Dec 16 16:17:08.946 ERRO cannot compute the time passed since the last block: second time provided was later than self, task: stuck_notifier

@lordwotton29
Copy link

Had the same problem with stuck node, restarted it and I'm having lots of these problems:
Dec 16 22:16:23.293 ERRO cannot compute the time passed since the last block: second time provided was later than self, task: stuck_notifier Dec 16 22:17:23.291 ERRO cannot compute the time passed since the last block: second time provided was later than self, task: stuck_notifier

I'll try stop the node and delete the storage to see if it makes a difference

@rickymac68
Copy link

Entering this for Mr. Bojangles:

I got Dec 16 21:52:48.911 ERRO cannot compute the time passed since the last block: second time provided was later than self, task: stuck_notifier

@Rugbertl
Copy link

Rugbertl commented Dec 17, 2019

I have a similar problem. This morning when I called the REST API with the following command: ./jcli rest v0 node stats get -h "http://127.0.0.1:3100/api". I only got the information that the call run into a timeout. All the other calls seem to work. (Balance check, Logs ...)

After restarting the node i had many WARN entries in my LOG:

Dec 17 11:08:54.238 WARN unable to reach peer for initial bootstrap, reason: HTTP/2.0 connection error, peer_addr: 3.124.116.145:3000, task: bootstrap
Dec 17 11:08:54.243 WARN unable to reach peer for initial bootstrap, reason: HTTP/2.0 connection error, peer_addr: 3.125.75.156:3000, task: bootstrap
Dec 17 11:08:54.518 WARN unable to reach peer for initial bootstrap, reason: HTTP/2.0 connection error, peer_addr: 13.112.181.42:3000, task: bootstrap
Dec 17 11:08:54.524 WARN unable to reach peer for initial bootstrap, reason: HTTP/2.0 connection error, peer_addr: 52.28.91.178:3000, task: bootstrap
Dec 17 11:08:54.783 WARN unable to reach peer for initial bootstrap, reason: HTTP/2.0 connection error, peer_addr: 13.114.196.228:3000, task: bootstrap
Dec 17 11:08:54.944 WARN unable to reach peer for initial bootstrap, reason: HTTP/2.0 connection error, peer_addr: 52.8.15.52:3000, task: bootstrap
Dec 17 11:08:55.102 WARN unable to reach peer for initial bootstrap, reason: HTTP/2.0 connection error, peer_addr: 52.9.132.248:3000, task: bootstrap
Dec 17 11:09:15.529 WARN 1 more blocks have arrived out of order, but I don't know what to do with them yet!, date: 3.3530, parent: b4105ecceb37a6ff0f1c50df49b7903cf140d782f24e8838dbadf2ae81c3838a, hash: d4258921a3c6346e0ec9e0e18f523693a5691f9e9a58c2c643adea0c3f33d98b, task: block
Dec 17 11:09:55.145 WARN blockchain is not moving up, the last block was 46718 seconds ago, task: stuck_notifier
Dec 17 11:10:53.321 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network
Dec 17 11:10:55.099 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network
Dec 17 11:10:55.144 WARN blockchain is not moving up, the last block was 46778 seconds ago, task: stuck_notifier
Dec 17 11:10:56.455 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network
Dec 17 11:10:56.633 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network
Dec 17 11:10:56.818 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network

Yesterday i was able to connect to the trusted peers.

@aalkilani
Copy link
Author

Deadalus ITN having same problem now (in addition to standalone jormungandr).
Deadalus on Windows 64 bit.
jormungandr on Centos 8

Possibly relevant Deadalus logs:
Dec 17 00:21:21.951 ERRO cannot process network blocks, reason: deadline has elapsed, task: block
....
.....
Dec 17 00:21:34.915 WARN blockchain is not moving up, the last block was 1859 seconds ago, task: stuck_notifier

@Rugbertl
Copy link

I just rebooted my server and restarted jormungandr. There seems to be a problem with the following trusted peers:

Dec 17 11:30:27.164 WARN unable to reach peer for initial bootstrap, reason: HTTP/2.0 connection error, peer_addr: 13.114.196.228:3000, task: bootstrap
Dec 17 11:30:27.325 WARN unable to reach peer for initial bootstrap, reason: HTTP/2.0 connection error, peer_addr: 52.8.15.52:3000, task: bootstrap
Dec 17 11:30:27.330 WARN unable to reach peer for initial bootstrap, reason: HTTP/2.0 connection error, peer_addr: 52.28.91.178:3000, task: bootstrap
Dec 17 11:30:27.608 WARN unable to reach peer for initial bootstrap, reason: HTTP/2.0 connection error, peer_addr: 13.112.181.42:3000, task: bootstrap
Dec 17 11:30:27.614 WARN unable to reach peer for initial bootstrap, reason: HTTP/2.0 connection error, peer_addr: 3.125.75.156:3000, task: bootstrap

The other ones seem to be fine. So the good thing is that my node is synching again. lastBlockHeight and blockRecvCnt is going up again and also lastBlockTime seems fine again.

I will keep monitoring the logs.

@Rugbertl
Copy link

I just received a CRIT in my logs.

Dec 17 11:36:38.995 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network
thread 'block1' panicked at 'assertion failed: !self.candidate_map.contains_key(&root_hash)', jormungandr/src/blockchain/candidate.rs:468:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
Dec 17 11:38:37.622 WARN response canceled by the processing task, request: UploadBlocks, sub_task: server, task: network
Dec 17 11:38:37.622 CRIT Task panicked, reason: assertion failed: !self.candidate_map.contains_key(&root_hash), task: block
Dec 17 11:38:37.792 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network
Dec 17 11:38:37.827 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network
Dec 17 11:38:40.672 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network
thread 'block0' panicked at 'invalid key', /home/peter/.cargo/registry/src/github.com-1ecc6299db9ec823/slab-0.4.2/src/lib.rs:789:18
Dec 17 11:38:45.316 CRIT Task panicked, reason: invalid key, task: block
Dec 17 11:44:25.721 WARN response canceled by the processing task, request: PushHeaders, sub_task: server, task: network

@JSCSJSCS
Copy link

My logs filled with these:

{"msg":"cannot process block announcement","level":"ERRO","ts":"2019-12-17T07:25:59.427184700-08:00","from_node_id":"b852c1d8ee440fc1b05956dc780319ca11aadf84b46a53d1","date":"3.36365","parent":"cead4e7337003b8b2aa83c05295df45e779af4a9c597ec7da130896c989840e2","hash":"0368f69800180e53783985c4c18ea04842656cd2c9307414368d0b660f3245c8","task":"block","reason":"deadline has elapsed"}
{"msg":"cannot process network headers","level":"ERRO","ts":"2019-12-17T07:26:02.418174700-08:00","task":"block","reason":"deadline has elapsed"}

@laplasz
Copy link

laplasz commented Dec 21, 2019

still having this issue with version 0.8.3

@rinor
Copy link
Contributor

rinor commented Jan 6, 2020

@aalkilani regarding lastBlockTime there was an issue #1402 and it was fixed in #1426 so v0.8.5 should behave better (regarding lastBlockTime changing while lastBlockHeight remaining unchanged).

@Mixxaz @Rugbertl regarding assertion failed: !self.candidate_map.contains_key(&root_hash) / panicked at 'invalid key'reported at #1351 and is already fixed in v0.8.5.

@lordwotton29 @rickymac68 regarding ERRO cannot compute the time passed since the last block: second time provided was later than self, task: stuck_notifier reported at #1404. Can you check if this makes sense to you #1404 (comment)

@Rugbertl regarding timeouts of jcli rest v0 node stats get is fixed and were related to storage locks issue fixed in #1412 and some improvements will be merged #1441 (you may already be aware due to your other issue #1392 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants