Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reorg: error - node is not healthy - context deadline exceeded #1327

Open
ThomasBlock opened this issue Feb 21, 2024 · 2 comments
Open

reorg: error - node is not healthy - context deadline exceeded #1327

ThomasBlock opened this issue Feb 21, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@ThomasBlock
Copy link

ThomasBlock commented Feb 21, 2024

Describe the bug
I am experimenting with different SSV setups. most are working fine. But this configuration crashes several times a day. It reboots, but would be nice to avoid these completely..

To Reproduce
ubuntu22
ethdocker
geth
nimbus-cl-only
SSV-Node:v1.2.3

Logs

execution-1  | INFO [02-21|16:11:34.196] Chain reorg detected                     number=19,276,837 hash=16adfc..de9b88 drop=1 dropfrom=f2211e..a3d250 add=1 addfrom=3c7661..7af515
execution-1  | INFO [02-21|16:11:34.300] Chain head was updated                   number=19,276,838 hash=3c7661..7af515 root=04cfd6..bcde79 elapsed=103.333878ms
consensus-1  | INF 2024-02-21 16:11:21.158+01:00 Slot end                                   topics="beacnde" slot=8475354 nextActionWait=n/a nextAttestationSlot=-1 nextProposalSlot=-1 syncCommitteeDuties=current head=71c706d7:8475354
consensus-1  | INF 2024-02-21 16:11:23.000+01:00 Slot start                                 topics="beacnde" head=71c706d7:8475354 delay=93us530ns finalized=264852:4dc8c933 peers=49 slot=8475355 sync=synced epoch=264854
consensus-1  | INF 2024-02-21 16:11:33.470+01:00 State replayed                             topics="chaindag" blocks=25 slots=27 current=71c706d7:8475354@8475355 ancestor=7e7ee1df:8475327@8475328 target=fd3d9e94:8475353@8475355 ancestorStateRoot=c6d1174e targetStateRoot=58af0665 found=false assignDur=680ms106us201ns replayDur=6s746ms924us5ns
consensus-1  | NTC 2024-02-21 16:11:34.060+01:00 Updated head block with chain reorg        topics="chaindag" headParent=fd3d9e94:8475353 stateRoot=20fe506f justified=264853:9afd5506 finalized=264852:4dc8c933 isOptHead=false newHead=f703c6d5:8475355 lastHead=71c706d7:8475354
consensus-1  | INF 2024-02-21 16:11:34.062+01:00 Missed multiple heartbeats                 topics="libp2p gossipsub" heartbeat=GossipSub delay=6s937ms303us522ns hinterval=700ms
consensus-1  | INF 2024-02-21 16:11:34.194+01:00 Slot end                                   topics="beacnde" slot=8475355 nextActionWait=n/a nextAttestationSlot=-1 nextProposalSlot=-1 syncCommitteeDuties=current head=f703c6d5:8475355

ssv-node-1  | 2024-02-21T15:11:33.357383Z       error   node is not healthy     {"node": "consensus client", "error": "failed to request syncing: failed to call GET endpoint: Get \"http://consensus:5052/eth/v1/node/syncing\": context deadline exceeded", "errorVerbose": "Get \"http://consensus:5052/eth/v1/node/syncing\": context deadline exceeded\nfailed to call GET endpoint\ngithub.com/attestantio/go-eth2-client/http.(*Service).get\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/http.go:66\ngithub.com/attestantio/go-eth2-client/http.(*Service).NodeSyncing\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/nodesyncing.go:30\ngithub.com/bloxapp/ssv/beacon/goclient.(*goClient).Healthy\n\t/go/src/github.com/bloxapp/ssv/beacon/goclient/goclient.go:203\ngithub.com/bloxapp/ssv/nodeprobe.(*Prober).probe.func1\n\t/go/src/github.com/bloxapp/ssv/nodeprobe/nodeprobe.go:96\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\nfailed to request syncing\ngithub.com/attestantio/go-eth2-client/http.(*Service).NodeSyncing\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/nodesyncing.go:32\ngithub.com/bloxapp/ssv/beacon/goclient.(*goClient).Healthy\n\t/go/src/github.com/bloxapp/ssv/beacon/goclient/goclient.go:203\ngithub.com/bloxapp/ssv/nodeprobe.(*Prober).probe.func1\n\t/go/src/github.com/bloxapp/ssv/nodeprobe/nodeprobe.go:96\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"}
ssv-node-1  | 2024-02-21T15:11:33.358330Z       error   not all nodes are healthy
ssv-node-1  | 2024-02-21T15:11:33.358346Z       fatal   ethereum node(s) are either out of sync or down. Ensure the nodes are healthy to resume.
ssv-node-1  | make: *** [Makefile:102: start-node] Error 1
ssv-node-1  | make: go: No such file or directory
ssv-node-1  | make: go: No such file or directory
ssv-node-1  | make: go: No such file or directory
ssv-node-1  | Build /go/bin/ssvnode
ssv-node-1  | Build /config/config.yaml
ssv-node-1  | Build 
ssv-node-1  | Command --config=/config/config.yaml
ssv-node-1  | Running node on address: *)
ssv-node-1  | 2024-02-21T15:11:36.958535Z       info    starting SSV-Node:v1.2.3-e5a6d711958f5043615bf8f6a95005a6083e714f
@ThomasBlock ThomasBlock added the bug Something isn't working label Feb 21, 2024
@ThomasBlock
Copy link
Author

update on this: threee of the systems work fine.
one setup with etdocker still makes problems:
ssv is rebooting altough execution and consensus client are totally fine.

ssv-node-1  | {"level":"info","time":"2024-03-26T13:17:45.292192Z","name":"execution_client","msg":"fetched registry events","from_block":19518848,"to_block":19518848,"target_block":19518848,"progress":"100.00%","events":0,"took":"5.556743ms"}
ssv-node-1  | {"level":"warn","time":"2024-03-26T13:17:45.601071Z","name":"Controller","msg":"failed to update validators metadata","error":"failed to get validator data from Beacon: failed to get validators data from beacon: failed to obtain validators: failed to obtain chunk: failed to request validators: failed to call GET endpoint: Get \"http://consensus:5052/eth/v1/beacon/states/head/validators?id=0x83d179a1f091fb06
.... 
context deadline exceeded\nfailed to get validators data from beacon\ngithub.com/bloxapp/ssv/protocol/v2/blockchain/beacon.FetchValidatorsMetadata\n\t/go/src/github.com/bloxapp/ssv/protocol/v2/blockchain/beacon/validator_metadata.go:113\ngithub.com/bloxapp/ssv/protocol/v2/blockchain/beacon.UpdateValidatorsMetadata\n\t/go/src/github.com/bloxapp/ssv/protocol/v2/blockchain/beacon/validator_metadata.go:71\ngithub.com/bloxapp/ssv/operator/validator.(*controller).UpdateValidatorMetaDataLoop\n\t/go/src/github.com/bloxapp/ssv/operator/validator/controller.go:858\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\nfailed to get validator data from Beacon\ngithub.com/bloxapp/ssv/protocol/v2/blockchain/beacon.UpdateValidatorsMetadata\n\t/go/src/github.com/bloxapp/ssv/protocol/v2/blockchain/beacon/validator_metadata.go:73\ngithub.com/bloxapp/ssv/operator/validator.(*controller).UpdateValidatorMetaDataLoop\n\t/go/src/github.com/bloxapp/ssv/operator/validator/controller.go:858\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"}
ssv-node-1  | {"level":"info","time":"2024-03-26T13:18:07.316673Z","name":"P2PNetwork","msg":"Verified handshake nodeinfo","selfPeer":"16Uiu2HAmVd4pPhEMR5RnAboqftLzuqZHiZy1CzLpdRP9qbFsoWxh","peer_id":"16Uiu2HAm6bqkkkkKpnHqzgrxjmJ57mNCe9Ph4MN7LdhkPedKG77h","peer_id":"16Uiu2HAm6bqkkkkKpnHqzgrxjmJ57mNCe9Ph4MN7LdhkPedKG77h","metadata":{"NodeVersion":"v1.3.2-97d20e67d83cad1fd0d8d12ff179f7a9fe090daa","ExecutionNode":"","ConsensusNode":"","Subnets":"f5ffffffffbe3ebbdbf7fffbff766c6b"},"networkID":"0x00000000"}
ssv-node-1  | {"level":"info","time":"2024-03-26T13:18:07.852493Z","name":"execution_client","msg":"fetched registry events","from_block":19518849,"to_block":19518849,"target_block":19518849,"progress":"100.00%","events":0,"took":"660.656µs"}
ssv-node-1  | {"level":"error","time":"2024-03-26T13:18:07.874067Z","msg":"node is not healthy","node":"consensus client","error":"failed to obtain node syncing status: failed to call GET endpoint: Get \"http://consensus:5052/eth/v1/node/syncing\": context deadline exceeded"}
ssv-node-1  | {"level":"error","time":"2024-03-26T13:18:07.874151Z","msg":"not all nodes are healthy"}
ssv-node-1  | {"level":"fatal","time":"2024-03-26T13:18:07.874164Z","msg":"ethereum node(s) are either out of sync or down. Ensure the nodes are healthy to resume."}
This is Eth Docker v2.8.0.0

ssvnode version v1.3.2-97d20e67d83cad1fd0d8d12ff179f7a9fe090daa

beacon-chain version Prysm/v5.0.1/a1a81d1720a0a3b850992d4825d0a023baa8e65a. Built at: 2024-03-08 20:21:37+00:00

validator version Prysm/v5.0.1/a1a81d1720a0a3b850992d4825d0a023baa8e65a. Built at: 2024-03-08 20:22:56+00:00

besu/v24.3.0/linux-x86_64/openjdk-java-17

mev-boost v1.7.1

@ThomasBlock
Copy link
Author

ThomasBlock commented Jun 16, 2024

update: was okay for a long time. 6 of 7 nodes work fine. now problems with this setup.
10 reboots of ssv-node a day bring performance down to 86 % - all while consensus and execution client are fine..

image
Yellow = good node
Blue= bad node

ethd version
This is Eth Docker v2.9.2.0
ssvnode version v1.3.4-39046e4aa45ab4b2d8bd48af41d62bc5858c59ad
beacon-chain version Prysm/v5.0.3/38f208d70dc95b12c08403f5c72009aaa10dfe2f. Built at: 2024-04-04 18:29:14+00:00
2024-06-16 15-09-07.7946|Nethermind starting initialization.
2024-06-16 15-09-07.8395|Client version: Nethermind/v1.26.0+0068729c/linux-x64/dotnet8.0.4

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant