Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc-merge-devnet-1 TBD-issue Jul 3 #9

Closed
dapplion opened this issue Jul 5, 2022 · 5 comments
Closed

gc-merge-devnet-1 TBD-issue Jul 3 #9

dapplion opened this issue Jul 5, 2022 · 5 comments

Comments

@dapplion
Copy link
Member

dapplion commented Jul 5, 2022


Nethermind (user list) and Lighthouse (user list) devs can ssh to hosts (ip list here) ssh devops@<ip>. Same as Pari's deployments


  • gc-merge-devnet-1 was launched at Jun 30 2022 12:05:00 GMT+0000 Add gc-merge-devnet-1 #7. 10 validating clients, diversity on CL; EL all nethermind, each has 1000 keys, network of 10000 validators total. The first two nodes are POSDAO validators.
  • TTD was reached at Jun 30 2022 13:56:30 GMT+0000
  • At Jul 2 2022 20:00:00 GMT+0000 participation dropped to 90%, as lodestar-nethermind-1 started to experience issues
  • At Jul 4 2022 15:00:00 GMT+0000 participation dropped to 60%. Some nodes CL or/and EL stopped progressing and lost all peers

Screenshot from 2022-07-05 13-21-01
Screenshot from 2022-07-05 13-21-35
Screenshot from 2022-07-05 13-21-50
Screenshot from 2022-07-05 13-22-11
Screenshot from 2022-07-05 13-22-24


NOTE: Currently digging in the logs, will post updates

@dapplion dapplion changed the title gc-merge-devnet-1 0% participation rate Jul 3 gc-merge-devnet-1 TBD-issue Jul 3 Jul 5, 2022
@dapplion
Copy link
Member Author

dapplion commented Jul 6, 2022

lodestar-nethermind-1 was the first host to experience issues. The beacon node (lodestar) stalled at ~ Jul 3 2022 6:00:00 GMT+0000

Screenshot from 2022-07-06 23-42-40
Screenshot from 2022-07-06 23-42-54

Note Lodestar is mostly single thread, except for BLS sig verification

Screenshot from 2022-07-06 23-43-08
Screenshot from 2022-07-06 23-43-25

Something in system kernel is taking 50% of CPU so at Jul 3 2022 7:30:00 all cores are at 100% usage.

Screenshot from 2022-07-06 23-43-40

Nethermind appears to go offline as prometheus can't scrape it

Screenshot from 2022-07-06 23-44-09

From logs some authentication error started happening at Jul-02 12:36:36 repeating every ~5min at semi-random intervals until ~ Jul-03 3:00:00. After that authentication errors started happen with more frequency, including for produceBlockV2

Jul-02 12:36:36.029[CHAIN]           error: Error pushing notifyForkchoiceUpdate() headBlockHash=0xd351b6dd86228d902ddddff2e0b4a79c84c25256a048395c933c554937e30a38, finalizedBlockHash=0x0008ffc99a1dd3685
a7d330449f8534ef96ab44bcfcaaef7017cf4d9a993abba JSON RPC error: Authentication error, engine_forkchoiceUpdatedV1
...
Jul-03 03:01:21.877[API]             error: Req req-lov4 produceBlockV2 error  JSON RPC error: Authentication error, engine_forkchoiceUpdatedV1

Also, other errors appear indicating an inconsistent communication and / or time between beacon - vc

Jul-03 04:41:12.692[API]             error: Req req-m1vg getProposerDuties error  Proposer duties for epoch 2906 not supported, current epoch 2907
Jul-03 04:41:22.024[API]             error: Req req-m1wg getAggregatedAttestation error  No attestation for slot=46514 dataRoot=0xc86fc2f724c1933efc0848406190e80441548648a09ade8469b8342ebb2d2590

Some error with execution communcation?

Jul-03 04:43:36.306[API]             error: Req req-m21v produceBlockV2 error  Received invalid payloadId=null
Jul-03 05:19:55.814[SYNC]            error: Unexpect error - processBlock  Non-base58btc character
Jul-03 05:30:18.188[CHAIN]           error: Error pushing notifyForkchoiceUpdate() headBlockHash=0xb9b9a081e5e9899b067c48781bcf77ca3fbddb5444fcd61bf9aedcfa525743e4, finalizedBlockHash=0x4a250a8e2804d0a30
1b62d4e926429d9bf511794713c7bd0cc7d9f42eb0d201f Timeout request
Jul-03 05:33:20.199[CHAIN]           error: Block error slot=47114, errCode=BLOCK_ERROR_EXECUTION_ERROR code=BLOCK_ERROR_EXECUTION_ERROR, execStatus=ELERROR, errorMessage=JSON RPC error: Authentication e
rror, engine_newPayloadV1
Jul-03 05:37:53.951[CHAIN]           error: Block error slot=47157, errCode=BLOCK_ERROR_EXECUTION_ERROR code=BLOCK_ERROR_EXECUTION_ERROR, execStatus=UNAVAILABLE, errorMessage=Timeout request
Jul-03 05:38:32.289[CHAIN]           error: Block error slot=47162, errCode=BLOCK_ERROR_EXECUTION_ERROR code=BLOCK_ERROR_EXECUTION_ERROR, execStatus=UNAVAILABLE, errorMessage=Timeout request
Jul-03 05:38:40.409[CHAIN]           error: Block error slot=47162, errCode=BLOCK_ERROR_EXECUTION_ERROR code=BLOCK_ERROR_EXECUTION_ERROR, execStatus=ELERROR, errorMessage=JSON RPC error: Authentication e
rror, engine_newPayloadV1

At some point the beacon node is unable to process any blocks due to timeouts

Jul-03 06:00:14.122[CHAIN]           error: Block error slot=47251, errCode=BLOCK_ERROR_EXECUTION_ERROR code=BLOCK_ERROR_EXECUTION_ERROR, execStatus=UNAVAILABLE, errorMessage=Timeout request
Jul-03 06:00:26.606[CHAIN]           error: Block error slot=47251, errCode=BLOCK_ERROR_EXECUTION_ERROR code=BLOCK_ERROR_EXECUTION_ERROR, execStatus=UNAVAILABLE, errorMessage=Timeout request
Jul-03 06:00:39.063[CHAIN]           error: Block error slot=47251, errCode=BLOCK_ERROR_EXECUTION_ERROR code=BLOCK_ERROR_EXECUTION_ERROR, execStatus=UNAVAILABLE, errorMessage=Timeout request
Jul-03 06:00:51.350[CHAIN]           error: Block error slot=47251, errCode=BLOCK_ERROR_EXECUTION_ERROR code=BLOCK_ERROR_EXECUTION_ERROR, execStatus=UNAVAILABLE, errorMessage=Timeout request
Jul-03 06:01:04.124[CHAIN]           error: Block error slot=47251, errCode=BLOCK_ERROR_EXECUTION_ERROR code=BLOCK_ERROR_EXECUTION_ERROR, execStatus=UNAVAILABLE, errorMessage=Timeout request
Jul-03 06:01:16.561[CHAIN]           error: Block error slot=47251, errCode=BLOCK_ERROR_EXECUTION_ERROR code=BLOCK_ERROR_EXECUTION_ERROR, execStatus=UNAVAILABLE, errorMessage=Timeout request

And latter Lodestar gets stuck trying to execute block 47316

Jul-03 06:34:47.195[CHAIN]           error: Block error slot=47316, errCode=BLOCK_ERROR_EXECUTION_ERROR code=BLOCK_ERROR_EXECUTION_ERROR, execStatus=INVALID, errorMessage=Block 47941 (0x8fcc12fc15a72e45f
e3a2b909bf867542a27fafd9394be540f95be3c7e136c3e) is known to be a part of an invalid chain.
Jul-03 06:34:58.874[CHAIN]           error: Block error slot=47316, errCode=BLOCK_ERROR_EXECUTION_ERROR code=BLOCK_ERROR_EXECUTION_ERROR, execStatus=INVALID, errorMessage=Block 47941 (0x8fcc12fc15a72e45f
e3a2b909bf867542a27fafd9394be540f95be3c7e136c3e) is known to be a part of an invalid chain.
Jul-03 06:35:01.544[CHAIN]           error: Block error slot=47316, errCode=BLOCK_ERROR_EXECUTION_ERROR code=BLOCK_ERROR_EXECUTION_ERROR, execStatus=INVALID, errorMessage=Block 47941 (0x8fcc12fc15a72e45f
e3a2b909bf867542a27fafd9394be540f95be3c7e136c3e) is known to be a part of an invalid chain.
Jul-03 06:35:08.373[CHAIN]           error: Block error slot=47316, errCode=BLOCK_ERROR_EXECUTION_ERROR code=BLOCK_ERROR_EXECUTION_ERROR, execStatus=INVALID, errorMessage=Block 47941 (0x8fcc12fc15a72e45f
e3a2b909bf867542a27fafd9394be540f95be3c7e136c3e) is known to be a part of an invalid chain.
Jul-03 06:35:13.809[CHAIN]           error: Block error slot=47316, errCode=BLOCK_ERROR_EXECUTION_ERROR code=BLOCK_ERROR_EXECUTION_ERROR, execStatus=INVALID, errorMessage=Block 47941 (0x8fcc12fc15a72e45f
e3a2b909bf867542a27fafd9394be540f95be3c7e136c3e) is known to be a part of an invalid chain.

@dapplion
Copy link
Member Author

dapplion commented Jul 6, 2022

lodestar-nethermind-1 validator performance clearly suffered due to the communication issues with EL and incapacity to follow the head. This cause the participation drop observed in the network.

  • But why did the host performance started to degrade in the first place?

validator performance summary
Screenshot from 2022-07-06 23-51-57

Receive + import block delay shows a constant progressive degradation in block validation times
Screenshot from 2022-07-06 23-56-50

State caches appear stable in size, but somehow the rate of requests increases linearly with time. This is not normal
Screenshot from 2022-07-07 00-03-39

@dapplion
Copy link
Member Author

dapplion commented Jul 7, 2022

@michaelsproul lighthouse-nethermind-3 logs around the incident

Recurring logs that happen extremely often after ~ Jul-03 00:00:00.

Jul 03 15:42:30.547 ERRO Failure verifying sync committee signature for gossip, validator_index: 2872, slot: 54449, request_index: 0, error: PastSlot { message_slot: Slot(54449), earliest_permissible_slot: Slot(
54450) }
...
Jul 03 18:19:51.079 ERRO Failure verifying signed contribution and proof, contribution_slot: 56337, subcommittee_index: 1, aggregator_index: 2222, request_index: 2, error: PastSlot { message_slot: Slot(56337), e
arliest_permissible_slot: Slot(56338) }

Some notable logs I found. For temporal reference lighthouse-nethermind-3 stalls at Jul-04 15:00:00.

Screenshot from 2022-07-07 16-45-07

Jul 04 14:47:25.918 DEBG Execution engine call failed            id: http://127.0.0.1:8560/, error: Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8560), path: "/", query: None, fragment: None }, source: TimedOut }), service: exec
Jul 04 14:47:25.919 ERRO Unable to get transition config         execution_endpoint: 0, error: Api { id: "http://127.0.0.1:8560/", error: Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8560), path: "/", query: None, fragment: None }, source: TimedOut }) }, service: exec
Jul 04 14:47:25.919 ERRO Failed to check transition config       error: EngineErrors([Api { id: "http://127.0.0.1:8560/", error: Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8560), path: "/", query: None, fragment: None }, source: TimedOut }) }]), service: exec
Jul 04 14:48:26.421 DEBG Execution engine call failed            id: http://127.0.0.1:8560/, error: Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8560), path: "/", query: None, fragment: None }, source: TimedOut }), service: exec
Jul 04 14:48:26.421 ERRO Unable to get transition config         execution_endpoint: 0, error: Api { id: "http://127.0.0.1:8560/", error: Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8560), path: "/", query: None, fragment: None }, source: TimedOut }) }, service: exec
Jul 04 14:48:26.421 ERRO Failed to check transition config       error: EngineErrors([Api { id: "http://127.0.0.1:8560/", error: Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8560), path: "/", query: None, fragment: None }, source: TimedOut }) }]), service: exec
Jul 04 14:58:30.834 WARN Execution engine offline                id: http://127.0.0.1:8560/, error: Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8560), path: "/", query: None, fragment: None }, source: TimedOut }), service: exec
Jul 04 14:58:30.835 WARN Error whilst processing payload status  error: Offline { id: "http://127.0.0.1:8560/" }, service: exec
Jul 04 14:58:30.836 WARN Peer sent invalid block in single block lookup, peer_id: 16Uiu2HAm9yXeivvqHTxXdChDeh7XJgsRGzisRkSTZEZVYvmLfXF4, error: ExecutionPayloadError(RequestFailed(EngineErrors([Offline { id: "http://127.0.0.1:8560/" }]))), root: 0xe28d…a9fe, service: sync
...
Jul 04 15:04:37.892 ERRO Failure verifying attestation for gossip, attestation_slot: 71275, committee_index: 2, request_index: 0, error: InvalidSignature
Jul 04 15:04:37.895 ERRO Failure verifying attestation for gossip, attestation_slot: 71275, committee_index: 1, request_index: 1, error: InvalidSignature

At this point it's just a wall of errors as the EL appears offline. @michaelsproul lighthouse seems to be downscoring peers as a result of the inn-ability to validate blocks with the EL?

Jul 04 15:08:34.345 ERRO Proposer prepare routine failed         error: ExecutionForkChoiceUpdateFailed(EngineErrors([Api { id: "http://127.0.0.1:8560/", error: Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8560), path: "/", query: None, fragment: None }, source: TimedOut }) }])), service: beacon
Jul 04 15:08:34.614 WARN Error whilst processing payload status  error: Offline { id: "http://127.0.0.1:8560/" }, service: exec
Jul 04 15:08:34.616 DEBG Batch processing failed                 service: sync, error: Peer sent invalid block. Reason: ExecutionPayloadError(RequestFailed(EngineErrors([Offline { id: "http://127.0.0.1:8560/" }]))), imported_blocks: 0, last_block_slot: 71312, chain: 7106048644422852136, first_block_slot: 71283, batch_epoch: 4455

As a control test I'm looking at all debug logs from Jul 04 14:58:00, as at Jul 04 14:58:30 the EL was offline.

The node seems to be processing gossip blocks fine. That means that the communication between CL-EL is okay right? There no explicit log about the EL responding ok.

Jul 04 14:58:00.217 DEBG Successfully processed gossip block     root: 0xd89b862273630a2240c28ef5a6f1d58effe1f9b7b8772cdf60abfa89fdcb2cf1, slot: 71196, graffiti: prysm-nethermind, service: beacon
Jul 04 14:58:00.217 INFO New block received                      root: 0xd89b862273630a2240c28ef5a6f1d58effe1f9b7b8772cdf60abfa89fdcb2cf1, slot: 71196
Jul 04 14:58:05.036 DEBG Successfully processed gossip block     root: 0x1731df27369a49784b7750ba3a709166ccf19190be37132c6ad2aa436d5608d7, slot: 71197, graffiti: teku-nethermind, service: beacon
Jul 04 14:58:05.036 INFO New block received                      root: 0x1731df27369a49784b7750ba3a709166ccf19190be37132c6ad2aa436d5608d7, slot: 71197
Jul 04 14:58:10.056 DEBG Successfully processed gossip block     root: 0x159f7958146cf4a7660bcfbb512bcc6af6a0afdbcea7a9ffad6849feb7f60d06, slot: 71198, graffiti: lighthouse-nethermind, service: beacon
Jul 04 14:58:10.057 INFO New block received                      root: 0x159f7958146cf4a7660bcfbb512bcc6af6a0afdbcea7a9ffad6849feb7f60d06, slot: 71198
Jul 04 14:58:15.049 DEBG Successfully processed gossip block     root: 0xfb479a84ef1e28ab411909c316374bfea018b7f64d5880a21f704031bcd86c67, slot: 71199, graffiti: teku-nethermind, service: beacon
Jul 04 14:58:15.049 INFO New block received                      root: 0xfb479a84ef1e28ab411909c316374bfea018b7f64d5880a21f704031bcd86c67, slot: 71199
Jul 04 14:58:20.272 DEBG Successfully processed gossip block     root: 0xa5178bec903c17e7d07a0eb5c8b247834758d168104c79219dcfe029062ff278, slot: 71200, graffiti: prysm-nethermind, service: beacon
Jul 04 14:58:20.273 INFO New block received                      root: 0xa5178bec903c17e7d07a0eb5c8b247834758d168104c79219dcfe029062ff278, slot: 71200
Jul 04 14:58:25.067 INFO New block received                      root: 0x44abe5fd2c745e0ee5105b94653476c61ad02271a1f05102a00fb5e4e0154766, slot: 71201
Jul 04 14:58:25.164 DEBG Searching for block                     block: 0x44ab…4766, peer_id: 16Uiu2HAmHDwrrDAESfotLpR7rQuUVgFju6bJEpPSE2TSmsJUYFiR, service: sync
Jul 04 14:58:28.100 DEBG Execution engine call failed            id: http://127.0.0.1:8560/, error: Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8560), path: "/", query: None, fragment: None }, source: TimedOut }), service: exec
Jul 04 14:58:28.100 ERRO Unable to get transition config         execution_endpoint: 0, error: Api { id: "http://127.0.0.1:8560/", error: Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8560), path: "/", query: None, fragment: None }, source: TimedOut }) }, service: exec
Jul 04 14:58:28.101 ERRO Failed to check transition config       error: EngineErrors([Api { id: "http://127.0.0.1:8560/", error: Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8560), path: "/", query: None, fragment: None }, source: TimedOut }) }]), service: exec
Jul 04 14:58:30.230 DEBG Successfully processed gossip block     root: 0xe28d7dbcd2acf7ea37e635d79243f86de8eb12d0bc6b5ec7a95e7fc76409a9fe, slot: 71202, graffiti: prysm-nethermind, service: beacon
Jul 04 14:58:30.230 INFO New block received                      root: 0xe28d7dbcd2acf7ea37e635d79243f86de8eb12d0bc6b5ec7a95e7fc76409a9fe, slot: 71202
Jul 04 14:58:30.445 DEBG Searching for block                     block: 0xe28d…a9fe, peer_id: 16Uiu2HAm9yXeivvqHTxXdChDeh7XJgsRGzisRkSTZEZVYvmLfXF4, service: sync
Jul 04 14:58:30.458 DEBG Searching for block                     block: 0xe28d…a9fe, peer_id: 16Uiu2HAm9yXeivvqHTxXdChDeh7XJgsRGzisRkSTZEZVYvmLfXF4, service: sync
Jul 04 14:58:30.485 DEBG Searching for block                     block: 0xe28d…a9fe, peer_id: 16Uiu2HAm9yXeivvqHTxXdChDeh7XJgsRGzisRkSTZEZVYvmLfXF4, service: sync
Jul 04 14:58:30.492 DEBG Searching for block                     block: 0xe28d…a9fe, peer_id: 16Uiu2HAm9yXeivvqHTxXdChDeh7XJgsRGzisRkSTZEZVYvmLfXF4, service: sync
Jul 04 14:58:30.834 WARN Execution engine offline                id: http://127.0.0.1:8560/, error: Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8560), path: "/", query: None, fragment: None }, source: TimedOut }), service: exec
Jul 04 14:58:30.835 CRIT No synced execution engines             service: exec
Jul 04 14:58:30.835 WARN Error whilst processing payload status  error: Offline { id: "http://127.0.0.1:8560/" }, service: exec
Jul 04 14:58:30.836 WARN Peer sent invalid block in single block lookup, peer_id: 16Uiu2HAm9yXeivvqHTxXdChDeh7XJgsRGzisRkSTZEZVYvmLfXF4, error: ExecutionPayloadError(RequestFailed(EngineErrors([Offline { id: "http://127.0.0.1:8560/" }]))), root: 0xe28d…a9fe, service: sync
Jul 04 14:58:30.836 DEBG Sync reporting peer                     action: Mid Tolerance Error, peer_id: 16Uiu2HAm9yXeivvqHTxXdChDeh7XJgsRGzisRkSTZEZVYvmLfXF4, service: sync
Jul 04 14:58:30.836 DEBG Peer score adjusted                     score: -10.38, peer_id: 16Uiu2HAm9yXeivvqHTxXdChDeh7XJgsRGzisRkSTZEZVYvmLfXF4, msg: single_block_failure, service: libp2p

@michaelsproul why does lighthouse go looking for a block Searching for block: 0xe28d…a9fe immediately making it head? New block received root: 0xe28d7dbcd2acf..

At the next slot, it reject the block at 71203 because of missing parent but it can request it's parent and process it this time.

Jul 04 14:58:35.187 DEBG Rejected gossip block                   slot: 71203, graffiti: prysm-nethermind, error: ParentUnknown(parent_root:0xe28d…a9fe), service: beacon
Jul 04 14:58:35.187 DEBG Unknown parent for gossip block         root: 0xdc493e6003617eaf00c5b36edc7b6041a90b992a01fdec331b5f3f5d088990d7
Jul 04 14:58:35.188 DEBG Requesting parent                       chain_hash: 0xdc49…90d7, request: parent, hash: 0xe28d…a9fe, downloading_peer: 16Uiu2HAmCpP9i47Nb7Z51f6KgUPtAA2RFerdEzxoJEHdRpR2HSNA, downloaded_blocks: 1, service: sync
Jul 04 14:58:35.486 DEBG Searching for block                     block: 0xdc49…90d7, peer_id: 16Uiu2HAmAA7XEykFxAXHzVLpMNuS192nDQ2mRnN2Hs9VCAyQuFEd, service: sync
Jul 04 14:58:35.534 DEBG Processing parent lookup                blocks: 1, chain_hash: 0xdc49…90d7
Jul 04 14:58:35.534 DEBG Parent lookup failed                    error: Block has an unknown parent: 0xe28d…a9fe
Jul 04 14:58:35.534 DEBG Parent chain processed                  result: Failed { imported_blocks: false, peer_action: Some(LowToleranceError) }, chain_hash: 0xdc49…90d7, service: sync
Jul 04 14:58:35.536 DEBG Sync reporting peer                     action: Low Tolerance Error, peer_id: 16Uiu2HAmCpP9i47Nb7Z51f6KgUPtAA2RFerdEzxoJEHdRpR2HSNA, service: sync
Jul 04 14:58:35.547 DEBG Peer score adjusted                     score: -9.94, peer_id: 16Uiu2HAmCpP9i47Nb7Z51f6KgUPtAA2RFerdEzxoJEHdRpR2HSNA, msg: parent_chain_failure, service: libp2p
Jul 04 14:58:35.563 DEBG Block is from a past failed chain. Dropping, block_slot: 71203, block_root: 0xdc493e6003617eaf00c5b36edc7b6041a90b992a01fdec331b5f3f5d088990d7, service: sync
Jul 04 14:58:35.565 DEBG Searching for block                     block: 0xdc49…90d7, peer_id: 16Uiu2HAmAA7XEykFxAXHzVLpMNuS192nDQ2mRnN2Hs9VCAyQuFEd, service: sync
Jul 04 14:58:35.578 DEBG Block is from a past failed chain. Dropping, block_slot: 71203, block_root: 0xdc493e6003617eaf00c5b36edc7b6041a90b992a01fdec331b5f3f5d088990d7, service: sync
Jul 04 14:58:35.587 DEBG Searching for block                     block: 0xdc49…90d7, peer_id: 16Uiu2HAmAA7XEykFxAXHzVLpMNuS192nDQ2mRnN2Hs9VCAyQuFEd, service: sync
Jul 04 14:58:35.606 DEBG Block is from a past failed chain. Dropping, block_slot: 71203, block_root: 0xdc493e6003617eaf00c5b36edc7b6041a90b992a01fdec331b5f3f5d088990d7, service: sync
Jul 04 14:58:35.608 DEBG Searching for block                     block: 0xdc49…90d7, peer_id: 16Uiu2HAmJiEJCx6vASFZqSC9ixQPXz4bnyyrDsvwvxiUipfQnbix, service: sync
Jul 04 14:58:35.617 DEBG Block is from a past failed chain. Dropping, block_slot: 71203, block_root: 0xdc493e6003617eaf00c5b36edc7b6041a90b992a01fdec331b5f3f5d088990d7, service: sync
Jul 04 14:58:35.904 INFO New RPC block received                  hash: 0xe28d…a9fe, slot: 71202
Jul 04 14:58:35.921 DEBG Head beacon block                       slot: 71202, root: 0xe28d7dbcd2acf7ea37e635d79243f86de8eb12d0bc6b5ec7a95e7fc76409a9fe, finalized_epoch: 4448, finalized_root: 0xde493a7cb5cc746d9e4a515b3351e464ef4d27b757c54ec0b6b50e61f21e9060, justified_epoch: 4449, justified_root: 0x7cc07984082018de2f1e54141012ddafd6cba4781f9ffd8b168b202bfeb5b0b4, service: beacon
Jul 04 14:58:35.929 DEBG Delayed head block                      set_as_head_delay: Some(24.184237ms), imported_delay: Some(2.469206008s), observed_delay: Some(3.435621541s), block_delay: 5.929011786s, slot: 71202, proposer_index: 7495, block_root: 0xe28d7dbcd2acf7ea37e635d79243f86de8eb12d0bc6b5ec7a95e7fc76409a9fe, service: beacon
....
Jul 04 14:58:37.502 DEBG Slot timer                              sync_state: Synced, current_slot: 71203, head_slot: 71202, head_block: 0xe28d…a9fe, finalized_epoch: 4448, finalized_root: 0xde49…9060, peers: 6, service: slot_notifier
Jul 04 14:58:37.503 INFO Synced                                  slot: 71203, block:    …  empty, epoch: 4450, finalized_epoch: 4448, finalized_root: 0xde49…9060, exec_hash: 0xc7ce…aa3c (verified), peers: 6, service: slot_notifier

Note that the snooper does not acknowledge any request at Jul 04 14:58:30.835 WARN Error whilst processing payload status error: Offline { id: "http://127.0.0.1:8560/" }, service: exec Did Lighthouse attempted to contact the EL?

Jul  4 14:58:16.448 2022 REQUEST engine_newPayloadV1
      "blockHash": "0xe8ed8f1a394209515b67804187ce0e4350f3984d6512c5fbc52323b81730299b",
      "blockNumber": "0x10f85",
Jul  4 14:58:21.328 2022 REQUEST engine_newPayloadV1
      "blockHash": "0x0e23074168480a760cd62a3415877ad72690e546e0d0f533c1ba5b212084712c",
      "blockNumber": "0x10f86",
Jul  4 14:58:25.207 2022 REQUEST engine_newPayloadV1
      "blockHash": "0x3ecf05dcdfea088788179e38078235f60993c524a8df73db6e445344bb3bec3a",
      "blockNumber": "0x10f87",
Jul  4 14:58:35.794 2022 REQUEST engine_newPayloadV1
      "blockHash": "0xc7ce98646e8649f92360b0d0adbb59828e4c7bd1da6c7e0f2290e4d3a4d8aa3c",
      "blockNumber": "0x10f88",
Jul  4 14:58:38.073 2022 REQUEST engine_newPayloadV1
      "blockHash": "0xc3f054ad052da97a970d0141f27bc616400d37f613a465e84b51fa070cd5bb24",
      "blockNumber": "0x10f89",

@michaelsproul
Copy link

lighthouse seems to be downscoring peers as a result of the inn-ability to validate blocks with the EL?

This was an issue in v2.3.0 but has since been fixed in this commit which is part of v2.3.2-rc.0: sigp/lighthouse@f428719. I'd suggest running the latest release (candidate) or even just unstable directly for these nets seeing as we're at the cutting edge.

That means that the communication between CL-EL is okay right? There no explicit log about the EL responding ok.

Yeah that's right, although we should maybe add a debug log

why does lighthouse go looking for a block Searching for block: 0xe28d…a9fe immediately making it head? New block received root: 0xe28d7dbcd2acf..

There's a mostly harmless race where we'll go looking for a block on RPC if we haven't seen it on gossip, at around the same time as it arrives on gossip. This is improved by a new PR we haven't merged yet: sigp/lighthouse#3317

Note that the snooper does not acknowledge any request at Jul 04 14:58:30.835 WARN Error whilst processing payload status error: Offline { id: "http://127.0.0.1:8560/" }, service: exec Did Lighthouse attempted to contact the EL?

Lighthouse caches the offline status, and will do an "upcheck" (hitting eth_syncing) before properly sending the request. If the upcheck fails then the actual request won't get sent. This is mostly a hangover from the redundant EE design, and we're working on simplifying it now that we only support a single EE. This is the tracking issue (which was relevant to the last devnet explosion too): sigp/lighthouse#3176

@dapplion
Copy link
Member Author

Closing stale issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants