Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NewPayload Validation during header download #10093

Merged
merged 2 commits into from
Jun 5, 2024

Conversation

mininny
Copy link
Contributor

@mininny mininny commented Apr 27, 2024

This PR addresses an issue with the current implementation of the engine_newPayload request handling during the initial sync process.

Problem

The engine_newPayload request triggers validation of newly received payload, and during the initial sync process, Erigon downloads the block headers and blocks since more data is required for validation.

Inside this download logic is validating the chain data after it has finished downloading all the headers. When the node is initially downloading the headers, there is no validated blocks in the Erigon yet as it hasn't run the staged sync yet. Because of this, chain validation is skipped and mark the download process as complete.

This in turn marks the execution as synced and currently returns ValidStatus to the engine_newPayload request even though no validation has actually been done.

I think this behavior deviates from the engine API specification and incorrectly returns VALID even though Erigon doesn't have any synced blocks.

Proposed Change

This PR modifies the handleNewPayload function to additionally validate the chain. If this chain validation is failed(or skipped), it will return SYNCING as a result for the engine_newPayload. When the header download and staged sync is complete, only then will newPayload return VALID after actually validating the chain data.


This is my first time contributing to Erigon, and I'd appreciate any feedback for this kind of change. I'd also love to know if current logic is an effort for performance improvement or if there's other rationale for it.

@mininny mininny marked this pull request as ready for review April 29, 2024 16:22
@yperbasis yperbasis changed the base branch from release/2.60 to main May 1, 2024 07:24
turbo/engineapi/engine_server.go Outdated Show resolved Hide resolved
@mininny mininny force-pushed the feature/fix-engine-newpayload branch from 23c4a14 to 3ba0e19 Compare May 16, 2024 12:57
@AskAlexSharov
Copy link
Collaborator

@Giulio2002 what do you think?

@Giulio2002
Copy link
Collaborator

Validation occurs directly in the EngineDownloader class, this PR does not add new behaviour

@Giulio2002 Giulio2002 closed this May 27, 2024
@mininny
Copy link
Contributor Author

mininny commented May 27, 2024

@Giulio2002 @AskAlexSharov Thanks for looking at this! I agree that the block validation occurs in EngineDownloader so usually this isn't needed.

But when the downloaded block range is too large (ExecutionStatus_TooFarAway), it just skips the block validation and return synced. This is problematic because the newPayload finishes downloading the block headers/bodies and returns VALID to the consensus layer when only the block download has been finished. As far as I understand, the execution client should return VALID when the block is executed and validated (which is done in validateChain)

This came up when I was working on https://github.com/testinprod-io/op-erigon which is a fork of erigon to work with the optimism protocol. Returning VALID incorrectly informs the consensus layer of optimism of finalized block in execution client, so having this merged will make execution layer return SYNCING when the staged sync is yet to run.

Another solution for this is to always return SYNCING after downloading block headers since the next invocation of newPayload will insert the block and return VALID

@Giulio2002 Giulio2002 reopened this May 28, 2024
@Giulio2002
Copy link
Collaborator

Actually, you are right. apologies

@Giulio2002 Giulio2002 enabled auto-merge (squash) May 28, 2024 23:20
@mininny
Copy link
Contributor Author

mininny commented Jun 4, 2024

Hi @AskAlexSharov @Giulio2002 , do you guys have any more comments on this? Would it be okay to merge this please? 😄

Also, i wonder if it would be possible to get this merged into release/v2.60 as well? I'm unsure if this would qualify as a critical bug, but having this released in the next release v2.60.2 would allow us to have this fixed in our fork as well :)

@Giulio2002
Copy link
Collaborator

Hey, auto-merge was enabled but your CI failed. let me rerun it

@awskii awskii dismissed AskAlexSharov’s stale review June 5, 2024 07:53

prohibits approved merge

@Giulio2002 Giulio2002 merged commit 45f4a63 into ledgerwatch:main Jun 5, 2024
17 of 18 checks passed
@taratorio taratorio added this to the 2.60.2-fixes milestone Jun 6, 2024
yperbasis pushed a commit that referenced this pull request Jun 21, 2024
This PR addresses an issue with the current implementation of the
`engine_newPayload` request handling during the initial sync process.

## Problem
The `engine_newPayload` request triggers validation of newly received
payload, and during the initial sync process, Erigon downloads the block
headers and blocks since more data is required for validation.

Inside this download logic is validating the chain data after it has
finished downloading all the headers. When the node is initially
downloading the headers, there is no validated blocks in the Erigon yet
as it hasn't run the staged sync yet. Because of this, chain validation
is skipped and mark the [download process as
complete](https://github.com/ledgerwatch/erigon/blob/5d92302004b81af3ab95d0af31b0de63a6cb3ac7/turbo/engineapi/engine_block_downloader/core.go#L94-L98).

This in turn marks the execution as synced and [currently
returns](https://github.com/ledgerwatch/erigon/blob/ab0f6336a2cdf69d0a60d19b557085692ea938d5/turbo/engineapi/engine_server.go#L757-L771)
`ValidStatus` to the `engine_newPayload` request even though no
validation has actually been done.

I think this behavior deviates from the engine API specification and
incorrectly returns VALID even though Erigon doesn't have any synced
blocks.

## Proposed Change
This PR modifies the `handleNewPayload` function to additionally
validate the chain. If this chain validation is failed(or skipped), it
will return `SYNCING` as a result for the `engine_newPayload`. When the
header download and staged sync is complete, only then will `newPayload`
return `VALID` after actually validating the chain data.

---

This is my first time contributing to Erigon, and I'd appreciate any
feedback for this kind of change. I'd also love to know if current logic
is an effort for performance improvement or if there's other rationale
for it.
yperbasis added a commit that referenced this pull request Jun 21, 2024
Cherry pick PR #10093 into the release branch

Co-authored-by: Minhyuk Kim <kimminhyuk1004@gmail.com>
blxdyx added a commit to node-real/bsc-erigon that referenced this pull request Jun 27, 2024
* fix Consensus specification tests CI (ledgerwatch#10391) (ledgerwatch#10396)

Cherry-pick:
ledgerwatch@bc5fa6f

Need this to get PR CI green for v2.60.1 patches, e.g. -
ledgerwatch#10390

Co-authored-by: Andrew Ashikhmin <34320705+yperbasis@users.noreply.github.com>

* rpc/handler: do not append null to stream when json may be valid (ledgerwatch#10390)

Cherry-pick:
ledgerwatch@4d1c954
Relates to: ledgerwatch#10376

* Fixed Bor Log appearing on Ethereum Mainnet (ledgerwatch#10405) (ledgerwatch#10420)

Cherry-pick:
ledgerwatch@be889f6

Co-authored-by: Giulio rebuffo <giulio.rebuffo@gmail.com>

* fix gas price not right problem (ledgerwatch#10456)

Cherry pick PR ledgerwatch#10451 into the release branch

Co-authored-by: mars <marshalys@gmail.com>

* eth_estimateGas: default feeCap to base fee (ledgerwatch#10499)

Copy PR ledgerwatch#10495 into the release branch

* Add flag for bor waypoint types (ledgerwatch#10501)

Cherry pick PR ledgerwatch#10281 into the release branch

Co-authored-by: Mark Holt <135143369+mh0lt@users.noreply.github.com>
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>

* try to fix 'method handler crashed' for debug_traceCall of ledgerwatch#9090 (ledgerwatch#10502)

Cherry pick PR ledgerwatch#10401 into the release branch

Co-authored-by: mars <marshalys@gmail.com>

* diagnostics: cherry pick speedtest disable (ledgerwatch#10509)

Cherry pick PR ledgerwatch#10449 into the release branch

* Enable DNS p2p discovery on holesky (ledgerwatch#10507)

Cherry pick PR ledgerwatch#10460 into the release branch

Co-authored-by: Willian Mitsuda <wmitsuda@gmail.com>

* fix eth_call 'method handler crashed' error when tx has set maxFeePerBlobGas (ledgerwatch#10506)

Cherry pick PR ledgerwatch#10452 into the release branch

Co-authored-by: mars <marshalys@gmail.com>

* e2: remove overlapped files only after merge (ledgerwatch#10487)

Otherwise: if start after `kill -9` in the middle of merge - may remove
small files of 1 type of file, but leave small files of another type of
files (which merge was not finished) - and leave node in un-mergable
state: ledgerwatch#10485

---------

Co-authored-by: awskii <awskii@users.noreply.github.com>

* add flag checking for pruning waypoints (ledgerwatch#10508)

Cherry pick PR ledgerwatch#10468 into the release branch

Co-authored-by: Mark Holt <135143369+mh0lt@users.noreply.github.com>

* p2p/sentry: sentry doesn't start with ErrNoHead (ledgerwatch#10454) (ledgerwatch#10523)

cherry-pick ledgerwatch#10494 to
release/2.60

* add lock to purgeMilestoneIDsList (ledgerwatch#10524)

Cherry pick PR ledgerwatch#10493 into the release branch

Co-authored-by: Mark Holt <135143369+mh0lt@users.noreply.github.com>

* polygon/heimdall: fix checkpoint json marshalling (ledgerwatch#10530)

Fixes a recent regression causing unwinds due to checkpoints having zero
root hash:
```
[WARN] [05-18|23:58:54.662] [bor] Root hash mismatch while whitelisting checkpoint expected=ac1c57270479250af3ce8eee90075cd8b2ba1bac55353105e063d9a4c87c743e got=0000000000000000000000000000000000000000000000000000000000000000
[WARN] [05-18|23:58:54.662] [bor] Rewinding chain due to checkpoint root hash mismatch number=57125727
```

Note this has already been fixed on Erigon 3 branch but as part of a
non-related PR -
https://github.com/ledgerwatch/erigon/pull/10124/files#diff-47d4532f399f2d6a45e6f19944a45c80bac573b4d1b5cb51485d0254229d1b16

* Fix capacity for immediate appends (ledgerwatch#10539)

Cherry pick PR ledgerwatch#10528 into the release branch

Co-authored-by: Shoham Chakraborty <shhmchk@gmail.com>

* core/vm: set tracer-observable value of a delegatecall to match parent value (ledgerwatch#10370)

requested by ledgerwatch#9549

port of ethereum/go-ethereum#26632

* params: version 2.60.1 (ledgerwatch#10555)

* blobGasPrice should be marshalled as hex (ledgerwatch#10571)

Cherry pick PR ledgerwatch#10551 into the release branch

* Caplin: Fixed reforwarding of Bls Execution changes (ledgerwatch#10577)

Cherry pick PR ledgerwatch#10546 into the release branch

Co-authored-by: Giulio rebuffo <giulio.rebuffo@gmail.com>

* Caplin: Proper "Normalization" of length of ForkVersions to 8 hex characters (ledgerwatch#10578)

Cherry pick PR ledgerwatch#10512 into the release branch

Co-authored-by: Giulio rebuffo <giulio.rebuffo@gmail.com>

* Caplin: Update BlobSidecars Beacon API endpoint to the latest specs (ledgerwatch#10580)

Cherry pick PR ledgerwatch#10576 into the release branch

Co-authored-by: Giulio rebuffo <giulio.rebuffo@gmail.com>

* bor blocks retire: infinity loop fix (ledgerwatch#10596)

Problem: `+1` was added to maxBlockNum instead of minBlockNum
for: ledgerwatch#10554

* txpool: EIP-3860 should only apply to create transactions (ledgerwatch#10609)

This fixes Issue ledgerwatch#10607

* qa-tests: update 2.60.x test workflows from main (ledgerwatch#10627)

* Fix potential p2p shutdown hangup (ledgerwatch#10626)

This is a fix for: 

ledgerwatch#10192

This fixes is a deadlock in v4_udp.go where 
* Thread A waits on mutex.Lock() in resetTimeout() called after reading
listUpdate channel.
* Thread B waits on listUpdate <- plist.PushBack(p) called after locking
mutex.Lock()
  
This fix decouples the list operations which need locking from the
channel operations which don't by storing the changes in local
variables. These updates are used for resetting a timeout - which is not
order dependent.

* downloader: Number of DNS requests seem excessive (ledgerwatch#5145) (ledgerwatch#10739)

cherry-pick ledgerwatch#10693 to release

* rpc: Fix incorrect txfeecap (ledgerwatch#10643)

Cherry pick PR ledgerwatch#10636 to
Erigon 2

* downloader: don't block erigon startup if devs deploy new hashes (of same files) (ledgerwatch#10761)

* skip hidden files when list files with given extension  (ledgerwatch#10654)

for ledgerwatch#10644

* qa-tests: backport to release/2.60 improvements made to e3 github action workflows (ledgerwatch#10778)

This PR backports improvements that we added to the E3 tests: recording
runner name and db version used for testing on MongoDB database.

* e2: more snaps (all networks) (ledgerwatch#10794)

* e2: configurable hashers amount (ledgerwatch#10785)

* Revert "e2: configurable hashers amount" (ledgerwatch#10834)

* diagnostics: move E3 changes to E2 (ledgerwatch#10806)

Merged all the work done from main branch to keep diagnostics up to
date.

* Downloader: fix staticpeers flag (ledgerwatch#10798)

Cherry pick ledgerwatch#10792

* Fix NewPayload Validation during header download (ledgerwatch#10837)

Cherry pick PR ledgerwatch#10093 into the release branch

Co-authored-by: Minhyuk Kim <kimminhyuk1004@gmail.com>

* e2: mainnet blob 9.3M (ledgerwatch#10842)

* Fix gas fee calculation for debug calls (ledgerwatch#10880)

Cherry pick PR ledgerwatch#10825 into the release branch

Co-authored-by: Minhyuk Kim <kimminhyuk1004@gmail.com>

* Revert "eth_estimateGas: default feeCap to base fee (ledgerwatch#10499)" (ledgerwatch#10904)

This reverts PR ledgerwatch#10499. See
ledgerwatch#10495 (comment)
and PR ledgerwatch#10901

* params: version 2.60.2 (ledgerwatch#10905)

* upstream v2.60.2

* fix ci lint

---------

Co-authored-by: milen <94537774+taratorio@users.noreply.github.com>
Co-authored-by: Andrew Ashikhmin <34320705+yperbasis@users.noreply.github.com>
Co-authored-by: Giulio rebuffo <giulio.rebuffo@gmail.com>
Co-authored-by: mars <marshalys@gmail.com>
Co-authored-by: Mark Holt <135143369+mh0lt@users.noreply.github.com>
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
Co-authored-by: Dmytro <vovk.dimon@gmail.com>
Co-authored-by: Willian Mitsuda <wmitsuda@gmail.com>
Co-authored-by: awskii <awskii@users.noreply.github.com>
Co-authored-by: battlmonstr <battlmonstr@users.noreply.github.com>
Co-authored-by: Shoham Chakraborty <shhmchk@gmail.com>
Co-authored-by: Michelangelo Riccobene <michelangelo.riccobene@gmail.com>
Co-authored-by: Minhyuk Kim <kimminhyuk1004@gmail.com>
MakarovSg pushed a commit to chainstack/bsc-erigon that referenced this pull request Jul 2, 2024
* fix Consensus specification tests CI (ledgerwatch#10391) (ledgerwatch#10396)

Cherry-pick:
ledgerwatch@bc5fa6f

Need this to get PR CI green for v2.60.1 patches, e.g. -
ledgerwatch#10390

Co-authored-by: Andrew Ashikhmin <34320705+yperbasis@users.noreply.github.com>

* rpc/handler: do not append null to stream when json may be valid (ledgerwatch#10390)

Cherry-pick:
ledgerwatch@4d1c954
Relates to: ledgerwatch#10376

* Fixed Bor Log appearing on Ethereum Mainnet (ledgerwatch#10405) (ledgerwatch#10420)

Cherry-pick:
ledgerwatch@be889f6

Co-authored-by: Giulio rebuffo <giulio.rebuffo@gmail.com>

* fix gas price not right problem (ledgerwatch#10456)

Cherry pick PR ledgerwatch#10451 into the release branch

Co-authored-by: mars <marshalys@gmail.com>

* eth_estimateGas: default feeCap to base fee (ledgerwatch#10499)

Copy PR ledgerwatch#10495 into the release branch

* Add flag for bor waypoint types (ledgerwatch#10501)

Cherry pick PR ledgerwatch#10281 into the release branch

Co-authored-by: Mark Holt <135143369+mh0lt@users.noreply.github.com>
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>

* try to fix 'method handler crashed' for debug_traceCall of ledgerwatch#9090 (ledgerwatch#10502)

Cherry pick PR ledgerwatch#10401 into the release branch

Co-authored-by: mars <marshalys@gmail.com>

* diagnostics: cherry pick speedtest disable (ledgerwatch#10509)

Cherry pick PR ledgerwatch#10449 into the release branch

* Enable DNS p2p discovery on holesky (ledgerwatch#10507)

Cherry pick PR ledgerwatch#10460 into the release branch

Co-authored-by: Willian Mitsuda <wmitsuda@gmail.com>

* fix eth_call 'method handler crashed' error when tx has set maxFeePerBlobGas (ledgerwatch#10506)

Cherry pick PR ledgerwatch#10452 into the release branch

Co-authored-by: mars <marshalys@gmail.com>

* e2: remove overlapped files only after merge (ledgerwatch#10487)

Otherwise: if start after `kill -9` in the middle of merge - may remove
small files of 1 type of file, but leave small files of another type of
files (which merge was not finished) - and leave node in un-mergable
state: ledgerwatch#10485

---------

Co-authored-by: awskii <awskii@users.noreply.github.com>

* add flag checking for pruning waypoints (ledgerwatch#10508)

Cherry pick PR ledgerwatch#10468 into the release branch

Co-authored-by: Mark Holt <135143369+mh0lt@users.noreply.github.com>

* p2p/sentry: sentry doesn't start with ErrNoHead (ledgerwatch#10454) (ledgerwatch#10523)

cherry-pick ledgerwatch#10494 to
release/2.60

* add lock to purgeMilestoneIDsList (ledgerwatch#10524)

Cherry pick PR ledgerwatch#10493 into the release branch

Co-authored-by: Mark Holt <135143369+mh0lt@users.noreply.github.com>

* polygon/heimdall: fix checkpoint json marshalling (ledgerwatch#10530)

Fixes a recent regression causing unwinds due to checkpoints having zero
root hash:
```
[WARN] [05-18|23:58:54.662] [bor] Root hash mismatch while whitelisting checkpoint expected=ac1c57270479250af3ce8eee90075cd8b2ba1bac55353105e063d9a4c87c743e got=0000000000000000000000000000000000000000000000000000000000000000
[WARN] [05-18|23:58:54.662] [bor] Rewinding chain due to checkpoint root hash mismatch number=57125727
```

Note this has already been fixed on Erigon 3 branch but as part of a
non-related PR -
https://github.com/ledgerwatch/erigon/pull/10124/files#diff-47d4532f399f2d6a45e6f19944a45c80bac573b4d1b5cb51485d0254229d1b16

* Fix capacity for immediate appends (ledgerwatch#10539)

Cherry pick PR ledgerwatch#10528 into the release branch

Co-authored-by: Shoham Chakraborty <shhmchk@gmail.com>

* core/vm: set tracer-observable value of a delegatecall to match parent value (ledgerwatch#10370)

requested by ledgerwatch#9549

port of ethereum/go-ethereum#26632

* params: version 2.60.1 (ledgerwatch#10555)

* blobGasPrice should be marshalled as hex (ledgerwatch#10571)

Cherry pick PR ledgerwatch#10551 into the release branch

* Caplin: Fixed reforwarding of Bls Execution changes (ledgerwatch#10577)

Cherry pick PR ledgerwatch#10546 into the release branch

Co-authored-by: Giulio rebuffo <giulio.rebuffo@gmail.com>

* Caplin: Proper "Normalization" of length of ForkVersions to 8 hex characters (ledgerwatch#10578)

Cherry pick PR ledgerwatch#10512 into the release branch

Co-authored-by: Giulio rebuffo <giulio.rebuffo@gmail.com>

* Caplin: Update BlobSidecars Beacon API endpoint to the latest specs (ledgerwatch#10580)

Cherry pick PR ledgerwatch#10576 into the release branch

Co-authored-by: Giulio rebuffo <giulio.rebuffo@gmail.com>

* bor blocks retire: infinity loop fix (ledgerwatch#10596)

Problem: `+1` was added to maxBlockNum instead of minBlockNum
for: ledgerwatch#10554

* txpool: EIP-3860 should only apply to create transactions (ledgerwatch#10609)

This fixes Issue ledgerwatch#10607

* qa-tests: update 2.60.x test workflows from main (ledgerwatch#10627)

* Fix potential p2p shutdown hangup (ledgerwatch#10626)

This is a fix for: 

ledgerwatch#10192

This fixes is a deadlock in v4_udp.go where 
* Thread A waits on mutex.Lock() in resetTimeout() called after reading
listUpdate channel.
* Thread B waits on listUpdate <- plist.PushBack(p) called after locking
mutex.Lock()
  
This fix decouples the list operations which need locking from the
channel operations which don't by storing the changes in local
variables. These updates are used for resetting a timeout - which is not
order dependent.

* downloader: Number of DNS requests seem excessive (ledgerwatch#5145) (ledgerwatch#10739)

cherry-pick ledgerwatch#10693 to release

* rpc: Fix incorrect txfeecap (ledgerwatch#10643)

Cherry pick PR ledgerwatch#10636 to
Erigon 2

* downloader: don't block erigon startup if devs deploy new hashes (of same files) (ledgerwatch#10761)

* skip hidden files when list files with given extension  (ledgerwatch#10654)

for ledgerwatch#10644

* qa-tests: backport to release/2.60 improvements made to e3 github action workflows (ledgerwatch#10778)

This PR backports improvements that we added to the E3 tests: recording
runner name and db version used for testing on MongoDB database.

* e2: more snaps (all networks) (ledgerwatch#10794)

* e2: configurable hashers amount (ledgerwatch#10785)

* Revert "e2: configurable hashers amount" (ledgerwatch#10834)

* diagnostics: move E3 changes to E2 (ledgerwatch#10806)

Merged all the work done from main branch to keep diagnostics up to
date.

* Downloader: fix staticpeers flag (ledgerwatch#10798)

Cherry pick ledgerwatch#10792

* Fix NewPayload Validation during header download (ledgerwatch#10837)

Cherry pick PR ledgerwatch#10093 into the release branch

Co-authored-by: Minhyuk Kim <kimminhyuk1004@gmail.com>

* e2: mainnet blob 9.3M (ledgerwatch#10842)

* Fix gas fee calculation for debug calls (ledgerwatch#10880)

Cherry pick PR ledgerwatch#10825 into the release branch

Co-authored-by: Minhyuk Kim <kimminhyuk1004@gmail.com>

* Revert "eth_estimateGas: default feeCap to base fee (ledgerwatch#10499)" (ledgerwatch#10904)

This reverts PR ledgerwatch#10499. See
ledgerwatch#10495 (comment)
and PR ledgerwatch#10901

* params: version 2.60.2 (ledgerwatch#10905)

* upstream v2.60.2

* fix ci lint

---------

Co-authored-by: milen <94537774+taratorio@users.noreply.github.com>
Co-authored-by: Andrew Ashikhmin <34320705+yperbasis@users.noreply.github.com>
Co-authored-by: Giulio rebuffo <giulio.rebuffo@gmail.com>
Co-authored-by: mars <marshalys@gmail.com>
Co-authored-by: Mark Holt <135143369+mh0lt@users.noreply.github.com>
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
Co-authored-by: Dmytro <vovk.dimon@gmail.com>
Co-authored-by: Willian Mitsuda <wmitsuda@gmail.com>
Co-authored-by: awskii <awskii@users.noreply.github.com>
Co-authored-by: battlmonstr <battlmonstr@users.noreply.github.com>
Co-authored-by: Shoham Chakraborty <shhmchk@gmail.com>
Co-authored-by: Michelangelo Riccobene <michelangelo.riccobene@gmail.com>
Co-authored-by: Minhyuk Kim <kimminhyuk1004@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants