Skip to content

engineapi: defer V3 payloadAttributes checks until head is VALID#20891

Merged
AskAlexSharov merged 6 commits intomainfrom
yperbasis/fcu-defer-v3-attrs
Apr 29, 2026
Merged

engineapi: defer V3 payloadAttributes checks until head is VALID#20891
AskAlexSharov merged 6 commits intomainfrom
yperbasis/fcu-defer-v3-attrs

Conversation

@yperbasis
Copy link
Copy Markdown
Member

@yperbasis yperbasis commented Apr 28, 2026

Summary

  • Hive engine-cancun / Invalid PayloadAttributes, Missing BeaconRoot, Syncing=True (Cancun) (test 376 of this run) was failing because Erigon validated parentBeaconBlockRoot before the SYNCING short-circuit and returned -38003 when the head was unknown.
  • Split validatePayloadAttributes into request-level (pre-FCU) and point-(8)-extension (post-FCU) halves so the spec-mandated ordering is preserved on both the SYNCING and VALID paths.
  • The fix that motivated engineapi: validate payloadAttributes before SYNCING short-circuit #20728 (hive engine-withdrawals / Empty Withdrawals (Paris)) still passes — its check (V2 wrong withdrawals presence) is in the pre-FCU group.

Why

cancun.md says fcuV3 extends point (8) of engine_forkchoiceUpdatedV1:

  1. Extend point (8) of the engine_forkchoiceUpdatedV1 specification by defining the following sequence of checks that MUST be run over payloadAttributes:
    1. payloadAttributes matches the PayloadAttributesV3 structure, return -38003: Invalid payload attributes on failure.
    2. payloadAttributes.timestamp does not fall within the time frame of the Cancun fork, return -38005: Unsupported fork on failure.
    3. payloadAttributes.timestamp is greater than timestamp of a block referenced by forkchoiceState.headBlockHash, return -38003: Invalid payload attributes on failure.

…and point (8) of V1 (paris.md) gates the entire processing flow on the head being VALID:

  1. Client software MUST process provided payloadAttributes after successfully applying the forkchoiceState and only if the payload referenced by forkchoiceState.headBlockHash is VALID.

So with an unknown head, point (8) doesn't fire and the response is the SYNCING one in point (9). The Syncing=True family of hive cancun tests pin exactly this.

The pre-FCU "wrong version of structure" rule is different — it lives in the V2 Request section and has no precondition on head state, which is why #20728 correctly moved it before the short-circuit.

What changed

  • validatePayloadAttributesPreFCU keeps the request-level checks: parentBeaconBlockRoot on V1/V2, V1/V2 fcu at a Cancun timestamp, withdrawals presence vs Shanghai. Runs before the SYNCING short-circuit.
  • validatePayloadAttributesPostFCU runs the V3 point-(8) extensions: parentBeaconBlockRoot != null for V3+, Cancun-window for V3+, SlotNumber != null for V4+. Runs only when status.Status == ValidStatus.
  • Two regression tests in testing_api_test.go:
    • TestForkchoiceUpdatedV3DefersAttributesValidationWhenSyncing — pins the hive Syncing=True scenario (no error, status=SYNCING).
    • TestForkchoiceUpdatedV3RejectsMissingBeaconRootWhenValid — pins the spec-mandated -38003 when the head is VALID so the deferral can't regress into a permanent swallow.

Hive engine-cancun "Invalid PayloadAttributes, Missing BeaconRoot,
Syncing=True (Cancun)" was failing because Erigon validated
parentBeaconBlockRoot before the SYNCING short-circuit and returned
-38003 when the head was unknown.

Per cancun.md, fcuV3 "Extends point (8) of engine_forkchoiceUpdatedV1"
with the structure / fork-window / timestamp checks. Point (8) of V1
explicitly fires "only if the payload referenced by
forkchoiceState.headBlockHash is VALID", so when the head is unknown
those checks must NOT run and the response is the SYNCING one defined
in point (9). The "wrong version of structure" rule that motivated
#20728 lives in the V2 Request section instead, and remains
head-state-independent — that path still needs to return -38003 even
while syncing.

Split validatePayloadAttributes into:
- validatePayloadAttributesPreFCU: V1/V2 wrong-version checks
  (parentBeaconBlockRoot on V1/V2, V1/V2 fcu at Cancun timestamp,
  withdrawals presence vs Shanghai). Runs before the SYNCING
  short-circuit.
- validatePayloadAttributesPostFCU: V3 "Extend point (8)" checks
  (parentBeaconBlockRoot != null, Cancun-window, V4 SlotNumber).
  Runs only when status.Status == ValidStatus.

Two regression tests added:
- TestForkchoiceUpdatedV3DefersAttributesValidationWhenSyncing pins the
  hive engine-cancun "Syncing=True" scenario.
- TestForkchoiceUpdatedV3RejectsMissingBeaconRootWhenValid pins the
  spec-mandated -38003 when the head is VALID, so a future change
  cannot regress it into a permanent SYNCING swallow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yperbasis yperbasis requested a review from mh0lt as a code owner April 28, 2026 19:06
The two STEEL/Geth secondary-client failures and the "Blob Transaction
Ordering, Multiple Clients" parallelism flake that previously needed an
allowance of 3 are no longer expected. Tighten the gate to 0 so any new
regression in engine-cancun fails CI loudly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts Engine API engine_forkchoiceUpdatedV3+ payload attribute validation to match the spec ordering: only run the V3 “extend point (8)” checks (e.g., parentBeaconBlockRoot != null) once the FCU head is VALID, while still enforcing request-level “wrong version of structure” checks before the SYNCING short-circuit.

Changes:

  • Split payload attributes validation into pre-FCU (request-level) vs post-FCU (point-(8)-extension) phases and update forkchoiceUpdated to call them in spec order.
  • Add regression tests covering (a) deferral on SYNCING and (b) rejection on VALID for missing parentBeaconBlockRoot.
  • Tighten Hive CI for the engine/cancun simulator by requiring zero failures.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
execution/engineapi/engine_server.go Splits validation into pre/post FCU phases and runs post-FCU checks only when head is VALID.
execution/engineapi/testing_api_test.go Adds targeted regression tests for SYNCING deferral vs VALID rejection of missing beacon root in FCU V3.
.github/workflows/test-hive.yml Sets engine/cancun Hive suite max-allowed-failures to 0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yperbasis yperbasis marked this pull request as draft April 28, 2026 19:22
This reverts commit 6cde6d63ae5fa1bb389afa68a5b85ee99f51876e.

The hive run on this branch
(https://github.com/erigontech/erigon/actions/runs/25072568682/job/73456575125)
showed 2 of 226 cancun tests still fail with the same TEST ISSUE
"missing trie node" errors the original comment described:

- Invalid Missing Ancestor Syncing ReOrg, Timestamp, EmptyTxs=False,
  CanonicalReOrg=False, Invalid P8
- Invalid Missing Ancestor Syncing ReOrg, Timestamp, EmptyTxs=False,
  CanonicalReOrg=True, Invalid P8

These come from Geth (the secondary EL Hive uses to produce the
invalid blocks for the test) failing to commit state, not from
Erigon — STEEL hasn't resolved them yet. Restore max-allowed-failures
to 3 and the original comment so the gate matches reality.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yperbasis yperbasis marked this pull request as ready for review April 28, 2026 19:35
@yperbasis yperbasis enabled auto-merge April 28, 2026 19:35
@yperbasis yperbasis marked this pull request as draft April 28, 2026 19:39
auto-merge was automatically disabled April 28, 2026 19:39

Pull request was converted to draft

yperbasis and others added 2 commits April 28, 2026 21:46
The previous pinned ref (684d4add) lacks ethereum/hive#1395
"simulators/engine: make SetBlock robust for reorg chains", which is
the upstream fix for the two cancun TEST ISSUE failures observed in
the prior run on this branch:

- Invalid Missing Ancestor Syncing ReOrg, ..., CanonicalReOrg=False, Invalid P8
- Invalid Missing Ancestor Syncing ReOrg, ..., CanonicalReOrg=True, Invalid P8

Bumping to current master (02075f47) so the gate tightening to
max-allowed-failures: 0 in the previous commit is testable against a
hive that no longer trips on the SetBlock reorg path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yperbasis
Copy link
Copy Markdown
Member Author

Reapplied the gate tightening and bumped hive_ref from 684d4add02075f47 (ethereum/hive master). The bump pulls in ethereum/hive#1395 "simulators/engine: make SetBlock robust for reorg chains", which is the upstream fix for the two Invalid Missing Ancestor Syncing ReOrg, ..., Invalid P8 TEST ISSUE failures from the previous run.

Re-dispatched Test Hive: https://github.com/erigontech/erigon/actions/runs/25074084347

@yperbasis yperbasis marked this pull request as ready for review April 28, 2026 20:05
@AskAlexSharov AskAlexSharov added this pull request to the merge queue Apr 29, 2026
Merged via the queue into main with commit 02e42de Apr 29, 2026
38 checks passed
@AskAlexSharov AskAlexSharov deleted the yperbasis/fcu-defer-v3-attrs branch April 29, 2026 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants