engineapi: defer V3 payloadAttributes checks until head is VALID#20891
engineapi: defer V3 payloadAttributes checks until head is VALID#20891AskAlexSharov merged 6 commits intomainfrom
Conversation
Hive engine-cancun "Invalid PayloadAttributes, Missing BeaconRoot, Syncing=True (Cancun)" was failing because Erigon validated parentBeaconBlockRoot before the SYNCING short-circuit and returned -38003 when the head was unknown. Per cancun.md, fcuV3 "Extends point (8) of engine_forkchoiceUpdatedV1" with the structure / fork-window / timestamp checks. Point (8) of V1 explicitly fires "only if the payload referenced by forkchoiceState.headBlockHash is VALID", so when the head is unknown those checks must NOT run and the response is the SYNCING one defined in point (9). The "wrong version of structure" rule that motivated #20728 lives in the V2 Request section instead, and remains head-state-independent — that path still needs to return -38003 even while syncing. Split validatePayloadAttributes into: - validatePayloadAttributesPreFCU: V1/V2 wrong-version checks (parentBeaconBlockRoot on V1/V2, V1/V2 fcu at Cancun timestamp, withdrawals presence vs Shanghai). Runs before the SYNCING short-circuit. - validatePayloadAttributesPostFCU: V3 "Extend point (8)" checks (parentBeaconBlockRoot != null, Cancun-window, V4 SlotNumber). Runs only when status.Status == ValidStatus. Two regression tests added: - TestForkchoiceUpdatedV3DefersAttributesValidationWhenSyncing pins the hive engine-cancun "Syncing=True" scenario. - TestForkchoiceUpdatedV3RejectsMissingBeaconRootWhenValid pins the spec-mandated -38003 when the head is VALID, so a future change cannot regress it into a permanent SYNCING swallow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The two STEEL/Geth secondary-client failures and the "Blob Transaction Ordering, Multiple Clients" parallelism flake that previously needed an allowance of 3 are no longer expected. Tighten the gate to 0 so any new regression in engine-cancun fails CI loudly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adjusts Engine API engine_forkchoiceUpdatedV3+ payload attribute validation to match the spec ordering: only run the V3 “extend point (8)” checks (e.g., parentBeaconBlockRoot != null) once the FCU head is VALID, while still enforcing request-level “wrong version of structure” checks before the SYNCING short-circuit.
Changes:
- Split payload attributes validation into pre-FCU (request-level) vs post-FCU (point-(8)-extension) phases and update
forkchoiceUpdatedto call them in spec order. - Add regression tests covering (a) deferral on SYNCING and (b) rejection on VALID for missing
parentBeaconBlockRoot. - Tighten Hive CI for the
engine/cancunsimulator by requiring zero failures.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
execution/engineapi/engine_server.go |
Splits validation into pre/post FCU phases and runs post-FCU checks only when head is VALID. |
execution/engineapi/testing_api_test.go |
Adds targeted regression tests for SYNCING deferral vs VALID rejection of missing beacon root in FCU V3. |
.github/workflows/test-hive.yml |
Sets engine/cancun Hive suite max-allowed-failures to 0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This reverts commit 6cde6d63ae5fa1bb389afa68a5b85ee99f51876e. The hive run on this branch (https://github.com/erigontech/erigon/actions/runs/25072568682/job/73456575125) showed 2 of 226 cancun tests still fail with the same TEST ISSUE "missing trie node" errors the original comment described: - Invalid Missing Ancestor Syncing ReOrg, Timestamp, EmptyTxs=False, CanonicalReOrg=False, Invalid P8 - Invalid Missing Ancestor Syncing ReOrg, Timestamp, EmptyTxs=False, CanonicalReOrg=True, Invalid P8 These come from Geth (the secondary EL Hive uses to produce the invalid blocks for the test) failing to commit state, not from Erigon — STEEL hasn't resolved them yet. Restore max-allowed-failures to 3 and the original comment so the gate matches reality. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This reverts commit b5016cc.
The previous pinned ref (684d4add) lacks ethereum/hive#1395 "simulators/engine: make SetBlock robust for reorg chains", which is the upstream fix for the two cancun TEST ISSUE failures observed in the prior run on this branch: - Invalid Missing Ancestor Syncing ReOrg, ..., CanonicalReOrg=False, Invalid P8 - Invalid Missing Ancestor Syncing ReOrg, ..., CanonicalReOrg=True, Invalid P8 Bumping to current master (02075f47) so the gate tightening to max-allowed-failures: 0 in the previous commit is testable against a hive that no longer trips on the SetBlock reorg path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Reapplied the gate tightening and bumped Re-dispatched Test Hive: https://github.com/erigontech/erigon/actions/runs/25074084347 |
Summary
engine-cancun / Invalid PayloadAttributes, Missing BeaconRoot, Syncing=True (Cancun)(test 376 of this run) was failing because Erigon validatedparentBeaconBlockRootbefore the SYNCING short-circuit and returned-38003when the head was unknown.validatePayloadAttributesinto request-level (pre-FCU) and point-(8)-extension (post-FCU) halves so the spec-mandated ordering is preserved on both the SYNCING and VALID paths.engine-withdrawals / Empty Withdrawals (Paris)) still passes — its check (V2 wrong withdrawals presence) is in the pre-FCU group.Why
cancun.mdsays fcuV3 extends point (8) ofengine_forkchoiceUpdatedV1:…and point (8) of V1 (paris.md) gates the entire processing flow on the head being
VALID:So with an unknown head, point (8) doesn't fire and the response is the SYNCING one in point (9). The
Syncing=Truefamily of hive cancun tests pin exactly this.The pre-FCU "wrong version of structure" rule is different — it lives in the V2 Request section and has no precondition on head state, which is why #20728 correctly moved it before the short-circuit.
What changed
validatePayloadAttributesPreFCUkeeps the request-level checks:parentBeaconBlockRooton V1/V2, V1/V2 fcu at a Cancun timestamp, withdrawals presence vs Shanghai. Runs before the SYNCING short-circuit.validatePayloadAttributesPostFCUruns the V3 point-(8) extensions:parentBeaconBlockRoot != nullfor V3+, Cancun-window for V3+,SlotNumber != nullfor V4+. Runs only whenstatus.Status == ValidStatus.testing_api_test.go:TestForkchoiceUpdatedV3DefersAttributesValidationWhenSyncing— pins the hiveSyncing=Truescenario (no error, status=SYNCING).TestForkchoiceUpdatedV3RejectsMissingBeaconRootWhenValid— pins the spec-mandated-38003when the head is VALID so the deferral can't regress into a permanent swallow.