cmd/utils/app, execution/vm: fix hive legacy-cancun blockchain failures#20892
cmd/utils/app, execution/vm: fix hive legacy-cancun blockchain failures#20892
Conversation
Two fixes for the 29 failures in Hive `legacy-cancun` (https://hive.ethpandaops.io/#/test/generic/1777223434-c411e5be836d6d6f8a466489c2874cf3). ## (a) cmd/utils/app — TD-based fork choice for `erigon import` The Hive consensus simulator writes each block to its own `00NN.rlp` file and runs `erigon import /blocks/00NN.rlp` for each — the simulator does not implement TD/fork-choice itself, it relies on the client's native chain selection. `InsertChain` always called `UpdateForkChoice(importedTip)`, so every imported block became head even when its TD was lower than the current canonical head's. For `lotsOfLeafs` block 6 (TD `0x9CCF4`) is the canonical winner, but Erigon kept switching head through blocks 7-13 (TD `0x9CC34`) and ended on `0x26ad10c0…` instead of `0xf7f9ea97…`. This affects 23 tests (all Berlin/Istanbul/London variants): `lotsOfLeafs`, `ChainAtoChainB_difficultyB`, `ForkStressTest`, `CallContractFromNotBestBlock`, `lotsOfBranchesOverrideAtTheMiddle`, `sideChainWithMoreTransactions`, `uncleBlockAtBlock3afterBlock4`, `blockChainFrontierWithLargerTDvsHomesteadBlockchain[2]`. Fix: before calling `UpdateForkChoice`, compare the imported chain's total difficulty against the current head's TD. If strictly less, write the headers/bodies/TDs straight to the DB as a side chain (no execution, no head change) so future side-chain extensions can still validate. Once a side-chain extension surpasses the canonical TD, the regular `InsertBlocks` + `UpdateForkChoice` path triggers the reorg and executes those blocks. Strict-`<` (rather than `≤`) keeps PoS imports (all `difficulty=0`, TD never grows) flowing through the canonical path; equal-TD competing PoW branches still trigger `UpdateForkChoice` and let the existing fork-choice machinery decide. ## (b) execution/vm — STATICCALL touch for EIP-161 state clearing `RevertPrecompiledTouch_d3g0v0_{Berlin,Istanbul,London}` and `RevertPrecompiledTouch_storage_d3g0v0_{Berlin,Istanbul,London}` (6 tests) all touch precompile 3 (RIPEMD-160) via STATICCALL inside a frame that runs out of gas. Expected: ripemd state-cleared (deleted) at end of tx via the `journal.dirty(ripemd)` consensus quirk. Erigon left it. The CALL path in `evm.call` calls `Exist(addr)` first, which loads ripemd into `stateObjects`; the subsequent `AddBalance(addr, 0)` then falls through to `TouchAccount` and the famous quirk fires, so `FinalizeTx` deletes ripemd. But the STATICCALL path skipped `Exist` and called `AddBalance(addr, 0)` directly — in serial mode that hits the `versionMap == nil && addr == ripemd && amount.IsZero()` shortcut that uses `balanceIncrease` instead of `GetOrNewStateObject`. ripemd never enters `stateObjects`, so on revert `FinalizeTx`'s `if !exist { continue }` branch skips it. Fix: replace `AddBalance(addr, u256.Num0, ...)` with `TouchAccount(addr)` for STATICCALL — same end behavior as CALL's post-`Exist` flow (loads the account, hits the dirty quirk on touch). Identical for non-ripemd / non-serial paths. `d0` (CALL), `d1` (DELEGATECALL), `d2` (CALLCODE) variants were unaffected: CALL goes through `Exist`+`Transfer`; DELEGATECALL and CALLCODE don't touch the callee at all (matching geth and the post- state expectations). ## Test plan - `make lint` clean. - `./execution/{vm,state,execmodule,tests/...}` and `./cmd/utils/app` short tests pass. - `TestLegacyBlockchain` (UncleFromSideChain Cancun/Prague) still passes. - Full hive validation requires running the simulator; this branch needs a hive run to confirm zero remaining failures in `legacy-cancun`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes Hive legacy-cancun failures by aligning Erigon’s import/head selection with TD-based fork choice for PoW-era blockchain tests, and by ensuring STATICCALL correctly “touches” the RIPEMD-160 precompile so EIP-161 state clearing applies even on OOG/revert paths.
Changes:
cmd/utils/app: Compare imported chain TD vs current head TD; persist lower-TD imports as side chains without updating head.execution/vm: ReplaceSTATICCALL’sAddBalance(0)touch withTouchAccountto ensure the RIPEMD-160 precompile is properly materialized/touched for EIP-161 clearing.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| cmd/utils/app/import_cmd.go | Adds TD comparison logic to avoid updating fork choice/head when importing a lower-TD branch; persists side-chain data for later extensions. |
| execution/vm/evm.go | Ensures STATICCALL triggers a true account touch (incl. RIPEMD-160 precompile) so state clearing matches expected consensus/test behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…rg helper Hoist the PoW total-difficulty fork-choice rule out of FeedHeaderPoW into a standalone ShouldReorg(localTd, localHeight, localHash, newTd, newHeight, newHash) helper, and call it from both the legacy header sync and the offline import command in the previous commit. Behavior unchanged on the existing path. For import_cmd.go this replaces the earlier strict-less-than check with the proper geth-style tie-break (shorter chain wins on equal TD; lex hash on equal TD + equal height). PoS imports — where every block has difficulty=0 so TD never grows — explicitly skip the side-chain branch so the head can still advance via UpdateForkChoice; without the bypass ShouldReorg's "shorter wins on tie" rule would never let the head past genesis. Add a focused unit test (TestShouldReorg) covering the strict-greater, strict-less, equal-TD shallower/deeper, and equal-TD-equal-height hash tie-break cases. The pre-existing TestSideChainInsert (currently t.Skip'd) covers the same scenarios end-to-end through FeedHeaderPoW and is unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TestState only walks the EEST static_tests, which don't carry every variant of the original ethereum/tests fixtures — e.g. RevertPrecompiledTouch_d3 across Berlin/Istanbul/London, which catches the ripemd-touch state-clearing path that the Hive `legacy-cancun` simulator flagged in this branch. TestLegacyBlockchain in block_test.go explicitly skips the LegacyTests subtree under the assumption that GeneralStateTests run natively elsewhere; in practice they don't. Add TestLegacyCancunState (state-test format, mirroring TestState's structure) walking legacy-tests/LegacyTests/Cancun/GeneralStateTests so this class of regression is caught locally on every test-all run rather than only by the weekly out-of-tree Hive run. Verified: with the STATICCALL fix in execution/vm/evm.go reverted, TestLegacyCancunState/stRevertTest/RevertPrecompiledTouch.json fails at indexes Berlin/1, Istanbul/1, London/1 (d3 variant) — all three failures the Hive legacy-cancun run reported. With the fix applied, all 12 sub-cases pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extract the shared boilerplate from TestStateCornerCases / TestState /
TestLegacyCancunState into two helpers:
- stateTestSetup(t): t.Parallel + Windows skip + log-handler scoping.
- runStateTests(t, st, dir): the Walk + per-subtest harness (temp
datadir, fresh DB, withTrace, ExpectException short-circuit, matcher
CheckFailure).
The three test functions now just describe what's different — the
directory and any matcher Slow/SkipLoad patterns — instead of repeating
the same ~25-line per-subtest scaffold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tLegacyCancunState
Mirror the matcher patterns TestState already uses:
- SkipLoad ^stTimeConsuming/ — 26 sstore-combinations / blake2f
fixtures whose unit cost dwarfs the rest of the suite.
- Slow ^stPreCompiledContracts/precompsEIP2929Cancun — pushed off
short / -short=false treadmills only when explicitly opted in.
Same intent as TestState: keep the per-PR test-all run within budget
on machines with modest GOMAXPROCS while still covering the d3 ripemd
state-clearing case (stRevertTest/RevertPrecompiledTouch.json) that
motivated this test in the first place.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
) When TestLegacyCancunState walks legacy-tests/LegacyTests/Cancun/ GeneralStateTests, six fixtures fail on the Constantinople fork only with post-state-root mismatches: sstoreGas, CREATE_HighNonce, CREATE2_HighNonce, CREATE2_HighNonceDelegatecall, CallEcrecover_Overflow, ecrecoverShortBuff. They cluster around three orthogonal Constantinople- era behaviors — EIP-1283 SSTORE gas, max-uint64 nonce CREATE/CREATE2, and ECRecover input handling — each likely its own small consensus fix. Pre-existing on Erigon (verified by reverting this branch's STATICCALL change and re-running). Geth's local runner walks the older LegacyTests/Constantinople/GeneralStateTests, which doesn't contain these fixtures, so geth doesn't catch them locally either — even though its evm generated them, so they pass against geth by definition. SkipLoad them by file with a comment pointing at #20894 so the test catches future regressions on the d3 RIPEMD-160 path (stRevertTest/RevertPrecompiledTouch.json) without gating PRs on this orthogonal cluster. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous "pre-Istanbul" wording was copy-pasted from block_test.go and is wrong for the Cancun-era snapshot — those fixtures cover Frontier through Cancun, including the Berlin/Istanbul/London RevertPrecompiledTouch variants this test was added to catch. Reword per PR review feedback (#20892 (comment)) to make the snapshot name's role explicit (it's a snapshot version, not a fork bound) and list the fork range concretely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| // Both the legacy header sync (FeedHeaderPoW) and the offline import command | ||
| // (cmd/utils/app/import_cmd.go) consult this so PoW canonical-chain selection | ||
| // is identical along both code paths. | ||
| func ShouldReorg(localTd *big.Int, localHeight uint64, localHash common.Hash, newTd *big.Int, newHeight uint64, newHash common.Hash) bool { |
There was a problem hiding this comment.
im planning on removing headerdownload, bodydownload and stage_headers and bodies soon since they are no longer used.
can this func be moved in the import cmd instead pls? since that is the only place that will ever need it
There was a problem hiding this comment.
OK, I moved it to package ethash
|
@claude review |
|
Claude finished @taratorio's task in 1m 57s —— View job Code Review
Required change (echoing taratorio's CHANGES_REQUESTED)
Concretely, move Bug: panic on empty
|
The headerdownload package is slated for removal along with the legacy stage_headers/bodies pipeline. Park ShouldReorg in ethash so the offline import command can keep calling it after headerdownload is gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without the guard, the TD-comparison block dereferences chain.Blocks[0] and panics on an empty slice instead of returning nil. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Two fixes addressing all 29 failures in the Hive
legacy-cancunrun
against
erigon_default, plus a regression test that catches the EVM-side bug locally.(a)
cmd/utils/app— TD-based fork choice forerigon importThe Hive consensus simulator writes each block to its own
00NN.rlpfile and runserigon import /blocks/00NN.rlpfor each — the simulator does not implement TD/fork-choice itself, it relies on the client's native chain selection.InsertChainalways calledUpdateForkChoice(importedTip), so every imported block became head even when its TD was lower than the current canonical head's. ForlotsOfLeafsblock 6 (TD0x9CCF4) is the canonical winner, but Erigon kept switching head through blocks 7-13 (TD0x9CC34) and ended on0x26ad10c0…instead of0xf7f9ea97….This affects 23 tests (all Berlin/Istanbul/London variants):
lotsOfLeafs,ChainAtoChainB_difficultyB,ForkStressTest,CallContractFromNotBestBlock,lotsOfBranchesOverrideAtTheMiddle,sideChainWithMoreTransactions,uncleBlockAtBlock3afterBlock4,blockChainFrontierWithLargerTDvsHomesteadBlockchain[2].Fix: before calling
UpdateForkChoice, compare the imported chain's total difficulty against the current head's TD using a sharedheaderdownload.ShouldReorghelper (extracted from the existingFeedHeaderPoWtie-break: TD → height → lex hash). If the imported chain doesn't beat the current head, write the headers/bodies/TDs straight to the DB as a side chain (no execution, no head change) so future side-chain extensions can still validate. Once a side-chain extension surpasses the canonical TD, the regularInsertBlocks+UpdateForkChoicepath triggers the reorg and executes those blocks. PoS imports (alldifficulty=0, TD never grows) explicitly bypass this branch —ShouldReorg's "shorter wins on tie" rule would otherwise prevent the head from advancing past genesis.(b)
execution/vm— STATICCALL touch for EIP-161 state clearingRevertPrecompiledTouch_d3g0v0_{Berlin,Istanbul,London}andRevertPrecompiledTouch_storage_d3g0v0_{Berlin,Istanbul,London}(6 tests) all touch precompile 3 (RIPEMD-160) via STATICCALL inside a frame that runs out of gas. Expected: ripemd state-cleared (deleted) at end of tx via thejournal.dirty(ripemd)consensus quirk. Erigon left it.The CALL path in
evm.callcallsExist(addr)first, which loads ripemd intostateObjects; the subsequentAddBalance(addr, 0)then falls through toTouchAccountand the famous quirk fires, soFinalizeTxdeletes ripemd. But the STATICCALL path skippedExistand calledAddBalance(addr, 0)directly — in serial mode that hits theversionMap == nil && addr == ripemd && amount.IsZero()shortcut that usesbalanceIncreaseinstead ofGetOrNewStateObject. ripemd never entersstateObjects, so on revertFinalizeTx'sif !exist { continue }branch skips it.Fix: replace
AddBalance(addr, u256.Num0, …)withTouchAccount(addr)for STATICCALL — same end behavior as CALL's post-Existflow (loads the account, hits the dirty quirk on touch). Identical for non-ripemd / non-serial paths.d0(CALL),d1(DELEGATECALL),d2(CALLCODE) variants were unaffected: CALL goes throughExist+Transfer; DELEGATECALL and CALLCODE don't touch the callee at all (matching geth and the post-state expectations).(c)
execution/tests— local regression test for the d3 RIPEMD pathAdd
TestLegacyCancunState, walkinglegacy-tests/LegacyTests/Cancun/GeneralStateTestsin state-test format.TestLegacyBlockchain(block_test.go) explicitly skips theLegacyTests/subtree andTestStatewalks the EEST static_tests, neither of which covers the d3 ripemd-touch case. Verified the new test fails at baseline (Berlin/1, Istanbul/1, London/1 — the d3 indexes) and passes with the fix in (b). Refactored the now-three state-test runners (TestStateCornerCases/TestState/TestLegacyCancunState) to share arunStateTests(t, st, dir)helper plus astateTestSetup(t)helper for parallel/log/Windows-skip boilerplate.Six fixtures fail on Constantinople-only post-state-root mismatches (sstoreGas / _HighNonce / Ecrecover_Overflow / ecrecoverShortBuff) — pre-existing Erigon-side divergences from geth, unrelated to this PR. Geth doesn't catch them locally because its runner walks
LegacyTests/Constantinople/GeneralStateTests(older snapshot, doesn't include these). Tracked separately in #20894 andSkipLoad-ed for now with a comment pointing at the issue.