Skip to content

cmd/utils/app, execution/vm: fix hive legacy-cancun blockchain failures#20892

Open
yperbasis wants to merge 14 commits intomainfrom
yperbasis/hive-legacy-cancun-fixes
Open

cmd/utils/app, execution/vm: fix hive legacy-cancun blockchain failures#20892
yperbasis wants to merge 14 commits intomainfrom
yperbasis/hive-legacy-cancun-fixes

Conversation

@yperbasis
Copy link
Copy Markdown
Member

@yperbasis yperbasis commented Apr 28, 2026

Summary

Two fixes addressing all 29 failures in the Hive legacy-cancun
run
against erigon_default, plus a regression test that catches the EVM-side bug locally.

(a) cmd/utils/app — TD-based fork choice for erigon import

The Hive consensus simulator writes each block to its own 00NN.rlp file and runs erigon import /blocks/00NN.rlp for each — the simulator does not implement TD/fork-choice itself, it relies on the client's native chain selection. InsertChain always called UpdateForkChoice(importedTip), so every imported block became head even when its TD was lower than the current canonical head's. For lotsOfLeafs block 6 (TD 0x9CCF4) is the canonical winner, but Erigon kept switching head through blocks 7-13 (TD 0x9CC34) and ended on 0x26ad10c0… instead of 0xf7f9ea97….

This affects 23 tests (all Berlin/Istanbul/London variants):
lotsOfLeafs, ChainAtoChainB_difficultyB, ForkStressTest, CallContractFromNotBestBlock, lotsOfBranchesOverrideAtTheMiddle, sideChainWithMoreTransactions, uncleBlockAtBlock3afterBlock4, blockChainFrontierWithLargerTDvsHomesteadBlockchain[2].

Fix: before calling UpdateForkChoice, compare the imported chain's total difficulty against the current head's TD using a shared headerdownload.ShouldReorg helper (extracted from the existing FeedHeaderPoW tie-break: TD → height → lex hash). If the imported chain doesn't beat the current head, write the headers/bodies/TDs straight to the DB as a side chain (no execution, no head change) so future side-chain extensions can still validate. Once a side-chain extension surpasses the canonical TD, the regular InsertBlocks + UpdateForkChoice path triggers the reorg and executes those blocks. PoS imports (all difficulty=0, TD never grows) explicitly bypass this branch — ShouldReorg's "shorter wins on tie" rule would otherwise prevent the head from advancing past genesis.

(b) execution/vm — STATICCALL touch for EIP-161 state clearing

RevertPrecompiledTouch_d3g0v0_{Berlin,Istanbul,London} and RevertPrecompiledTouch_storage_d3g0v0_{Berlin,Istanbul,London} (6 tests) all touch precompile 3 (RIPEMD-160) via STATICCALL inside a frame that runs out of gas. Expected: ripemd state-cleared (deleted) at end of tx via the journal.dirty(ripemd) consensus quirk. Erigon left it.

The CALL path in evm.call calls Exist(addr) first, which loads ripemd into stateObjects; the subsequent AddBalance(addr, 0) then falls through to TouchAccount and the famous quirk fires, so FinalizeTx deletes ripemd. But the STATICCALL path skipped Exist and called AddBalance(addr, 0) directly — in serial mode that hits the versionMap == nil && addr == ripemd && amount.IsZero() shortcut that uses balanceIncrease instead of GetOrNewStateObject. ripemd never enters stateObjects, so on revert FinalizeTx's if !exist { continue } branch skips it.

Fix: replace AddBalance(addr, u256.Num0, …) with TouchAccount(addr) for STATICCALL — same end behavior as CALL's post-Exist flow (loads the account, hits the dirty quirk on touch). Identical for non-ripemd / non-serial paths.

d0 (CALL), d1 (DELEGATECALL), d2 (CALLCODE) variants were unaffected: CALL goes through Exist+Transfer; DELEGATECALL and CALLCODE don't touch the callee at all (matching geth and the post-state expectations).

(c) execution/tests — local regression test for the d3 RIPEMD path

Add TestLegacyCancunState, walking legacy-tests/LegacyTests/Cancun/GeneralStateTests in state-test format. TestLegacyBlockchain (block_test.go) explicitly skips the LegacyTests/ subtree and TestState walks the EEST static_tests, neither of which covers the d3 ripemd-touch case. Verified the new test fails at baseline (Berlin/1, Istanbul/1, London/1 — the d3 indexes) and passes with the fix in (b). Refactored the now-three state-test runners (TestStateCornerCases / TestState / TestLegacyCancunState) to share a runStateTests(t, st, dir) helper plus a stateTestSetup(t) helper for parallel/log/Windows-skip boilerplate.

Six fixtures fail on Constantinople-only post-state-root mismatches (sstoreGas / _HighNonce / Ecrecover_Overflow / ecrecoverShortBuff) — pre-existing Erigon-side divergences from geth, unrelated to this PR. Geth doesn't catch them locally because its runner walks LegacyTests/Constantinople/GeneralStateTests (older snapshot, doesn't include these). Tracked separately in #20894 and SkipLoad-ed for now with a comment pointing at the issue.

Two fixes for the 29 failures in Hive `legacy-cancun`
(https://hive.ethpandaops.io/#/test/generic/1777223434-c411e5be836d6d6f8a466489c2874cf3).

## (a) cmd/utils/app — TD-based fork choice for `erigon import`

The Hive consensus simulator writes each block to its own `00NN.rlp`
file and runs `erigon import /blocks/00NN.rlp` for each — the simulator
does not implement TD/fork-choice itself, it relies on the client's
native chain selection. `InsertChain` always called
`UpdateForkChoice(importedTip)`, so every imported block became head
even when its TD was lower than the current canonical head's. For
`lotsOfLeafs` block 6 (TD `0x9CCF4`) is the canonical winner, but
Erigon kept switching head through blocks 7-13 (TD `0x9CC34`) and
ended on `0x26ad10c0…` instead of `0xf7f9ea97…`.

This affects 23 tests (all Berlin/Istanbul/London variants):
`lotsOfLeafs`, `ChainAtoChainB_difficultyB`,
`ForkStressTest`, `CallContractFromNotBestBlock`,
`lotsOfBranchesOverrideAtTheMiddle`, `sideChainWithMoreTransactions`,
`uncleBlockAtBlock3afterBlock4`,
`blockChainFrontierWithLargerTDvsHomesteadBlockchain[2]`.

Fix: before calling `UpdateForkChoice`, compare the imported chain's
total difficulty against the current head's TD. If strictly less, write
the headers/bodies/TDs straight to the DB as a side chain (no execution,
no head change) so future side-chain extensions can still validate.
Once a side-chain extension surpasses the canonical TD, the regular
`InsertBlocks` + `UpdateForkChoice` path triggers the reorg and
executes those blocks. Strict-`<` (rather than `≤`) keeps PoS imports
(all `difficulty=0`, TD never grows) flowing through the canonical
path; equal-TD competing PoW branches still trigger
`UpdateForkChoice` and let the existing fork-choice machinery decide.

## (b) execution/vm — STATICCALL touch for EIP-161 state clearing

`RevertPrecompiledTouch_d3g0v0_{Berlin,Istanbul,London}` and
`RevertPrecompiledTouch_storage_d3g0v0_{Berlin,Istanbul,London}`
(6 tests) all touch precompile 3 (RIPEMD-160) via STATICCALL inside a
frame that runs out of gas. Expected: ripemd state-cleared (deleted)
at end of tx via the `journal.dirty(ripemd)` consensus quirk. Erigon
left it.

The CALL path in `evm.call` calls `Exist(addr)` first, which loads
ripemd into `stateObjects`; the subsequent `AddBalance(addr, 0)` then
falls through to `TouchAccount` and the famous quirk fires, so
`FinalizeTx` deletes ripemd. But the STATICCALL path skipped `Exist`
and called `AddBalance(addr, 0)` directly — in serial mode that hits
the `versionMap == nil && addr == ripemd && amount.IsZero()` shortcut
that uses `balanceIncrease` instead of `GetOrNewStateObject`. ripemd
never enters `stateObjects`, so on revert `FinalizeTx`'s
`if !exist { continue }` branch skips it.

Fix: replace `AddBalance(addr, u256.Num0, ...)` with
`TouchAccount(addr)` for STATICCALL — same end behavior as CALL's
post-`Exist` flow (loads the account, hits the dirty quirk on touch).
Identical for non-ripemd / non-serial paths.

`d0` (CALL), `d1` (DELEGATECALL), `d2` (CALLCODE) variants were
unaffected: CALL goes through `Exist`+`Transfer`; DELEGATECALL and
CALLCODE don't touch the callee at all (matching geth and the post-
state expectations).

## Test plan

- `make lint` clean.
- `./execution/{vm,state,execmodule,tests/...}` and `./cmd/utils/app`
  short tests pass.
- `TestLegacyBlockchain` (UncleFromSideChain Cancun/Prague) still
  passes.
- Full hive validation requires running the simulator; this branch
  needs a hive run to confirm zero remaining failures in
  `legacy-cancun`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Hive legacy-cancun failures by aligning Erigon’s import/head selection with TD-based fork choice for PoW-era blockchain tests, and by ensuring STATICCALL correctly “touches” the RIPEMD-160 precompile so EIP-161 state clearing applies even on OOG/revert paths.

Changes:

  • cmd/utils/app: Compare imported chain TD vs current head TD; persist lower-TD imports as side chains without updating head.
  • execution/vm: Replace STATICCALL’s AddBalance(0) touch with TouchAccount to ensure the RIPEMD-160 precompile is properly materialized/touched for EIP-161 clearing.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
cmd/utils/app/import_cmd.go Adds TD comparison logic to avoid updating fork choice/head when importing a lower-TD branch; persists side-chain data for later extensions.
execution/vm/evm.go Ensures STATICCALL triggers a true account touch (incl. RIPEMD-160 precompile) so state clearing matches expected consensus/test behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/utils/app/import_cmd.go
Comment thread cmd/utils/app/import_cmd.go Outdated
yperbasis and others added 4 commits April 28, 2026 22:26
…rg helper

Hoist the PoW total-difficulty fork-choice rule out of FeedHeaderPoW
into a standalone ShouldReorg(localTd, localHeight, localHash, newTd,
newHeight, newHash) helper, and call it from both the legacy header
sync and the offline import command in the previous commit. Behavior
unchanged on the existing path.

For import_cmd.go this replaces the earlier strict-less-than check
with the proper geth-style tie-break (shorter chain wins on equal TD;
lex hash on equal TD + equal height). PoS imports — where every block
has difficulty=0 so TD never grows — explicitly skip the side-chain
branch so the head can still advance via UpdateForkChoice; without
the bypass ShouldReorg's "shorter wins on tie" rule would never let
the head past genesis.

Add a focused unit test (TestShouldReorg) covering the strict-greater,
strict-less, equal-TD shallower/deeper, and equal-TD-equal-height hash
tie-break cases. The pre-existing TestSideChainInsert (currently
t.Skip'd) covers the same scenarios end-to-end through FeedHeaderPoW
and is unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TestState only walks the EEST static_tests, which don't carry every
variant of the original ethereum/tests fixtures — e.g.
RevertPrecompiledTouch_d3 across Berlin/Istanbul/London, which catches
the ripemd-touch state-clearing path that the Hive `legacy-cancun`
simulator flagged in this branch. TestLegacyBlockchain in block_test.go
explicitly skips the LegacyTests subtree under the assumption that
GeneralStateTests run natively elsewhere; in practice they don't.

Add TestLegacyCancunState (state-test format, mirroring TestState's
structure) walking legacy-tests/LegacyTests/Cancun/GeneralStateTests
so this class of regression is caught locally on every test-all run
rather than only by the weekly out-of-tree Hive run.

Verified: with the STATICCALL fix in execution/vm/evm.go reverted,
TestLegacyCancunState/stRevertTest/RevertPrecompiledTouch.json fails
at indexes Berlin/1, Istanbul/1, London/1 (d3 variant) — all three
failures the Hive legacy-cancun run reported. With the fix applied,
all 12 sub-cases pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extract the shared boilerplate from TestStateCornerCases / TestState /
TestLegacyCancunState into two helpers:

  - stateTestSetup(t): t.Parallel + Windows skip + log-handler scoping.
  - runStateTests(t, st, dir): the Walk + per-subtest harness (temp
    datadir, fresh DB, withTrace, ExpectException short-circuit, matcher
    CheckFailure).

The three test functions now just describe what's different — the
directory and any matcher Slow/SkipLoad patterns — instead of repeating
the same ~25-line per-subtest scaffold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tLegacyCancunState

Mirror the matcher patterns TestState already uses:

  - SkipLoad ^stTimeConsuming/ — 26 sstore-combinations / blake2f
    fixtures whose unit cost dwarfs the rest of the suite.
  - Slow ^stPreCompiledContracts/precompsEIP2929Cancun — pushed off
    short / -short=false treadmills only when explicitly opted in.

Same intent as TestState: keep the per-PR test-all run within budget
on machines with modest GOMAXPROCS while still covering the d3 ripemd
state-clearing case (stRevertTest/RevertPrecompiledTouch.json) that
motivated this test in the first place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread execution/tests/state_test.go Outdated
yperbasis and others added 2 commits April 28, 2026 23:48
)

When TestLegacyCancunState walks legacy-tests/LegacyTests/Cancun/
GeneralStateTests, six fixtures fail on the Constantinople fork only
with post-state-root mismatches: sstoreGas, CREATE_HighNonce,
CREATE2_HighNonce, CREATE2_HighNonceDelegatecall, CallEcrecover_Overflow,
ecrecoverShortBuff. They cluster around three orthogonal Constantinople-
era behaviors — EIP-1283 SSTORE gas, max-uint64 nonce CREATE/CREATE2,
and ECRecover input handling — each likely its own small consensus fix.

Pre-existing on Erigon (verified by reverting this branch's STATICCALL
change and re-running). Geth's local runner walks the older
LegacyTests/Constantinople/GeneralStateTests, which doesn't contain
these fixtures, so geth doesn't catch them locally either — even though
its evm generated them, so they pass against geth by definition.

SkipLoad them by file with a comment pointing at #20894 so the test
catches future regressions on the d3 RIPEMD-160 path
(stRevertTest/RevertPrecompiledTouch.json) without gating PRs on this
orthogonal cluster.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous "pre-Istanbul" wording was copy-pasted from block_test.go
and is wrong for the Cancun-era snapshot — those fixtures cover Frontier
through Cancun, including the Berlin/Istanbul/London RevertPrecompiledTouch
variants this test was added to catch.

Reword per PR review feedback (#20892 (comment)) to make the snapshot
name's role explicit (it's a snapshot version, not a fork bound) and
list the fork range concretely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yperbasis yperbasis marked this pull request as ready for review April 28, 2026 21:54
@yperbasis yperbasis requested a review from taratorio April 28, 2026 21:54
// Both the legacy header sync (FeedHeaderPoW) and the offline import command
// (cmd/utils/app/import_cmd.go) consult this so PoW canonical-chain selection
// is identical along both code paths.
func ShouldReorg(localTd *big.Int, localHeight uint64, localHash common.Hash, newTd *big.Int, newHeight uint64, newHash common.Hash) bool {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im planning on removing headerdownload, bodydownload and stage_headers and bodies soon since they are no longer used.

can this func be moved in the import cmd instead pls? since that is the only place that will ever need it

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I moved it to package ethash

@taratorio
Copy link
Copy Markdown
Member

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 29, 2026

Claude finished @taratorio's task in 1m 57s —— View job


Code Review

  • Read all changed files
  • Review cmd/utils/app/import_cmd.go
  • Review execution/stagedsync/headerdownload/header_algos.go
  • Review execution/stagedsync/headerdownload/header_algo_test.go
  • Review execution/vm/evm.go
  • Review execution/tests/state_test.go

Required change (echoing taratorio's CHANGES_REQUESTED)

ShouldReorg was extracted into execution/stagedsync/headerdownload/header_algos.go, but headerdownload (along with bodydownload and the stage_headers/bodies pipeline) is slated for removal. Once that package is gone, import_cmd.go will lose its only dependency on it. The function should live in cmd/utils/app/import_cmd.go instead — it's only 9 lines and has no dependencies beyond math/big and common. FeedHeaderPoW in header_algos.go can call the unexported inline copy for now; once the package is removed the copy there disappears with it.

Concretely, move ShouldReorg (with its doc comment) to import_cmd.go as an unexported shouldReorg, update the call sites, and drop the headerdownload import from import_cmd.go.

Bug: panic on empty chain.Blocks (import_cmd.go:276)

firstBlock := chain.Blocks[0]   // panics if len == 0

InsertChain is a public function and the loop at the top of it (for _, block := range chain.Blocks) would silently succeed on an empty slice, reaching this line and panicking. Add an early return before the TD-comparison block:

if len(chain.Blocks) == 0 {
    return nil
}

Fix this →

execution/vm/evm.go — STATICCALL fix (correct)

The swap from AddBalance(addr, u256.Num0, …) to TouchAccount(addr) for STATICCALL is the right fix. The comment explaining the serial-mode shortcut in the RIPEMD-160 path that bypasses GetOrNewStateObject is accurate and non-obvious — good to have it. The fix matches CALL's post-Exist behavior and is consistent with geth.

execution/stagedsync/headerdownload/header_algos.goShouldReorg extraction (correct logic, wrong location)

The extracted function itself is correct. The tie-break matches geth: higher TD wins; on equal TD shorter chain wins; on equal TD+height the lexicographically larger hash wins. The behavior of FeedHeaderPoW is unchanged (it now delegates to ShouldReorg rather than inlining the same logic).

One minor nit: the doc comment on ShouldReorg says "Both the legacy header sync (FeedHeaderPoW) and the offline import command … consult this" — once headerdownload is removed that sentence will be stale. When the function moves to import_cmd.go, simplify the comment to just describe the rule.

execution/stagedsync/headerdownload/header_algo_test.goTestShouldReorg (good)

Coverage is complete: strictly-higher TD, strictly-lower TD, equal-TD shallower, equal-TD deeper, equal-TD equal-height hash-larger, hash-smaller, equal-hash. All seven cases are correct. Good addition.

execution/tests/state_test.go — refactoring + TestLegacyCancunState (clean)

The stateTestSetup / runStateTests helpers eliminate the duplication cleanly. TestLegacyCancunState closes the coverage gap that let the RIPEMD-160 d3 case slip through. The six SkipLoad entries for pre-existing Constantinople divergences are explained with a link to #20894 — appropriate.

One note: TestLegacyCancunState creates a new temporaltest.NewTestDB per call to st.Walk (one per fixture file), which is fine. The beginRwNoContention helper is a thoughtful addition for catching accidental DB sharing under -parallel.

Summary

The STATICCALL and TD fork-choice fixes are correct. The primary blocker is moving ShouldReorg out of headerdownload per taratorio's review. The empty-slice guard on chain.Blocks[0] is a secondary safety fix worth adding at the same time.

yperbasis and others added 2 commits April 29, 2026 15:53
The headerdownload package is slated for removal along with the legacy
stage_headers/bodies pipeline. Park ShouldReorg in ethash so the offline
import command can keep calling it after headerdownload is gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without the guard, the TD-comparison block dereferences chain.Blocks[0]
and panics on an empty slice instead of returning nil.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yperbasis yperbasis enabled auto-merge April 29, 2026 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants