Skip to content

ci: comprehensive-spec parity gate for bench↔CI#83

Merged
CPerezz merged 16 commits into
ethereum:mainfrom
CPerezz:feat/ci-bench-parity
May 22, 2026
Merged

ci: comprehensive-spec parity gate for bench↔CI#83
CPerezz merged 16 commits into
ethereum:mainfrom
CPerezz:feat/ci-bench-parity

Conversation

@CPerezz
Copy link
Copy Markdown
Collaborator

@CPerezz CPerezz commented May 21, 2026

After the latest benchmarks done while working in #75 we saw that the CI wasn't actually making sure that everything worked.

This fixes it. Now CI checks pre-spamoor state (injected state). It checks that the DB is functional and works (spamoor) and checks the post-spamoor state.

CPerezz added 16 commits May 21, 2026 15:38
examples/spec-ci-comprehensive.yaml is the canonical fixture for the
bench↔CI parity gate: 22 entities covering every --spec feature (3
templates, 3 address modes, EIP-7702 delegation markers, raw bytecode,
balance/nonce overrides, approximate_size_bytes storage synthesis,
explicit + bulk ERC-20 owners/allowances, hex-string balance form,
zero-balance EOA).

The spamoor sender is pinned to oracle.SpamoorSenderAddr — same as
spec-ci-baseline.yaml — so the existing
TestCISpecMatchesSpamoorSender drift gate fires for both fixtures.
spec_setup_test.go extends to iterate over both YAMLs.

internal/specbuild/comprehensive_test.go pins the entity count and
asserts cross-client invariance of the resulting PreAlloc shape (same
length under each client calibration with fixedSizer{bytesPerSlot:64}).

No CI gate wired yet — that lands in a follow-up commit that adds the
shared verify-spec.sh script and the per-client CI jobs that drive
this fixture through gen + boot + spamoor + verify.
State-actor's main scripts/ dir was a mix of forensic Go tools (used
manually for debugging) and CI/bench-orchestration files (used only by
.github/workflows/ci.yml and the bench host). Moving the latter into
.github/scripts/ keeps the main repo dirs focused on what state-actor
itself needs at runtime and test-time.

git mv (tracked files):
  scripts/run-bloatnet.sh             -> .github/scripts/
  scripts/reth-v8-solo.sh             -> .github/scripts/
  scripts/nethermind-v8-solo.sh       -> .github/scripts/
  scripts/nethermind-v8-postgen.sh    -> .github/scripts/
  scripts/engine-driver/              -> .github/scripts/

mv (previously-untracked files):
  scripts/besu-v8-reboot.sh           -> .github/scripts/
  scripts/reth-v7b-rerun.sh           -> .github/scripts/
  scripts/verify-cross-client-root.sh -> .github/scripts/
  scripts/gen-bloatnet-spec/          -> .github/scripts/

Path refs inside the moved scripts updated to point at the new
.github/scripts/{engine-driver,gen-bloatnet-spec} locations. The
verify-bloatnet.sh refs are left pointing at scripts/verify-bloatnet.sh
intentionally — that file stays in place for the upcoming
side-by-side regression gate (commit 3); commit 5 deletes it.

Stays in scripts/ (NOT CI-only — human-driven forensic + dev tools):
  probe-account/        dump-storages-trie/
  strip-shallow-bncs/   verify-trie-consistency/
  hashof                regen-erc20-bytecode.sh
.github/scripts/verify-spec.sh — fail-fast (set -euo pipefail) shell
script that walks the YAML at \$SPEC, derives each entity's address via
the same rule as internal/specbuild/derive.go (literal / name-derived
= keccak256(BE_u64(seed) || utf8(name))[12:] / position-derived =
same with "anon-N"), and asserts via cast:

  - balance, nonce, code per entity (explicit-only assertions).
  - ERC-20 template-specific: name(), symbol(), decimals(), totalSupply()
    queryable.

Plus fixed assertions:

  - chain-id == 1337
  - All 5 canonical system contracts (BeaconRoots / HistoryStorage /
    WithdrawalQueue / ConsolidationQueue / DepositContract) have
    non-empty code at canonical addresses — closes task #127's gap
    between bench and CI.
  - In MODE=post: chain advanced past genesis, BEACON_ROOTS ring buffer
    slot (timestamp % 8191) is non-zero (EIP-4788 pre-exec fired),
    spamoor sender nonce > 0 (txs landed).

First failing assertion exits non-zero. CI step inherits the exit;
bench-host orchestrator checks $?.

Not invoked anywhere yet. The next commit wires it into ci.yml as
the per-client comprehensive-spec job; the final commit deletes
scripts/verify-bloatnet.sh after a side-by-side regression check
against a known-good v8 bench DB.
New CI job comprehensive-spec-geth runs the full bench pipeline against
the geth client on every PR:

  1. Generate DB with examples/spec-ci-comprehensive.yaml (every --spec
     feature: 3 templates, 3 address modes, EIP-7702 markers, raw
     bytecode, balance/nonce overrides, approximate_size_bytes synth,
     explicit + bulk ERC-20 owners/allowances).
  2. Boot geth in --dev mode against the produced datadir.
  3. Wait for RPC.
  4. .github/scripts/verify-spec.sh MODE=pre — chain-id, canonical
     syscontracts, per-entity balance/nonce/code, ERC-20 metadata
     queryable. Fail-fast: first failing assertion aborts the workflow.
  5. spamoor erc20_bloater for 100 blocks (privkey=1 = the spamoor
     sender entity in the spec).
  6. verify-spec.sh MODE=post — adds chain advance, BEACON_ROOTS ring
     buffer (EIP-4788 pre-exec slot timestamp%8191 non-zero), spamoor
     sender nonce > 0.

This is the same shell script the bench host invokes via run-bloatnet.sh
(after the script migration to .github/scripts/ in the prior commit),
so any divergence between the four clients now fails CI before reaching
the bench host.

Geth-only for now: it runs as a pure-Go suite on the GHA runner without
DinD, so foundry/cast install + docker run geth is straightforward. The
besu/nethermind/reth equivalents need extending the cgo-suite composite
action's docker-in-docker scaffolding and are tracked as a follow-up.
The previous invocation passed --db=/tmp/parity-data which state-actor
took verbatim as DBPath. Geth client's writer + the geth daemon both
expect the canonical <datadir>/geth/chaindata layout (per
client/geth/e2e_test.go:89). The result: state-actor wrote at the
wrong path, geth booted with an empty datadir, --dev auto-deployed
its 4 hardcoded syscontracts but the Prysm-vendored DepositContract
(not in geth's params) was missing → verify-spec.sh's Phase I check
flagged the gap correctly.

Switch the gen step to write at /tmp/parity-data/geth/chaindata and
mount geth's --datadir at /data (without the /geth subpath, since geth
appends it itself).
state-actor's --spec path appends spec entities ON TOP of the synthetic
fill (default --accounts=1000 + --contracts=100), so the local CI run
was generating ~1000 random EOAs + ~100 random contracts whose addresses
could collide with the spec's name/position-derived entities. The bench
scripts pass --accounts=0 --contracts=0 to suppress the synthetic loop
on --spec runs; match that here.

Empirical evidence: pre-fix run had the name-derived 'named-token'
contract at 0x094d…7Fa41 reading back as empty code, even though the
specbuild golden test confirms state-actor writes 1723 bytes there.
The most likely cause is a collision with one of the synthetic
contracts overwriting the address at write-time.
Surfaces the per-entity address verify-spec.sh resolved to, so a
collision or derivation mismatch in CI is greppable from the workflow
log instead of requiring local repro.
TestSpecIntrospect (client/geth/spec_introspect_test.go) mirrors main.go's
--spec pipeline against examples/spec-ci-comprehensive.yaml + seed=0 +
--accounts=0 --contracts=0, then opens the produced Pebble DB and asserts
for every PreAllocEntity:
  (1) account snapshot row at "a" + keccak256(addr) exists
  (2) snapshot CodeHash matches keccak256(pe.Code)
  (3) Pebble row "c" + CodeHash returns pe.Code byte-equal

Locally PASSES for all 22 entities — confirming state-actor's writer
produces correct on-disk state (snapshot + code DB), including the
name-derived named-token at 0x094d…7Fa41. The bench/CI failure must
therefore originate downstream — between disk and geth's runtime
state.GetCode path.

The CI diagnostic block in comprehensive-spec-geth (before verify-spec.sh)
queries `cast code` for both explicit-token (works) and named-token
(fails) at block 0x0 vs latest. Three possible signatures will pin the
mode:
  - both addrs empty at 0x0 → geth didn't load state-actor's genesis
  - named-token empty at 0x0 AND latest → genesis-load drops the entry
  - named-token has code at 0x0 but empty at latest → snapshot
    regeneration / state pruning on chain advance drops it

Permanent gate: the Go test stays in the default CI job (no build
tags), catching writer-side regressions. The CI shell probes stay
until the named-token bug is root-caused.

KEEP_DATADIR=<path> env var lets the test preserve the produced datadir
for manual `geth --dev --datadir=<path>` boot + cast inspection.
Match the working TestE2ESuite (client/geth/e2e_test.go:78-79) which
passes BuildSynthetic("osaka", 1337, 60M, 1_700_000_000, 0xdeadbeef).
The default --timestamp=0 + empty --extra-data may produce a genesis
state-root/hash that geth --dev silently treats differently than the
osaka-fork-with-recent-timestamp genesis it embeds.

If the named-token entity now resolves at block 0, the bug was a
genesis-hash mismatch causing geth to discard part of the alloc. If
not, root cause is elsewhere.
Root cause of named-token (and every name/position-derived) entity
returning empty `eth_getCode` in the comprehensive-spec-geth CI job:
main.go:84-86 silently rewrites `*seed = time.Now().UnixNano()` when
`--seed=0` is passed. Documented contract, but CI was passing
`--seed=0` to opt-into determinism — the opposite of what the trap
does. State-actor wrote derived-address entities at random addresses;
verify-spec.sh probed the SEED=0-derived addresses; mismatch.

Pinned via TestSpecIntrospect, which exercises the same in-process
Populate path on /tmp datadir and walks every PreAllocEntity's
snapshot + code-DB rows. In-process gen passes seed=0 directly to
specbuild.Build (bypassing main.go) and is byte-correct. Binary gen
through main.go --seed=0 randomized.

Changes:
- .github/workflows/ci.yml — --seed=0 → --seed=42 (matches bench);
  SEED env 0 → 42 in pre + post verify steps; drop the diagnostic
  probe step (no longer needed).
- .github/scripts/verify-spec.sh — SEED default 0 → 42; explicit
  fail-fast when SEED=0 (so a future engineer hitting the same trap
  gets a loud error instead of empty cast outputs).
- client/geth/spec_introspect_test.go — strip investigation env-var
  crutches (REUSE_DATADIR / KEEP_DATADIR); polish docstring to call
  out the seed=0 footgun as the why-this-test-exists context.
In post-spamoor mode the sender's balance is drained on gas + ERC-20
deposits, so the exact-equality check from MODE=pre is structurally
wrong. The nonce > 0 assertion (check_spamoor_sender_nonce) is the
right post-spamoor invariant.

Surfaced when CI's comprehensive-spec-geth post-step failed with
spamoor-sender balance 999999983986772447577000069 vs the spec's
999999999000000000000000000 — i.e., ~15 ETH spent over 100 blocks of
spamoor.
Same shape as the balance skip: spamoor advances the sender's nonce
on every signed tx, so the spec-pinned genesis nonce (0) no longer
holds in post mode. The check_spamoor_sender_nonce() function already
asserts nonce > 0 as the proper post-spamoor invariant.

Caught by the previous CI run after the balance-skip fix landed.
Investigation artifacts (added while chasing the seed=0 footgun before
e48e6de pinned the root cause; not needed for the PR scope):
- client/geth/spec_introspect_test.go (per-entity Pebble introspection
  test; the in-process writer is already covered by the per-client
  TestE2ESuite + verify-spec.sh end-to-end).
- .github/workflows/ci.yml: drop --timestamp + --extra-data overrides
  (added on the false-positive genesis-hash-mismatch theory).
- .github/scripts/verify-spec.sh: drop "(addr=$addr)" debug-print noise
  from c120219.

Bench-host one-off scripts removed (not referenced anywhere):
- besu-v8-reboot.sh
- nethermind-v8-postgen.sh
- nethermind-v8-solo.sh
- reth-v7b-rerun.sh
- reth-v8-solo.sh

Kept:
- run-bloatnet.sh (referenced by scripts/verify-bloatnet.sh)
- verify-cross-client-root.sh (ad-hoc operator tool for live-RPC
  cross-client invariant verification; separate category from the
  v*-solo one-offs).
The bench host already owns its own runtime tooling out-of-tree; with
the CI parity gate now covering every spec parameter combination via
.github/scripts/verify-spec.sh + examples/spec-ci-comprehensive.yaml,
nothing in this repo benefits from having the bench-orchestration
scripts committed here.

Removed:
- .github/scripts/gen-bloatnet-spec/ (Go YAML generator for 100GB bench)
- .github/scripts/run-bloatnet.sh   (bench master orchestrator)
- .github/scripts/engine-driver/    (CL mock for engine-API runs)
- .github/scripts/verify-cross-client-root.sh (ad-hoc live-RPC operator tool)
- scripts/verify-bloatnet.sh        (bench RPC verifier)

Kept:
- .github/scripts/verify-spec.sh (wired into the CI workflow's
  comprehensive-spec-geth job; the only script the repo actually needs).

Also folds in two unstaged edits that fell through 24415f3's commit:
- .github/scripts/verify-spec.sh: revert "(addr=$addr)" debug-print noise
- .github/workflows/ci.yml: revert --timestamp + --extra-data overrides
- client/geth/genesis_block.go: trim a stale bench-script reference in
  WritePathDBMetadata's docstring.
Removes the standalone comprehensive-spec-geth job + verify-spec.sh
shell verifier, ports their 4 missing checks into the Go oracle, and
switches every per-client TestE2ESuite to the renamed full-matrix
fixture. Result: every client (geth/besu/neth/reth) now exercises
the same broader fixture under the same richer oracle, with the
cross-client genesis-root invariant gating the whole thing.

Coverage delta (Go oracle gains, all four clients):
- CheckChainID — eth_chainId asserts against cfg.Genesis.Config.ChainID.
- CheckCanonicalSyscontracts — eth_getCode for the 5 canonical mainnet
  syscontract addresses at block 0.
- CheckChainAdvanced — post-spamoor block-number > 0.
- CheckBeaconRootsRingBuffer — EIP-4788 slot at (latest.ts %% 8191)
  is non-zero, proving the pre-execution actually ran.
- CheckInjections now also asserts nonce when cfg.GenesisAccounts[addr]
  has non-zero nonce (catches dropped explicit-nonce overrides).

The Go path was already richer in ERC-20 owner/allowance sampling,
storage-slot probes, and spamoor-execution assertion; with these 4
additions, it's now a strict superset of verify-spec.sh.

File moves:
- examples/spec-ci-comprehensive.yaml → full-matrix-spec-feature.yaml.
- internal/specbuild/comprehensive_test.go → full_matrix_test.go
  (renamed test to TestBuildFullMatrix to match).
- examples/spec-ci-baseline.yaml deleted (full-matrix is a superset).

Per-client e2e suites (client/geth/, client/besu/, client/nethermind/,
client/reth/oracle_test.go) now load the full-matrix YAML instead of
the baseline.

Workflow surgery:
- comprehensive-spec-geth job deleted from .github/workflows/ci.yml.
- .github/scripts/verify-spec.sh deleted (last script in that dir —
  the whole .github/scripts/ tree is empty now).
- cross-client-genesis-root keystone gate untouched.

If any of besu/nethermind/reth turns red against the broader fixture,
the failure surface is now exactly the missing-coverage signal we
want — that's the point of this change.
Address findings from a three-agent PR review (code-reviewer,
pr-test-analyzer, silent-failure-hunter):

T1.1 — CheckBeaconRootsRingBuffer: reject literal "0x0" timestamp
The previous guard only caught "" (empty after 0x trim), so a
client returning a literal-zero timestamp ("0x0") parsed to ts=0,
slot=0, then emitted the misleading "EIP-4788 pre-exec didn't fire"
error. Now the guard rejects both "" and "0" with a message that
disambiguates "missing field" from "literal zero timestamp."

T2.1 — CheckChainID: fail loudly when cfg.Genesis.Config.ChainID is nil
The 4-deep nil-guard in runphases.go silently skipped chain-id
verification if any link was nil. The class of bug the check exists
to catch (besu chainspec drift) would be invisible whenever a future
caller forgot to populate Genesis.Config. Pulled the nil-chain into
a chainIDFromCfg helper that returns 0, and the caller now t.Fatalfs
on 0 with a message pointing at the per-client test setup.

T2.2 — Phase 6b results land in result.json
CheckChainAdvanced + CheckBeaconRootsRingBuffer now populate two new
bool fields on SuiteResult so the cross-client aggregator (or a
log-pruned post-mortem) has a machine-readable signal even when
test-log output is gone.

T2.3 — Unit tests for check_chain.go (14 tests, ~280 LOC)
Mirrors internal/rpcprobe/probe_test.go's fakeRPCServer pattern but
local to e2e_testing so we don't expose a test-only API from rpcprobe.
Highest-value coverage:
- TestCheckBeaconRootsRingBuffer_LiteralZeroTimestamp pins T1.1
- TestCheckBeaconRootsRingBuffer_ModulusRollover pins the 8191 modulus
- TestCheckCanonicalSyscontracts_AccumulatesAllFailures + _LastMissing
  pin that the loop reaches every address (guards against a future
  short-circuit refactor)

T2.4 — Zero-balance entities now verified
Dropped the !acct.Balance.IsZero() skip in CheckInjections. The
"balloon RPC calls" concern doesn't apply because the e2e suites
use --accounts=0 --contracts=0 (no synthetic-fill). Now the spec's
zero-balance-eoa entity and the canonical syscontracts get an
eth_getBalance call asserting their on-chain balance == 0.

T3.1 — CheckBeaconRootsRingBuffer docstring drift
The previous docstring claimed the function reads the
parent-beacon-block-root slot, but it actually reads the timestamp
slot (correct choice in --dev mode where the parent root is often a
literal zero hash, but the doc had drifted). Rewrote to call out the
two-slot contract per EIP-4788 and the --dev caveat.

T3.2 — slotHash encoding idiom
Swapped binary.BigEndian.PutUint64(slotHash[24:], slot) for the
conventional common.BigToHash(new(big.Int).SetUint64(slot)) — drops
the encoding/binary import.

T3.3 — Short-circuit BEACON_ROOTS when chain didn't advance
If CheckChainAdvanced fails, calling CheckBeaconRootsRingBuffer
produces a second redundant error from BlockByNumber's zero-stateRoot
guard. Now the second check is skipped (result.PostSpamoorBeaconRootsOK
stays false, which is correct).

Verification: all 14 new check_chain unit tests pass; full
internal/... + generator/... + genesis/... suites still green.
@CPerezz CPerezz merged commit cca6924 into ethereum:main May 22, 2026
7 checks passed
@CPerezz CPerezz deleted the feat/ci-bench-parity branch May 22, 2026 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant