ci: comprehensive-spec parity gate for bench↔CI#83
Merged
Conversation
examples/spec-ci-comprehensive.yaml is the canonical fixture for the
bench↔CI parity gate: 22 entities covering every --spec feature (3
templates, 3 address modes, EIP-7702 delegation markers, raw bytecode,
balance/nonce overrides, approximate_size_bytes storage synthesis,
explicit + bulk ERC-20 owners/allowances, hex-string balance form,
zero-balance EOA).
The spamoor sender is pinned to oracle.SpamoorSenderAddr — same as
spec-ci-baseline.yaml — so the existing
TestCISpecMatchesSpamoorSender drift gate fires for both fixtures.
spec_setup_test.go extends to iterate over both YAMLs.
internal/specbuild/comprehensive_test.go pins the entity count and
asserts cross-client invariance of the resulting PreAlloc shape (same
length under each client calibration with fixedSizer{bytesPerSlot:64}).
No CI gate wired yet — that lands in a follow-up commit that adds the
shared verify-spec.sh script and the per-client CI jobs that drive
this fixture through gen + boot + spamoor + verify.
State-actor's main scripts/ dir was a mix of forensic Go tools (used
manually for debugging) and CI/bench-orchestration files (used only by
.github/workflows/ci.yml and the bench host). Moving the latter into
.github/scripts/ keeps the main repo dirs focused on what state-actor
itself needs at runtime and test-time.
git mv (tracked files):
scripts/run-bloatnet.sh -> .github/scripts/
scripts/reth-v8-solo.sh -> .github/scripts/
scripts/nethermind-v8-solo.sh -> .github/scripts/
scripts/nethermind-v8-postgen.sh -> .github/scripts/
scripts/engine-driver/ -> .github/scripts/
mv (previously-untracked files):
scripts/besu-v8-reboot.sh -> .github/scripts/
scripts/reth-v7b-rerun.sh -> .github/scripts/
scripts/verify-cross-client-root.sh -> .github/scripts/
scripts/gen-bloatnet-spec/ -> .github/scripts/
Path refs inside the moved scripts updated to point at the new
.github/scripts/{engine-driver,gen-bloatnet-spec} locations. The
verify-bloatnet.sh refs are left pointing at scripts/verify-bloatnet.sh
intentionally — that file stays in place for the upcoming
side-by-side regression gate (commit 3); commit 5 deletes it.
Stays in scripts/ (NOT CI-only — human-driven forensic + dev tools):
probe-account/ dump-storages-trie/
strip-shallow-bncs/ verify-trie-consistency/
hashof regen-erc20-bytecode.sh
.github/scripts/verify-spec.sh — fail-fast (set -euo pipefail) shell
script that walks the YAML at \$SPEC, derives each entity's address via
the same rule as internal/specbuild/derive.go (literal / name-derived
= keccak256(BE_u64(seed) || utf8(name))[12:] / position-derived =
same with "anon-N"), and asserts via cast:
- balance, nonce, code per entity (explicit-only assertions).
- ERC-20 template-specific: name(), symbol(), decimals(), totalSupply()
queryable.
Plus fixed assertions:
- chain-id == 1337
- All 5 canonical system contracts (BeaconRoots / HistoryStorage /
WithdrawalQueue / ConsolidationQueue / DepositContract) have
non-empty code at canonical addresses — closes task #127's gap
between bench and CI.
- In MODE=post: chain advanced past genesis, BEACON_ROOTS ring buffer
slot (timestamp % 8191) is non-zero (EIP-4788 pre-exec fired),
spamoor sender nonce > 0 (txs landed).
First failing assertion exits non-zero. CI step inherits the exit;
bench-host orchestrator checks $?.
Not invoked anywhere yet. The next commit wires it into ci.yml as
the per-client comprehensive-spec job; the final commit deletes
scripts/verify-bloatnet.sh after a side-by-side regression check
against a known-good v8 bench DB.
New CI job comprehensive-spec-geth runs the full bench pipeline against
the geth client on every PR:
1. Generate DB with examples/spec-ci-comprehensive.yaml (every --spec
feature: 3 templates, 3 address modes, EIP-7702 markers, raw
bytecode, balance/nonce overrides, approximate_size_bytes synth,
explicit + bulk ERC-20 owners/allowances).
2. Boot geth in --dev mode against the produced datadir.
3. Wait for RPC.
4. .github/scripts/verify-spec.sh MODE=pre — chain-id, canonical
syscontracts, per-entity balance/nonce/code, ERC-20 metadata
queryable. Fail-fast: first failing assertion aborts the workflow.
5. spamoor erc20_bloater for 100 blocks (privkey=1 = the spamoor
sender entity in the spec).
6. verify-spec.sh MODE=post — adds chain advance, BEACON_ROOTS ring
buffer (EIP-4788 pre-exec slot timestamp%8191 non-zero), spamoor
sender nonce > 0.
This is the same shell script the bench host invokes via run-bloatnet.sh
(after the script migration to .github/scripts/ in the prior commit),
so any divergence between the four clients now fails CI before reaching
the bench host.
Geth-only for now: it runs as a pure-Go suite on the GHA runner without
DinD, so foundry/cast install + docker run geth is straightforward. The
besu/nethermind/reth equivalents need extending the cgo-suite composite
action's docker-in-docker scaffolding and are tracked as a follow-up.
The previous invocation passed --db=/tmp/parity-data which state-actor took verbatim as DBPath. Geth client's writer + the geth daemon both expect the canonical <datadir>/geth/chaindata layout (per client/geth/e2e_test.go:89). The result: state-actor wrote at the wrong path, geth booted with an empty datadir, --dev auto-deployed its 4 hardcoded syscontracts but the Prysm-vendored DepositContract (not in geth's params) was missing → verify-spec.sh's Phase I check flagged the gap correctly. Switch the gen step to write at /tmp/parity-data/geth/chaindata and mount geth's --datadir at /data (without the /geth subpath, since geth appends it itself).
state-actor's --spec path appends spec entities ON TOP of the synthetic fill (default --accounts=1000 + --contracts=100), so the local CI run was generating ~1000 random EOAs + ~100 random contracts whose addresses could collide with the spec's name/position-derived entities. The bench scripts pass --accounts=0 --contracts=0 to suppress the synthetic loop on --spec runs; match that here. Empirical evidence: pre-fix run had the name-derived 'named-token' contract at 0x094d…7Fa41 reading back as empty code, even though the specbuild golden test confirms state-actor writes 1723 bytes there. The most likely cause is a collision with one of the synthetic contracts overwriting the address at write-time.
Surfaces the per-entity address verify-spec.sh resolved to, so a collision or derivation mismatch in CI is greppable from the workflow log instead of requiring local repro.
TestSpecIntrospect (client/geth/spec_introspect_test.go) mirrors main.go's
--spec pipeline against examples/spec-ci-comprehensive.yaml + seed=0 +
--accounts=0 --contracts=0, then opens the produced Pebble DB and asserts
for every PreAllocEntity:
(1) account snapshot row at "a" + keccak256(addr) exists
(2) snapshot CodeHash matches keccak256(pe.Code)
(3) Pebble row "c" + CodeHash returns pe.Code byte-equal
Locally PASSES for all 22 entities — confirming state-actor's writer
produces correct on-disk state (snapshot + code DB), including the
name-derived named-token at 0x094d…7Fa41. The bench/CI failure must
therefore originate downstream — between disk and geth's runtime
state.GetCode path.
The CI diagnostic block in comprehensive-spec-geth (before verify-spec.sh)
queries `cast code` for both explicit-token (works) and named-token
(fails) at block 0x0 vs latest. Three possible signatures will pin the
mode:
- both addrs empty at 0x0 → geth didn't load state-actor's genesis
- named-token empty at 0x0 AND latest → genesis-load drops the entry
- named-token has code at 0x0 but empty at latest → snapshot
regeneration / state pruning on chain advance drops it
Permanent gate: the Go test stays in the default CI job (no build
tags), catching writer-side regressions. The CI shell probes stay
until the named-token bug is root-caused.
KEEP_DATADIR=<path> env var lets the test preserve the produced datadir
for manual `geth --dev --datadir=<path>` boot + cast inspection.
Match the working TestE2ESuite (client/geth/e2e_test.go:78-79) which
passes BuildSynthetic("osaka", 1337, 60M, 1_700_000_000, 0xdeadbeef).
The default --timestamp=0 + empty --extra-data may produce a genesis
state-root/hash that geth --dev silently treats differently than the
osaka-fork-with-recent-timestamp genesis it embeds.
If the named-token entity now resolves at block 0, the bug was a
genesis-hash mismatch causing geth to discard part of the alloc. If
not, root cause is elsewhere.
Root cause of named-token (and every name/position-derived) entity returning empty `eth_getCode` in the comprehensive-spec-geth CI job: main.go:84-86 silently rewrites `*seed = time.Now().UnixNano()` when `--seed=0` is passed. Documented contract, but CI was passing `--seed=0` to opt-into determinism — the opposite of what the trap does. State-actor wrote derived-address entities at random addresses; verify-spec.sh probed the SEED=0-derived addresses; mismatch. Pinned via TestSpecIntrospect, which exercises the same in-process Populate path on /tmp datadir and walks every PreAllocEntity's snapshot + code-DB rows. In-process gen passes seed=0 directly to specbuild.Build (bypassing main.go) and is byte-correct. Binary gen through main.go --seed=0 randomized. Changes: - .github/workflows/ci.yml — --seed=0 → --seed=42 (matches bench); SEED env 0 → 42 in pre + post verify steps; drop the diagnostic probe step (no longer needed). - .github/scripts/verify-spec.sh — SEED default 0 → 42; explicit fail-fast when SEED=0 (so a future engineer hitting the same trap gets a loud error instead of empty cast outputs). - client/geth/spec_introspect_test.go — strip investigation env-var crutches (REUSE_DATADIR / KEEP_DATADIR); polish docstring to call out the seed=0 footgun as the why-this-test-exists context.
In post-spamoor mode the sender's balance is drained on gas + ERC-20 deposits, so the exact-equality check from MODE=pre is structurally wrong. The nonce > 0 assertion (check_spamoor_sender_nonce) is the right post-spamoor invariant. Surfaced when CI's comprehensive-spec-geth post-step failed with spamoor-sender balance 999999983986772447577000069 vs the spec's 999999999000000000000000000 — i.e., ~15 ETH spent over 100 blocks of spamoor.
Same shape as the balance skip: spamoor advances the sender's nonce on every signed tx, so the spec-pinned genesis nonce (0) no longer holds in post mode. The check_spamoor_sender_nonce() function already asserts nonce > 0 as the proper post-spamoor invariant. Caught by the previous CI run after the balance-skip fix landed.
Investigation artifacts (added while chasing the seed=0 footgun before e48e6de pinned the root cause; not needed for the PR scope): - client/geth/spec_introspect_test.go (per-entity Pebble introspection test; the in-process writer is already covered by the per-client TestE2ESuite + verify-spec.sh end-to-end). - .github/workflows/ci.yml: drop --timestamp + --extra-data overrides (added on the false-positive genesis-hash-mismatch theory). - .github/scripts/verify-spec.sh: drop "(addr=$addr)" debug-print noise from c120219. Bench-host one-off scripts removed (not referenced anywhere): - besu-v8-reboot.sh - nethermind-v8-postgen.sh - nethermind-v8-solo.sh - reth-v7b-rerun.sh - reth-v8-solo.sh Kept: - run-bloatnet.sh (referenced by scripts/verify-bloatnet.sh) - verify-cross-client-root.sh (ad-hoc operator tool for live-RPC cross-client invariant verification; separate category from the v*-solo one-offs).
The bench host already owns its own runtime tooling out-of-tree; with the CI parity gate now covering every spec parameter combination via .github/scripts/verify-spec.sh + examples/spec-ci-comprehensive.yaml, nothing in this repo benefits from having the bench-orchestration scripts committed here. Removed: - .github/scripts/gen-bloatnet-spec/ (Go YAML generator for 100GB bench) - .github/scripts/run-bloatnet.sh (bench master orchestrator) - .github/scripts/engine-driver/ (CL mock for engine-API runs) - .github/scripts/verify-cross-client-root.sh (ad-hoc live-RPC operator tool) - scripts/verify-bloatnet.sh (bench RPC verifier) Kept: - .github/scripts/verify-spec.sh (wired into the CI workflow's comprehensive-spec-geth job; the only script the repo actually needs). Also folds in two unstaged edits that fell through 24415f3's commit: - .github/scripts/verify-spec.sh: revert "(addr=$addr)" debug-print noise - .github/workflows/ci.yml: revert --timestamp + --extra-data overrides - client/geth/genesis_block.go: trim a stale bench-script reference in WritePathDBMetadata's docstring.
Removes the standalone comprehensive-spec-geth job + verify-spec.sh shell verifier, ports their 4 missing checks into the Go oracle, and switches every per-client TestE2ESuite to the renamed full-matrix fixture. Result: every client (geth/besu/neth/reth) now exercises the same broader fixture under the same richer oracle, with the cross-client genesis-root invariant gating the whole thing. Coverage delta (Go oracle gains, all four clients): - CheckChainID — eth_chainId asserts against cfg.Genesis.Config.ChainID. - CheckCanonicalSyscontracts — eth_getCode for the 5 canonical mainnet syscontract addresses at block 0. - CheckChainAdvanced — post-spamoor block-number > 0. - CheckBeaconRootsRingBuffer — EIP-4788 slot at (latest.ts %% 8191) is non-zero, proving the pre-execution actually ran. - CheckInjections now also asserts nonce when cfg.GenesisAccounts[addr] has non-zero nonce (catches dropped explicit-nonce overrides). The Go path was already richer in ERC-20 owner/allowance sampling, storage-slot probes, and spamoor-execution assertion; with these 4 additions, it's now a strict superset of verify-spec.sh. File moves: - examples/spec-ci-comprehensive.yaml → full-matrix-spec-feature.yaml. - internal/specbuild/comprehensive_test.go → full_matrix_test.go (renamed test to TestBuildFullMatrix to match). - examples/spec-ci-baseline.yaml deleted (full-matrix is a superset). Per-client e2e suites (client/geth/, client/besu/, client/nethermind/, client/reth/oracle_test.go) now load the full-matrix YAML instead of the baseline. Workflow surgery: - comprehensive-spec-geth job deleted from .github/workflows/ci.yml. - .github/scripts/verify-spec.sh deleted (last script in that dir — the whole .github/scripts/ tree is empty now). - cross-client-genesis-root keystone gate untouched. If any of besu/nethermind/reth turns red against the broader fixture, the failure surface is now exactly the missing-coverage signal we want — that's the point of this change.
Address findings from a three-agent PR review (code-reviewer,
pr-test-analyzer, silent-failure-hunter):
T1.1 — CheckBeaconRootsRingBuffer: reject literal "0x0" timestamp
The previous guard only caught "" (empty after 0x trim), so a
client returning a literal-zero timestamp ("0x0") parsed to ts=0,
slot=0, then emitted the misleading "EIP-4788 pre-exec didn't fire"
error. Now the guard rejects both "" and "0" with a message that
disambiguates "missing field" from "literal zero timestamp."
T2.1 — CheckChainID: fail loudly when cfg.Genesis.Config.ChainID is nil
The 4-deep nil-guard in runphases.go silently skipped chain-id
verification if any link was nil. The class of bug the check exists
to catch (besu chainspec drift) would be invisible whenever a future
caller forgot to populate Genesis.Config. Pulled the nil-chain into
a chainIDFromCfg helper that returns 0, and the caller now t.Fatalfs
on 0 with a message pointing at the per-client test setup.
T2.2 — Phase 6b results land in result.json
CheckChainAdvanced + CheckBeaconRootsRingBuffer now populate two new
bool fields on SuiteResult so the cross-client aggregator (or a
log-pruned post-mortem) has a machine-readable signal even when
test-log output is gone.
T2.3 — Unit tests for check_chain.go (14 tests, ~280 LOC)
Mirrors internal/rpcprobe/probe_test.go's fakeRPCServer pattern but
local to e2e_testing so we don't expose a test-only API from rpcprobe.
Highest-value coverage:
- TestCheckBeaconRootsRingBuffer_LiteralZeroTimestamp pins T1.1
- TestCheckBeaconRootsRingBuffer_ModulusRollover pins the 8191 modulus
- TestCheckCanonicalSyscontracts_AccumulatesAllFailures + _LastMissing
pin that the loop reaches every address (guards against a future
short-circuit refactor)
T2.4 — Zero-balance entities now verified
Dropped the !acct.Balance.IsZero() skip in CheckInjections. The
"balloon RPC calls" concern doesn't apply because the e2e suites
use --accounts=0 --contracts=0 (no synthetic-fill). Now the spec's
zero-balance-eoa entity and the canonical syscontracts get an
eth_getBalance call asserting their on-chain balance == 0.
T3.1 — CheckBeaconRootsRingBuffer docstring drift
The previous docstring claimed the function reads the
parent-beacon-block-root slot, but it actually reads the timestamp
slot (correct choice in --dev mode where the parent root is often a
literal zero hash, but the doc had drifted). Rewrote to call out the
two-slot contract per EIP-4788 and the --dev caveat.
T3.2 — slotHash encoding idiom
Swapped binary.BigEndian.PutUint64(slotHash[24:], slot) for the
conventional common.BigToHash(new(big.Int).SetUint64(slot)) — drops
the encoding/binary import.
T3.3 — Short-circuit BEACON_ROOTS when chain didn't advance
If CheckChainAdvanced fails, calling CheckBeaconRootsRingBuffer
produces a second redundant error from BlockByNumber's zero-stateRoot
guard. Now the second check is skipped (result.PostSpamoorBeaconRootsOK
stays false, which is correct).
Verification: all 14 new check_chain unit tests pass; full
internal/... + generator/... + genesis/... suites still green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
After the latest benchmarks done while working in #75 we saw that the CI wasn't actually making sure that everything worked.
This fixes it. Now CI checks pre-spamoor state (injected state). It checks that the DB is functional and works (spamoor) and checks the post-spamoor state.