Skip to content

perf(l1): batch account-state prefetch via rocksdb multi_get_cf#6712

Open
edg-l wants to merge 4 commits into
mainfrom
perf/bal-prefetch-accounts-multi-get
Open

perf(l1): batch account-state prefetch via rocksdb multi_get_cf#6712
edg-l wants to merge 4 commits into
mainfrom
perf/bal-prefetch-accounts-multi-get

Conversation

@edg-l
Copy link
Copy Markdown
Contributor

@edg-l edg-l commented May 22, 2026

Summary

The BAL prewarmer's CachingDatabase::prefetch_accounts issues one rocksdb point-get per address, parallelized via rayon. For a BAL with thousands of addresses, that is thousands of independent column-family lookups, each going through bloom + block-cache + (cache miss) SST read.

This change collapses the prefetch into a single batched_multi_get_cf call on ACCOUNT_FLATKEYVALUE for addresses past the FKV generator's cursor, falling back to per-address trie walks for paths not yet swept by the generator. The diff-layer cache is checked first per address, matching the existing Trie::get semantics.

Plumbing:

  • StorageReadView::multi_get (default loops get; RocksDB override uses batched_multi_get_cf)
  • Store::get_account_states_batch_by_root ; FKV multi_get + trie fallback
  • VmDatabase::get_account_states_batch ; trait method, overridden on StoreVmDatabase
  • LevmDatabase::get_account_states_batch ; trait method, overridden on DynVmDatabase
  • CachingDatabase::prefetch_accounts ; dispatches to the batch path; default impl loops for non-rocksdb backends

Benchmark

Fixture: bal-devnet-7-mainnet-mix-460 (460 blocks, ~30 Ggas, transfer/EVM mix). release-with-debug, import-bench --with-bal. Baseline = b0b9a11c5d (no perf change).

metric baseline this PR delta
wall time 4.632 s 3.636 s -996 ms (-21.5%)
agg Ggas/s 8.016 11.066 +38.0%
avg ms / block 8.217 5.952 -2.27 ms (-27.6%)
exec 7.286 ms 5.073 ms -2.21 ms (-30.4%)
warmer 6.777 ms 2.369 ms -4.41 ms (-65.0%)
store 0.656 ms 0.660 ms ~flat

Compare dashboard: https://edgl.dev/share/compare-bal-devnet-7-baseline-vs-fkv-multi_get.html

The warmer phase collapsing to a third of baseline is the smoking gun ; warm_block_from_bal → prefetch_accounts was the dominant cost there, and is now a single multi_get_cf for the FKV-covered subset.

Notes

  • Only applies to the BAL-driven parallel execution path (warm_block_from_bal); non-BAL imports go through warm_block and don't touch this method.
  • Same optimization on storage prefetch was tried and reverted: each storage lookup is already a single rocksdb get (no trie walk to collapse), so par_iter point-gets beat multi_get's batching amortization on the warmer's wall-clock budget. See PR discussion for the perf data.

Test plan

  • cargo check --workspace
  • make -C tooling/ef_tests/blockchain test ; 8819 passed, 0 failed
  • Hive bal group
  • CI green

Replace N parallel point-gets in CachingDatabase::prefetch_accounts with
a single multi_get_cf on ACCOUNT_FLATKEYVALUE for FKV-covered addresses,
falling back to per-address trie walks for paths past the FKV cursor.
Reduces warmer-phase cost on the bal-devnet-7 fixture from 6.78 ms to
2.37 ms per block (-65%); total wall time -21.5%.
@edg-l edg-l requested a review from a team as a code owner May 22, 2026 10:32
@github-actions github-actions Bot added L1 Ethereum client performance Block execution throughput and performance in general labels May 22, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

⚠️ Known Issues — intentionally skipped tests

Source: docs/known_issues.md

Known Issues

Tests intentionally excluded from CI. Source of truth for the Known
Issues
section the L1 workflow appends to each ef-tests job summary
and posts as a sticky PR comment.

EF Tests — Stateless coverage narrowed to EIP-8025 optional-proofs

make -C tooling/ef_tests/blockchain test calls test-stateless-zkevm
instead of test-stateless. The zkevm@v0.3.3 fixtures are filled against
bal@v5.6.1, out of sync with current bal spec; the broad target trips ~549
fixtures. Re-broaden once the zkevm bundle is regenerated.

Why and resolution path

PR #6527 broadened
test-stateless to extract the entire for_amsterdam/ tree from the
zkevm bundle and run all of it under --features stateless; combined with
this branch's bal-devnet-7 semantics that scope produces ~549
GasUsedMismatch / ReceiptsRootMismatch /
BlockAccessListHashMismatch failures.

test-stateless-zkevm filters cargo to the eip8025_optional_proofs
suite, which still validates the stateless harness without the bal-version
mismatch.

Re-broaden by switching test: back to test-stateless in
tooling/ef_tests/blockchain/Makefile once the zkevm bundle is regenerated
against the current bal spec.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

Lines of code report

Total lines added: 180
Total lines removed: 0
Total lines changed: 180

Detailed view
+------------------------------------------+-------+------+
| File                                     | Lines | Diff |
+------------------------------------------+-------+------+
| ethrex/crates/blockchain/vm.rs           | 269   | +54  |
+------------------------------------------+-------+------+
| ethrex/crates/storage/api/mod.rs         | 44    | +7   |
+------------------------------------------+-------+------+
| ethrex/crates/storage/backend/rocksdb.rs | 343   | +20  |
+------------------------------------------+-------+------+
| ethrex/crates/storage/store.rs           | 2761  | +62  |
+------------------------------------------+-------+------+
| ethrex/crates/vm/backends/levm/db.rs     | 122   | +11  |
+------------------------------------------+-------+------+
| ethrex/crates/vm/db.rs                   | 25    | +9   |
+------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/db/mod.rs      | 165   | +17  |
+------------------------------------------+-------+------+

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have enough context for a thorough review. Here's the full analysis:


Code Review: perf(l1): batch account-state prefetch via rocksdb multi_get_cf

Overview

This PR collapses the per-address RocksDB point-gets in CachingDatabase::prefetch_accounts into a single batched_multi_get_cf call for addresses whose hashed paths fall within the FKV generator's cursor. The architecture is clean — new default trait methods loop single-get, and the RocksDB-backed path overrides with the batch read. The benchmark results (−65% warmer phase, +38% Ggas/s) validate the approach. The implementation is largely correct, but there are a few issues to address.


Correctness Issues

FKV cursor comparison semantics differ from the existing helper

store.rs:2513-2514:

let fkv_cursor = Nibbles::from_hex(last_written.clone());
if fkv_cursor >= path_nibbles {    // >=

store.rs:3032-3035 (flatkeyvalue_computed_with_last_written):

fn flatkeyvalue_computed_with_last_written(account: H256, last_written: &[u8]) -> bool {
    let account_nibbles = Nibbles::from_bytes(account.as_bytes());
    &last_written[0..64] > account_nibbles.as_ref()  // strict >
}

The new batch path uses >= (consistent with BackendTrieDB::flatkeyvalue_computed at trie.rs:128), while the existing single-account helper uses strict >. For the edge case where the cursor equals the address path exactly, the batch path routes to FKV while the single-account path falls back to the trie. Both are safe (the trie always has the correct answer), but the inconsistency means the two code paths can return identical results via different routes. It is worth either:

  • Unifying by fixing the helper to use >= (it appears the trie's own semantics already use >=), or
  • Adding a comment explaining why the helper is deliberately conservative.

Two separate read transactions when both FKV and trie paths are active

store.rs:2521-2543: the FKV path opens self.backend.begin_read(), and then the trie fallback calls self.open_state_trie(state_root) which internally opens another read view. If both branches are active, two independent snapshot handles are acquired. For RocksDB this means two separate snapshot acquisitions — subtly, they could see different states if a write commits between them. The existing pattern in get_account_state_by_root (line 2255) acquires a single shared read_view for both trie and FKV to avoid this. Consider plumbing the same shared read_view through both branches.


Performance Issues

fkv_cursor re-allocated on every loop iteration

store.rs:2513:

for (i, path) in leaf_paths.iter().enumerate() {
    // ...
    let fkv_cursor = Nibbles::from_hex(last_written.clone());  // cloned every iteration

last_written never changes within the loop. Nibbles::from_hex + the Vec<u8> clone runs once per address. Move it before the loop:

let fkv_cursor = Nibbles::from_hex(last_written.clone());
for (i, path) in leaf_paths.iter().enumerate() {
    // ...
    let path_nibbles = Nibbles::from_hex(path.clone());
    if fkv_cursor >= path_nibbles {

leaf_paths[i].clone() inside routing loop

store.rs:2510: Nibbles::from_hex(path.clone()) allocates a fresh Vec<u8> per address just to wrap it in a Nibbles. from_hex likely wraps the underlying data — if it accepts a borrow, pass path.as_slice() instead to avoid the clone. If ownership is required, the allocation is unavoidable, but it's worth verifying the API.


Code Quality Issues

Dead diff_hits counter

store.rs:2499 and 2546:

let mut diff_hits: usize = 0;
// ...
diff_hits += 1;
// ...
let _ = diff_hits; // surface via tracing if needed.

Computing a counter only to suppress it with let _ = ... is dead code. Either emit it via tracing::trace! now, or remove it entirely. The let _ = pattern is typically reserved for ignoring return values at call sites, not for dead state.

Double-wrapping error in rocksdb.rs

backend/rocksdb.rs:311-314:

let err = StoreError::Custom(format!("Table {} not found", table));
return (0..keys.len())
    .map(|_| Err(StoreError::Custom(err.to_string())))
    .collect();

err.to_string() calls Display on a StoreError::Custom, then wraps the resulting string in another StoreError::Custom. Replace with:

let msg = format!("Table {} not found", table);
return (0..keys.len())
    .map(|_| Err(StoreError::Custom(msg.clone())))
    .collect();

Relies silently on Copy for AccountStateCacheEntry

blockchain/vm.rs:180-181:

cache.insert(*addr, cached);      // takes ownership (or copies)
results[*slot] = cached.map(|e| e.state);  // reuses cached

This compiles because AccountStateCacheEntry derives Copy (confirmed at vm.rs:17), but the intent is opaque — a reader unfamiliar with the Copy impl will flag this as a use-after-move. If someone later adds a heap-allocated field (removing Copy), this silently breaks. A clearer pattern:

results[*slot] = state;
let entry = state.map(|s| AccountStateCacheEntry {
    state: s,
    hashed_address: H256::from(keccak_hash(addr.to_fixed_bytes())),
});
cache.insert(*addr, entry);

This also avoids recomputing keccak_hash for the sole purpose of populating a field that isn't needed for the result.


Minor Notes

  • storage/api/mod.rs: The new multi_get default impl is clean and the doc comment clearly states ordering guarantees. Good.
  • vm/db.rs and levm/src/db/mod.rs: The layered trait defaults (loop single-get → override with batch) are the right pattern for this kind of extensibility.
  • rocksdb.rs:316 comment: sorted_input=false — accurate. The sorted_input parameter tells RocksDB whether the caller has pre-sorted keys (an optimization hint), not whether the output is sorted. The comment could be slightly more precise on this distinction, but it's fine as-is.
  • levm/db.rs: unwrap_or_default() for absent accounts in DynVmDatabase::get_account_states_batch is consistent with the existing get_account_state behavior.
  • The trie fallback opens the trie once and iterates trie_indices over it (store.rs:2540-2543). This is correct and avoids re-opening per address.
  • The #[cfg(all(feature = "rayon", not(feature = "eip-8025")))] guard is preserved on prefetch_accounts. The new code no longer uses par_iter — verify there isn't a version of prefetch_accounts under other feature flag combinations that still needs updating.

Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

No correctness or security bugs stood out in the diff. I do see two performance concerns:

  1. prefetch_accounts now always routes through the new batch API at crates/vm/levm/src/db/mod.rs, but the default get_account_states_batch implementation is still a plain sequential loop at crates/vm/levm/src/db/mod.rs and crates/vm/db.rs. Also, the store batch path still falls back to a sequential trie walk for non-FKV-covered accounts at crates/storage/store.rs. That means this change removes the previous rayon fan-out for any backend without a specialized batch impl, and for partially covered FKV batches it can still serialize the slow path. On BAL warming/import hot paths, that can increase latency noticeably. I’d keep the old parallel fallback when the backend does not override batching, or parallelize the trie fallback branch.

  2. The new classifier in crates/storage/store.rs rebuilds Nibbles and clones last_written for every address. This is a hot path added specifically for throughput, so the repeated heap work is avoidable. Precompute fkv_cursor once outside the loop and compare against prebuilt paths directly.

Open question: this code manually reproduces the trie/FKV selection logic instead of going through Trie::get, so I’d want a regression test proving batch lookup matches repeated single-account lookup for diff-layer hits, FKV-covered hits/misses, and trie fallback around the cursor boundary.

I didn’t run tests in this environment.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 22, 2026

Greptile Summary

This PR replaces the BAL prewarmer's per-address par_iter point-gets with a single batched_multi_get_cf call on ACCOUNT_FLATKEYVALUE for addresses whose hashed path falls within the FKV generator's cursor, falling back to per-address trie walks for the rest. The plumbing adds StorageReadView::multi_get (with a loop-based default and a RocksDB override), Store::get_account_states_batch_by_root, and VmDatabase/LevmDatabase::get_account_states_batch trait methods wired through the LEVM adapter stack.

  • The FKV boundary comparison (fkv_cursor >= path_nibbles, full 65-byte nibble path) correctly mirrors BackendTrieDB::flatkeyvalue_computed's >= semantics and handles the empty-cursor (FKV not started) and [0xff]-cursor (FKV complete) edge cases without panicking.
  • Cache coherence in StoreVmDatabase::get_account_states_batch is correctly maintained: the read-lock is dropped before the store fetch, and the write-back uses insert to allow overwrite — matching the original single-address path.
  • CachingDatabase::prefetch_accounts now pre-filters already-cached addresses and calls the batch path under a single read-lock then a single write-lock, removing the rayon parallel scatter.

Confidence Score: 4/5

The batch account-lookup path is logically correct; the two findings are minor cleanup items that do not affect correctness or data integrity.

The FKV cursor comparison matches the trie's own semantics exactly, the cache split/reassembly in StoreVmDatabase is correct, and the RocksDB batched_multi_get_cf override propagates errors faithfully. Only two non-blocking style issues exist: a per-iteration clone of last_written that can be hoisted outside the loop, and a diff_hits counter that is computed but never surfaced.

crates/storage/store.rs — the new get_account_states_batch_by_root function contains both findings; all other files are straightforward adapters or trait additions.

Important Files Changed

Filename Overview
crates/storage/store.rs Adds get_account_states_batch_by_root: FKV path uses multi_get for addresses below the FKV cursor, falls back to trie for the rest. Cursor comparison correctly mirrors BackendTrieDB::flatkeyvalue_computed semantics. Minor: fkv_cursor cloned per iteration; diff_hits is dead code.
crates/storage/backend/rocksdb.rs Overrides multi_get on RocksDBReadTx using batched_multi_get_cf. Handles missing CF gracefully, propagates per-entry errors correctly, and maps PinnableSlice to Vec<u8> correctly.
crates/storage/api/mod.rs Adds multi_get with a loop-based default impl; RocksDB overrides it with the batched path. Trait contract (same-order results) is clearly documented.
crates/blockchain/vm.rs Adds get_account_states_batch to StoreVmDatabase. Cache-split logic, index-to-result reassembly, and cache population are all correct. AccountStateCacheEntry is Copy so the double-use after cache.insert compiles and is safe.
crates/vm/db.rs Adds get_account_states_batch default to VmDatabase trait; default impl loops get_account_state, and the RocksDB-backed StoreVmDatabase overrides it with the batch path.
crates/vm/backends/levm/db.rs Implements get_account_states_batch for DynVmDatabase, bridging VmDatabase's Option<AccountState> to LEVM's non-optional AccountState via unwrap_or_default, consistent with the existing get_account_state adapter.
crates/vm/levm/src/db/mod.rs Updates CachingDatabase::prefetch_accounts (rayon path) to pre-filter cached addresses, then dispatches to inner.get_account_states_batch rather than parallel point-gets. Cache write-back with or_insert correctly handles concurrent population by other threads.

Sequence Diagram

sequenceDiagram
    participant W as warm_block_from_bal
    participant CD as CachingDatabase
    participant DVM as DynVmDatabase
    participant SVM as StoreVmDatabase
    participant S as Store
    participant RDB as RocksDB

    W->>CD: prefetch_accounts([addr1..addrN])
    CD->>CD: read_accounts() — filter cached addresses
    CD->>DVM: get_account_states_batch(missing[])
    DVM->>SVM: get_account_states_batch(addresses[])
    SVM->>SVM: check account_state_cache (per address)
    SVM->>S: get_account_states_batch_by_root(state_root, misses[])
    S->>S: last_written() — FKV cursor
    S->>S: trie_cache.get() — diff-layer hits
    alt FKV-covered addresses
        S->>RDB: multi_get(ACCOUNT_FLATKEYVALUE, keys[])
        RDB-->>S: "Vec<Result<Option<Vec<u8>>>>"
    end
    alt Trie-only addresses
        S->>S: open_state_trie(state_root)
        loop per address
            S->>RDB: trie walk
            RDB-->>S: encoded AccountState
        end
    end
    S-->>SVM: "Vec<Option<AccountState>>"
    SVM->>SVM: populate account_state_cache
    SVM-->>DVM: "Vec<Option<AccountState>>"
    DVM-->>CD: "Vec<AccountState> (None→default)"
    CD->>CD: write_accounts() — or_insert each
    CD-->>W: Ok(())
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
crates/storage/store.rs:2501-2514
`fkv_cursor` is reconstructed on every non-cache-hit loop iteration by cloning `last_written`, which is invariant across all iterations. For a BAL with thousands of addresses this causes thousands of unnecessary `Vec<u8>` clones and `Nibbles::from_hex` wraps. Hoist both allocations outside the loop.

```suggestion
        // `last_computed_flatkeyvalue >= path` ⇒ FKV row is authoritative
        // (either present with value, or absent meaning the account does not exist).
        let fkv_cursor = Nibbles::from_hex(last_written.clone());
        for (i, path) in leaf_paths.iter().enumerate() {
            if let Some(value) = trie_cache.get(state_root, path.as_slice()) {
                if !value.is_empty() {
                    results[i] = Some(AccountState::decode(&value)?);
                }
                diff_hits += 1;
                continue;
            }
            // Reuse the trie's FKV-cursor check semantics via the leaf path.
            let path_nibbles = Nibbles::from_hex(path.clone());
            if fkv_cursor >= path_nibbles {
```

### Issue 2 of 2
crates/storage/store.rs:2499-2546
`diff_hits` counter is dead code — it is incremented on every trie-cache hit, but immediately discarded via `let _ = diff_hits`. If this is meant to be surfaced as a tracing span field (as the comment suggests), it should be added to the `#[instrument]` span or emitted via `tracing::debug!`. As written it adds per-iteration bookkeeping with no observable effect.

Reviews (1): Last reviewed commit: "perf(l1): batch account-state prefetch v..." | Re-trigger Greptile

Comment thread crates/storage/store.rs
Comment on lines +2501 to +2514
for (i, path) in leaf_paths.iter().enumerate() {
if let Some(value) = trie_cache.get(state_root, path.as_slice()) {
if !value.is_empty() {
results[i] = Some(AccountState::decode(&value)?);
}
diff_hits += 1;
continue;
}
// Reuse the trie's FKV-cursor check semantics via the leaf path.
let path_nibbles = Nibbles::from_hex(path.clone());
// `last_computed_flatkeyvalue >= path` ⇒ FKV row is authoritative
// (either present with value, or absent meaning the account does not exist).
let fkv_cursor = Nibbles::from_hex(last_written.clone());
if fkv_cursor >= path_nibbles {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 fkv_cursor is reconstructed on every non-cache-hit loop iteration by cloning last_written, which is invariant across all iterations. For a BAL with thousands of addresses this causes thousands of unnecessary Vec<u8> clones and Nibbles::from_hex wraps. Hoist both allocations outside the loop.

Suggested change
for (i, path) in leaf_paths.iter().enumerate() {
if let Some(value) = trie_cache.get(state_root, path.as_slice()) {
if !value.is_empty() {
results[i] = Some(AccountState::decode(&value)?);
}
diff_hits += 1;
continue;
}
// Reuse the trie's FKV-cursor check semantics via the leaf path.
let path_nibbles = Nibbles::from_hex(path.clone());
// `last_computed_flatkeyvalue >= path` ⇒ FKV row is authoritative
// (either present with value, or absent meaning the account does not exist).
let fkv_cursor = Nibbles::from_hex(last_written.clone());
if fkv_cursor >= path_nibbles {
// `last_computed_flatkeyvalue >= path` ⇒ FKV row is authoritative
// (either present with value, or absent meaning the account does not exist).
let fkv_cursor = Nibbles::from_hex(last_written.clone());
for (i, path) in leaf_paths.iter().enumerate() {
if let Some(value) = trie_cache.get(state_root, path.as_slice()) {
if !value.is_empty() {
results[i] = Some(AccountState::decode(&value)?);
}
diff_hits += 1;
continue;
}
// Reuse the trie's FKV-cursor check semantics via the leaf path.
let path_nibbles = Nibbles::from_hex(path.clone());
if fkv_cursor >= path_nibbles {
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/store.rs
Line: 2501-2514

Comment:
`fkv_cursor` is reconstructed on every non-cache-hit loop iteration by cloning `last_written`, which is invariant across all iterations. For a BAL with thousands of addresses this causes thousands of unnecessary `Vec<u8>` clones and `Nibbles::from_hex` wraps. Hoist both allocations outside the loop.

```suggestion
        // `last_computed_flatkeyvalue >= path` ⇒ FKV row is authoritative
        // (either present with value, or absent meaning the account does not exist).
        let fkv_cursor = Nibbles::from_hex(last_written.clone());
        for (i, path) in leaf_paths.iter().enumerate() {
            if let Some(value) = trie_cache.get(state_root, path.as_slice()) {
                if !value.is_empty() {
                    results[i] = Some(AccountState::decode(&value)?);
                }
                diff_hits += 1;
                continue;
            }
            // Reuse the trie's FKV-cursor check semantics via the leaf path.
            let path_nibbles = Nibbles::from_hex(path.clone());
            if fkv_cursor >= path_nibbles {
```

How can I resolve this? If you propose a fix, please make it concise.

Comment thread crates/storage/store.rs Outdated
Comment on lines +2499 to +2546
let mut diff_hits: usize = 0;

for (i, path) in leaf_paths.iter().enumerate() {
if let Some(value) = trie_cache.get(state_root, path.as_slice()) {
if !value.is_empty() {
results[i] = Some(AccountState::decode(&value)?);
}
diff_hits += 1;
continue;
}
// Reuse the trie's FKV-cursor check semantics via the leaf path.
let path_nibbles = Nibbles::from_hex(path.clone());
// `last_computed_flatkeyvalue >= path` ⇒ FKV row is authoritative
// (either present with value, or absent meaning the account does not exist).
let fkv_cursor = Nibbles::from_hex(last_written.clone());
if fkv_cursor >= path_nibbles {
fkv_indices.push(i);
} else {
trie_indices.push(i);
}
}

if !fkv_indices.is_empty() {
let read_view = self.backend.begin_read()?;
let keys: Vec<&[u8]> = fkv_indices
.iter()
.map(|&i| leaf_paths[i].as_slice())
.collect();
let raw = read_view.multi_get(ACCOUNT_FLATKEYVALUE, &keys);
for (slot, res) in fkv_indices.iter().zip(raw.into_iter()) {
let Some(encoded) = res? else { continue };
if encoded.is_empty() {
continue;
}
results[*slot] = Some(AccountState::decode(&encoded)?);
}
}

if !trie_indices.is_empty() {
// Fall back to the regular trie path for any addresses whose path
// hasn't been swept by the FKV generator yet.
let state_trie = self.open_state_trie(state_root)?;
for &i in &trie_indices {
results[i] = self.get_account_state_from_trie(&state_trie, addresses[i])?;
}
}

let _ = diff_hits; // surface via tracing if needed.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 diff_hits counter is dead code — it is incremented on every trie-cache hit, but immediately discarded via let _ = diff_hits. If this is meant to be surfaced as a tracing span field (as the comment suggests), it should be added to the #[instrument] span or emitted via tracing::debug!. As written it adds per-iteration bookkeeping with no observable effect.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/store.rs
Line: 2499-2546

Comment:
`diff_hits` counter is dead code — it is incremented on every trie-cache hit, but immediately discarded via `let _ = diff_hits`. If this is meant to be surfaced as a tracing span field (as the comment suggests), it should be added to the `#[instrument]` span or emitted via `tracing::debug!`. As written it adds per-iteration bookkeeping with no observable effect.

How can I resolve this? If you propose a fix, please make it concise.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

Benchmark Results Comparison

No significant difference was registered for any benchmark run.

Detailed Results

Benchmark Results: BubbleSort

Command Mean [s] Min [s] Max [s] Relative
main_revm_BubbleSort 3.288 ± 0.055 3.226 3.382 1.12 ± 0.02
main_levm_BubbleSort 2.947 ± 0.038 2.920 3.034 1.00
pr_revm_BubbleSort 3.294 ± 0.047 3.236 3.403 1.12 ± 0.02
pr_levm_BubbleSort 2.974 ± 0.103 2.921 3.264 1.01 ± 0.04

Benchmark Results: ERC20Approval

Command Mean [s] Min [s] Max [s] Relative
main_revm_ERC20Approval 1.060 ± 0.016 1.040 1.090 1.01 ± 0.02
main_levm_ERC20Approval 1.116 ± 0.032 1.097 1.205 1.06 ± 0.03
pr_revm_ERC20Approval 1.053 ± 0.012 1.043 1.085 1.00
pr_levm_ERC20Approval 1.101 ± 0.007 1.089 1.108 1.04 ± 0.01

Benchmark Results: ERC20Mint

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Mint 144.2 ± 4.5 141.0 155.6 1.00 ± 0.03
main_levm_ERC20Mint 163.2 ± 2.2 161.1 167.2 1.14 ± 0.02
pr_revm_ERC20Mint 143.5 ± 1.5 141.6 147.0 1.00
pr_levm_ERC20Mint 164.0 ± 2.0 161.7 167.8 1.14 ± 0.02

Benchmark Results: ERC20Transfer

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Transfer 252.9 ± 4.8 247.4 261.4 1.00
main_levm_ERC20Transfer 279.2 ± 10.8 271.8 305.6 1.10 ± 0.05
pr_revm_ERC20Transfer 253.1 ± 2.4 250.8 258.1 1.00 ± 0.02
pr_levm_ERC20Transfer 274.7 ± 3.2 271.2 282.1 1.09 ± 0.02

Benchmark Results: Factorial

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Factorial 224.1 ± 2.3 222.1 229.9 1.00
main_levm_Factorial 266.6 ± 1.0 264.6 268.1 1.19 ± 0.01
pr_revm_Factorial 227.1 ± 6.0 223.0 239.6 1.01 ± 0.03
pr_levm_Factorial 270.2 ± 4.1 267.3 280.7 1.21 ± 0.02

Benchmark Results: FactorialRecursive

Command Mean [s] Min [s] Max [s] Relative
main_revm_FactorialRecursive 1.672 ± 0.040 1.585 1.719 1.07 ± 0.03
main_levm_FactorialRecursive 1.568 ± 0.012 1.552 1.591 1.00
pr_revm_FactorialRecursive 1.696 ± 0.032 1.636 1.742 1.08 ± 0.02
pr_levm_FactorialRecursive 1.579 ± 0.011 1.560 1.599 1.01 ± 0.01

Benchmark Results: Fibonacci

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Fibonacci 211.7 ± 0.6 211.0 212.7 1.00
main_levm_Fibonacci 241.3 ± 4.2 238.2 250.1 1.14 ± 0.02
pr_revm_Fibonacci 214.4 ± 3.7 211.0 224.2 1.01 ± 0.02
pr_levm_Fibonacci 242.4 ± 3.4 238.4 249.0 1.15 ± 0.02

Benchmark Results: FibonacciRecursive

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_FibonacciRecursive 899.3 ± 12.7 878.2 924.2 1.25 ± 0.02
main_levm_FibonacciRecursive 720.6 ± 7.7 708.5 735.3 1.00
pr_revm_FibonacciRecursive 901.9 ± 6.1 893.6 914.7 1.25 ± 0.02
pr_levm_FibonacciRecursive 725.6 ± 6.3 720.9 742.1 1.01 ± 0.01

Benchmark Results: ManyHashes

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ManyHashes 9.2 ± 0.6 9.0 10.8 1.02 ± 0.06
main_levm_ManyHashes 11.0 ± 0.8 10.5 12.5 1.20 ± 0.09
pr_revm_ManyHashes 9.1 ± 0.1 9.0 9.4 1.00
pr_levm_ManyHashes 10.7 ± 0.2 10.4 11.2 1.17 ± 0.03

Benchmark Results: MstoreBench

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_MstoreBench 273.2 ± 3.8 270.8 283.5 1.17 ± 0.02
main_levm_MstoreBench 233.4 ± 1.2 232.2 235.6 1.00
pr_revm_MstoreBench 279.9 ± 8.4 270.9 291.9 1.20 ± 0.04
pr_levm_MstoreBench 234.7 ± 2.8 231.7 239.8 1.01 ± 0.01

Benchmark Results: Push

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Push 315.5 ± 2.2 313.7 321.2 1.07 ± 0.01
main_levm_Push 294.6 ± 1.9 293.0 299.4 1.00
pr_revm_Push 315.9 ± 1.2 314.1 317.4 1.07 ± 0.01
pr_levm_Push 295.0 ± 3.1 291.9 303.0 1.00 ± 0.01

Benchmark Results: SstoreBench_no_opt

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_SstoreBench_no_opt 180.8 ± 4.0 178.3 188.4 1.56 ± 0.04
main_levm_SstoreBench_no_opt 115.8 ± 0.9 114.5 116.8 1.00
pr_revm_SstoreBench_no_opt 186.7 ± 8.0 177.8 199.9 1.61 ± 0.07
pr_levm_SstoreBench_no_opt 116.2 ± 4.9 114.0 130.1 1.00 ± 0.04

Address review feedback on PR #6712:

- store.rs: hoist `fkv_cursor` out of the per-address loop and document
  why the comparison matches `BackendTrieDB::flatkeyvalue_computed`'s
  semantics (not the more-conservative
  `flatkeyvalue_computed_with_last_written`). Drop the dead
  `diff_hits` counter.
- vm.rs: comment on why `cache.insert` (vs `or_insert`) is intentional.
- rocksdb.rs: clone a `String` instead of formatting the error per key.
edg-l added a commit that referenced this pull request May 22, 2026
Address review feedback on PR #6712:

- store.rs: hoist `fkv_cursor` out of the per-address loop and document
  why the comparison matches `BackendTrieDB::flatkeyvalue_computed`'s
  semantics (not the more-conservative
  `flatkeyvalue_computed_with_last_written`). Drop the dead
  `diff_hits` counter.
- vm.rs: comment on why `cache.insert` (vs `or_insert`) is intentional.
- rocksdb.rs: clone a `String` instead of formatting the error per key.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 61.450 ± 0.363 61.172 62.266 1.00
head 62.559 ± 0.680 61.898 63.819 1.02 ± 0.01

@edg-l edg-l moved this to In Review in ethrex_l1 May 22, 2026
Comment thread crates/storage/store.rs
// hasn't been swept by the FKV generator yet.
let state_trie = self.open_state_trie(state_root)?;
for &i in &trie_indices {
results[i] = self.get_account_state_from_trie(&state_trie, addresses[i])?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sequential trie fallback (was parallel in CachingDatabase::prefetch_accounts). The old par_iter got free parallelism on per-address trie walks for every address; the new path serializes them when trie_indices is non-empty.

For the bal-devnet-7 benchmark this is invisible (almost every address hits FKV), but during initial sync — when the FKV cursor is small and trie_indices.len() >> fkv_indices.len() — this is a regression vs main for the prefetch hot path.

If Trie (from open_state_trie) is Send + Sync (or clonable), the cheap version is:

use rayon::prelude::*;
let results_pairs: Result<Vec<_>, _> = trie_indices
    .par_iter()
    .map(|&i| self.get_account_state_from_trie(&state_trie, addresses[i]).map(|s| (i, s)))
    .collect();
for (i, s) in results_pairs? { results[i] = s; }

If Trie isn't Send, this needs a small refactor (each parallel branch opens its own trie tx). Worth a quick check on the worst-case workload (low FKV cursor) before merge.

Comment thread crates/storage/store.rs
}
continue;
}
let path_nibbles = Nibbles::from_hex(path.clone());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Nibbles::from_hex(path.clone()) allocates a fresh 65-byte Vec<u8> per address inside the loop, just to wrap it in Nibbles for the >= comparison. Since Nibbles::from_hex is a const wrapper (no normalization) and PartialOrd<Nibbles> is Vec<u8> lex compare under the hood, comparing slices avoids the alloc:

if fkv_cursor.as_ref() >= path.as_slice() {
    fkv_indices.push(i);
}

or expose Nibbles::cmp_slice(&self, &[u8]). For a BAL with ~5k addresses this is 5k tiny allocations; not material on the benchmark you posted, but free to fix and tightens the hot loop.

Comment thread crates/storage/api/mod.rs
&self,
table: &'static str,
keys: &[&[u8]],
) -> Vec<Result<Option<Vec<u8>>, StoreError>> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small doc-callout: the default impl is a serial get loop, so for non-RocksDB backends (in-memory, etc.) callers see no speedup from multi_get vs N gets — only the alloc/dispatch shape changes. The current rustdoc says "Backends that support batched reads ... should override this for better throughput" which implies the converse correctly. Worth one extra line for callers:

Callers should not assume multi_get is asymptotically faster than get; it is only an optimization for backends that have a batched read primitive. In particular, the in-memory backend's multi_get has the same cost as N independent gets.

Non-blocking; just a hint to future readers picking between get and multi_get for new code paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client performance Block execution throughput and performance in general

Projects

Status: In Review
Status: Todo

Development

Successfully merging this pull request may close these issues.

2 participants