perf(l1): batch account-state prefetch via rocksdb multi_get_cf#6712
perf(l1): batch account-state prefetch via rocksdb multi_get_cf#6712edg-l wants to merge 4 commits into
Conversation
Replace N parallel point-gets in CachingDatabase::prefetch_accounts with a single multi_get_cf on ACCOUNT_FLATKEYVALUE for FKV-covered addresses, falling back to per-address trie walks for paths past the FKV cursor. Reduces warmer-phase cost on the bal-devnet-7 fixture from 6.78 ms to 2.37 ms per block (-65%); total wall time -21.5%.
|
Lines of code reportTotal lines added: Detailed view |
🤖 Claude Code ReviewNow I have enough context for a thorough review. Here's the full analysis: Code Review:
|
🤖 Codex Code ReviewNo correctness or security bugs stood out in the diff. I do see two performance concerns:
Open question: this code manually reproduces the trie/FKV selection logic instead of going through I didn’t run tests in this environment. Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Greptile SummaryThis PR replaces the BAL prewarmer's per-address
Confidence Score: 4/5The batch account-lookup path is logically correct; the two findings are minor cleanup items that do not affect correctness or data integrity. The FKV cursor comparison matches the trie's own semantics exactly, the cache split/reassembly in StoreVmDatabase is correct, and the RocksDB batched_multi_get_cf override propagates errors faithfully. Only two non-blocking style issues exist: a per-iteration clone of last_written that can be hoisted outside the loop, and a diff_hits counter that is computed but never surfaced. crates/storage/store.rs — the new get_account_states_batch_by_root function contains both findings; all other files are straightforward adapters or trait additions.
|
| Filename | Overview |
|---|---|
| crates/storage/store.rs | Adds get_account_states_batch_by_root: FKV path uses multi_get for addresses below the FKV cursor, falls back to trie for the rest. Cursor comparison correctly mirrors BackendTrieDB::flatkeyvalue_computed semantics. Minor: fkv_cursor cloned per iteration; diff_hits is dead code. |
| crates/storage/backend/rocksdb.rs | Overrides multi_get on RocksDBReadTx using batched_multi_get_cf. Handles missing CF gracefully, propagates per-entry errors correctly, and maps PinnableSlice to Vec<u8> correctly. |
| crates/storage/api/mod.rs | Adds multi_get with a loop-based default impl; RocksDB overrides it with the batched path. Trait contract (same-order results) is clearly documented. |
| crates/blockchain/vm.rs | Adds get_account_states_batch to StoreVmDatabase. Cache-split logic, index-to-result reassembly, and cache population are all correct. AccountStateCacheEntry is Copy so the double-use after cache.insert compiles and is safe. |
| crates/vm/db.rs | Adds get_account_states_batch default to VmDatabase trait; default impl loops get_account_state, and the RocksDB-backed StoreVmDatabase overrides it with the batch path. |
| crates/vm/backends/levm/db.rs | Implements get_account_states_batch for DynVmDatabase, bridging VmDatabase's Option<AccountState> to LEVM's non-optional AccountState via unwrap_or_default, consistent with the existing get_account_state adapter. |
| crates/vm/levm/src/db/mod.rs | Updates CachingDatabase::prefetch_accounts (rayon path) to pre-filter cached addresses, then dispatches to inner.get_account_states_batch rather than parallel point-gets. Cache write-back with or_insert correctly handles concurrent population by other threads. |
Sequence Diagram
sequenceDiagram
participant W as warm_block_from_bal
participant CD as CachingDatabase
participant DVM as DynVmDatabase
participant SVM as StoreVmDatabase
participant S as Store
participant RDB as RocksDB
W->>CD: prefetch_accounts([addr1..addrN])
CD->>CD: read_accounts() — filter cached addresses
CD->>DVM: get_account_states_batch(missing[])
DVM->>SVM: get_account_states_batch(addresses[])
SVM->>SVM: check account_state_cache (per address)
SVM->>S: get_account_states_batch_by_root(state_root, misses[])
S->>S: last_written() — FKV cursor
S->>S: trie_cache.get() — diff-layer hits
alt FKV-covered addresses
S->>RDB: multi_get(ACCOUNT_FLATKEYVALUE, keys[])
RDB-->>S: "Vec<Result<Option<Vec<u8>>>>"
end
alt Trie-only addresses
S->>S: open_state_trie(state_root)
loop per address
S->>RDB: trie walk
RDB-->>S: encoded AccountState
end
end
S-->>SVM: "Vec<Option<AccountState>>"
SVM->>SVM: populate account_state_cache
SVM-->>DVM: "Vec<Option<AccountState>>"
DVM-->>CD: "Vec<AccountState> (None→default)"
CD->>CD: write_accounts() — or_insert each
CD-->>W: Ok(())
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
crates/storage/store.rs:2501-2514
`fkv_cursor` is reconstructed on every non-cache-hit loop iteration by cloning `last_written`, which is invariant across all iterations. For a BAL with thousands of addresses this causes thousands of unnecessary `Vec<u8>` clones and `Nibbles::from_hex` wraps. Hoist both allocations outside the loop.
```suggestion
// `last_computed_flatkeyvalue >= path` ⇒ FKV row is authoritative
// (either present with value, or absent meaning the account does not exist).
let fkv_cursor = Nibbles::from_hex(last_written.clone());
for (i, path) in leaf_paths.iter().enumerate() {
if let Some(value) = trie_cache.get(state_root, path.as_slice()) {
if !value.is_empty() {
results[i] = Some(AccountState::decode(&value)?);
}
diff_hits += 1;
continue;
}
// Reuse the trie's FKV-cursor check semantics via the leaf path.
let path_nibbles = Nibbles::from_hex(path.clone());
if fkv_cursor >= path_nibbles {
```
### Issue 2 of 2
crates/storage/store.rs:2499-2546
`diff_hits` counter is dead code — it is incremented on every trie-cache hit, but immediately discarded via `let _ = diff_hits`. If this is meant to be surfaced as a tracing span field (as the comment suggests), it should be added to the `#[instrument]` span or emitted via `tracing::debug!`. As written it adds per-iteration bookkeeping with no observable effect.
Reviews (1): Last reviewed commit: "perf(l1): batch account-state prefetch v..." | Re-trigger Greptile
| for (i, path) in leaf_paths.iter().enumerate() { | ||
| if let Some(value) = trie_cache.get(state_root, path.as_slice()) { | ||
| if !value.is_empty() { | ||
| results[i] = Some(AccountState::decode(&value)?); | ||
| } | ||
| diff_hits += 1; | ||
| continue; | ||
| } | ||
| // Reuse the trie's FKV-cursor check semantics via the leaf path. | ||
| let path_nibbles = Nibbles::from_hex(path.clone()); | ||
| // `last_computed_flatkeyvalue >= path` ⇒ FKV row is authoritative | ||
| // (either present with value, or absent meaning the account does not exist). | ||
| let fkv_cursor = Nibbles::from_hex(last_written.clone()); | ||
| if fkv_cursor >= path_nibbles { |
There was a problem hiding this comment.
fkv_cursor is reconstructed on every non-cache-hit loop iteration by cloning last_written, which is invariant across all iterations. For a BAL with thousands of addresses this causes thousands of unnecessary Vec<u8> clones and Nibbles::from_hex wraps. Hoist both allocations outside the loop.
| for (i, path) in leaf_paths.iter().enumerate() { | |
| if let Some(value) = trie_cache.get(state_root, path.as_slice()) { | |
| if !value.is_empty() { | |
| results[i] = Some(AccountState::decode(&value)?); | |
| } | |
| diff_hits += 1; | |
| continue; | |
| } | |
| // Reuse the trie's FKV-cursor check semantics via the leaf path. | |
| let path_nibbles = Nibbles::from_hex(path.clone()); | |
| // `last_computed_flatkeyvalue >= path` ⇒ FKV row is authoritative | |
| // (either present with value, or absent meaning the account does not exist). | |
| let fkv_cursor = Nibbles::from_hex(last_written.clone()); | |
| if fkv_cursor >= path_nibbles { | |
| // `last_computed_flatkeyvalue >= path` ⇒ FKV row is authoritative | |
| // (either present with value, or absent meaning the account does not exist). | |
| let fkv_cursor = Nibbles::from_hex(last_written.clone()); | |
| for (i, path) in leaf_paths.iter().enumerate() { | |
| if let Some(value) = trie_cache.get(state_root, path.as_slice()) { | |
| if !value.is_empty() { | |
| results[i] = Some(AccountState::decode(&value)?); | |
| } | |
| diff_hits += 1; | |
| continue; | |
| } | |
| // Reuse the trie's FKV-cursor check semantics via the leaf path. | |
| let path_nibbles = Nibbles::from_hex(path.clone()); | |
| if fkv_cursor >= path_nibbles { |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/store.rs
Line: 2501-2514
Comment:
`fkv_cursor` is reconstructed on every non-cache-hit loop iteration by cloning `last_written`, which is invariant across all iterations. For a BAL with thousands of addresses this causes thousands of unnecessary `Vec<u8>` clones and `Nibbles::from_hex` wraps. Hoist both allocations outside the loop.
```suggestion
// `last_computed_flatkeyvalue >= path` ⇒ FKV row is authoritative
// (either present with value, or absent meaning the account does not exist).
let fkv_cursor = Nibbles::from_hex(last_written.clone());
for (i, path) in leaf_paths.iter().enumerate() {
if let Some(value) = trie_cache.get(state_root, path.as_slice()) {
if !value.is_empty() {
results[i] = Some(AccountState::decode(&value)?);
}
diff_hits += 1;
continue;
}
// Reuse the trie's FKV-cursor check semantics via the leaf path.
let path_nibbles = Nibbles::from_hex(path.clone());
if fkv_cursor >= path_nibbles {
```
How can I resolve this? If you propose a fix, please make it concise.| let mut diff_hits: usize = 0; | ||
|
|
||
| for (i, path) in leaf_paths.iter().enumerate() { | ||
| if let Some(value) = trie_cache.get(state_root, path.as_slice()) { | ||
| if !value.is_empty() { | ||
| results[i] = Some(AccountState::decode(&value)?); | ||
| } | ||
| diff_hits += 1; | ||
| continue; | ||
| } | ||
| // Reuse the trie's FKV-cursor check semantics via the leaf path. | ||
| let path_nibbles = Nibbles::from_hex(path.clone()); | ||
| // `last_computed_flatkeyvalue >= path` ⇒ FKV row is authoritative | ||
| // (either present with value, or absent meaning the account does not exist). | ||
| let fkv_cursor = Nibbles::from_hex(last_written.clone()); | ||
| if fkv_cursor >= path_nibbles { | ||
| fkv_indices.push(i); | ||
| } else { | ||
| trie_indices.push(i); | ||
| } | ||
| } | ||
|
|
||
| if !fkv_indices.is_empty() { | ||
| let read_view = self.backend.begin_read()?; | ||
| let keys: Vec<&[u8]> = fkv_indices | ||
| .iter() | ||
| .map(|&i| leaf_paths[i].as_slice()) | ||
| .collect(); | ||
| let raw = read_view.multi_get(ACCOUNT_FLATKEYVALUE, &keys); | ||
| for (slot, res) in fkv_indices.iter().zip(raw.into_iter()) { | ||
| let Some(encoded) = res? else { continue }; | ||
| if encoded.is_empty() { | ||
| continue; | ||
| } | ||
| results[*slot] = Some(AccountState::decode(&encoded)?); | ||
| } | ||
| } | ||
|
|
||
| if !trie_indices.is_empty() { | ||
| // Fall back to the regular trie path for any addresses whose path | ||
| // hasn't been swept by the FKV generator yet. | ||
| let state_trie = self.open_state_trie(state_root)?; | ||
| for &i in &trie_indices { | ||
| results[i] = self.get_account_state_from_trie(&state_trie, addresses[i])?; | ||
| } | ||
| } | ||
|
|
||
| let _ = diff_hits; // surface via tracing if needed. |
There was a problem hiding this comment.
diff_hits counter is dead code — it is incremented on every trie-cache hit, but immediately discarded via let _ = diff_hits. If this is meant to be surfaced as a tracing span field (as the comment suggests), it should be added to the #[instrument] span or emitted via tracing::debug!. As written it adds per-iteration bookkeeping with no observable effect.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/store.rs
Line: 2499-2546
Comment:
`diff_hits` counter is dead code — it is incremented on every trie-cache hit, but immediately discarded via `let _ = diff_hits`. If this is meant to be surfaced as a tracing span field (as the comment suggests), it should be added to the `#[instrument]` span or emitted via `tracing::debug!`. As written it adds per-iteration bookkeeping with no observable effect.
How can I resolve this? If you propose a fix, please make it concise.
Benchmark Results ComparisonNo significant difference was registered for any benchmark run. Detailed ResultsBenchmark Results: BubbleSort
Benchmark Results: ERC20Approval
Benchmark Results: ERC20Mint
Benchmark Results: ERC20Transfer
Benchmark Results: Factorial
Benchmark Results: FactorialRecursive
Benchmark Results: Fibonacci
Benchmark Results: FibonacciRecursive
Benchmark Results: ManyHashes
Benchmark Results: MstoreBench
Benchmark Results: Push
Benchmark Results: SstoreBench_no_opt
|
Address review feedback on PR #6712: - store.rs: hoist `fkv_cursor` out of the per-address loop and document why the comparison matches `BackendTrieDB::flatkeyvalue_computed`'s semantics (not the more-conservative `flatkeyvalue_computed_with_last_written`). Drop the dead `diff_hits` counter. - vm.rs: comment on why `cache.insert` (vs `or_insert`) is intentional. - rocksdb.rs: clone a `String` instead of formatting the error per key.
Address review feedback on PR #6712: - store.rs: hoist `fkv_cursor` out of the per-address loop and document why the comparison matches `BackendTrieDB::flatkeyvalue_computed`'s semantics (not the more-conservative `flatkeyvalue_computed_with_last_written`). Drop the dead `diff_hits` counter. - vm.rs: comment on why `cache.insert` (vs `or_insert`) is intentional. - rocksdb.rs: clone a `String` instead of formatting the error per key.
Benchmark Block Execution Results Comparison Against Main
|
| // hasn't been swept by the FKV generator yet. | ||
| let state_trie = self.open_state_trie(state_root)?; | ||
| for &i in &trie_indices { | ||
| results[i] = self.get_account_state_from_trie(&state_trie, addresses[i])?; |
There was a problem hiding this comment.
Sequential trie fallback (was parallel in CachingDatabase::prefetch_accounts). The old par_iter got free parallelism on per-address trie walks for every address; the new path serializes them when trie_indices is non-empty.
For the bal-devnet-7 benchmark this is invisible (almost every address hits FKV), but during initial sync — when the FKV cursor is small and trie_indices.len() >> fkv_indices.len() — this is a regression vs main for the prefetch hot path.
If Trie (from open_state_trie) is Send + Sync (or clonable), the cheap version is:
use rayon::prelude::*;
let results_pairs: Result<Vec<_>, _> = trie_indices
.par_iter()
.map(|&i| self.get_account_state_from_trie(&state_trie, addresses[i]).map(|s| (i, s)))
.collect();
for (i, s) in results_pairs? { results[i] = s; }If Trie isn't Send, this needs a small refactor (each parallel branch opens its own trie tx). Worth a quick check on the worst-case workload (low FKV cursor) before merge.
| } | ||
| continue; | ||
| } | ||
| let path_nibbles = Nibbles::from_hex(path.clone()); |
There was a problem hiding this comment.
nit: Nibbles::from_hex(path.clone()) allocates a fresh 65-byte Vec<u8> per address inside the loop, just to wrap it in Nibbles for the >= comparison. Since Nibbles::from_hex is a const wrapper (no normalization) and PartialOrd<Nibbles> is Vec<u8> lex compare under the hood, comparing slices avoids the alloc:
if fkv_cursor.as_ref() >= path.as_slice() {
fkv_indices.push(i);
}or expose Nibbles::cmp_slice(&self, &[u8]). For a BAL with ~5k addresses this is 5k tiny allocations; not material on the benchmark you posted, but free to fix and tightens the hot loop.
| &self, | ||
| table: &'static str, | ||
| keys: &[&[u8]], | ||
| ) -> Vec<Result<Option<Vec<u8>>, StoreError>> { |
There was a problem hiding this comment.
Small doc-callout: the default impl is a serial get loop, so for non-RocksDB backends (in-memory, etc.) callers see no speedup from multi_get vs N gets — only the alloc/dispatch shape changes. The current rustdoc says "Backends that support batched reads ... should override this for better throughput" which implies the converse correctly. Worth one extra line for callers:
Callers should not assume
multi_getis asymptotically faster thanget; it is only an optimization for backends that have a batched read primitive. In particular, the in-memory backend'smulti_gethas the same cost as N independentgets.
Non-blocking; just a hint to future readers picking between get and multi_get for new code paths.
Summary
The BAL prewarmer's
CachingDatabase::prefetch_accountsissues one rocksdb point-get per address, parallelized via rayon. For a BAL with thousands of addresses, that is thousands of independent column-family lookups, each going through bloom + block-cache + (cache miss) SST read.This change collapses the prefetch into a single
batched_multi_get_cfcall onACCOUNT_FLATKEYVALUEfor addresses past the FKV generator's cursor, falling back to per-address trie walks for paths not yet swept by the generator. The diff-layer cache is checked first per address, matching the existingTrie::getsemantics.Plumbing:
StorageReadView::multi_get(default loopsget; RocksDB override usesbatched_multi_get_cf)Store::get_account_states_batch_by_root; FKV multi_get + trie fallbackVmDatabase::get_account_states_batch; trait method, overridden onStoreVmDatabaseLevmDatabase::get_account_states_batch; trait method, overridden onDynVmDatabaseCachingDatabase::prefetch_accounts; dispatches to the batch path; default impl loops for non-rocksdb backendsBenchmark
Fixture:
bal-devnet-7-mainnet-mix-460(460 blocks, ~30 Ggas, transfer/EVM mix).release-with-debug,import-bench --with-bal. Baseline =b0b9a11c5d(no perf change).Compare dashboard: https://edgl.dev/share/compare-bal-devnet-7-baseline-vs-fkv-multi_get.html
The warmer phase collapsing to a third of baseline is the smoking gun ;
warm_block_from_bal → prefetch_accountswas the dominant cost there, and is now a single multi_get_cf for the FKV-covered subset.Notes
warm_block_from_bal); non-BAL imports go throughwarm_blockand don't touch this method.Test plan
cargo check --workspacemake -C tooling/ef_tests/blockchain test; 8819 passed, 0 failed