refactor(storage): dedupe PayloadBuffer proofs by subsumption#313
refactor(storage): dedupe PayloadBuffer proofs by subsumption#313MegaRedHand wants to merge 1 commit intomainfrom
Conversation
Maintain each `PayloadBuffer` entry as an antichain under the subset relation on participants. On push: - skip if incoming participants are a subset (incl. equal) of any existing proof (adds no coverage); - otherwise remove existing proofs whose participants are a strict subset of the incoming one, then insert. This generalises the prior equality-only dedup and eliminates redundant proofs that otherwise accumulate after aggregation produces a superset of its child proofs, or when a block embeds a proof subsumed by one already in `known_payloads`. Validator coverage is monotonic across pushes, so `update_head`'s reads of `known_payloads` are never temporarily degraded. Affects every insert path transparently — `apply_aggregated_group`, `on_gossip_aggregated_attestation`, `on_block_core`, and `promote_new_aggregated_payloads` — without touching their call sites.
🤖 Kimi Code ReviewReview: PR #313 - Antichain invariant for PayloadBuffer The implementation correctly maintains an antichain of participant sets per Correctness & Logic
Performance
Consensus Safety
Tests
Nits
Verdict: LGTM. The logic is consensus-safe and the test suite validates the antichain behavior thoroughly. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Codex Code Review
No correctness or safety regression stood out beyond that. The subset/superset pruning itself is consistent with how these proofs are later used for fork-choice weight extraction, block building, and recursive aggregation. I could not run the Rust tests here because Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewNow I have everything I need for the review. Review:
|
Greptile SummaryThis PR replaces the equality-only duplicate check in Confidence Score: 5/5Safe to merge; implementation is correct and well-tested. The antichain invariant is correctly maintained: early-return when new_set ⊆ existing_set is sound (the antichain property guarantees no strict-subset peers exist simultaneously), and swap_remove in reverse index order provably keeps all lower indices at their original positions. total_proofs accounting is accurate across all branches. The only finding is a P2 style note about computing new_set before the branch where it is sometimes unused. No files require special attention.
|
| Filename | Overview |
|---|---|
| crates/storage/src/store.rs | Replaces equality-only dedup in PayloadBuffer::push with a full antichain/subsumption dedup; adds 8 targeted unit tests covering supersets, subsets, equality, incomparable pairs, multi-absorb, mixed keep/remove, cross-data-root independence, and FIFO eviction. Logic and accounting are correct. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[push called] --> B{data_root already\nin buffer?}
B -- No --> C[Insert new entry\nadd to order\ntotal_proofs += 1]
B -- Yes --> D[Build new_set from\nproof.participant_indices]
D --> E{For each existing proof\nbuild existing_set}
E --> F{new_set ⊆ existing_set?}
F -- Yes --> G[Return: incoming is\nsubsumed, skip]
F -- No --> H{existing_set ⊆ new_set?}
H -- Yes --> I[Mark index for removal]
H -- No --> E
I --> E
E -- loop done --> J[swap_remove marked indices\nin reverse order\ntotal_proofs -= removed count]
J --> K[Push new proof\ntotal_proofs += 1]
C --> L{total_proofs > capacity?}
K --> L
L -- Yes --> M[Pop oldest data_root\nfrom order, remove entry\ntotal_proofs -= entry.len]
M --> L
L -- No --> N[Done]
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 144-145
Comment:
**Unconditional `new_set` allocation for new data_roots**
`new_set` is built before the `if let` branch, so it's allocated and populated even when `data_root` isn't in `self.data` (the `else` path), where it's never read. Moving the allocation inside the `if let Some(entry)` arm would avoid this work for fresh insertions.
```suggestion
let (data_root, att_data) = hashed.into_parts();
```
Then compute `new_set` at the start of the `if let Some(entry)` block only.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "refactor(storage): dedupe PayloadBuffer ..." | Re-trigger Greptile
| let (data_root, att_data) = hashed.into_parts(); | ||
| let new_set: HashSet<u64> = proof.participant_indices().collect(); |
There was a problem hiding this comment.
Unconditional
new_set allocation for new data_roots
new_set is built before the if let branch, so it's allocated and populated even when data_root isn't in self.data (the else path), where it's never read. Moving the allocation inside the if let Some(entry) arm would avoid this work for fresh insertions.
| let (data_root, att_data) = hashed.into_parts(); | |
| let new_set: HashSet<u64> = proof.participant_indices().collect(); | |
| let (data_root, att_data) = hashed.into_parts(); |
Then compute new_set at the start of the if let Some(entry) block only.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 144-145
Comment:
**Unconditional `new_set` allocation for new data_roots**
`new_set` is built before the `if let` branch, so it's allocated and populated even when `data_root` isn't in `self.data` (the `else` path), where it's never read. Moving the allocation inside the `if let Some(entry)` arm would avoid this work for fresh insertions.
```suggestion
let (data_root, att_data) = hashed.into_parts();
```
Then compute `new_set` at the start of the `if let Some(entry)` block only.
How can I resolve this? If you propose a fix, please make it concise.
Summary
Replace the equality-only dedup in
PayloadBuffer::pushwith a subset-based dedup that maintains each data_root's proof list as an antichain. On push, if the incoming proof's participants are a subset (including equal) of any existing proof, the push is skipped; otherwise every existing proof whose participants are a strict subset of the incoming proof is removed before the insert.Why
After an aggregator produces a new proof for a data_root, the child proofs merged into it are still in
new_payloadswith participant sets that are strict subsets of the new proof's. They add no coverage and just consumePayloadBuffercapacity, accelerating FIFO eviction of other data_roots. The same kind of redundancy accumulates inknown_payloadsthroughdrain_new_to_knownand block ingestion.The leanSpec models this at
leanSpec/src/lean_spec/subspecs/forkchoice/store.py:1048-1062by wholesale-replacinglatest_new_aggregated_payloads[D]with{new_proof}. We can't mimic that synchronously because (a) aggregation runs on aspawn_blockingworker while the actor keeps writingnew_payloadsvia gossip ingestion, and (b)update_headreadsknown_payloadsonly, so any known-side pruning has to be carefully timed to avoid a fork-choice coverage gap.Dedupe-on-push sidesteps both constraints:
update_headreadingknown_payloadsbetween pushes sees monotonically-growing coverage, so there's no fork-choice timing risk.apply_aggregated_group,on_gossip_aggregated_attestation,on_block_core, andpromote_new_aggregated_payloadsall benefit transparently.Cost
O(n·v) per push (n = proofs currently in the entry, v = participants per proof) via
HashSet<u64>membership.nis bounded by the max-antichain size per data_root (typically 1 in steady state, small when concurrent aggregates overlap). If profiling later shows push in a hot path, a bitwise-native subset op onAggregationBitsis a natural follow-up.Tests
PayloadBuffer::push: superset removes subset, subset skipped, equal-participants skipped, incomparable proofs coexist, superset absorbs multiple singletons in one push, mixed kept/removed under same data_root, cross-data-root independence, FIFO eviction still usestotal_proofs.PayloadBuffertests keep passing unchanged (they use pairwise-incomparable participant sets).Test plan
cargo test -p ethlambda-storage— 30 tests passcargo test -p ethlambda-blockchain— 20 unit tests passcargo test -p ethlambda-blockchain --test forkchoice_spectests --release— 70 spec tests passcargo fmt --allcleancargo clippy --workspace --all-targets -- -D warningsclean--is-aggregator, verifylean_latest_new_aggregated_payloadsandlean_latest_known_aggregated_payloadsstay bounded across interval 2 / interval 4 instead of growing monotonically