fix(storage): protect head root from state pruning#396
Conversation
## Problem
`prune_old_data` only protects the latest finalized and justified block
roots:
```rust
let protected_roots = [self.latest_finalized().root, self.latest_justified().root];
```
`prune_old_states` then keeps the top `STATES_TO_KEEP = 3_000` entries
sorted by **slot** (descending) and deletes the rest of the `States`
table, ignoring fork-choice membership.
When finalization stalls and a competing branch keeps producing blocks
on top of an unfinalized region, that branch's high-slot headers can
fill the retention window even though it isn't the fork-choice head.
The actual head can fall outside the top 3000 by slot, and its state
row is deleted. The very next call into:
```rust
pub fn head_state(&self) -> State {
self.get_state(&self.head())
.expect("head state is always available")
}
```
panics, taking down the blockchain actor (the container stays up, P2P
keeps running, but every gossip message logs `err=Actor stopped`).
Observed in a stalled 8-node devnet: finalization froze with head at
slot ~2891 while a minority fork advanced past slot 15_000. On
receiving the next gossip block, six nodes pruned 2666 states and
panicked simultaneously at `store.rs:1314`.
## Fix
Add `self.head()` to `protected_roots` so the pruner never deletes
the state currently in use by fork choice, regardless of its slot
position relative to other branches' tips.
🤖 Kimi Code ReviewThis is a critical bug fix for chain availability. The change correctly prevents pruning of the current LMD GHOST head, which is essential for maintaining the canonical chain tip. AnalysisFile: The IssueThe original code only protected
VerificationThe fix is correct because:
Suggestions1. Add explanatory comment (line 777) let protected_roots = [
self.latest_finalized().root,
self.latest_justified().root,
self.head(), // Protect current head to preserve chain tip for fork choice
];2. Verify ancestor completeness 3. Database consistency check debug_assert!(self.blocks.contains_key(&self.head()), "Head block must exist");ConclusionApprove. This fixes a potential consensus-critical bug where the node could delete its own chain tip. The change is minimal, safe, and follows the existing pattern. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Codex Code ReviewNo correctness, security, or performance regressions stood out in the diff. Protecting One test gap remains:
Aside from that missing regression coverage, the patch looks good. Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewNow I have all the context needed for a thorough review. Here is the review: Review: fix(storage): protect head root from state pruningOverviewThe PR fixes a production panic ( CorrectnessThe fix is sound. The The fix also correctly protects the head's block entry via Issues1. Stale doc comments (minor)
These should now read "(finalized, justified, head)" to match what 2. Regression test is missing (notable) The PR's own test plan includes an unchecked box for the critical regression case:
The existing A minimal regression test would look like:
3.
SummaryThe one-line fix is correct and properly addresses the panic. The main gap before merging is the missing regression test, which the PR itself acknowledges. The doc comment updates are low-stakes but should be made to keep the inline documentation accurate. Automated review by Claude (Anthropic) · sonnet · custom prompt |
Greptile SummaryAdds
Confidence Score: 4/5Safe to merge — the one-line fix correctly addresses the panic, and the only gaps are stale doc comments and an unwritten regression test. The change is small and targeted. The core logic is sound: adding the head root to The doc comment updates and regression test are both in
|
| Filename | Overview |
|---|---|
| crates/storage/src/store.rs | Adds self.head() to protected_roots in prune_old_data so the fork-choice head state is never evicted; fix is correct but the targeted regression test (head at low slot, competing branch fills window) is not yet added, and two doc comments still describe protected_roots as "(finalized, justified)" only. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[prune_old_data called] --> B[Build protected_roots]
B --> C[latest_finalized root]
B --> D[latest_justified root]
B --> E[self.head - NEW]
C & D & E --> F[prune_old_states\nkeep top STATES_TO_KEEP by slot\n+ protected_roots]
C & D & E --> G[prune_old_blocks\nkeep top BLOCKS_TO_KEEP by slot\n+ protected_roots]
F --> H{head slot inside\ntop 3000 window?}
H -- Yes\nnormal case --> I[head state safe\nvia retention window]
H -- No\nstalled finalization --> J[head state safe\nvia protected_roots]
J --> K[head_state succeeds\nno panic]
I --> K
Comments Outside Diff (4)
-
crates/storage/src/store.rs, line 879-882 (link)The doc comments on
prune_old_statesandprune_old_blocksstill describeprotected_rootsas "(finalized, justified)" only. Since the head root is now also passed in, the parenthetical is stale and could mislead future readers.Prompt To Fix With AI
This is a comment left during a code review. Path: crates/storage/src/store.rs Line: 879-882 Comment: The doc comments on `prune_old_states` and `prune_old_blocks` still describe `protected_roots` as "(finalized, justified)" only. Since the head root is now also passed in, the parenthetical is stale and could mislead future readers. How can I resolve this? If you propose a fix, please make it concise.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
crates/storage/src/store.rs, line 930-931 (link)Same stale parenthetical in
prune_old_blocks— the head root is now a third protected root alongside finalized and justified.Prompt To Fix With AI
This is a comment left during a code review. Path: crates/storage/src/store.rs Line: 930-931 Comment: Same stale parenthetical in `prune_old_blocks` — the head root is now a third protected root alongside finalized and justified. How can I resolve this? If you propose a fix, please make it concise.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
crates/storage/src/store.rs, line 1704-1706 (link)The inline count comment is now slightly misleading:
protected_rootsnow carries three entries (finalized, justified, head), but the head in this test sits at slot 3004—inside the top-3000 window—so it never becomes a pruning candidate and the arithmetic stays correct. Clarifying this avoids confusion for the next person reading the test.Prompt To Fix With AI
This is a comment left during a code review. Path: crates/storage/src/store.rs Line: 1704-1706 Comment: The inline count comment is now slightly misleading: `protected_roots` now carries three entries (finalized, justified, head), but the head in this test sits at slot 3004—inside the top-3000 window—so it never becomes a pruning candidate and the arithmetic stays correct. Clarifying this avoids confusion for the next person reading the test. How can I resolve this? If you propose a fix, please make it concise.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
crates/storage/src/store.rs, line 1722-1759 (link)Regression test for the exact panic scenario is missing
The PR description marks this as an outstanding checkbox, but neither of the existing periodic-pruning tests covers the actual failure: a head at a low slot while a competing branch's higher-slot headers fill the retention window. Both
fallback_pruning_removes_old_states_and_blocksandfallback_pruning_no_op_within_retentionplace the head at the top of the slot range (slot 3004 / slot 2999), so the head was never in the pruning-candidate set even before this fix — those tests would pass unchanged with or withoutself.head()inprotected_roots. A test that insertsSTATES_TO_KEEP + Nheaders on a high-slot branch, sets the head to a low-slot root, callsprune_old_data, and then assertshead_state()succeeds (and the head's state row still exists) would directly validate the fix.Prompt To Fix With AI
This is a comment left during a code review. Path: crates/storage/src/store.rs Line: 1722-1759 Comment: **Regression test for the exact panic scenario is missing** The PR description marks this as an outstanding checkbox, but neither of the existing periodic-pruning tests covers the actual failure: a head at a *low* slot while a competing branch's higher-slot headers fill the retention window. Both `fallback_pruning_removes_old_states_and_blocks` and `fallback_pruning_no_op_within_retention` place the head at the *top* of the slot range (slot 3004 / slot 2999), so the head was never in the pruning-candidate set even before this fix — those tests would pass unchanged with or without `self.head()` in `protected_roots`. A test that inserts `STATES_TO_KEEP + N` headers on a high-slot branch, sets the head to a low-slot root, calls `prune_old_data`, and then asserts `head_state()` succeeds (and the head's state row still exists) would directly validate the fix. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 4
crates/storage/src/store.rs:879-882
The doc comments on `prune_old_states` and `prune_old_blocks` still describe `protected_roots` as "(finalized, justified)" only. Since the head root is now also passed in, the parenthetical is stale and could mislead future readers.
```suggestion
/// Prune old states beyond the retention window.
///
/// Keeps the most recent `STATES_TO_KEEP` states (by slot), plus any
/// states whose roots appear in `protected_roots` (finalized, justified, head).
```
### Issue 2 of 4
crates/storage/src/store.rs:930-931
Same stale parenthetical in `prune_old_blocks` — the head root is now a third protected root alongside finalized and justified.
```suggestion
/// Keeps the most recent `BLOCKS_TO_KEEP` blocks (by slot), plus any
/// blocks whose roots appear in `protected_roots` (finalized, justified, head).
```
### Issue 3 of 4
crates/storage/src/store.rs:1704-1706
The inline count comment is now slightly misleading: `protected_roots` now carries three entries (finalized, justified, head), but the head in this test sits at slot 3004—inside the top-3000 window—so it never becomes a pruning candidate and the arithmetic stays correct. Clarifying this avoids confusion for the next person reading the test.
```suggestion
// 3005 headers total. Top 3000 by slot are kept in the retention window,
// leaving 5 candidates. 2 of those candidates are protected (finalized=slot0,
// justified=slot1); head=slot3004 is also protected but already inside the
// retention window, so it is not a candidate and doesn't change the count.
// 5 - 2 = 3 pruned → 3005 - 3 = 3002 states remaining.
```
### Issue 4 of 4
crates/storage/src/store.rs:1722-1759
**Regression test for the exact panic scenario is missing**
The PR description marks this as an outstanding checkbox, but neither of the existing periodic-pruning tests covers the actual failure: a head at a *low* slot while a competing branch's higher-slot headers fill the retention window. Both `fallback_pruning_removes_old_states_and_blocks` and `fallback_pruning_no_op_within_retention` place the head at the *top* of the slot range (slot 3004 / slot 2999), so the head was never in the pruning-candidate set even before this fix — those tests would pass unchanged with or without `self.head()` in `protected_roots`. A test that inserts `STATES_TO_KEEP + N` headers on a high-slot branch, sets the head to a low-slot root, calls `prune_old_data`, and then asserts `head_state()` succeeds (and the head's state row still exists) would directly validate the fix.
Reviews (1): Last reviewed commit: "fix(storage): protect head root from sta..." | Re-trigger Greptile
Summary
prune_old_data'sprotected_rootsso the pruner never evicts the state currently in use by fork choice.Problem
prune_old_dataonly protected the latest finalized and justified block roots:prune_old_statesthen keeps the topSTATES_TO_KEEP = 3_000entries by slot (descending) and deletes the rest of theStatestable, ignoring fork-choice membership.When finalization stalls and a competing branch keeps producing blocks on top of an unfinalized region, that branch's high-slot headers fill the retention window even though it isn't the fork-choice head. The actual head can fall outside the top 3000 by slot, and its state row is deleted. The very next call into:
panics, taking down the blockchain actor (the container stays up, P2P keeps running, but every gossip message logs
err=Actor stopped).Observed
In a stalled 8-node devnet, finalization froze around slot 2866 while a minority fork from 2 surviving aggregators kept proposing one empty block per slot, reaching slot ~15000+. On the next gossip block (slot 15010, proposer 2), 6 of 8 nodes pruned 2666 states and panicked simultaneously at
store.rs:1314withhead state is always available. Logs on every affected node:Fix
Add
self.head()toprotected_roots. The head is now retained regardless of its slot position relative to other branches' tips.Test plan
prune_old_states_*tests still pass.prune_old_data.