Skip to content

fix: make Store's public methods return Result<T,Error>, propagate and update every callers accordingly#340

Open
d4m014 wants to merge 1 commit intolambdaclass:mainfrom
d4m014:fix/propagate-result-types-through-storage-layer
Open

fix: make Store's public methods return Result<T,Error>, propagate and update every callers accordingly#340
d4m014 wants to merge 1 commit intolambdaclass:mainfrom
d4m014:fix/propagate-result-types-through-storage-layer

Conversation

@d4m014
Copy link
Copy Markdown
Contributor

@d4m014 d4m014 commented May 1, 2026

🗒️ Description / Motivation

All public Store methods that perform disk I/O previously panicked on failure via .expect(). This PR makes every such method return Result<T, Error> and propagates errors through every caller; the actor layer, RPC handlers, p2p handlers, fork-choice logic, and the binary entrypoint. Panicking on a recoverable storage error is unsafe in a long-running consensus node; this change ensures errors are surfaced and handled deliberately.

What Changed

  • crates/storage/src/store.rs: from_anchor_state/get_forkchoice_store/init_store → Result<Self, Error>; set_time, set_safe_target, insert_state, write_signed_block, prune_live_chain, prune_old_states, prune_old_blocks all return Result; head_slot/safe_target_slot use .ok_or_else() instead of .unwrap()

  • crates/storage/src/lib.rs : re-exports Error

  • crates/blockchain/src/store.rs : added StoreError::Storage(#[source] ethlambda_storage::Error) variant + From impl; all storage calls use ? or two-step ?.ok_or(); affected functions updated to return Result

  • crates/blockchain/src/lib.rs : actor layer uses let Ok(...) else { return } for critical paths; non-critical reads use if let Ok(...); errors logged via inspect_err

  • crates/net/rpc/src/lib.rs : get_latest_finalized_state/get_latest_justified_state map storage errors to 500, missing resources to 404

  • crates/net/rpc/src/fork_choice.rs : local try_store! macro short-circuits on storage error with 500; non-critical header reads use .ok().flatten()

  • crates/net/p2p/src/req_resp/handlers.rs : block lookup uses if let Ok(Some(b)); build_status uses .unwrap_or_default() for graceful degradation

  • bin/ethlambda/src/checkpoint_sync.rs : added Storage(#[from] ethlambda_storage::Error) to CheckpointSyncError

  • bin/ethlambda/src/main.rs : removed Ok(...) wrapper around from_anchor_state; uses map_err for error conversion

  • crates/blockchain/tests/forkchoice_spectests.rs / signature_spectests.rs : all store calls updated to handle Result (.unwrap() in test context).

Correctness / Behavior Guarantees

  • No behavior changes on the happy path, all existing logic is preserved - Storage errors that previously caused a panic now either propagate as Err, return a 500 HTTP response, or are logged and skipped depending on context
  • Corrupt or missing entries in prune scans are skipped via .filter_map rather than aborting the entire prune

Tests Added / Run

  • cargo test --workspace: All existing unit tests pass
  • cargo check --workspace passes clean

Related Issues / PRs

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 1, 2026

Greptile Summary

This PR replaces all panicking .expect() calls in Store's public methods with Result<T, Error> returns, then propagates errors through the actor layer, RPC handlers, P2P handlers, fork-choice logic, and the binary entrypoint. The happy path is unchanged; storage failures now surface as Err, HTTP 500, or a logged-and-skipped event depending on context.

  • P1 — crates/blockchain/src/lib.rs: The walk-up loop while let Ok(Some(header)) = self.store.get_block_header(...) silently treats a DB read error (Err(...)) identically to Ok(None) (block not found), falling through to request_missing_block. A transient I/O error will trigger a spurious P2P network request for a block the node already holds, and the cycle will repeat on re-delivery.

Confidence Score: 3/5

Safe to merge after fixing the walk-up loop DB-error handling; all other changes are mechanical and correct

One P1 logic bug: a storage read error in the pending-block walk-up loop is silently coerced to "block not found", causing spurious P2P requests that restart the cycle on re-delivery. The rest of the error-propagation work is solid and the happy path is unaffected.

crates/blockchain/src/lib.rs — the while let Ok(Some(...)) walk-up loop in process_or_pend_block

Important Files Changed

Filename Overview
crates/storage/src/store.rs Core store: all public methods now return Result; .expect() replaced with ?; error propagation is clean and consistent throughout
crates/blockchain/src/lib.rs Actor layer updated to propagate storage errors; P1 issue: while let Ok(Some(...)) in walk-up loop silently converts DB errors to missing-block, triggering spurious network requests
crates/blockchain/src/store.rs Fork-choice logic updated to propagate errors; redundant store.time() DB read inside hot tick loop (P2)
crates/net/p2p/src/req_resp/handlers.rs Block-by-root handler and build_status updated; DB errors in block lookup silently swallowed without logging (P2)
crates/net/rpc/src/fork_choice.rs Introduced local try_store! macro for 500 responses; macro discards error details without logging (P2)
crates/net/rpc/src/lib.rs RPC handlers correctly map storage errors to 500 and missing resources to 404
bin/ethlambda/src/main.rs from_anchor_state now propagates via map_err; BlockChain::spawn uses .expect() at startup which is acceptable for initialization failures
bin/ethlambda/src/checkpoint_sync.rs Adds Storage(#[from] ethlambda_storage::Error) variant to CheckpointSyncError; straightforward and correct
crates/storage/src/lib.rs Re-exports Error from the api module; trivial one-line change
crates/blockchain/tests/forkchoice_spectests.rs Test file updated to unwrap Results; correct use of .unwrap() in test context
crates/blockchain/tests/signature_spectests.rs Test file updated to unwrap get_forkchoice_store Result; minimal change, correct

Comments Outside Diff (2)

  1. crates/net/p2p/src/req_resp/handlers.rs, line 901-906 (link)

    P2 DB errors silently swallowed with no log

    The comment "DB errors are silently skipped (per spec)" conflates two distinct cases: the spec says missing blocks may be skipped, but a storage error is not a missing block — it signals a potential disk fault. Swallowing it without a log makes these failures invisible in production.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/net/p2p/src/req_resp/handlers.rs
    Line: 901-906
    
    Comment:
    **DB errors silently swallowed with no log**
    
    The comment "DB errors are silently skipped (per spec)" conflates two distinct cases: the spec says missing blocks may be skipped, but a storage error is not a missing block — it signals a potential disk fault. Swallowing it without a log makes these failures invisible in production.
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. crates/net/rpc/src/fork_choice.rs, line 940-947 (link)

    P2 try_store! macro discards error context with no log

    Err(_) silently drops the actual storage error before returning a 500. Storage errors at this layer indicate a potential disk or backend issue and should be logged so operators can diagnose them.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/net/rpc/src/fork_choice.rs
    Line: 940-947
    
    Comment:
    **`try_store!` macro discards error context with no log**
    
    `Err(_)` silently drops the actual storage error before returning a 500. Storage errors at this layer indicate a potential disk or backend issue and should be logged so operators can diagnose them.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 4
crates/blockchain/src/lib.rs:242
**DB error silently treated as missing block during walk-up**

In `process_or_pend_block`, the walk-up loop uses `while let Ok(Some(header)) = self.store.get_block_header(&missing_root)`, which breaks silently on both `Ok(None)` and `Err(...)`. When a storage error occurs mid-walk, the code falls through to `self.request_missing_block(missing_root)` — treating a read failure as if the block is genuinely absent from the DB. This issues a spurious P2P request for a block the node already holds, and on re-delivery the cycle restarts.

### Issue 2 of 4
crates/blockchain/src/store.rs:453-461
**Redundant `store.time()` read inside hot tick loop**

After `store.set_time(t + 1)` succeeds, `new_time` is necessarily `t + 1` — there is no need to call `store.time()` again. Each extra call does a full DB round-trip and the loop runs once per 800 ms interval on every tick. `new_time` can simply be replaced with `t + 1`.

```suggestion
        let new_time = t + 1;
```

### Issue 3 of 4
crates/net/p2p/src/req_resp/handlers.rs:901-906
**DB errors silently swallowed with no log**

The comment "DB errors are silently skipped (per spec)" conflates two distinct cases: the spec says missing blocks may be skipped, but a storage error is not a missing block — it signals a potential disk fault. Swallowing it without a log makes these failures invisible in production.

### Issue 4 of 4
crates/net/rpc/src/fork_choice.rs:940-947
**`try_store!` macro discards error context with no log**

`Err(_)` silently drops the actual storage error before returning a 500. Storage errors at this layer indicate a potential disk or backend issue and should be logged so operators can diagnose them.

Reviews (1): Last reviewed commit: "make Store's public methods return Resul..." | Re-trigger Greptile

@@ -236,21 +241,25 @@ impl BlockChainServer {
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 DB error silently treated as missing block during walk-up

In process_or_pend_block, the walk-up loop uses while let Ok(Some(header)) = self.store.get_block_header(&missing_root), which breaks silently on both Ok(None) and Err(...). When a storage error occurs mid-walk, the code falls through to self.request_missing_block(missing_root) — treating a read failure as if the block is genuinely absent from the DB. This issues a spurious P2P request for a block the node already holds, and on re-delivery the cycle restarts.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/lib.rs
Line: 242

Comment:
**DB error silently treated as missing block during walk-up**

In `process_or_pend_block`, the walk-up loop uses `while let Ok(Some(header)) = self.store.get_block_header(&missing_root)`, which breaks silently on both `Ok(None)` and `Err(...)`. When a storage error occurs mid-walk, the code falls through to `self.request_missing_block(missing_root)` — treating a read failure as if the block is genuinely absent from the DB. This issues a spurious P2P request for a block the node already holds, and on re-delivery the cycle restarts.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 453 to 461
@@ -446,7 +460,7 @@ fn on_block_core(
// This check ensures the state has been computed for the parent block.
let parent_state =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant store.time() read inside hot tick loop

After store.set_time(t + 1) succeeds, new_time is necessarily t + 1 — there is no need to call store.time() again. Each extra call does a full DB round-trip and the loop runs once per 800 ms interval on every tick. new_time can simply be replaced with t + 1.

Suggested change
let new_time = t + 1;
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 453-461

Comment:
**Redundant `store.time()` read inside hot tick loop**

After `store.set_time(t + 1)` succeeds, `new_time` is necessarily `t + 1` — there is no need to call `store.time()` again. Each extra call does a full DB round-trip and the loop runs once per 800 ms interval on every tick. `new_time` can simply be replaced with `t + 1`.

```suggestion
        let new_time = t + 1;
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Store swallows DB errors with .expect(), propagate Result through storage layer

1 participant