fix: make Store's public methods return Result<T,Error>, propagate and update every callers accordingly#340
Conversation
…ate every callers accordingly
Greptile SummaryThis PR replaces all panicking
Confidence Score: 3/5Safe to merge after fixing the walk-up loop DB-error handling; all other changes are mechanical and correct One P1 logic bug: a storage read error in the pending-block walk-up loop is silently coerced to "block not found", causing spurious P2P requests that restart the cycle on re-delivery. The rest of the error-propagation work is solid and the happy path is unaffected. crates/blockchain/src/lib.rs — the
|
| Filename | Overview |
|---|---|
| crates/storage/src/store.rs | Core store: all public methods now return Result; .expect() replaced with ?; error propagation is clean and consistent throughout |
| crates/blockchain/src/lib.rs | Actor layer updated to propagate storage errors; P1 issue: while let Ok(Some(...)) in walk-up loop silently converts DB errors to missing-block, triggering spurious network requests |
| crates/blockchain/src/store.rs | Fork-choice logic updated to propagate errors; redundant store.time() DB read inside hot tick loop (P2) |
| crates/net/p2p/src/req_resp/handlers.rs | Block-by-root handler and build_status updated; DB errors in block lookup silently swallowed without logging (P2) |
| crates/net/rpc/src/fork_choice.rs | Introduced local try_store! macro for 500 responses; macro discards error details without logging (P2) |
| crates/net/rpc/src/lib.rs | RPC handlers correctly map storage errors to 500 and missing resources to 404 |
| bin/ethlambda/src/main.rs | from_anchor_state now propagates via map_err; BlockChain::spawn uses .expect() at startup which is acceptable for initialization failures |
| bin/ethlambda/src/checkpoint_sync.rs | Adds Storage(#[from] ethlambda_storage::Error) variant to CheckpointSyncError; straightforward and correct |
| crates/storage/src/lib.rs | Re-exports Error from the api module; trivial one-line change |
| crates/blockchain/tests/forkchoice_spectests.rs | Test file updated to unwrap Results; correct use of .unwrap() in test context |
| crates/blockchain/tests/signature_spectests.rs | Test file updated to unwrap get_forkchoice_store Result; minimal change, correct |
Comments Outside Diff (2)
-
crates/net/p2p/src/req_resp/handlers.rs, line 901-906 (link)DB errors silently swallowed with no log
The comment "DB errors are silently skipped (per spec)" conflates two distinct cases: the spec says missing blocks may be skipped, but a storage error is not a missing block — it signals a potential disk fault. Swallowing it without a log makes these failures invisible in production.
Prompt To Fix With AI
This is a comment left during a code review. Path: crates/net/p2p/src/req_resp/handlers.rs Line: 901-906 Comment: **DB errors silently swallowed with no log** The comment "DB errors are silently skipped (per spec)" conflates two distinct cases: the spec says missing blocks may be skipped, but a storage error is not a missing block — it signals a potential disk fault. Swallowing it without a log makes these failures invisible in production. How can I resolve this? If you propose a fix, please make it concise.
-
crates/net/rpc/src/fork_choice.rs, line 940-947 (link)try_store!macro discards error context with no logErr(_)silently drops the actual storage error before returning a 500. Storage errors at this layer indicate a potential disk or backend issue and should be logged so operators can diagnose them.Prompt To Fix With AI
This is a comment left during a code review. Path: crates/net/rpc/src/fork_choice.rs Line: 940-947 Comment: **`try_store!` macro discards error context with no log** `Err(_)` silently drops the actual storage error before returning a 500. Storage errors at this layer indicate a potential disk or backend issue and should be logged so operators can diagnose them. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 4
crates/blockchain/src/lib.rs:242
**DB error silently treated as missing block during walk-up**
In `process_or_pend_block`, the walk-up loop uses `while let Ok(Some(header)) = self.store.get_block_header(&missing_root)`, which breaks silently on both `Ok(None)` and `Err(...)`. When a storage error occurs mid-walk, the code falls through to `self.request_missing_block(missing_root)` — treating a read failure as if the block is genuinely absent from the DB. This issues a spurious P2P request for a block the node already holds, and on re-delivery the cycle restarts.
### Issue 2 of 4
crates/blockchain/src/store.rs:453-461
**Redundant `store.time()` read inside hot tick loop**
After `store.set_time(t + 1)` succeeds, `new_time` is necessarily `t + 1` — there is no need to call `store.time()` again. Each extra call does a full DB round-trip and the loop runs once per 800 ms interval on every tick. `new_time` can simply be replaced with `t + 1`.
```suggestion
let new_time = t + 1;
```
### Issue 3 of 4
crates/net/p2p/src/req_resp/handlers.rs:901-906
**DB errors silently swallowed with no log**
The comment "DB errors are silently skipped (per spec)" conflates two distinct cases: the spec says missing blocks may be skipped, but a storage error is not a missing block — it signals a potential disk fault. Swallowing it without a log makes these failures invisible in production.
### Issue 4 of 4
crates/net/rpc/src/fork_choice.rs:940-947
**`try_store!` macro discards error context with no log**
`Err(_)` silently drops the actual storage error before returning a 500. Storage errors at this layer indicate a potential disk or backend issue and should be logged so operators can diagnose them.
Reviews (1): Last reviewed commit: "make Store's public methods return Resul..." | Re-trigger Greptile
| @@ -236,21 +241,25 @@ impl BlockChainServer { | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
DB error silently treated as missing block during walk-up
In process_or_pend_block, the walk-up loop uses while let Ok(Some(header)) = self.store.get_block_header(&missing_root), which breaks silently on both Ok(None) and Err(...). When a storage error occurs mid-walk, the code falls through to self.request_missing_block(missing_root) — treating a read failure as if the block is genuinely absent from the DB. This issues a spurious P2P request for a block the node already holds, and on re-delivery the cycle restarts.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/lib.rs
Line: 242
Comment:
**DB error silently treated as missing block during walk-up**
In `process_or_pend_block`, the walk-up loop uses `while let Ok(Some(header)) = self.store.get_block_header(&missing_root)`, which breaks silently on both `Ok(None)` and `Err(...)`. When a storage error occurs mid-walk, the code falls through to `self.request_missing_block(missing_root)` — treating a read failure as if the block is genuinely absent from the DB. This issues a spurious P2P request for a block the node already holds, and on re-delivery the cycle restarts.
How can I resolve this? If you propose a fix, please make it concise.| @@ -446,7 +460,7 @@ fn on_block_core( | |||
| // This check ensures the state has been computed for the parent block. | |||
| let parent_state = | |||
There was a problem hiding this comment.
Redundant
store.time() read inside hot tick loop
After store.set_time(t + 1) succeeds, new_time is necessarily t + 1 — there is no need to call store.time() again. Each extra call does a full DB round-trip and the loop runs once per 800 ms interval on every tick. new_time can simply be replaced with t + 1.
| let new_time = t + 1; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 453-461
Comment:
**Redundant `store.time()` read inside hot tick loop**
After `store.set_time(t + 1)` succeeds, `new_time` is necessarily `t + 1` — there is no need to call `store.time()` again. Each extra call does a full DB round-trip and the loop runs once per 800 ms interval on every tick. `new_time` can simply be replaced with `t + 1`.
```suggestion
let new_time = t + 1;
```
How can I resolve this? If you propose a fix, please make it concise.
🗒️ Description / Motivation
All public Store methods that perform disk I/O previously panicked on failure via
.expect(). This PR makes every such method returnResult<T, Error>and propagates errors through every caller; the actor layer, RPC handlers, p2p handlers, fork-choice logic, and the binary entrypoint. Panicking on a recoverable storage error is unsafe in a long-running consensus node; this change ensures errors are surfaced and handled deliberately.What Changed
crates/storage/src/store.rs:from_anchor_state/get_forkchoice_store/init_store → Result<Self, Error>;set_time,set_safe_target,insert_state, write_signed_block,prune_live_chain,prune_old_states,prune_old_blocksall returnResult;head_slot/safe_target_slotuse.ok_or_else() instead of.unwrap()crates/storage/src/lib.rs: re-exportsErrorcrates/blockchain/src/store.rs: addedStoreError::Storage(#[source] ethlambda_storage::Error)variant +From impl; all storage calls use?or two-step?.ok_or(); affected functions updated to returnResultcrates/blockchain/src/lib.rs : actor layer uses letOk(...) else { return }for critical paths; non-critical reads useif let Ok(...);errors logged viainspect_errcrates/net/rpc/src/lib.rs:get_latest_finalized_state/get_latest_justified_statemap storage errors to 500, missing resources to 404crates/net/rpc/src/fork_choice.rs: localtry_store!macro short-circuits on storage error with 500; non-critical header reads use.ok().flatten()crates/net/p2p/src/req_resp/handlers.rs: block lookup usesif let Ok(Some(b));build_statususes.unwrap_or_default()for graceful degradationbin/ethlambda/src/checkpoint_sync.rs: addedStorage(#[from] ethlambda_storage::Error)toCheckpointSyncErrorbin/ethlambda/src/main.rs: removedOk(...)wrapper aroundfrom_anchor_state;usesmap_errfor error conversioncrates/blockchain/tests/forkchoice_spectests.rs/signature_spectests.rs: all store calls updated to handleResult(.unwrap()in test context).Correctness / Behavior Guarantees
.filter_maprather than aborting the entire pruneTests Added / Run
Related Issues / PRs
.expect(), propagate Result through storage layer #306