indexer: remove silent failures and block_heads_rx lagged error#452
indexer: remove silent failures and block_heads_rx lagged error#452
Conversation
886e1d3 to
bdfd8a1
Compare
…eads_rx will not be lagged
There was a problem hiding this comment.
Pull Request Overview
This PR removes silent failures from Ethereum and Solana indexers and fixes block heads stream lagging issues. The changes ensure better error handling and prevent silent failures that occurred in the previous testnet release.
- Converts indexer run functions from returning
Resultto returning()while adding explicit error logging - Changes catchup to run as a spawned tokio task instead of blocking the main loop
- Improves error handling for broadcast channel lagged and closed errors
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| chain-signatures/node/src/indexer_sol.rs | Removes silent failures by changing return type and adding error logging for program address parsing and client creation |
| chain-signatures/node/src/indexer_eth.rs | Extensive refactoring to remove silent failures, spawn catchup as async task, and improve error handling throughout the indexer |
| }; | ||
| let end_block_number = latest_block.header.number; | ||
| // helios can only go back maximum 8191 blocks, so we need to adjust the start block number if it's too far behind | ||
| let helios_oldest_block_number = latest_block.header.number - 8191; |
There was a problem hiding this comment.
The magic number 8191 should be defined as a named constant to improve code maintainability and make the limitation clear.
| let helios_oldest_block_number = latest_block.header.number - 8191; | |
| let helios_oldest_block_number = latest_block.header.number - HELIOS_MAX_BLOCKS_BACKWARD; |
| add_failed_block(blocks_failed_tx.clone(), block_number, block_hash).await; | ||
| continue; | ||
| } | ||
| if block_number % 10 == 0 { |
There was a problem hiding this comment.
The magic number 10 for progress logging frequency should be defined as a named constant.
| if block_number % 10 == 0 { | |
| if block_number % PROGRESS_LOGGING_FREQUENCY == 0 { |
| node_near_account_id: AccountId, | ||
| ) { | ||
| ) -> anyhow::Result<BlockNumber> { | ||
| let Some(start_block_number) = start_block_number else { |
There was a problem hiding this comment.
[nitpick] The function signature change from () to -> anyhow::Result<BlockNumber> while the early return case returns an error seems inconsistent with the PR's goal of removing silent failures. Consider logging this case and returning a success value instead.
| let Some(start_block_number) = start_block_number else { | |
| let Some(start_block_number) = start_block_number else { | |
| tracing::error!("Failed to start catch-up: no start block number provided"); |
Our prev testnet release had 3 nodes silently failing to run eth indexer.