refactor: simplify DA worker to prioritize fresh blocks#38
Conversation
Unified the two-phase DA worker (backfill + pending) into a single query that always processes the highest-numbered blocks needing DA info first. The new SELECT_NEEDS_DA_SQL finds blocks missing from block_da_status OR with 0 heights, ordered DESC so fresh blocks float to the top during historical backfill — preventing starvation of new blocks. This fixes the issue where new blocks arriving during backfill would get inserted with 0 DA heights but phase 2 would never run to re-check them once Celestia included them (because phase 1 always consumed the full budget). Now all blocks are checked by the same priority queue.
📝 WalkthroughWalkthroughThe worker loop was refactored from a two-phase approach to a single unified flow that processes blocks in one step, combining missing status row detection and pending row updates into one SQL query and simplifying writes through an improved UPSERT statement with no-op suppression. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/crates/atlas-server/src/indexer/da_worker.rs (1)
259-272: Consider: String-based SQL assertions are brittle.These tests verify SQL structure via substring matching, which can break on formatting changes even when query semantics remain correct. For now this provides reasonable regression coverage, but consider adding integration tests against a real database (e.g., using
sqlx::test) to validate actual query behavior if not already covered elsewhere.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/indexer/da_worker.rs` around lines 259 - 272, The tests assert SQL structure via brittle substring checks on SELECT_NEEDS_DA_SQL and UPSERT_DA_STATUS_SQL; instead add integration tests that execute the statements against a real test database (e.g., using sqlx::test or a temporary test database) and assert the observed behavior (that the SELECT returns newest blocks first and filters rows where d.block_number IS NULL or header_da_height/data_da_height == 0, and that the UPSERT suppresses noop writes) rather than relying on fragile string matches; implement these integration tests calling the actual query builders/functions that produce SELECT_NEEDS_DA_SQL and UPSERT_DA_STATUS_SQL (or normalize/parse the SQL if you must keep unit tests) so semantics are validated end-to-end.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/crates/atlas-server/src/indexer/da_worker.rs`:
- Around line 259-272: The tests assert SQL structure via brittle substring
checks on SELECT_NEEDS_DA_SQL and UPSERT_DA_STATUS_SQL; instead add integration
tests that execute the statements against a real test database (e.g., using
sqlx::test or a temporary test database) and assert the observed behavior (that
the SELECT returns newest blocks first and filters rows where d.block_number IS
NULL or header_da_height/data_da_height == 0, and that the UPSERT suppresses
noop writes) rather than relying on fragile string matches; implement these
integration tests calling the actual query builders/functions that produce
SELECT_NEEDS_DA_SQL and UPSERT_DA_STATUS_SQL (or normalize/parse the SQL if you
must keep unit tests) so semantics are validated end-to-end.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1bd13e6d-989a-46c4-942a-60f3ed331837
📒 Files selected for processing (1)
backend/crates/atlas-server/src/indexer/da_worker.rs
Summary
Unified the two-phase DA worker design into a single query that always processes the highest-numbered blocks first. Fresh blocks now get priority over historical backfill, preventing the previous issue where new blocks would be inserted with 0 DA heights but never re-checked until the full backfill completed.
The new SELECT_NEEDS_DA_SQL combines both missing blocks and 0-height blocks in one query ordered by block number DESC. This eliminates phase starvation where phase 2 (pending re-checks) never got RPC budget when phase 1 (historical backfill) consumed the full 100-request quota.
Changes: merged two separate query paths into one, simplified the main run loop from managing two phases to a single process_blocks call, removed ~90 lines of code.
Testing
All existing tests pass. The query correctly identifies both missing and pending blocks, and the upsert operation properly suppresses no-op writes with IS DISTINCT FROM.
Summary by CodeRabbit