Skip to content

Comments

feat: add timeout for http and db requests#15

Open
pthmas wants to merge 1 commit intomainfrom
pierrick/fix-last-page-query
Open

feat: add timeout for http and db requests#15
pthmas wants to merge 1 commit intomainfrom
pierrick/fix-last-page-query

Conversation

@pthmas
Copy link
Collaborator

@pthmas pthmas commented Feb 24, 2026

Overview

  • Replace expensive OFFSET scans on the blocks table with a keyset cursor (WHERE number <= cursor) derived from the block number, cutting last-page query time from 30s+ to <15ms.
  • Add a 10s Postgres statement_timeout on every pool connection and an HTTP TimeoutLayer to prevent any slow query from exhausting the connection pool and crashing the API.

Summary by CodeRabbit

  • New Features
    • Implemented request timeout handling—requests exceeding 10 seconds will return HTTP 408 status
    • Improved block listing pagination with keyset-based approach for better performance with large datasets
    • Added database query timeout (10 seconds) to prevent queries from running indefinitely

@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

📝 Walkthrough

Walkthrough

This change implements a multi-layered timeout strategy for the backend. Tower-http dependency is updated with a timeout feature, a 10-second HTTP request timeout is configured at the router level, database connections receive a 10-second statement timeout via pool initialization hook, and block pagination switches from OFFSET to keyset-based approach.

Changes

Cohort / File(s) Summary
Timeout Infrastructure
backend/Cargo.toml, backend/crates/atlas-api/src/main.rs, backend/crates/atlas-common/src/db.rs
Added timeout feature to tower-http; configured TimeoutLayer middleware returning HTTP 408 on 10-second request timeout; implemented database-level statement timeout via after_connect hook on PgPoolOptions.
Query Optimization
backend/crates/atlas-api/src/handlers/blocks.rs
Replaced OFFSET-based pagination with keyset pagination using block number cursor, improving performance for block listing queries.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 Timeouts tick and cursors hop,
Keyset bounds make queries stop,
Ten seconds, measured, clean and tight,
Database and requests work just right!
No more waiting, code runs true,
Optimization in every queue!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding timeout mechanisms for HTTP requests and database operations across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pierrick/fix-last-page-query

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
backend/crates/atlas-api/src/main.rs (1)

8-11: Align timeout responses with the standard API error envelope.

All handlers return errors through ApiResult<Json<T>>, which serializes via ApiError::into_response() to produce a consistent JSON envelope: {"error": "message"}. However, TimeoutLayer at lines 214-217 returns a bare 408 status code without a response body, creating inconsistency for clients expecting the standard error format.

Consider wrapping with a custom error handler layer to serialize timeouts into the same envelope as other API errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-api/src/main.rs` around lines 8 - 11, TimeoutLayer
currently returns a bare 408 response that bypasses our ApiResult/ApiError JSON
envelope; modify the timeout handling so timeouts are converted into our
standard ApiError and serialized via ApiError::into_response (or wrap
TimeoutLayer with a custom layer/handler) so controllers still return
ApiResult<Json<T>> style errors. Locate where TimeoutLayer is added (the tower
layer setup using TimeoutLayer) and replace or wrap it with a layer that
intercepts timeout errors, maps them to an ApiError variant (e.g.,
ApiError::timeout or ApiError::new with message "request timed out"), and
produces the same JSON envelope by calling ApiError::into_response before
returning the response. Ensure the handler uses the existing ApiError type and
serialization path so clients always receive {"error": "..."} for timeouts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/crates/atlas-api/src/handlers/blocks.rs`:
- Around line 22-36: The cursor calculation for keyset pagination uses
Pagination::offset() which multiplies by the raw, unclamped limit causing
skipped/duplicated pages; fix by using the clamped limit when computing the
cursor (e.g. replace cursor = (total_count - 1) - pagination.offset() with a
calculation that uses pagination.limit(), or change Pagination::offset() to
compute (page.saturating_sub(1)) * self.limit.min(100)); update the cursor
assignment in the handler that computes `cursor` (and/or the Pagination::offset
method) so both use the same clamped limit to keep pages consistent with the SQL
LIMIT binding.

In `@backend/crates/atlas-common/src/db.rs`:
- Around line 4-15: The pool created by create_pool uses after_connect to set a
10s statement_timeout which can cause run_migrations to fail on long
index/insert operations; change the implementation so migrations run with a
pool/connection that does not set the 10s timeout or make the timeout
configurable: add an optional parameter (e.g., statement_timeout:
Option<Duration> or a bool like set_statement_timeout) to create_pool and only
install the after_connect timeout when provided, or add a new helper
create_migration_pool that omits the after_connect handler, and update
run_migrations to use the migration-specific pool/connection instead of the
timeout-bearing pool.

---

Nitpick comments:
In `@backend/crates/atlas-api/src/main.rs`:
- Around line 8-11: TimeoutLayer currently returns a bare 408 response that
bypasses our ApiResult/ApiError JSON envelope; modify the timeout handling so
timeouts are converted into our standard ApiError and serialized via
ApiError::into_response (or wrap TimeoutLayer with a custom layer/handler) so
controllers still return ApiResult<Json<T>> style errors. Locate where
TimeoutLayer is added (the tower layer setup using TimeoutLayer) and replace or
wrap it with a layer that intercepts timeout errors, maps them to an ApiError
variant (e.g., ApiError::timeout or ApiError::new with message "request timed
out"), and produces the same JSON envelope by calling ApiError::into_response
before returning the response. Ensure the handler uses the existing ApiError
type and serialization path so clients always receive {"error": "..."} for
timeouts.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af3daf4 and 4fb1182.

📒 Files selected for processing (4)
  • backend/Cargo.toml
  • backend/crates/atlas-api/src/handlers/blocks.rs
  • backend/crates/atlas-api/src/main.rs
  • backend/crates/atlas-common/src/db.rs

Comment on lines +22 to +36
// Convert page-based navigation to a keyset cursor using block numbers.
// Blocks are sequential so: cursor = max_block - (page - 1) * limit
// WHERE number <= cursor is O(log N) via primary key; OFFSET was O(N).
let limit = pagination.limit();
let cursor = (total_count - 1) - pagination.offset();

let blocks: Vec<Block> = sqlx::query_as(
"SELECT number, hash, parent_hash, timestamp, gas_used, gas_limit, transaction_count, indexed_at
FROM blocks
WHERE number <= $2
ORDER BY number DESC
LIMIT $1 OFFSET $2"
LIMIT $1"
)
.bind(pagination.limit())
.bind(pagination.offset())
.bind(limit)
.bind(cursor)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -n "struct Pagination|impl Pagination|fn offset" backend/crates/atlas-common/src/types.rs

Repository: evstack/atlas

Length of output: 144


🏁 Script executed:

sed -n '358,400p' backend/crates/atlas-common/src/types.rs

Repository: evstack/atlas

Length of output: 974


Pagination keyset cursor calculation has a critical bug: offset() uses unclamped limit.

The offset() method in Pagination returns (page - 1) * self.limit using the raw client-provided limit, while limit() applies a cap of 100. This mismatch breaks the cursor calculation in the keyset query.

Example: A client requesting page=2, limit=200 will get:

  • offset() = (2-1) × 200 = 200 (using raw limit)
  • limit() = min(200, 100) = 100 (clamped)
  • cursor = total_count - 1 - 200
  • Query fetches only 100 rows but skips 200, causing duplicate/skipped pages.

Fix: Either apply the same clamping in offset()(self.page.saturating_sub(1)) * self.limit.min(100) — or recompute the cursor using the clamped limit directly: cursor = (total_count - 1) - (pagination.page.saturating_sub(1)) * pagination.limit().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-api/src/handlers/blocks.rs` around lines 22 - 36, The
cursor calculation for keyset pagination uses Pagination::offset() which
multiplies by the raw, unclamped limit causing skipped/duplicated pages; fix by
using the clamped limit when computing the cursor (e.g. replace cursor =
(total_count - 1) - pagination.offset() with a calculation that uses
pagination.limit(), or change Pagination::offset() to compute
(page.saturating_sub(1)) * self.limit.min(100)); update the cursor assignment in
the handler that computes `cursor` (and/or the Pagination::offset method) so
both use the same clamped limit to keep pages consistent with the SQL LIMIT
binding.

Comment on lines +4 to +15
/// Create a database connection pool.
/// Sets statement_timeout = 10s on every connection to prevent slow queries
/// from exhausting the pool.
pub async fn create_pool(database_url: &str, max_connections: u32) -> Result<PgPool, sqlx::Error> {
PgPoolOptions::new()
.max_connections(max_connections)
.after_connect(|conn, _meta| {
Box::pin(async move {
conn.execute("SET statement_timeout = '10s'").await?;
Ok(())
})
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Step 1: Find migration files
find . -path "*/migrations/*" -name "*.sql" -o -path "*/.migrations/*" -name "*.sql" | head -20

Repository: evstack/atlas

Length of output: 443


🏁 Script executed:

#!/bin/bash
# Step 2: Check directory structure for migrations
ls -la backend/ 2>/dev/null || echo "backend/ not found"
find . -type d -name migrations | head -10

Repository: evstack/atlas

Length of output: 544


🏁 Script executed:

#!/bin/bash
# Step 3: Search for run_migrations function to see how the pool is used
rg -A 10 "fn run_migrations|async fn run_migrations" --type rust

Repository: evstack/atlas

Length of output: 315


🏁 Script executed:

#!/bin/bash
# Step 4: Check the current file to see the full pool creation and migration usage
cat -n backend/crates/atlas-common/src/db.rs | head -50

Repository: evstack/atlas

Length of output: 1010


🏁 Script executed:

#!/bin/bash
# Read all migration files to check for long-running operations
for file in ./backend/migrations/*.sql; do
  echo "=== $(basename $file) ==="
  cat "$file"
  echo ""
done

Repository: evstack/atlas

Length of output: 15018


Statement timeout will block migrations on startup.

run_migrations receives the pool created with after_connect setting statement_timeout = 10s on every connection. The migration in 20240105 includes INSERT INTO tx_hash_lookup SELECT ... FROM transactions and creates GIN trigram indexes, both of which can exceed 10s on large tables. Other migrations also create indexes and ALTER TABLE operations that may timeout. Consider either running migrations with a separate connection/pool without the timeout, or making the timeout configurable for migration execution.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-common/src/db.rs` around lines 4 - 15, The pool created
by create_pool uses after_connect to set a 10s statement_timeout which can cause
run_migrations to fail on long index/insert operations; change the
implementation so migrations run with a pool/connection that does not set the
10s timeout or make the timeout configurable: add an optional parameter (e.g.,
statement_timeout: Option<Duration> or a bool like set_statement_timeout) to
create_pool and only install the after_connect timeout when provided, or add a
new helper create_migration_pool that omits the after_connect handler, and
update run_migrations to use the migration-specific pool/connection instead of
the timeout-bearing pool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant