Skip to content

sdk: add retry with exponential backoff to RPC read methods#3099

Merged
snormore merged 4 commits intomainfrom
snor/sdk-rpc-retry
Feb 26, 2026
Merged

sdk: add retry with exponential backoff to RPC read methods#3099
snormore merged 4 commits intomainfrom
snor/sdk-rpc-retry

Conversation

@snormore
Copy link
Contributor

@snormore snormore commented Feb 25, 2026

Summary of Changes

  • Add retry with exponential backoff (3 retries, 500ms–5s) to all read-only RPC calls in DZClient using the existing backon dependency
  • Only retry transient network errors (IO, Reqwest, Middleware) — permanent errors like AccountNotFound fail immediately via a .when() filter, avoiding wasted backoff delays in the activator and CLI
  • Covers both the DZClient inherent methods and the DoubleZeroClient trait impl: get_balance, get_epoch, get_account, get_all, gets, get, get_program_accounts, get_transactions, and get_logs
  • Write methods (execute_transaction) and subscription loops (subscribe, gets_and_subscribe) are intentionally not retried

Diff Breakdown

Category Files Lines (+/-) Net
Core logic 1 +103 / -37 +66
Docs 1 +2 / -0 +2

Core logic change in a single file — retry wrappers with transient-error filtering on all RPC read calls.

Key files (click to expand)
  • smartcontract/sdk/rs/src/client.rs — added rpc_retry_builder(), is_retryable_rpc_error() helpers and wrapped all read-only RPC calls with filtered retry
  • CHANGELOG.md — added unreleased entry

Testing Verification

  • All 99 existing SDK unit tests pass (cargo test -p doublezero_sdk)
  • Clippy clean with -Dclippy::all -Dwarnings
  • Verified that RpcError::ForUser("AccountNotFound") — returned by Solana's get_account() for non-existent accounts — is correctly classified as non-retryable, preventing the activator from wasting ~3.5s of backoff per missing account lookup

Wrap all read-only RPC calls in DZClient with backon retry (3 retries,
500ms-5s exponential backoff) to handle transient RPC timeouts gracefully.
Write methods and subscription loops are intentionally not retried.
Add a .when() filter to all retry sites so that only transient network
errors (IO, Reqwest, Middleware) are retried. Permanent errors like
AccountNotFound (RpcError::ForUser) now fail immediately instead of
retrying 3 times with backoff, which was causing activation delays in
the activator and e2e test timeouts.
Replace "latest" with "^1" for @types/bun in all TypeScript SDK packages
to avoid forced npm registry lookups that can hang indefinitely when the
registry is unreachable. Also add timeout-minutes: 10 to the sdk-test CI
job, and remove redundant per-package bun lockfiles in favor of the
workspace-level lockfile.
@snormore snormore marked this pull request as ready for review February 25, 2026 21:00
@snormore snormore enabled auto-merge (squash) February 25, 2026 21:13
@snormore snormore merged commit d065346 into main Feb 26, 2026
30 checks passed
@snormore snormore deleted the snor/sdk-rpc-retry branch February 26, 2026 05:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants