fix(eth): RPC failover on broadcast + pin networks on JsonRpcProvider#59
Merged
BitHighlander merged 3 commits intodevelopfrom Apr 30, 2026
Merged
fix(eth): RPC failover on broadcast + pin networks on JsonRpcProvider#59BitHighlander merged 3 commits intodevelopfrom
BitHighlander merged 3 commits intodevelopfrom
Conversation
Tenderly's public Optimism endpoint started rate-limiting broadcasts (HTTP 429 / -32005). getProvider() did failover for the pre-flight getBlockNumber but once a working URL was selected, every subsequent call (broadcast, getFeeData, etc.) hit that single URL forever. Result: signing succeeded, broadcast failed, dApp got SERVER_ERROR. - broadcastTransaction now iterates the full RPC list with a transient/definitive error classifier. Rate limits, 5xx, and network errors fail over to the next URL; nonce / insufficient-funds / gas errors stop immediately so we don't waste candidates on a doomed tx. 'already known' is treated as success (the tx is in mempool somewhere) using the locally-recovered hash from the signed bytes. - Last-resort RPC list (chains/lastResortRpcs.ts) appended AFTER Pioneer's URLs in the failover sequence. Pioneer always wins priority; the hardcoded list only fires when Pioneer's full set is exhausted, so a stale entry can never preempt a working live node — addresses the historical concern about hardcoded RPC tables blocking releases. - New makeStaticProvider helper (registry.ts) constructs JsonRpcProvider with a pinned network. ethers v6's auto-detect retried eth_chainId every 1s indefinitely on slow/dead/rate-limited URLs, generating background spam from any timed-out caller (Promise.race rejected the awaiting promise but left zombie retries running). All 6 sites in index.ts + ethereumHandler.ts now use the helper. Bad URLs fail fast on the actual call. - failedRpcs cooldown is honored across the broadcast path so a URL that just rate-limited isn't re-tried for 60s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h helper, preserve ProviderRpcError
- P1: makeStaticProvider gains optional { timeoutMs } that builds a FetchRequest with a per-HTTP-attempt timeout AND sets ThrottleParams.maxAttempts=1. Without this, ethers' default FetchRequest retries 429/5xx with exponential backoff for ~30s before surfacing the error — the exact stall the failover loop was meant to eliminate. Broadcast attempts now use timeoutMs:4000 so a hung URL fails over within ~4s.
- P2: handleEthSendRawTransaction now calls broadcastTransaction(params[0]) instead of inlining getProvider().broadcastTransaction. Raw signed tx submissions (dApps that sign externally) get the same Pioneer + last-resort iteration, transient/definitive classification, and already-known handling as eth_sendTransaction.
- P2: sendTransaction's catch passes ProviderRpcErrors through unchanged. Was wrapping every error as 'Error sending transaction', swallowing the classifier's output (insufficient funds, all-RPC-failed, etc.).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses two of the audit findings tagged P1: - P1#3 EVM signing preflight now uses a candidate-loop helper. New `withRpcFailover` shares the same Pioneer + last-resort priority and failedRpcs cooldown as broadcastTransaction (factored into `getCandidateRpcs`). signTransaction's nonce / estimateGas / getFeeData calls each fail over independently — a rate-limited URL on one read no longer breaks signing before broadcast can rescue. Each attempt gets a 5s transport timeout so a hung URL doesn't stall. - P1#4 getProvider's static provider now passes timeoutMs:5000 so ethers' transport itself aborts at 5s. Previously the awaited `Promise.race(getBlockNumber, timeout)` raced-out at the application layer while abandoned ethers requests kept retrying internally. The outer Promise.race becomes redundant once the transport-level timeout exists; dropped it. - broadcastTransaction's candidate-list/cooldown duplication is collapsed into `getCandidateRpcs` so future failover sites have one place to extend. Out of scope (separate PRs needed): - UTXO build-then-approval race + UTXO broadcast resilience (5 chains) - Solana send-time failover beyond health check - Tron raw-broadcast injected path - Background health polling overlap guard - Pioneer portfolio/custom-token endpoint timeouts - EVM read paths (balance/token validation) for non-broadcast resilience Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9 tasks
BitHighlander
added a commit
that referenced
this pull request
Apr 30, 2026
… failover (#60) * fix(audit): UTXO build race + cross-chain broadcast resilience + read failover Addresses the eight remaining audit findings from PR #59's review. P1 — UTXO build-then-approval race (#1) and broadcast resilience (#2): - bitcoin/dogecoin/litecoin/bitcoincash/dash all awaited buildTx() fire-and-forget then created the approval event afterwards. Race produced phantom approvals (response.unsignedTx undefined) or null storedEvent depending on timing. Now buildTx is awaited BEFORE addEvent, with unsignedTx attached to the event from the start. Also applied to thorchain/cosmos/maya/osmosis/ripple — they already awaited the build but used raw fetch() without timeout/retry. - Both build and broadcast now go through fetchJsonWithTimeout with 15s budget + 1 retry on 5xx. P2 — cross-cutting resilience: - New chrome-extension/src/background/fetchUtils.ts with fetchJsonWithTimeout + fetchWithTimeout. Applied to all six Pioneer endpoints in index.ts (portfolio, insight, nodes, tokens/metadata, tokens/custom GET/POST/DELETE, tokens/balances) and the UTXO/Tendermint chain handlers above. - New chrome-extension/src/background/chains/rpcFailover.ts with withRpcFailoverByNetworkId. Iterates user-override → Pioneer → last-resort with transient/definitive classification and per-attempt transport timeout. Applied to GET_ASSET_BALANCE, GET_EVM_BALANCE, VALIDATE_ERC20_TOKEN. - Drop check binds to the URL that accepted the broadcast — querying a different RPC could produce false-positive drop warnings, especially after a last-resort fallback succeeded. dropCheckUrlByHash maps hash → success URL; performDropCheck queries that exact URL with a 4s timeout, falling back to getProvider only on service-worker restart. - Solana broadcastTransaction iterates SOLANA_RPC_URLS on transient failures (rate limit, 5xx, timeout). Definitive errors (insufficient funds, blockhash expired, signature verification, account-in-use) skip the loop. Cached health-checked URL is invalidated on failure so the next caller re-tests. - Tron tronGridPost (injected bundle, can't import fetchUtils) gets inline timeout + one retry on 5xx. Broadcast paths get a longer 12s budget; reads stay at 8s. - Health poll (checkKeepKey) gets AbortSignal.timeout(3000) + a singleflight guard so a stalled localhost:1646 can't stack overlapping probes every 5s. Out of scope: fetchBalances per-chain RPC enrichment (line 500) still uses a single makeStaticProvider per chain since it's already inside a chain-iteration loop; further failover there would be over-engineering. GET_GAS_ESTIMATE (line 909) still uses the active provider directly — could be migrated to withRpcFailover later. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): Solana classifier + Bitcoin fall-through Two findings from PR #60's review. P1: Solana broadcast classifier was over-eager about definitive errors. - 'already processed' is treated as success — the tx is in mempool / processed somewhere; the dApp still needs the signature. Recover it from the signed bytes (first 64-byte sig in the signed-tx layout IS the canonical Solana transaction signature). Inline base58 encoder added; reuses the existing BASE58_ALPHABET const that base58Decode already declared. - 'blockhash not found' demoted from definitive to transient — can be RPC freshness for dApp-supplied transactions; trying the next URL is worth it. 'block height exceeded' stays definitive (the blockhash window has truly closed). - 'account in use' demoted to transient — usually a parallel-write race that can resolve on the next URL. P2: bitcoinHandler's broadcast catch logged + sent transaction_error but didn't re-throw, so the case 'transfer' fell through to default and returned 'Method transfer not supported' to the dApp instead of the real broadcast error. Other UTXO/Tendermint handlers don't have this catch — only bitcoin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Optimism broadcast hit rate-limit (Tenderly public, HTTP 429 /
-32005).getProvider()did failover for its pre-flightgetBlockNumberbut every subsequent call hit the same single URL — signing succeeded, broadcast failed.This PR makes broadcast iterate the full RPC list and adds a hardcoded last-resort table that runs after Pioneer's list (so it can never preempt a live node).
What changed
Broadcast failover (
broadcastTransactioninethereumHandler.ts)classifyBroadcastErrordistinguishes:definitive—insufficient funds,nonce too low,replacement transaction underpriced,gas required exceeds,intrinsic gas too low. Same outcome on any RPC; throw immediately.already-known— tx is in mempool somewhere; treat as success and return the locally-recovered hash from the signed bytes.transient— rate limit, 5xx, network, ethersSERVER_ERROR. Mark URL failed for 60s, try next.Last-resort RPC list (
chains/lastResortRpcs.ts, new)feedback_no_hardcoded_rpcs.md.Pinned-network
JsonRpcProvider(makeStaticProviderinchains/registry.ts)eth_chainIdto auto-detect the network on the first RPC call, and retries every 1s indefinitely if the URL is slow / dead / rate-limited.Promise.race(getBalance, timeout)rejects the awaiting promise but leaves the abandoned provider retrying — which is the source of theJsonRpcProvider failed to detect network and cannot start up; retry in 1sspam (50+ lines per balance refresh).staticNetwork: trueand the chainId at construction time so ethers skips detection. Bad URLs fail fast on the actual call.index.ts:485(fetchBalances),:891(GET_GAS_ESTIMATE),:1352(GET_ASSET_BALANCE),:1494(GET_EVM_BALANCE),:1701(VALIDATE_ERC20_TOKEN),ethereumHandler.ts:150(getProvider).Out of scope
signTransactionpre-flight calls (getTransactionCount,estimateGas,getFeeData) still hit a single URL viagetProvider(). If those rate-limit you'll see different failure modes — fee data unavailable, nonce 0, etc. Worth following up but a separate PR.Files
chrome-extension/src/background/chains/lastResortRpcs.ts(new)chrome-extension/src/background/chains/registry.ts—makeStaticProviderexportchrome-extension/src/background/chains/ethereumHandler.ts— broadcast loop rewrite, getProvider uses pinned helperchrome-extension/src/background/index.ts— 5 JsonRpcProvider sites use pinned helperTest plan
[HANDOFF] BEX → RPC (broadcast attempt) url=...for each attempt.nonce: 0) — should fail definitively after the first URL, not walk the whole list.fetchBalanceswith multiple chains in storage: console no longer shows the 50×JsonRpcProvider failed to detect networkspam after a chain switch. Bad URLs fail within 5s instead of looping.already knownpath: difficult to trigger naturally; skip if not reproducible.🤖 Generated with Claude Code