Skip to content

fix(audit): UTXO build race + cross-chain broadcast resilience + read failover#60

Merged
BitHighlander merged 2 commits intodevelopfrom
fix/audit-followup
Apr 30, 2026
Merged

fix(audit): UTXO build race + cross-chain broadcast resilience + read failover#60
BitHighlander merged 2 commits intodevelopfrom
fix/audit-followup

Conversation

@BitHighlander
Copy link
Copy Markdown
Collaborator

Summary

Closes the eight remaining audit findings from #59's review.

P1 — release blockers

#1 UTXO build-then-approval race (5 chains)

bitcoin/dogecoin/litecoin/bitcoincash/dash all started buildTx() fire-and-forget then created the approval event. Two race outcomes:

  • Build resolved before addEventgetEventById returned null and silently dropped the unsignedTx update.
  • User approved before build resolved → response.unsignedTx was undefined, signing crashed or used stale data.

Now await fetchJsonWithTimeout(...) for the build BEFORE creating the event, with unsignedTx attached to the event from creation. Same pattern applied to thorchain/cosmos/maya/osmosis/ripple for consistency (they already awaited but used raw fetch).

#2 UTXO/Tendermint broadcast no timeout/retry/failover

All build and broadcast calls now go through fetchJsonWithTimeout with 15s budget + 1 retry on 5xx. Explicit response.ok check before .json() so a transient Pioneer hiccup surfaces as a clean error instead of malformed-JSON parse failure.

P2 — cross-cutting resilience

#5 EVM read failover

New chrome-extension/src/background/chains/rpcFailover.ts with withRpcFailoverByNetworkId(networkId, op, options). Iterates user-override → Pioneer URLs → last-resort with transient/definitive classification and a 5s transport timeout per attempt. Applied to GET_ASSET_BALANCE, GET_EVM_BALANCE, VALIDATE_ERC20_TOKEN. Distinct failedRpcs cooldown from ethereumHandler's active-provider path so cross-chain rate limits stay independent.

#6 Drop check binds to success URL

scheduleDropCheck(hash, delayMs, successUrl?) now records the URL that accepted the broadcast in a hash→URL map. performDropCheck queries that exact URL with a 4s transport timeout instead of running getProvider() again — which could pick a different RPC that never saw the tx and produce false-positive drop warnings (especially after last-resort fallback succeeded). Map cleared after the latest scheduled check (45s). Service-worker restart loses the URL hint and falls back to getProvider — best effort.

#7 Solana send-time failover

broadcastTransaction iterates SOLANA_RPC_URLS (cached health-pick first, then any others). Transient errors (rate limit, 5xx, timeout) fail over; definitive errors (insufficient funds, blockhash expired, signature verification, account in use, already processed) skip the loop. Cached health pick invalidated on failure so the next caller re-tests.

#8 Tron raw broadcast resilience

tronGridPost (injected bundle — can't share fetchUtils) gets inline timeout + one retry on 5xx. Broadcast paths use 12s budget; reads stay at 8s.

#9 Health poll singleflight

checkKeepKey gets AbortSignal.timeout(3000) + a healthPollInflight guard. A stalled localhost:1646 request can't stack overlapping probes every 5s; worst case is one stalled request hung for 3s before the next tick can run.

#10 Pioneer endpoint timeouts

New chrome-extension/src/background/fetchUtils.ts with fetchJsonWithTimeout + fetchWithTimeout. Applied to all Pioneer endpoints (portfolio, insight, nodes, tokens/metadata, tokens/custom GET/POST/DELETE, tokens/balances) plus all UTXO/Tendermint chain handlers.

Out of scope

  • fetchBalances per-chain RPC enrichment at index.ts:500 — already inside an outer chain-iteration loop; nested failover is over-engineering.
  • GET_GAS_ESTIMATE at index.ts:909 — still uses the active provider directly. Migrate to withRpcFailover (the active-provider variant in ethereumHandler) in a follow-up if rate-limit complaints surface there.

Files

  • New: chrome-extension/src/background/fetchUtils.ts
  • New: chrome-extension/src/background/chains/rpcFailover.ts
  • Modified: 5 UTXO handlers, 5 Tendermint handlers, ethereumHandler, solanaHandler, tron-provider, index.ts (Pioneer timeouts + 3 read failover sites + health poll guard), injected.js bundle output

Test plan

  • BTC/LTC/DOGE/BCH/DASH transfer from side panel — popup opens with a populated unsignedTx (no flicker, no "loading transaction" hang)
  • Pioneer down test (block api.keepkey.info via /etc/hosts): build/broadcast surface a clean timeout error within ~15s instead of hanging forever
  • BSC/ETH balance refresh with one of Pioneer's primary RPCs rate-limiting — failover catches it; balance still loads
  • ERC-20 validation against a token on a Pioneer-rate-limited chain — fails over to the next URL
  • Solana broadcast with a flaky helius/solana endpoint — second URL accepts the tx
  • Tron transfer with TronGrid 5xx — retried once, succeeds on second attempt
  • Disconnect KeepKey for 30+ seconds, watch console for log volume — overlapping probes are gone (one log every 5s, not stacked)
  • Drop check after a broadcast that succeeded via last-resort RPC — [DROP-CHECK] line shows the right URL was queried
  • Existing flows from feat: Pioneer-sourced EVM chain registry + Solana sign-message UX #57/feat(ux): risk-tiered approval colors, hex dump + copy, deep-link Chainlist, typed timeouts #58/fix(eth): RPC failover on broadcast + pin networks on JsonRpcProvider #59: BSC switch, signMessage popup, broadcast failover — no regressions

🤖 Generated with Claude Code

BitHighlander and others added 2 commits April 30, 2026 17:23
… failover

Addresses the eight remaining audit findings from PR #59's review.

P1 — UTXO build-then-approval race (#1) and broadcast resilience (#2):
- bitcoin/dogecoin/litecoin/bitcoincash/dash all awaited buildTx() fire-and-forget then created the approval event afterwards. Race produced phantom approvals (response.unsignedTx undefined) or null storedEvent depending on timing. Now buildTx is awaited BEFORE addEvent, with unsignedTx attached to the event from the start. Also applied to thorchain/cosmos/maya/osmosis/ripple — they already awaited the build but used raw fetch() without timeout/retry.
- Both build and broadcast now go through fetchJsonWithTimeout with 15s budget + 1 retry on 5xx.

P2 — cross-cutting resilience:
- New chrome-extension/src/background/fetchUtils.ts with fetchJsonWithTimeout + fetchWithTimeout. Applied to all six Pioneer endpoints in index.ts (portfolio, insight, nodes, tokens/metadata, tokens/custom GET/POST/DELETE, tokens/balances) and the UTXO/Tendermint chain handlers above.
- New chrome-extension/src/background/chains/rpcFailover.ts with withRpcFailoverByNetworkId. Iterates user-override → Pioneer → last-resort with transient/definitive classification and per-attempt transport timeout. Applied to GET_ASSET_BALANCE, GET_EVM_BALANCE, VALIDATE_ERC20_TOKEN.
- Drop check binds to the URL that accepted the broadcast — querying a different RPC could produce false-positive drop warnings, especially after a last-resort fallback succeeded. dropCheckUrlByHash maps hash → success URL; performDropCheck queries that exact URL with a 4s timeout, falling back to getProvider only on service-worker restart.
- Solana broadcastTransaction iterates SOLANA_RPC_URLS on transient failures (rate limit, 5xx, timeout). Definitive errors (insufficient funds, blockhash expired, signature verification, account-in-use) skip the loop. Cached health-checked URL is invalidated on failure so the next caller re-tests.
- Tron tronGridPost (injected bundle, can't import fetchUtils) gets inline timeout + one retry on 5xx. Broadcast paths get a longer 12s budget; reads stay at 8s.
- Health poll (checkKeepKey) gets AbortSignal.timeout(3000) + a singleflight guard so a stalled localhost:1646 can't stack overlapping probes every 5s.

Out of scope: fetchBalances per-chain RPC enrichment (line 500) still uses a single makeStaticProvider per chain since it's already inside a chain-iteration loop; further failover there would be over-engineering. GET_GAS_ESTIMATE (line 909) still uses the active provider directly — could be migrated to withRpcFailover later.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two findings from PR #60's review.

P1: Solana broadcast classifier was over-eager about definitive errors.
- 'already processed' is treated as success — the tx is in mempool / processed somewhere; the dApp still needs the signature. Recover it from the signed bytes (first 64-byte sig in the signed-tx layout IS the canonical Solana transaction signature). Inline base58 encoder added; reuses the existing BASE58_ALPHABET const that base58Decode already declared.
- 'blockhash not found' demoted from definitive to transient — can be RPC freshness for dApp-supplied transactions; trying the next URL is worth it. 'block height exceeded' stays definitive (the blockhash window has truly closed).
- 'account in use' demoted to transient — usually a parallel-write race that can resolve on the next URL.

P2: bitcoinHandler's broadcast catch logged + sent transaction_error but didn't re-throw, so the case 'transfer' fell through to default and returned 'Method transfer not supported' to the dApp instead of the real broadcast error. Other UTXO/Tendermint handlers don't have this catch — only bitcoin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@BitHighlander BitHighlander merged commit 1df4d20 into develop Apr 30, 2026
3 of 4 checks passed
@BitHighlander BitHighlander deleted the fix/audit-followup branch April 30, 2026 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant