Skip to content

Accelerate Reth RPC: send_raw_transaction throughput ceiling ~15k tx/s #8

@andy-thomason

Description

@andy-thomason

Background

A load test (rbft-megatx, 100k txs, 4-node local testnet, 0.5s block interval) consistently caps out at ~12,500–13,200 TPS regardless of batch size (1k vs 10k) or spreading ingress across all 4 nodes. Timing instrumentation added to send_raw_transaction identifies the bottleneck.


Call chain for send_raw_transaction

rpc-eth-api/helpers/transaction.rs   send_raw_transaction
  rpc-eth-types/utils.rs               recover_raw_transaction
    alloy-consensus                      T::decode_2718_exact         ← RLP decode
    alloy-consensus                      try_into_recovered
      ethereum/primitives/transaction.rs   recover_signer
        alloy-consensus/crypto.rs            signature_hash()          ← keccak256 of signing payload
        alloy-consensus/crypto.rs (secp256k1) SECP256K1.recover_ecdsa ← *** hot path ***
        alloy-consensus/crypto.rs            keccak256(pubkey)[12..]   ← address derivation
  rpc/rpc/eth/helpers/transaction.rs   add_pool_transaction           ← pool insertion (lock)

The critical line is in alloy-consensus-1.7.3/src/crypto.rs:

let public = SECP256K1.recover_ecdsa(&Message::from_digest(*msg), &sig)?;

This calls into libsecp256k1 (C library). It accounts for roughly 20–25 µs of the 30 µs decode budget.


Timing data (100k txs, node0, single run)

Metric min p50 p90 p99 max avg
decode_us (RLP + ECDSA recovery) 23 30 42 52 4291 32 µs
pool_us (txpool insertion) 7 13 29 220 30515 26 µs
total_us 35 49 76 294 30562 65 µs

Theoretical ceiling at 65 µs avg per tx: 1 / 65e-6 ≈ 15,400 tx/s per node — matches observed TPS.

pool_us p99 (220–380 µs) spikes coincide with block commits, indicating txpool lock contention as a secondary bottleneck.

What does NOT help

  • 10k batch size vs 1k: identical per-call latency (71 µs/call vs 69 µs/call)
  • Round-robin across 4 nodes: each node still processes all txs via P2P gossip; sending to node 1 doesn't reduce node 0's ECDSA work

Opportunities

  1. Parallel ECDSA recovery within a batch — batches are processed serially inside MeteredBatchRequestsFuture::poll. Each batch of 1k txs takes ~50ms end-to-end. Spawning ECDSA recovery onto a rayon threadpool would allow all cores to be used.

  2. Reduce txpool lock contention at block commit timepool_us p99 is 4–14× p50. The pool is locked during block execution/commit; consider separating the block-processing lock from the insertion lock.

  3. Sender caching — if the same signed tx is resubmitted (e.g. retry on pool-full), skip re-recovery by caching (tx_hash → sender).

  4. Hardware-accelerated secp256k1 — libsecp256k1 with --with-asm or using an AVX-accelerated build; or the crypto-backend pluggable trait in alloy-consensus to swap in a faster implementation.


Instrumentation added (in ~/play/reth)

  • crates/rpc/rpc-eth-api/src/helpers/transaction.rs — logs decode_us and total_us per tx
  • crates/rpc/rpc/src/eth/helpers/transaction.rs — logs pool_us per tx
  • crates/rpc/rpc-builder/src/metrics.rs — logs batch_us and calls per HTTP batch

Enable with: RUST_LOG=warn,rpc::eth::timing=info

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions