Accelerate Reth RPC: send_raw_transaction throughput ceiling ~15k tx/s

## Background

A load test (`rbft-megatx`, 100k txs, 4-node local testnet, 0.5s block interval) consistently caps out at **~12,500–13,200 TPS** regardless of batch size (1k vs 10k) or spreading ingress across all 4 nodes. Timing instrumentation added to `send_raw_transaction` identifies the bottleneck.

---

## Call chain for `send_raw_transaction`

```
rpc-eth-api/helpers/transaction.rs   send_raw_transaction
  rpc-eth-types/utils.rs               recover_raw_transaction
    alloy-consensus                      T::decode_2718_exact         ← RLP decode
    alloy-consensus                      try_into_recovered
      ethereum/primitives/transaction.rs   recover_signer
        alloy-consensus/crypto.rs            signature_hash()          ← keccak256 of signing payload
        alloy-consensus/crypto.rs (secp256k1) SECP256K1.recover_ecdsa ← *** hot path ***
        alloy-consensus/crypto.rs            keccak256(pubkey)[12..]   ← address derivation
  rpc/rpc/eth/helpers/transaction.rs   add_pool_transaction           ← pool insertion (lock)
```

The critical line is in `alloy-consensus-1.7.3/src/crypto.rs`:
```rust
let public = SECP256K1.recover_ecdsa(&Message::from_digest(*msg), &sig)?;
```
This calls into **libsecp256k1** (C library). It accounts for roughly 20–25 µs of the 30 µs decode budget.

---

## Timing data (100k txs, node0, single run)

| Metric | min | p50 | p90 | p99 | max | avg |
|---|---|---|---|---|---|---|
| `decode_us` (RLP + ECDSA recovery) | 23 | 30 | 42 | 52 | 4291 | **32 µs** |
| `pool_us` (txpool insertion) | 7 | 13 | 29 | 220 | 30515 | **26 µs** |
| `total_us` | 35 | 49 | 76 | 294 | 30562 | **65 µs** |

Theoretical ceiling at 65 µs avg per tx: **1 / 65e-6 ≈ 15,400 tx/s** per node — matches observed TPS.

`pool_us` p99 (220–380 µs) spikes coincide with block commits, indicating **txpool lock contention** as a secondary bottleneck.

## What does NOT help

- **10k batch size** vs 1k: identical per-call latency (71 µs/call vs 69 µs/call)
- **Round-robin across 4 nodes**: each node still processes all txs via P2P gossip; sending to node 1 doesn't reduce node 0's ECDSA work

---

## Opportunities

1. **Parallel ECDSA recovery within a batch** — batches are processed serially inside `MeteredBatchRequestsFuture::poll`. Each batch of 1k txs takes ~50ms end-to-end. Spawning ECDSA recovery onto a rayon threadpool would allow all cores to be used.

2. **Reduce txpool lock contention at block commit time** — `pool_us` p99 is 4–14× p50. The pool is locked during block execution/commit; consider separating the block-processing lock from the insertion lock.

3. **Sender caching** — if the same signed tx is resubmitted (e.g. retry on pool-full), skip re-recovery by caching `(tx_hash → sender)`.

4. **Hardware-accelerated secp256k1** — libsecp256k1 with `--with-asm` or using an AVX-accelerated build; or the `crypto-backend` pluggable trait in alloy-consensus to swap in a faster implementation.

---

## Instrumentation added (in `~/play/reth`)

- `crates/rpc/rpc-eth-api/src/helpers/transaction.rs` — logs `decode_us` and `total_us` per tx
- `crates/rpc/rpc/src/eth/helpers/transaction.rs` — logs `pool_us` per tx
- `crates/rpc/rpc-builder/src/metrics.rs` — logs `batch_us` and `calls` per HTTP batch

Enable with: `RUST_LOG=warn,rpc::eth::timing=info`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate Reth RPC: send_raw_transaction throughput ceiling ~15k tx/s #8

Background

Call chain for `send_raw_transaction`

Timing data (100k txs, node0, single run)

What does NOT help

Opportunities

Instrumentation added (in `~/play/reth`)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	min	p50	p90	p99	max	avg
`decode_us` (RLP + ECDSA recovery)	23	30	42	52	4291	32 µs
`pool_us` (txpool insertion)	7	13	29	220	30515	26 µs
`total_us`	35	49	76	294	30562	65 µs

Accelerate Reth RPC: send_raw_transaction throughput ceiling ~15k tx/s #8

Description

Background

Call chain for send_raw_transaction

Timing data (100k txs, node0, single run)

What does NOT help

Opportunities

Instrumentation added (in ~/play/reth)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Call chain for `send_raw_transaction`

Instrumentation added (in `~/play/reth`)