-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Background
A load test (rbft-megatx, 100k txs, 4-node local testnet, 0.5s block interval) consistently caps out at ~12,500–13,200 TPS regardless of batch size (1k vs 10k) or spreading ingress across all 4 nodes. Timing instrumentation added to send_raw_transaction identifies the bottleneck.
Call chain for send_raw_transaction
rpc-eth-api/helpers/transaction.rs send_raw_transaction
rpc-eth-types/utils.rs recover_raw_transaction
alloy-consensus T::decode_2718_exact ← RLP decode
alloy-consensus try_into_recovered
ethereum/primitives/transaction.rs recover_signer
alloy-consensus/crypto.rs signature_hash() ← keccak256 of signing payload
alloy-consensus/crypto.rs (secp256k1) SECP256K1.recover_ecdsa ← *** hot path ***
alloy-consensus/crypto.rs keccak256(pubkey)[12..] ← address derivation
rpc/rpc/eth/helpers/transaction.rs add_pool_transaction ← pool insertion (lock)
The critical line is in alloy-consensus-1.7.3/src/crypto.rs:
let public = SECP256K1.recover_ecdsa(&Message::from_digest(*msg), &sig)?;This calls into libsecp256k1 (C library). It accounts for roughly 20–25 µs of the 30 µs decode budget.
Timing data (100k txs, node0, single run)
| Metric | min | p50 | p90 | p99 | max | avg |
|---|---|---|---|---|---|---|
decode_us (RLP + ECDSA recovery) |
23 | 30 | 42 | 52 | 4291 | 32 µs |
pool_us (txpool insertion) |
7 | 13 | 29 | 220 | 30515 | 26 µs |
total_us |
35 | 49 | 76 | 294 | 30562 | 65 µs |
Theoretical ceiling at 65 µs avg per tx: 1 / 65e-6 ≈ 15,400 tx/s per node — matches observed TPS.
pool_us p99 (220–380 µs) spikes coincide with block commits, indicating txpool lock contention as a secondary bottleneck.
What does NOT help
- 10k batch size vs 1k: identical per-call latency (71 µs/call vs 69 µs/call)
- Round-robin across 4 nodes: each node still processes all txs via P2P gossip; sending to node 1 doesn't reduce node 0's ECDSA work
Opportunities
-
Parallel ECDSA recovery within a batch — batches are processed serially inside
MeteredBatchRequestsFuture::poll. Each batch of 1k txs takes ~50ms end-to-end. Spawning ECDSA recovery onto a rayon threadpool would allow all cores to be used. -
Reduce txpool lock contention at block commit time —
pool_usp99 is 4–14× p50. The pool is locked during block execution/commit; consider separating the block-processing lock from the insertion lock. -
Sender caching — if the same signed tx is resubmitted (e.g. retry on pool-full), skip re-recovery by caching
(tx_hash → sender). -
Hardware-accelerated secp256k1 — libsecp256k1 with
--with-asmor using an AVX-accelerated build; or thecrypto-backendpluggable trait in alloy-consensus to swap in a faster implementation.
Instrumentation added (in ~/play/reth)
crates/rpc/rpc-eth-api/src/helpers/transaction.rs— logsdecode_usandtotal_usper txcrates/rpc/rpc/src/eth/helpers/transaction.rs— logspool_usper txcrates/rpc/rpc-builder/src/metrics.rs— logsbatch_usandcallsper HTTP batch
Enable with: RUST_LOG=warn,rpc::eth::timing=info