Skip to content

Tune RPS of eth_getTransactionReceipt #11090

@tholcman

Description

@tholcman

System information

Erigon version: ./erigon --version
erigon version 2.60.3-1f73ed55

same behavior was in older versions
erigon version 2.59.x, 2.60.x

OS & Version: Windows/Linux/OSX
linux, docker image thorax/erigon:v2.60.3
erigon, prysm, rpcdaemon running in one pod in kubernetes as the only workload on the node (+ some system stuff)

Erigon Command (with flags/config):

erigon --datadir /data/erigon --authrpc.jwtsecret /data/shared/jwt_secret --private.api.addr 127.0.0.1:7458 --port 31782 --nat extip:x.x.x.x --p2p.allowed-ports yyyyy,zzzzz --batchSize 512M --chain mainnet --metrics --metrics.addr 0.0.0.0 --metrics.port 9005 --authrpc.port 8551 --db.size.limit 7TB --db.pagesize 16KB --bodies.cache 6GB --maxpeers 300 --prune=

RPCDaemon flags:

rpcdaemon  \
  --datadir /data/erigon \
  --db.read.concurrency=48 \ # we have tried many different values 1, 2, 16, 24, 256, 1000, 10000
  --http.addr 0.0.0.0 --http.api eth,erigon,web3,net,debug,trace,txpool --http.port 8335 --http.vhosts * --http.timeouts.read 65s --rpc.batch.limit 400 --rpc.batch.concurrency 4 --rpc.returndata.limit 100000000 --ws \
  --txpool.api.addr localhost:7458 \
  --verbosity=debug --pprof --pprof.port=6062 --pprof.addr=0.0.0.0 --metrics --metrics.addr 0.0.0.0 --metrics.port 9007 \
# we have also trie --private.api.addr localhost:7458

we have also tried to omit the db.read.concurrency and set GOMAXPROCS to many different values ... no change

Consensus Layer: prysm

Chain/Network: ethereum mainnet

HW Specs:
GCP n4-standard-16 = 16vCPUs, 64GB Ram
disk: 4TB, (hyperdisk balanced), 36000IOPS, 500MB/s

Actual behaviour

We are running some performance tests, calling eth_getTransactionReceipt with many different txids, it looks like rpcdaemon is doing something synchronously or in one thread only:

# we have list of transactionHashes in transactions.txt altogether 32k tx ids, some from the latest blocks & some from blocks from the last 1/3 of the chain

$ cat transactions.txt | shuf | jq --raw-input -rcM '{"id": 1,"jsonrpc": "2.0","method": "eth_getTransactionReceipt","params": [ . ]}' | jq -rscM 'map({method: "POST", url: "http://127.0.0.1:8335", body: . | @base64 , header: {"Content-Type": ["application/json"]}}) | .[]' | vegeta attack -format json -rate=0 -max-workers 1 -duration 30s -timeout 1s| vegeta report
Requests      [total, rate, throughput]         723, 24.05, 24.04
Duration      [total, attack, wait]             30.076s, 30.068s, 8.641ms
Latencies     [min, mean, 50, 90, 95, 99, max]  586.332µs, 41.594ms, 35.886ms, 83.867ms, 93.202ms, 116.626ms, 148.232ms
Bytes In      [total, mean]                     1490000, 2060.86
Bytes Out     [total, mean]                     101943, 141.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:723

to emphasize:

$ cat ... | vegeta ... -max-workers 1 ...
Requests      [total, rate, throughput]         723, 24.05, 24.04
Latencies     [min, mean, 50, 90, 95, 99, max]  586.332µs, 41.594ms, 35.886ms, 83.867ms, 93.202ms, 116.626ms, 148.232ms

= 24RPS

if we increase concurrency of the test the performance goes very slighly up but responses starts slow down significantly:

$ cat ... | vegeta attack ... -max-workers 5 ... 
Requests      [total, rate, throughput]         1029, 34.27, 34.09
Latencies     [min, mean, 50, 90, 95, 99, max]  500.26µs, 146.413ms, 142.153ms, 274.241ms, 312.796ms, 373.658ms, 483.553ms

= 34RPS

with more workers (10, 20, ...) it stays ~same 35

if we try to keep certain rate the API starts to slow down and eventually starts failing:

$ cat ... | vegeta attack ... -rate=200 -timeout 1s
Requests      [total, rate, throughput]         3000, 100.03, 6.61
Latencies     [min, mean, 50, 90, 95, 99, max]  848.307µs, 990.396ms, 1s, 1.001s, 1.001s, 1.001s, 1.002s
Success       [ratio]                           6.83%
Status Codes  [code:count]                      0:2795  200:205

The daemon is running 22 threads:

ps -T -p $RPCDAEMON_PID | wc -l
23

CPU utilization goes from 5% (idle) to ~10-14% during tests
Disk
peak write iops ~15k not changed during the test
peak read iops ~1.5k(idle) -> 3k(during tests)
^ far far away from the limit of 36k (actually have tried to run additional workload (fio) during tests and was able to achieve ~70k peak iops)

By far it is not using all available hardware...

Expected behaviour

with increasing clients concurrency I would expect increasing performance approximately linearly until the HW is saturated - to utilize parallelization.

pprof during the -max-workers 10 tests

cpu

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions