-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
System information
Erigon version: ./erigon --version
erigon version 2.60.3-1f73ed55
same behavior was in older versions
erigon version 2.59.x, 2.60.x
OS & Version: Windows/Linux/OSX
linux, docker image thorax/erigon:v2.60.3
erigon, prysm, rpcdaemon running in one pod in kubernetes as the only workload on the node (+ some system stuff)
Erigon Command (with flags/config):
erigon --datadir /data/erigon --authrpc.jwtsecret /data/shared/jwt_secret --private.api.addr 127.0.0.1:7458 --port 31782 --nat extip:x.x.x.x --p2p.allowed-ports yyyyy,zzzzz --batchSize 512M --chain mainnet --metrics --metrics.addr 0.0.0.0 --metrics.port 9005 --authrpc.port 8551 --db.size.limit 7TB --db.pagesize 16KB --bodies.cache 6GB --maxpeers 300 --prune=
RPCDaemon flags:
rpcdaemon \
--datadir /data/erigon \
--db.read.concurrency=48 \ # we have tried many different values 1, 2, 16, 24, 256, 1000, 10000
--http.addr 0.0.0.0 --http.api eth,erigon,web3,net,debug,trace,txpool --http.port 8335 --http.vhosts * --http.timeouts.read 65s --rpc.batch.limit 400 --rpc.batch.concurrency 4 --rpc.returndata.limit 100000000 --ws \
--txpool.api.addr localhost:7458 \
--verbosity=debug --pprof --pprof.port=6062 --pprof.addr=0.0.0.0 --metrics --metrics.addr 0.0.0.0 --metrics.port 9007 \
# we have also trie --private.api.addr localhost:7458
we have also tried to omit the db.read.concurrency and set GOMAXPROCS to many different values ... no change
Consensus Layer: prysm
Chain/Network: ethereum mainnet
HW Specs:
GCP n4-standard-16 = 16vCPUs, 64GB Ram
disk: 4TB, (hyperdisk balanced), 36000IOPS, 500MB/s
Actual behaviour
We are running some performance tests, calling eth_getTransactionReceipt with many different txids, it looks like rpcdaemon is doing something synchronously or in one thread only:
# we have list of transactionHashes in transactions.txt altogether 32k tx ids, some from the latest blocks & some from blocks from the last 1/3 of the chain
$ cat transactions.txt | shuf | jq --raw-input -rcM '{"id": 1,"jsonrpc": "2.0","method": "eth_getTransactionReceipt","params": [ . ]}' | jq -rscM 'map({method: "POST", url: "http://127.0.0.1:8335", body: . | @base64 , header: {"Content-Type": ["application/json"]}}) | .[]' | vegeta attack -format json -rate=0 -max-workers 1 -duration 30s -timeout 1s| vegeta report
Requests [total, rate, throughput] 723, 24.05, 24.04
Duration [total, attack, wait] 30.076s, 30.068s, 8.641ms
Latencies [min, mean, 50, 90, 95, 99, max] 586.332µs, 41.594ms, 35.886ms, 83.867ms, 93.202ms, 116.626ms, 148.232ms
Bytes In [total, mean] 1490000, 2060.86
Bytes Out [total, mean] 101943, 141.00
Success [ratio] 100.00%
Status Codes [code:count] 200:723
to emphasize:
$ cat ... | vegeta ... -max-workers 1 ...
Requests [total, rate, throughput] 723, 24.05, 24.04
Latencies [min, mean, 50, 90, 95, 99, max] 586.332µs, 41.594ms, 35.886ms, 83.867ms, 93.202ms, 116.626ms, 148.232ms
= 24RPS
if we increase concurrency of the test the performance goes very slighly up but responses starts slow down significantly:
$ cat ... | vegeta attack ... -max-workers 5 ...
Requests [total, rate, throughput] 1029, 34.27, 34.09
Latencies [min, mean, 50, 90, 95, 99, max] 500.26µs, 146.413ms, 142.153ms, 274.241ms, 312.796ms, 373.658ms, 483.553ms
= 34RPS
with more workers (10, 20, ...) it stays ~same 35
if we try to keep certain rate the API starts to slow down and eventually starts failing:
$ cat ... | vegeta attack ... -rate=200 -timeout 1s
Requests [total, rate, throughput] 3000, 100.03, 6.61
Latencies [min, mean, 50, 90, 95, 99, max] 848.307µs, 990.396ms, 1s, 1.001s, 1.001s, 1.001s, 1.002s
Success [ratio] 6.83%
Status Codes [code:count] 0:2795 200:205
The daemon is running 22 threads:
ps -T -p $RPCDAEMON_PID | wc -l
23
CPU utilization goes from 5% (idle) to ~10-14% during tests
Disk
peak write iops ~15k not changed during the test
peak read iops ~1.5k(idle) -> 3k(during tests)
^ far far away from the limit of 36k (actually have tried to run additional workload (fio) during tests and was able to achieve ~70k peak iops)
By far it is not using all available hardware...
Expected behaviour
with increasing clients concurrency I would expect increasing performance approximately linearly until the HW is saturated - to utilize parallelization.
pprof during the -max-workers 10 tests
