protect execution TIP under RPC load by lupin012 · Pull Request #19905 · erigontech/erigon

lupin012 · 2026-03-15T19:50:44Z

This PR introduces an HTTP admission control layer to protect the Staged Sync pipeline from being starved or delayed by high RPC load.

This PR introduces a two-level admission control system to protect the Staged Sync pipeline from being starved or delayed by high RPC load.
Root Cause Analysis:
Under heavy RPC traffic, the node accumulates a large number of goroutines blocked on roTxsLimiter.Acquire. When DB slots become available, the backlog drains in a way that starves the staged sync pipeline. The goroutine pile-up also causes a significant spike in virtual memory and overall system instability.

Solution:
Two gates work in tandem:

HTTP admission handler (rpcAdmissionHandler) — outer gate installed at the top of every HTTP RPC stack, before CORS, Gzip, or JSON decoding. If the number of inflight requests exceeds the configured limit, the request is rejected immediately with HTTP 503. This prevents goroutine accumulation at the source. On every admitted request the handler tags the context with
WithRPCContext (limit value) so the DB layer can identify the caller.
BeginRo inner gate — if the context carries a positive RPC limit, BeginRo uses TryAcquire on roTxsLimiter and returns ErrServerOverloaded immediately if the semaphore is full. Internal callers (staged sync, background workers) always use blocking Acquire and are never rejected.

This two-level approach means most overload is shed at the HTTP layer (goroutines never enter the system), while any RPC requests that slip through under transient concurrency spikes are still fail-fast at the DB layer rather than piling up behind the semaphore.

Configuration:

--rpc.max.concurrency: HTTP admission limit.
- 0 (default): uses --db.read.concurrency (auto-tuned to GOMAXPROCS × 64, capped at 9000)
- 0: explicit limit
- -1: unlimited (admission control disabled, BeginRo falls back to blocking Acquire) (as old behaviour)

Resource	Result

Summary of Resource Management Improvements

Resource	Result
Goroutine pile-up	✅ Requests rejected at HTTP layer before CORS, Gzip, or JSON decoding
Staged sync starvation	✅ Internal callers (staged sync, workers) use blocking `Acquire` and are never rejected; RPC uses `TryAcquire` fail-fast
Transient overload spikes	✅ `BeginRo` inner gate catches RPC requests that pass the HTTP layer during concurrency spikes
Scalability	✅ Default limit auto-tuned to `GOMAXPROCS × 64` (capped at 9000) via `--db.read.concurrency`
Configuration	✅ Zero required config, one optional flag (`--rpc.max.concurrency`)

Benchmark & Stress Test Results
Setup: 32 Cores, 64GB RAM, 70GB Swap. Minimal Node in Sync. Parallel eth_call stress tests (28k QPS).

Click to expand: Benchmark Data (Before vs After on local node)

Current SW (main release)

CPU
03:23:56 PM all 29.55 0.00 22.30 34.33 0.00 13.83
03:24:06 PM all 56.41 0.00 15.44 10.83 0.00 17.32
03:24:16 PM all 75.60 0.00 13.36 2.86 0.00 8.18
03:24:26 PM all 73.19 0.00 14.35 2.82 0.00 9.63
03:24:36 PM all 73.35 0.00 14.56 2.75 0.00 9.34

Memory
15:23:30 rss=31.89GB vsz=7.65TB proc_swap=11.81GB sys_swap=27.21/72.00GB MemAvail=1.15GB SwapAvail=44.79GB
15:23:40 rss=32.74GB vsz=7.65TB proc_swap=11.00GB sys_swap=27.02/72.00GB MemAvail=1.50GB SwapAvail=44.98GB
15:23:50 rss=33.83GB vsz=7.65TB proc_swap=9.89GB sys_swap=25.65/72.00GB MemAvail=1.44GB SwapAvail=46.35GB
15:24:00 rss=36.33GB vsz=7.65TB proc_swap=7.60GB sys_swap=23.55/72.00GB MemAvail=1.67GB SwapAvail=48.45GB
15:24:10 rss=37.85GB vsz=7.65TB proc_swap=6.91GB sys_swap=21.83/72.00GB MemAvail=5.10GB SwapAvail=50.17GB
15:24:20 rss=39.30GB vsz=7.65TB proc_swap=6.69GB sys_swap=20.23/72.00GB MemAvail=7.28GB SwapAvail=51.77GB
15:24:30 rss=40.40GB vsz=7.65TB proc_swap=6.20GB sys_swap=17.94/72.00GB MemAvail=10.20GB SwapAvail=54.06GB
15:24:40 rss=41.44GB vsz=7.65TB proc_swap=5.23GB sys_swap=14.95/72.00GB MemAvail=20.01GB SwapAvail=57.05GB
15:24:50 rss=41.68GB vsz=7.65TB proc_swap=5.20GB sys_swap=14.92/72.00GB MemAvail=16.14GB SwapAvail=57.08GB
15:25:00 rss=42.77GB vsz=7.65TB proc_swap=4.95GB sys_swap=14.87/72.00GB MemAvail=11.41GB SwapAvail=57.13GB
15:25:11 rss=42.78GB vsz=7.65TB proc_swap=5.26GB sys_swap=15.55/72.00GB MemAvail=8.58GB SwapAvail=56.45GB
15:25:21 rss=40.79GB vsz=7.65TB proc_swap=6.88GB sys_swap=17.46/72.00GB MemAvail=5.65GB SwapAvail=54.54GB

TIP Trucking
[15:21:44] block #24,656,279 ts=2026-03-14 15:19:47 lag=+117.8s ALERT: lag=117.8s — node is behind the tip!
[15:21:44] block #24,656,280 ts=2026-03-14 15:19:59 lag=+105.8s ALERT: lag=105.8s — node is behind the tip!
[15:21:44] block #24,656,281 ts=2026-03-14 15:20:11 lag=+93.8s ALERT: lag=93.8s — node is behind the tip!
[15:21:44] block #24,656,282 ts=2026-03-14 15:20:23 lag=+81.8s ALERT: lag=81.8s — node is behind the tip!
[15:21:44] block #24,656,283 ts=2026-03-14 15:20:47 lag=+57.8s ALERT: lag=57.8s — node is behind the tip!
[15:21:57] block #24,656,284 ts=2026-03-14 15:20:59 lag=+58.0s ALERT: lag=58.0s — node is behind the tip!
[15:21:57] block #24,656,285 ts=2026-03-14 15:21:11 lag=+46.0s ALERT: lag=46.0s — node is behind the tip!
[15:21:57] block #24,656,286 ts=2026-03-14 15:21:23 lag=+34.0s ALERT: lag=34.0s — node is behind the tip!
[15:21:57] block #24,656,287 ts=2026-03-14 15:21:35 lag=+22.0s ALERT: lag=22.0s — node is behind the tip!
[15:21:57] block #24,656,288 ts=2026-03-14 15:21:47 lag=+10.0s OK
[15:22:07] block #24,656,289 ts=2026-03-14 15:21:59 lag=+8.0s OK
[15:22:19] block #24,656,290 ts=2026-03-14 15:22:11 lag=+8.3s OK
[15:22:32] block #24,656,291 ts=2026-03-14 15:22:23 lag=+9.3s OK
[15:23:02] ALERT: no new block for 30s (last block #24656291) — node may be losing the tip!
[15:23:32] ALERT: no new block for 60s (last block #24656291) — node may be losing the tip!
[15:24:02] ALERT: no new block for 90s (last block #24656291) — node may be losing the tip!
[15:24:24] block #24,656,292 ts=2026-03-14 15:22:35 lag=+109.5s ALERT: lag=109.5s — node is behind the tip!
[15:24:24] block #24,656,293 ts=2026-03-14 15:22:47 lag=+97.5s ALERT: lag=97.5s — node is behind the tip!
[15:24:24] block #24,656,294 ts=2026-03-14 15:22:59 lag=+85.5s ALERT: lag=85.5s — node is behind the tip!
[15:24:24] block #24,656,295 ts=2026-03-14 15:23:11 lag=+73.5s ALERT: lag=73.5s — node is behind the tip!
[15:24:54] ALERT: no new block for 30s (last block #24656295) — node may be losing the tip!
[15:25:17] block #24,656,296 ts=2026-03-14 15:23:23 lag=+114.2s ALERT: lag=114.2s — node is behind the tip!
[15:25:17] block #24,656,297 ts=2026-03-14 15:23:35 lag=+102.2s ALERT: lag=102.2s — node is behind the tip!
[15:25:17] block #24,656,298 ts=2026-03-14 15:23:47 lag=+90.2s ALERT: lag=90.2s — node is behind the tip!
[15:25:17] block #24,656,299 ts=2026-03-14 15:23:59 lag=+78.2s ALERT: lag=78.2s — node is behind the tip!
[15:25:17] block #24,656,300 ts=2026-03-14 15:24:11 lag=+66.2s ALERT: lag=66.2s — node is behind the tip!
[15:25:17] block #24,656,301 ts=2026-03-14 15:24:23 lag=+54.2s ALERT: lag=54.2s — node is behind the tip!
[15:25:17] block #24,656,302 ts=2026-03-14 15:24:35 lag=+42.2s ALERT: lag=42.2s — node is behind the tip!
[15:25:17] block #24,656,303 ts=2026-03-14 15:24:47 lag=+30.2s ALERT: lag=30.2s — node is behind the tip!

./run_perf_tests.py -p pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern: pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port: http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m39s]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m46s]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m38s]

./run_perf_tests.py -p pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern: pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port: http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m39s]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m45s]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m40s]

NEW Software (with PR)

CPU
7:58:51 AM 07:58:56 AM 07:59:01 AM 07:59:06 AM 07:59:11 AM 07:59:16 AM 07:59:21 AM 07:59:26 AM 07:59:31 AM 07:59:36 AM 07:59:41 AM 07:59:46 AM 07:59:51 AM 07:59:56 AM 08:00:01 AM 08:00:06 AM 08:00:11 AM 08:00:16 AM 08:00:21 AM 08:00:26 AM 08:00:31 AM 08:00:36 AM 08:00:41 AM 08:00:46 AM 08:00:51 AM 08:00:56 AM all 51.09 0.00 6.16 0.35 0.00 42.40
all 49.26 0.00 5.82 0.03 0.00 44.89
all 50.34 0.00 5.95 0.20 0.00 43.51
all 51.60 0.00 5.88 0.04 0.00 42.47
all 48.97 0.00 5.90 0.06 0.00 45.07
all 49.59 0.00 6.11 0.36 0.00 43.93
all 48.69 0.00 5.78 0.03 0.00 45.51
all 53.50 0.00 6.66 0.26 0.00 39.59
all 50.45 0.00 6.37 0.02 0.00 43.16
all 48.71 0.00 6.18 0.03 0.00 45.08
all 53.58 0.00 6.45 0.15 0.00 39.81
all 53.74 0.00 6.13 0.05 0.00 40.07
all 31.76 0.00 3.95 0.23 0.00 64.06
all 37.20 0.00 5.05 0.03 0.00 57.71
all 77.10 0.00 12.95 0.01 0.00 9.94
all 78.22 0.00 12.58 0.08 0.00 9.11
all 77.64 0.00 12.50 0.00 0.00 9.86
all 77.48 0.00 12.61 0.08 0.00 9.83
all 77.61 0.00 12.47 0.01 0.00 9.90
all 77.35 0.00 12.89 0.06 0.00 9.70
all 77.85 0.00 12.92 0.04 0.00 9.19
all 77.73 0.00 12.80 0.02 0.00 9.44
all 78.42 0.00 12.95 0.05 0.00 8.59
all 78.52 0.00 12.55 0.01 0.00 8.93
all 78.42 0.00 12.77 0.19 0.00 8.62
all 56.98 0.00 8.64 0.11 0.00 34.28

Memory
2026-03-20 08:00:36 pid=1117840 rss=30.04GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.93GB SwapAvail=71.02GB
2026-03-20 08:00:41 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.86GB SwapAvail=71.02GB
2026-03-20 08:00:41 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.86GB SwapAvail=71.02GB
2026-03-20 08:00:46 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.90GB SwapAvail=71.02GB
2026-03-20 08:00:46 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.90GB SwapAvail=71.02GB
2026-03-20 08:00:51 pid=1117840 rss=30.28GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.88GB SwapAvail=71.02GB
2026-03-20 08:00:51 pid=1117840 rss=30.28GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.88GB SwapAvail=71.02GB
2026-03-20 08:00:56 pid=1117840 rss=30.54GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=40.39GB SwapAvail=71.02GB
2026-03-20 08:00:56 pid=1117840 rss=30.54GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=40.39GB SwapAvail=71.02GB
2026-03-20 08:01:02 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=40.25GB SwapAvail=71.02GB
2026-03-20 08:01:02 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=40.25GB SwapAvail=71.02GB
2026-03-20 08:01:07 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.97GB SwapAvail=71.02GB
2026-03-20 08:01:07 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.97GB SwapAvail=71.02GB
2026-03-20 08:01:12 pid=1117840 rss=30.62GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.48GB SwapAvail=71.02GB
2026-03-20 08:01:12 pid=1117840 rss=30.62GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.48GB SwapAvail=71.02GB
2026-03-20 08:01:17 pid=1117840 rss=30.71GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.57GB SwapAvail=71.02GB
2026-03-20 08:01:17 pid=1117840 rss=30.71GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.57GB SwapAvail=71.02GB

TIP Trucking
07:56:10] block #24,697,055 ts=2026-03-20 07:55:59 lag=+12.0s OK
[07:56:15] block #24,697,056 ts=2026-03-20 07:56:11 lag=+4.5s OK
[07:56:25] block #24,697,057 ts=2026-03-20 07:56:23 lag=+2.5s OK
[07:56:38] block #24,697,058 ts=2026-03-20 07:56:35 lag=+3.4s OK
[07:56:50] block #24,697,059 ts=2026-03-20 07:56:47 lag=+3.5s OK
[07:57:02] block #24,697,060 ts=2026-03-20 07:56:59 lag=+3.6s OK
[07:57:16] block #24,697,061 ts=2026-03-20 07:57:11 lag=+5.6s OK
[07:57:27] block #24,697,062 ts=2026-03-20 07:57:23 lag=+4.7s OK
[07:57:39] block #24,697,063 ts=2026-03-20 07:57:35 lag=+4.3s OK
[07:57:49] block #24,697,064 ts=2026-03-20 07:57:47 lag=+2.4s OK
[07:58:01] block #24,697,065 ts=2026-03-20 07:57:59 lag=+2.9s OK
[07:58:13] block #24,697,066 ts=2026-03-20 07:58:11 lag=+2.8s OK
[07:58:25] block #24,697,067 ts=2026-03-20 07:58:23 lag=+2.4s OK
[07:58:37] block #24,697,068 ts=2026-03-20 07:58:35 lag=+2.7s OK
[07:58:49] block #24,697,069 ts=2026-03-20 07:58:47 lag=+2.3s OK
[07:59:01] block #24,697,070 ts=2026-03-20 07:58:59 lag=+2.1s OK
[07:59:15] block #24,697,071 ts=2026-03-20 07:59:11 lag=+4.3s OK
[07:59:25] block #24,697,072 ts=2026-03-20 07:59:23 lag=+2.6s OK
[07:59:40] block #24,697,073 ts=2026-03-20 07:59:35 lag=+5.3s OK
[08:00:02] block #24,697,074 ts=2026-03-20 07:59:59 lag=+3.9s OK
[08:00:13] block #24,697,075 ts=2026-03-20 08:00:11 lag=+2.8s OK

./run_perf_tests.py -p pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern: pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port: http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=51.39% max=605.449ms error=503 Service Unavailable]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=51.55% max=442.974ms error=503 Service Unavailable]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=49.52% max=440.405ms error=503 Service Unavailable]
[1. 4] daemon: executes test qps: 28000 time: 60 -> [R=51.01% max=440.004ms error=503 Service Unavailable]
[1. 5] daemon: executes test qps: 28000 time: 60 -> [R=49.66% max=597.333ms error=503 Service Unavailable]

./run_perf_tests.py -p pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern: pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port: http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=51.51% max=581.793ms error=503 Service Unavailable]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=51.61% max=431.222ms error=503 Service Unavailable]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=49.48% max=495.57ms error=503 Service Unavailable]
[1. 4] daemon: executes test qps: 28000 time: 60 -> [R=50.91% max=433.208ms error=503 Service Unavailable]
[1. 5] daemon: executes test qps: 28000 time: 60 -> [R=49.57% max=538.283ms error=503 Service Unavailable]

Verified on CI TIPtrucking infrastructure. Previous software versions experienced "TIP lost" at 3,000 QPS. With these changes, the system now successfully handles up to 6,000 QPS without any TIP loss or degradation.

Stress Test Observations (main release)

Chain Tip Loss: Under heavy load, the node fails to stay synced and the Chain Tip is lost, as the staged sync pipeline is starved of DB read slots by queued RPC goroutines.
Virtual Memory Pressure: The system experiences severe VM pressure, with process swap usage reaching 11.81 GB. The massive accumulation of goroutines blocked on roTxsLimiter.Acquire causes excessive paging and swapping. This state is highly unstable and frequently leads to the process being terminated by the OOM Killer, causing total node downtime.
Request Satisfaction (100%): Despite the performance degradation, all requests are eventually satisfied. However, this is achieved at the cost of system stability and synchronization.
Increased Latency: Request latency increases dramatically due to deep queuing, with response times reaching up to 1m 40s.

Stress Test Observations (with PR)

Chain Tip Stability: The two-level admission control prevents goroutine accumulation entirely. The HTTP outer gate rejects excess requests before any processing; the BeginRo inner gate ensures that any RPC request that does enter the system uses TryAcquire (fail-fast) rather than blocking. Internal callers (staged sync, background workers) always use blocking Acquire
and are never rejected, so the pipeline makes continuous progress.
Virtual Memory Pressure: Significantly lower memory footprint. By eliminating request queuing at the HTTP layer, the system avoids excessive paging and swapping (0.00 GB swap), keeping the OS stable.
Request Satisfaction (~50%): Approximately 50% of requests are satisfied; the remainder are immediately rejected with 503 Service Unavailable. This is the intended fail-fast behavior — goroutines never accumulate, DB slots are never exhausted.
Latency Consistency: Response latency remains consistently low. By refusing to queue requests beyond the system's capacity, the node avoids the massive latency spikes (previously up to 1m 40s) seen before the fix.

This behavior is aligned with Nethermind, which returns 503 Service Unavailable under high load, prioritizing node health over request queuing.

Final Observation

By adopting a fail-fast strategy at two levels — HTTP admission before any expensive processing, and TryAcquire inside BeginRo for RPC callers — we enforce resource isolation at the core level. Internal execution paths retain guaranteed access to DB read slots via blocking Acquire, while external RPC pressure is shed immediately. This approach shifts congestion management
responsibility to the external infrastructure (load balancers, proxies), which is better equipped to handle buffering, ensuring that the Erigon node remains stable and synchronized regardless of external RPC load.

🚀 RPC Concurrency & Resource Management Comparison

Feature	Erigon (main)	Erigon (with PR)
Admission control	❌ None	✅ HTTP outer gate (`rpcAdmissionHandler`)
Overload response	Unlimited queuing	✅ Immediate HTTP 503
Rejection point	❌ None	✅ Pre-CORS, Gzip, JSON decode
Goroutine accumulation	⚠️ Yes, unlimited	✅ Eliminated — goroutines don't enter the system
Internal pipeline protection	❌ RPC and staged sync compete for slots	✅ Internal callers use blocking `Acquire`
DB slots protection	❌ None — RPC exhausts slots	✅ `TryAcquire` in `BeginRo` for RPC
Memory under load	❌ Critical — swap up to 11.81 GB, OOM	✅ Stable (0.00 GB swap in test)
Latency under overload	High (~1m 40s)	✅ Constantly low (fail-fast)
Configuration required	❌ No concurrency flags	✅ Zero config; `--rpc.max.concurrency` optional
Execution isolation	❌ Chain tip lost under load	✅ Guaranteed by design

📊 Performance Comparison: Main (18/03) vs. PR

This benchmark compares the current main branch against this PR using the same set of APIs under heavy load.

API	main (18/03) post_exec p50	PR post_exec p50	Improvement
eth_call @ 3000 QPS	6.82s ✅	5.89s ✅	−14%
eth_getBlockByNumber @ 3000 QPS	13.73s ⚠️	5.23s ✅	−62%
eth_getProof @ 1000–3000 QPS	49.12s (tip lost)	2.84s ✅	−94%

🔍 Key Observations

eth_call: Neither main nor the PR caused a chain tip loss. Since eth_call is read-only and light on DB slots, it is inherently more stable, but the PR still delivers a 14% reduction in p50 latency.
eth_getBlockByNumber: Remains stable up to 6000 QPS with no actual tip loss. Any observed sync=0 periods during testing were identified as monitoring false negatives rather than actual node desync.
eth_getProof: This is the most impactful result. While main lost the chain tip at only 1000 QPS (p50=49s), the PR successfully holds up to 3000 QPS with a p50 of 2.84s—a 94% performance gain.

🏆 Overall Conclusion

The final PR successfully eliminates chain tip loss across all tested APIs and QPS levels. No real tip loss was observed in any production-level test run, ensuring much higher node reliability under stress.

Add a two-layer mechanism to guarantee staged sync keeps DB access even under heavy RPC load: 1. executionLimiter — dedicated semaphore (GOMAXPROCS×2) for staged sync. BeginRo with TxPriorityExecution uses blocking Acquire on this limiter, never shared with RPC callers. 2. rpcAdmissionHandler — outermost HTTP middleware that tags every request context with TxPriorityRPC. BeginRo for RPC uses TryAcquire (fail-fast) on the shared roTxsLimiter; when saturated it returns ErrServerOverloaded (HTTP 503 + Retry-After: 1) instead of queuing. Optional flag --rpc.max.concurrent (default 0 = unlimited) enables an additional HTTP-layer gate that rejects excess requests before CORS/Gzip/ JSON decode, further reducing memory pressure under extreme load. roTxsLimiter default changed from hardcoded 32 to max(10, GOMAXPROCS×4) so it scales automatically with hardware (128 slots on 32-core machines). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The client-side roTxsLimiter in remotedb controls concurrent gRPC streams, not MDBX DB slots. Applying TryAcquire here caused integration tests to return "server overloaded" whenever more than GOMAXPROCS-1 concurrent RPC requests were in flight. The fail-fast TryAcquire is only appropriate at the MDBX layer on the server side, which already happens via TxPriorityRPC in remotedbserver. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

NewHTTPHandlerStack gains a tagAsRPC bool parameter. The JSON-RPC port passes true (TxPriorityRPC + fail-fast TryAcquire, as before). The engine API (auth) port passes false so CL↔EL protocol calls (ForkchoiceUpdated, NewPayload, GetForkChoice …) reach ExecModule.BeginRo with the default priority and use blocking Acquire instead of fail-fast TryAcquire. This fixes spurious "server overloaded" errors in execution tests that use ExecutionClientDirect, where context values propagate in-process (unlike real gRPC where context values do not cross the wire). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

RPC callers connecting via gRPC (remotedb) now get fail-fast behaviour: when the client-side roTxsLimiter is saturated, BeginRo returns ErrServerOverloaded (HTTP 503 + Retry-After: 1) instead of blocking. Non-RPC callers (execution, default) still use blocking Acquire. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Start the 10-second RPC stats background goroutine only for the chaindata DB instance, not for txpool and other MDBX instances. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…eject - Remove 6 transient debug counters (DebugRPCCancelledReqs, DebugHTTP503, DebugRPCServerEntry, DebugHTTPAfterNext, DebugGzipEntry, DebugVHostEntry) and the statusCapture wrapper that existed only to capture 503s - Keep DebugHTTPTotal and add DebugHTTPRejected (with % in periodic log) - Restore HTTP 503 response on admission reject (was temporarily HTTP 200) - Remove Retry-After header from admission response Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rectly The HTTP admission limit now always equals DBReadConcurrency, removing the need for a separate --rpc.max.concurrent flag. This simplifies configuration and ensures the HTTP gate always matches the DB semaphore size, preventing goroutine pile-up under load. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove TxPriority (Default/RPC/Execution) and all WithTxPriority/TxPriorityFrom usage - Remove executionLimiter; single roTxsLimiter with blocking Acquire for all callers - Remove dead counters: rpcInflight, rpcPeak, rpcTotal, rpc503, execInflight, execPeak, execTotal - Remove debugSysInfo (vmRSS/cpuIdle) from periodic log; keep only http_total/http_rejected/% - Remove engineAdmissionHandler and statusCapture from rpcstack - HTTP admission handler returns 503 on overload (reverted from 200) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove the periodic debug logging goroutine (stopDebugCh, http_total/http_rejected stats), DebugHTTPTotal/DebugHTTPRejected counters, and now-unused kv import in rpcstack. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…trol Adds --rpc.max.concurrentRequests flag to control the HTTP admission handler limit: 0 (default) = use db.read.concurrency >0 = explicit limit -1 = unlimited (admission control disabled) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add two-level protection to prevent RPC load from starving internal callers (staged sync, background workers) of DB read slots: 1. HTTP admission handler (rpcAdmissionHandler) acts as outer gate: rejects with HTTP 503 if inflight requests exceed --rpc.max.concurrency. Propagates the concurrency limit into the request context via WithRPCContext so BeginRo knows the caller is an RPC handler. 2. BeginRo inner gate: if the context carries a positive limit (RPC caller), uses TryAcquire on roTxsLimiter and returns ErrServerOverloaded immediately if the semaphore is full. Internal callers always use blocking Acquire and are never rejected. When --rpc.max.concurrency=-1 (unlimited), limit=0 is propagated and RPC falls back to blocking Acquire with no restriction. Rename flag to --rpc.max.concurrency (was --rpc.max.concurrent-requests). Default: 0 = use db.read.concurrency value. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

yperbasis

From Claude:

High

remotedbserver.go hardcodes WithRPCContext(ctx, 1) — wrong semantics

db/kv/remotedbserver/remotedbserver.go:138,160: Both begin() and renew() pass a hardcoded limit of 1. Since BeginRo treats any limit > 0 as "use fail-fast TryAcquire", this forces ALL remote DB server
transactions into fail-fast mode — including calls from a standalone rpcdaemon connected via gRPC. The remote DB server is a gRPC service, not HTTP, so it was never behind the HTTP admission handler. Making it
fail-fast means transient DB semaphore fullness causes immediate errors for remote rpcdaemon users, rather than waiting briefly.

If the intent is to tag these as RPC, the limit should come from configuration, not be hardcoded. Or better, since these calls aren't gated by an HTTP admission handler, they should arguably remain blocking (no
WithRPCContext at all) — the goroutine pile-up problem doesn't apply here because gRPC has its own flow control.

No metrics or logging for shed load

There are no counters for HTTP 503 rejections or ErrServerOverloaded returns. Operators need to know how often admission control is firing. Without metrics, it's impossible to tune --rpc.max.concurrency or
detect that the node is under sustained overload. At minimum, add Prometheus counters for both rejection points.

No tests

No unit tests for rpcAdmissionHandler or the BeginRo TryAcquire path. The admission handler has subtle concurrency behavior (the atomic counter can briefly exceed the limit before decrementing) that should be
covered. The enableRPC path in rpcstack.go calls NewHTTPHandlerStack with the new signature, and the existing tests in rpcstack_test.go (lines 249, 265) will fail to compile since they use enableRPC →
NewHTTPHandlerStack which now requires 6 args instead of 4.

Medium

WithRPCContext API stores a limit but BeginRo only uses it as a boolean

kv_interface.go: The function signature WithRPCContext(ctx, limit int64) suggests the limit is meaningful, but BeginRo only checks limit > 0 as a boolean flag. This is confusing. Either:

Simplify to WithRPCContext(ctx context.Context) context.Context (boolean tag), or
Actually use the limit in BeginRo for something (e.g., pass it to a separate per-RPC semaphore)

roTxLimit default change in node.go is unrelated and undocumented

Line 322: Changes the fallback default from 32 to max(10, GOMAXPROCS*4). This is effectively dead code when DBReadConcurrency is set (always true via flag defaults), but if OpenDatabase is called without HTTP
config, the behavior changes. This should be called out in the PR description or split into its own change.

CI workflow version bumps are unrelated

The rpc-tests tag bumps (v1.115.0/v1.78.0 → v1.124.0) have nothing to do with admission control. These should be in a separate PR.

enableRPC path in rpcstack.go passes config.RpcConcurrencyLimit but nobody sets it

The diff adds config.RpcConcurrencyLimit to httpConfig and uses it in enableRPC, but no caller in the codebase populates this field. The embedded daemon path (node/cli/flags.go:setEmbeddedRpcDaemon) doesn't set
RpcConcurrencyLimit on the httpConfig struct — it sets it on HttpCfg, which flows through startRegularRpcServer (where the limit is computed separately). So the enableRPC path always gets RpcConcurrencyLimit =
0, meaning admission control is disabled there.

Low

No Retry-After header on 503

Standard HTTP practice is to include Retry-After with a 503. This helps well-behaved clients implement proper backoff.

Error format inconsistency

Clients may see two different error shapes for the same overload:

HTTP 503 with plain text body (from admission handler)
JSON-RPC internal error wrapping ErrServerOverloaded (from BeginRo, if the inner gate fires)

The JSON-RPC error is a standard -32000 internal error, which clients might not associate with overload. Consider returning a JSON-RPC response with error code -32005 (or similar) and the 503 status code for
both paths.

WebSocket connections bypass admission control

WebSocket RPC calls go through srv.WebsocketHandler which doesn't wrap with rpcAdmissionHandler. Under heavy WS load, the same goroutine pile-up problem could occur. Worth documenting as a known limitation.

Minor nits

The PR description repeats its first sentence ("This PR introduces...")
rpcAdmissionHandler could use semaphore.NewWeighted instead of atomic.Int64 to avoid the brief over-counting on concurrent arrivals — though the current approach is fine in practice

- db/kv: rename WithRPCContext/IsRPCContext to WithNonBlockingAcquire/IsNonBlockingAcquire — pure boolean, removes spurious limit parameter; DB layer no longer references RPC concepts - db/kv/mdbx: add db_rotx_overloaded_total Prometheus counter on TryAcquire failure - db/kv/remotedbserver: remove WithNonBlockingAcquire from begin/renew — gRPC transactions use blocking Acquire; they are not behind the HTTP admission handler - node: restore roTxLimit default to 32 (as on main); remove unrelated runtime import - node/rpcstack: add rpc_admission_rejected_total counter, Retry-After: 1 header, WebSocket limitation comment, and TODO for unified 503+JSON-RPC error format - node/rpcstack_test: add TestRPCAdmissionHandler covering limit=0, under-limit, and over-limit (503) cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lupin012 · 2026-03-24T09:16:58Z

@yperbasis

If the review fixes are OK, I will perform the following regression tests to ensure no regressions have been introduced:

Load Stress Testing: Verify that the system correctly returns HTTP 503 under overload conditions.
Concurrency Limit Validation: Run tests with --rpc.max.concurrency=1 to verify it maintains the expected legacy behavior.
TIP-trucking Load Test: Perform a full validation using the TIP_trucking_with_load test suite to ensure stability.

📋 Final Status of Review Comments

🔴 High Priority

#	Issue	Status	PR Comment
1	`remotedbserver.go` hardcodes `WithRPCContext(ctx, 1)`	✅ Fixed	Removed `WithRPCContext` from `begin()` and `renew()` — gRPC transactions now use blocking `Acquire`, consistent with the fact that they are not behind the HTTP admission handler.
2	No metrics for shed load	✅ Fixed	Added Prometheus counters `rpc_admission_rejected_total` (HTTP admission handler) and `db_rotx_overloaded_total` (inner `BeginRo` gate).
3	No tests for `rpcAdmissionHandler`	✅ Fixed	Added `TestRPCAdmissionHandler` with 3 sub-tests: limit=0 disabled, requests under the limit pass, requests over the limit → 503.

🟡 Medium Priority

#	Issue	Status	PR Comment
4	`WithRPCContext` API with spurious limit	✅ Fixed	Renamed to `WithNonBlockingAcquire(ctx)` — pure boolean, spurious limit parameter removed. The DB layer no longer has any knowledge of RPC.
5	`roTxLimit` default changed without documentation	✅ Fixed	Restored to 32 as on `main` — the change was unrelated to this PR.
6	CI `rpc-tests` version bumps unrelated to PR	✅ False positive	The bump was necessary: the new version correctly handles 503 responses che l'admission control può ora restituire.
7	`RpcConcurrencyLimit` never populated in embedded daemon path	✅ False positive	The field is populated at `flags.go:450` via `ctx.Int(utils.RpcMaxConcurrentRequestsFlag.Name)`.

🟢 Low Priority

#	Issue	Status	PR Comment
8	No `Retry-After` header on 503	✅ Fixed	Added `Retry-After: 1` to the 503 response from the admission handler.
9	Two different error formats for overload	📝 Separate PR	Path 1 (admission) returns HTTP 503 + plain text. Path 2 (`BeginRo`) returns HTTP 200 + JSON-RPC -32000. Unifying both requires buffering in `rpc/http.go` — deferred to a separate PR.
10	WebSocket bypasses admission control	📝 Separate PR	WebSocket connections are long-lived: the relevant limit is concurrent connections, non le inflight requests. Richiede un connection limiter dedicato — rimandato a una PR separata.

yperbasis

enableRPC path has dead admission control (still open from review point #7)

The httpConfig.RpcConcurrencyLimit field is added but never populated by callers of enableRPC. The embedded daemon path (node/cli/flags.go:setEmbeddedRpcDaemon) populates HttpCfg.RpcMaxConcurrentRequests which
flows through startRegularRpcServer (where the limit is computed separately and passed to NewHTTPHandlerStack directly). But the enableRPC → httpConfig.RpcConcurrencyLimit path always gets the zero value,
meaning admission control is always disabled for the embedded HTTP server's enableRPC codepath.

Ensures admission control wiring (enableRPC → NewHTTPHandlerStack → rpcAdmissionHandler) is exercised by all existing tests. Previously RpcConcurrencyLimit was always zero, silently disabling admission control in the test HTTP server. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

yperbasis

Issues

Medium

node/rpcstack.go now imports db/kv: This creates a dependency from the HTTP server layer down to the DB layer. The coupling is minimal (just kv.WithNonBlockingAcquire), but the context key + helper could
live in a thin, dependency-free package (e.g., common/ctxflags or similar) to keep the layering clean. Not a blocker, but worth considering to avoid the import growing over time.
No log on 503 rejection: The admission handler only increments a Prometheus counter. Operators diagnosing client-reported 503s won't see anything in logs unless they check metrics. A debug.Trace-level log
(or even just logging on the first rejection after a quiet period) would help.
Test spin-wait without runtime.Gosched() (rpcstack_test.go:432):
for admission.inflight.Load() < limit {
// spin
}
On a single-P runtime (e.g., GOMAXPROCS=1 in CI), this could livelock because the goroutines blocked in the handler never get scheduled. Adding runtime.Gosched() inside the loop would be safer.

Low

Static Retry-After: 1: Under sustained overload, all rejected clients retry at the same second, causing a thundering herd. A jittered value (e.g., 1 + rand(0,2)) or documenting that clients should add their
own jitter would help, but fine for v1.
tagAsRPC parameter naming: The boolean controls whether admission control is applied and whether the context is tagged for non-blocking DB acquire. withAdmissionControl would be a more descriptive name —
tagAsRPC sounds like it's labeling the request type.
CI rpc-tests version bump from v1.78.0 → v1.124.0 in qa-tip-tracking-with-load.yml is a large jump. The bump is necessary (503 handling), but worth a quick sanity check that no unrelated test behavior
changed between those versions.

Questions for the author

For the default case where admission limit = db.read.concurrency, the HTTP RPC gate can admit enough requests to fill all DB read slots. If Engine API or staged sync internal reads need a slot concurrently,
they'll have to wait for an HTTP RPC handler to finish. Has this been observed in practice? Would reserving e.g. 10% of slots for non-HTTP callers be worth considering, or is the current design sufficient
because staged sync primarily uses write transactions?

yperbasis · 2026-03-26T09:56:05Z

@lupin012 should we merge this PR?

I am still running the CI Tip Tracking & Load tests. Once they finish successfully (as expected), I will proceed with the merge.

This PR introduces an HTTP admission control layer to protect the Staged Sync pipeline from being starved or delayed by high RPC load. This PR introduces a two-level admission control system to protect the Staged Sync pipeline from being starved or delayed by high RPC load. Root Cause Analysis: Under heavy RPC traffic, the node accumulates a large number of goroutines blocked on roTxsLimiter.Acquire. When DB slots become available, the backlog drains in a way that starves the staged sync pipeline. The goroutine pile-up also causes a significant spike in virtual memory and overall system instability. Solution: Two gates work in tandem: 1. HTTP admission handler (rpcAdmissionHandler) — outer gate installed at the top of every HTTP RPC stack, before CORS, Gzip, or JSON decoding. If the number of inflight requests exceeds the configured limit, the request is rejected immediately with HTTP 503. This prevents goroutine accumulation at the source. On every admitted request the handler tags the context with WithRPCContext (limit value) so the DB layer can identify the caller. 2. BeginRo inner gate — if the context carries a positive RPC limit, BeginRo uses TryAcquire on roTxsLimiter and returns ErrServerOverloaded immediately if the semaphore is full. Internal callers (staged sync, background workers) always use blocking Acquire and are never rejected. This two-level approach means most overload is shed at the HTTP layer (goroutines never enter the system), while any RPC requests that slip through under transient concurrency spikes are still fail-fast at the DB layer rather than piling up behind the semaphore. Configuration: - --rpc.max.concurrency: HTTP admission limit. - 0 (default): uses --db.read.concurrency (auto-tuned to GOMAXPROCS × 64, capped at 9000) - > 0: explicit limit - -1: unlimited (admission control disabled, BeginRo falls back to blocking Acquire) (as old behaviour) | Resource | Result | | :--- | :--- | ### Summary of Resource Management Improvements | Resource | Result | | :--- | :--- | | **Goroutine pile-up** | ✅ Requests rejected at HTTP layer before CORS, Gzip, or JSON decoding | | **Staged sync starvation** | ✅ Internal callers (staged sync, workers) use blocking `Acquire` and are never rejected; RPC uses `TryAcquire` fail-fast | | **Transient overload spikes** | ✅ `BeginRo` inner gate catches RPC requests that pass the HTTP layer during concurrency spikes | | **Scalability** | ✅ Default limit auto-tuned to `GOMAXPROCS × 64` (capped at 9000) via `--db.read.concurrency` | | **Configuration** | ✅ Zero required config, one optional flag (`--rpc.max.concurrency`) | Benchmark & Stress Test Results Setup: 32 Cores, 64GB RAM, 70GB Swap. Minimal Node in Sync. Parallel eth_call stress tests (28k QPS). <details> <summary><b>Click to expand: Benchmark Data (Before vs After on local node)</b></summary> ### Current SW (main release) CPU 03:23:56 PM all 29.55 0.00 22.30 34.33 0.00 13.83 03:24:06 PM all 56.41 0.00 15.44 10.83 0.00 17.32 03:24:16 PM all 75.60 0.00 13.36 2.86 0.00 8.18 03:24:26 PM all 73.19 0.00 14.35 2.82 0.00 9.63 03:24:36 PM all 73.35 0.00 14.56 2.75 0.00 9.34 Memory 15:23:30 rss=31.89GB vsz=7.65TB proc_swap=11.81GB sys_swap=27.21/72.00GB MemAvail=1.15GB SwapAvail=44.79GB 15:23:40 rss=32.74GB vsz=7.65TB proc_swap=11.00GB sys_swap=27.02/72.00GB MemAvail=1.50GB SwapAvail=44.98GB 15:23:50 rss=33.83GB vsz=7.65TB proc_swap=9.89GB sys_swap=25.65/72.00GB MemAvail=1.44GB SwapAvail=46.35GB 15:24:00 rss=36.33GB vsz=7.65TB proc_swap=7.60GB sys_swap=23.55/72.00GB MemAvail=1.67GB SwapAvail=48.45GB 15:24:10 rss=37.85GB vsz=7.65TB proc_swap=6.91GB sys_swap=21.83/72.00GB MemAvail=5.10GB SwapAvail=50.17GB 15:24:20 rss=39.30GB vsz=7.65TB proc_swap=6.69GB sys_swap=20.23/72.00GB MemAvail=7.28GB SwapAvail=51.77GB 15:24:30 rss=40.40GB vsz=7.65TB proc_swap=6.20GB sys_swap=17.94/72.00GB MemAvail=10.20GB SwapAvail=54.06GB 15:24:40 rss=41.44GB vsz=7.65TB proc_swap=5.23GB sys_swap=14.95/72.00GB MemAvail=20.01GB SwapAvail=57.05GB 15:24:50 rss=41.68GB vsz=7.65TB proc_swap=5.20GB sys_swap=14.92/72.00GB MemAvail=16.14GB SwapAvail=57.08GB 15:25:00 rss=42.77GB vsz=7.65TB proc_swap=4.95GB sys_swap=14.87/72.00GB MemAvail=11.41GB SwapAvail=57.13GB 15:25:11 rss=42.78GB vsz=7.65TB proc_swap=5.26GB sys_swap=15.55/72.00GB MemAvail=8.58GB SwapAvail=56.45GB 15:25:21 rss=40.79GB vsz=7.65TB proc_swap=6.88GB sys_swap=17.46/72.00GB MemAvail=5.65GB SwapAvail=54.54GB TIP Trucking [15:21:44] block #24,656,279 ts=2026-03-14 15:19:47 lag=+117.8s ALERT: lag=117.8s — node is behind the tip! [15:21:44] block #24,656,280 ts=2026-03-14 15:19:59 lag=+105.8s ALERT: lag=105.8s — node is behind the tip! [15:21:44] block #24,656,281 ts=2026-03-14 15:20:11 lag=+93.8s ALERT: lag=93.8s — node is behind the tip! [15:21:44] block #24,656,282 ts=2026-03-14 15:20:23 lag=+81.8s ALERT: lag=81.8s — node is behind the tip! [15:21:44] block #24,656,283 ts=2026-03-14 15:20:47 lag=+57.8s ALERT: lag=57.8s — node is behind the tip! [15:21:57] block #24,656,284 ts=2026-03-14 15:20:59 lag=+58.0s ALERT: lag=58.0s — node is behind the tip! [15:21:57] block #24,656,285 ts=2026-03-14 15:21:11 lag=+46.0s ALERT: lag=46.0s — node is behind the tip! [15:21:57] block #24,656,286 ts=2026-03-14 15:21:23 lag=+34.0s ALERT: lag=34.0s — node is behind the tip! [15:21:57] block #24,656,287 ts=2026-03-14 15:21:35 lag=+22.0s ALERT: lag=22.0s — node is behind the tip! [15:21:57] block #24,656,288 ts=2026-03-14 15:21:47 lag=+10.0s OK [15:22:07] block #24,656,289 ts=2026-03-14 15:21:59 lag=+8.0s OK [15:22:19] block #24,656,290 ts=2026-03-14 15:22:11 lag=+8.3s OK [15:22:32] block #24,656,291 ts=2026-03-14 15:22:23 lag=+9.3s OK [15:23:02] ALERT: no new block for 30s (last block #24656291) — node may be losing the tip! [15:23:32] ALERT: no new block for 60s (last block #24656291) — node may be losing the tip! [15:24:02] ALERT: no new block for 90s (last block #24656291) — node may be losing the tip! [15:24:24] block #24,656,292 ts=2026-03-14 15:22:35 lag=+109.5s ALERT: lag=109.5s — node is behind the tip! [15:24:24] block #24,656,293 ts=2026-03-14 15:22:47 lag=+97.5s ALERT: lag=97.5s — node is behind the tip! [15:24:24] block #24,656,294 ts=2026-03-14 15:22:59 lag=+85.5s ALERT: lag=85.5s — node is behind the tip! [15:24:24] block #24,656,295 ts=2026-03-14 15:23:11 lag=+73.5s ALERT: lag=73.5s — node is behind the tip! [15:24:54] ALERT: no new block for 30s (last block #24656295) — node may be losing the tip! [15:25:17] block #24,656,296 ts=2026-03-14 15:23:23 lag=+114.2s ALERT: lag=114.2s — node is behind the tip! [15:25:17] block #24,656,297 ts=2026-03-14 15:23:35 lag=+102.2s ALERT: lag=102.2s — node is behind the tip! [15:25:17] block #24,656,298 ts=2026-03-14 15:23:47 lag=+90.2s ALERT: lag=90.2s — node is behind the tip! [15:25:17] block #24,656,299 ts=2026-03-14 15:23:59 lag=+78.2s ALERT: lag=78.2s — node is behind the tip! [15:25:17] block #24,656,300 ts=2026-03-14 15:24:11 lag=+66.2s ALERT: lag=66.2s — node is behind the tip! [15:25:17] block #24,656,301 ts=2026-03-14 15:24:23 lag=+54.2s ALERT: lag=54.2s — node is behind the tip! [15:25:17] block #24,656,302 ts=2026-03-14 15:24:35 lag=+42.2s ALERT: lag=42.2s — node is behind the tip! [15:25:17] block #24,656,303 ts=2026-03-14 15:24:47 lag=+30.2s ALERT: lag=30.2s — node is behind the tip! > ./run_perf_tests.py -p pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y eth_call -m 2 -r 100 -Z Performance Test started Test repetitions: 100 on sequence: 28000:60 for pattern: pattern/mainnet/stress_test_eth_call_001_latest.tar Test on port: http://localhost:8545 [1. 1] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m39s] [1. 2] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m46s] [1. 3] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m38s] > ./run_perf_tests.py -p pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y eth_call -m 2 -r 100 -Z Performance Test started Test repetitions: 100 on sequence: 28000:60 for pattern: pattern/mainnet/stress_test_eth_call_001_latest.tar Test on port: http://localhost:8545 [1. 1] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m39s] [1. 2] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m45s] [1. 3] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m40s] ### NEW Software (with PR) CPU 7:58:51 AM all 51.09 0.00 6.16 0.35 0.00 42.40 07:58:56 AM all 49.26 0.00 5.82 0.03 0.00 44.89 07:59:01 AM all 50.34 0.00 5.95 0.20 0.00 43.51 07:59:06 AM all 51.60 0.00 5.88 0.04 0.00 42.47 07:59:11 AM all 48.97 0.00 5.90 0.06 0.00 45.07 07:59:16 AM all 49.59 0.00 6.11 0.36 0.00 43.93 07:59:21 AM all 48.69 0.00 5.78 0.03 0.00 45.51 07:59:26 AM all 53.50 0.00 6.66 0.26 0.00 39.59 07:59:31 AM all 50.45 0.00 6.37 0.02 0.00 43.16 07:59:36 AM all 48.71 0.00 6.18 0.03 0.00 45.08 07:59:41 AM all 53.58 0.00 6.45 0.15 0.00 39.81 07:59:46 AM all 53.74 0.00 6.13 0.05 0.00 40.07 07:59:51 AM all 31.76 0.00 3.95 0.23 0.00 64.06 07:59:56 AM all 37.20 0.00 5.05 0.03 0.00 57.71 08:00:01 AM all 77.10 0.00 12.95 0.01 0.00 9.94 08:00:06 AM all 78.22 0.00 12.58 0.08 0.00 9.11 08:00:11 AM all 77.64 0.00 12.50 0.00 0.00 9.86 08:00:16 AM all 77.48 0.00 12.61 0.08 0.00 9.83 08:00:21 AM all 77.61 0.00 12.47 0.01 0.00 9.90 08:00:26 AM all 77.35 0.00 12.89 0.06 0.00 9.70 08:00:31 AM all 77.85 0.00 12.92 0.04 0.00 9.19 08:00:36 AM all 77.73 0.00 12.80 0.02 0.00 9.44 08:00:41 AM all 78.42 0.00 12.95 0.05 0.00 8.59 08:00:46 AM all 78.52 0.00 12.55 0.01 0.00 8.93 08:00:51 AM all 78.42 0.00 12.77 0.19 0.00 8.62 08:00:56 AM all 56.98 0.00 8.64 0.11 0.00 34.28 Memory 2026-03-20 08:00:36 pid=1117840 rss=30.04GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.93GB SwapAvail=71.02GB 2026-03-20 08:00:41 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.86GB SwapAvail=71.02GB 2026-03-20 08:00:41 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.86GB SwapAvail=71.02GB 2026-03-20 08:00:46 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.90GB SwapAvail=71.02GB 2026-03-20 08:00:46 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.90GB SwapAvail=71.02GB 2026-03-20 08:00:51 pid=1117840 rss=30.28GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.88GB SwapAvail=71.02GB 2026-03-20 08:00:51 pid=1117840 rss=30.28GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.88GB SwapAvail=71.02GB 2026-03-20 08:00:56 pid=1117840 rss=30.54GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=40.39GB SwapAvail=71.02GB 2026-03-20 08:00:56 pid=1117840 rss=30.54GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=40.39GB SwapAvail=71.02GB 2026-03-20 08:01:02 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=40.25GB SwapAvail=71.02GB 2026-03-20 08:01:02 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=40.25GB SwapAvail=71.02GB 2026-03-20 08:01:07 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.97GB SwapAvail=71.02GB 2026-03-20 08:01:07 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.97GB SwapAvail=71.02GB 2026-03-20 08:01:12 pid=1117840 rss=30.62GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.48GB SwapAvail=71.02GB 2026-03-20 08:01:12 pid=1117840 rss=30.62GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.48GB SwapAvail=71.02GB 2026-03-20 08:01:17 pid=1117840 rss=30.71GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.57GB SwapAvail=71.02GB 2026-03-20 08:01:17 pid=1117840 rss=30.71GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.57GB SwapAvail=71.02GB TIP Trucking 07:56:10] block #24,697,055 ts=2026-03-20 07:55:59 lag=+12.0s OK [07:56:15] block #24,697,056 ts=2026-03-20 07:56:11 lag=+4.5s OK [07:56:25] block #24,697,057 ts=2026-03-20 07:56:23 lag=+2.5s OK [07:56:38] block #24,697,058 ts=2026-03-20 07:56:35 lag=+3.4s OK [07:56:50] block #24,697,059 ts=2026-03-20 07:56:47 lag=+3.5s OK [07:57:02] block #24,697,060 ts=2026-03-20 07:56:59 lag=+3.6s OK [07:57:16] block #24,697,061 ts=2026-03-20 07:57:11 lag=+5.6s OK [07:57:27] block #24,697,062 ts=2026-03-20 07:57:23 lag=+4.7s OK [07:57:39] block #24,697,063 ts=2026-03-20 07:57:35 lag=+4.3s OK [07:57:49] block #24,697,064 ts=2026-03-20 07:57:47 lag=+2.4s OK [07:58:01] block #24,697,065 ts=2026-03-20 07:57:59 lag=+2.9s OK [07:58:13] block #24,697,066 ts=2026-03-20 07:58:11 lag=+2.8s OK [07:58:25] block #24,697,067 ts=2026-03-20 07:58:23 lag=+2.4s OK [07:58:37] block #24,697,068 ts=2026-03-20 07:58:35 lag=+2.7s OK [07:58:49] block #24,697,069 ts=2026-03-20 07:58:47 lag=+2.3s OK [07:59:01] block #24,697,070 ts=2026-03-20 07:58:59 lag=+2.1s OK [07:59:15] block #24,697,071 ts=2026-03-20 07:59:11 lag=+4.3s OK [07:59:25] block #24,697,072 ts=2026-03-20 07:59:23 lag=+2.6s OK [07:59:40] block #24,697,073 ts=2026-03-20 07:59:35 lag=+5.3s OK [08:00:02] block #24,697,074 ts=2026-03-20 07:59:59 lag=+3.9s OK [08:00:13] block #24,697,075 ts=2026-03-20 08:00:11 lag=+2.8s OK ./run_perf_tests.py -p pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y eth_call -m 2 -r 100 -Z Performance Test started Test repetitions: 100 on sequence: 28000:60 for pattern: pattern/mainnet/stress_test_eth_call_001_latest.tar Test on port: http://localhost:8545 [1. 1] daemon: executes test qps: 28000 time: 60 -> [R=51.39% max=605.449ms error=503 Service Unavailable] [1. 2] daemon: executes test qps: 28000 time: 60 -> [R=51.55% max=442.974ms error=503 Service Unavailable] [1. 3] daemon: executes test qps: 28000 time: 60 -> [R=49.52% max=440.405ms error=503 Service Unavailable] [1. 4] daemon: executes test qps: 28000 time: 60 -> [R=51.01% max=440.004ms error=503 Service Unavailable] [1. 5] daemon: executes test qps: 28000 time: 60 -> [R=49.66% max=597.333ms error=503 Service Unavailable] ./run_perf_tests.py -p pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y eth_call -m 2 -r 100 -Z Performance Test started Test repetitions: 100 on sequence: 28000:60 for pattern: pattern/mainnet/stress_test_eth_call_001_latest.tar Test on port: http://localhost:8545 [1. 1] daemon: executes test qps: 28000 time: 60 -> [R=51.51% max=581.793ms error=503 Service Unavailable] [1. 2] daemon: executes test qps: 28000 time: 60 -> [R=51.61% max=431.222ms error=503 Service Unavailable] [1. 3] daemon: executes test qps: 28000 time: 60 -> [R=49.48% max=495.57ms error=503 Service Unavailable] [1. 4] daemon: executes test qps: 28000 time: 60 -> [R=50.91% max=433.208ms error=503 Service Unavailable] [1. 5] daemon: executes test qps: 28000 time: 60 -> [R=49.57% max=538.283ms error=503 Service Unavailable] Verified on CI TIPtrucking infrastructure. Previous software versions experienced "TIP lost" at 3,000 QPS. With these changes, the system now successfully handles up to 6,000 QPS without any TIP loss or degradation. </details> Stress Test Observations (main release) - Chain Tip Loss: Under heavy load, the node fails to stay synced and the Chain Tip is lost, as the staged sync pipeline is starved of DB read slots by queued RPC goroutines. - Virtual Memory Pressure: The system experiences severe VM pressure, with process swap usage reaching 11.81 GB. The massive accumulation of goroutines blocked on roTxsLimiter.Acquire causes excessive paging and swapping. This state is highly unstable and frequently leads to the process being terminated by the OOM Killer, causing total node downtime. - Request Satisfaction (100%): Despite the performance degradation, all requests are eventually satisfied. However, this is achieved at the cost of system stability and synchronization. - Increased Latency: Request latency increases dramatically due to deep queuing, with response times reaching up to 1m 40s. --- Stress Test Observations (with PR) - Chain Tip Stability: The two-level admission control prevents goroutine accumulation entirely. The HTTP outer gate rejects excess requests before any processing; the BeginRo inner gate ensures that any RPC request that does enter the system uses TryAcquire (fail-fast) rather than blocking. Internal callers (staged sync, background workers) always use blocking Acquire and are never rejected, so the pipeline makes continuous progress. - Virtual Memory Pressure: Significantly lower memory footprint. By eliminating request queuing at the HTTP layer, the system avoids excessive paging and swapping (0.00 GB swap), keeping the OS stable. - Request Satisfaction (~50%): Approximately 50% of requests are satisfied; the remainder are immediately rejected with 503 Service Unavailable. This is the intended fail-fast behavior — goroutines never accumulate, DB slots are never exhausted. - Latency Consistency: Response latency remains consistently low. By refusing to queue requests beyond the system's capacity, the node avoids the massive latency spikes (previously up to 1m 40s) seen before the fix. This behavior is aligned with Nethermind, which returns 503 Service Unavailable under high load, prioritizing node health over request queuing. --- Final Observation By adopting a fail-fast strategy at two levels — HTTP admission before any expensive processing, and TryAcquire inside BeginRo for RPC callers — we enforce resource isolation at the core level. Internal execution paths retain guaranteed access to DB read slots via blocking Acquire, while external RPC pressure is shed immediately. This approach shifts congestion management responsibility to the external infrastructure (load balancers, proxies), which is better equipped to handle buffering, ensuring that the Erigon node remains stable and synchronized regardless of external RPC load. ## 🚀 RPC Concurrency & Resource Management Comparison | Feature | Erigon (main) | **Erigon (with PR)** | | :--- | :--- | :--- | | **Admission control** | ❌ None | ✅ **HTTP outer gate** (`rpcAdmissionHandler`) | | **Overload response** | Unlimited queuing | ✅ **Immediate HTTP 503** | | **Rejection point** | ❌ None | ✅ Pre-CORS, Gzip, JSON decode | | **Goroutine accumulation** | ⚠️ Yes, unlimited | ✅ **Eliminated** — goroutines don't enter the system | | **Internal pipeline protection** | ❌ RPC and staged sync compete for slots | ✅ **Internal callers** use blocking `Acquire` | | **DB slots protection** | ❌ None — RPC exhausts slots | ✅ `TryAcquire` in `BeginRo` for RPC | | **Memory under load** | ❌ Critical — swap up to 11.81 GB, OOM | ✅ **Stable** (0.00 GB swap in test) | | **Latency under overload** | High (~1m 40s) | ✅ **Constantly low** (fail-fast) | | **Configuration required** | ❌ No concurrency flags | ✅ **Zero config**; `--rpc.max.concurrency` optional | | **Execution isolation** | ❌ Chain tip lost under load | ✅ **Guaranteed by design** | ### 📊 Performance Comparison: Main (18/03) vs. PR This benchmark compares the current `main` branch against this PR using the same set of APIs under heavy load. | API | main (18/03) post_exec p50 | PR post_exec p50 | Improvement | | :--- | :---: | :---: | :---: | | **eth_call** @ 3000 QPS | 6.82s ✅ | 5.89s ✅ | **−14%** | | **eth_getBlockByNumber** @ 3000 QPS | 13.73s ⚠️ | 5.23s ✅ | **−62%** | | **eth_getProof** @ 1000–3000 QPS | 49.12s (tip lost) | 2.84s ✅ | **−94%** | --- ### 🔍 Key Observations * **eth_call**: Neither `main` nor the PR caused a chain tip loss. Since `eth_call` is read-only and light on DB slots, it is inherently more stable, but the PR still delivers a **14% reduction** in p50 latency. * **eth_getBlockByNumber**: Remains stable up to **6000 QPS** with no actual tip loss. Any observed `sync=0` periods during testing were identified as monitoring false negatives rather than actual node desync. * **eth_getProof**: This is the most impactful result. While `main` lost the chain tip at only 1000 QPS (p50=49s), the **PR successfully holds up to 3000 QPS** with a p50 of 2.84s—a **94% performance gain**. ### 🏆 Overall Conclusion The final PR successfully **eliminates chain tip loss** across all tested APIs and QPS levels. No real tip loss was observed in any production-level test run, ensuring much higher node reliability under stress. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

lupin012 changed the title ~~protect execution TIP under RPC load~~ WIP protect execution TIP under RPC load Mar 15, 2026

lupin012 requested a review from canepat March 15, 2026 20:03

lupin012 changed the title ~~WIP protect execution TIP under RPC load~~ [Wip] protect execution TIP under RPC load Mar 15, 2026

lupin012 added performance RPC labels Mar 15, 2026

lupin012 changed the title ~~[Wip] protect execution TIP under RPC load~~ [wip] protect execution TIP under RPC load Mar 16, 2026

yperbasis added this to the 3.5.0 milestone Mar 17, 2026

lupin012 force-pushed the lupin012/protect-tip-under-load branch from 3a062c5 to 115af7b Compare March 17, 2026 16:31

lupin012 and others added 4 commits March 17, 2026 18:33

use branch for rpc-test:perf

0a03c25

lupin012 force-pushed the lupin012/protect-tip-under-load branch from 74758c7 to 0a03c25 Compare March 17, 2026 17:34

lupin012 and others added 5 commits March 17, 2026 18:40

update rpc versin on TIP trucking worflow

88212b6

debug counter

e90eb8e

db/kv/mdbx: debug goroutine for chaindata only — remove before merge

61ea4e0

Start the 10-second RPC stats background goroutine only for the chaindata DB instance, not for txpool and other MDBX instances. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lupin012 force-pushed the lupin012/protect-tip-under-load branch 2 times, most recently from f621c53 to e6f4205 Compare March 18, 2026 20:10

lupin012 force-pushed the lupin012/protect-tip-under-load branch from e6f4205 to 83c6b2c Compare March 18, 2026 20:13

lupin012 and others added 8 commits March 18, 2026 21:57

db/kv, node: remove debug goroutine and HTTP counters

13bbfe5

Remove the periodic debug logging goroutine (stopDebugCh, http_total/http_rejected stats), DebugHTTPTotal/DebugHTTPRejected counters, and now-unused kv import in rpcstack. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cmd: rename --rpc.max.concurrentRequests to --rpc.max.concurrent-requ…

82bc8ab

…ests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

db/kv: restore kv_interface.go to main (no functional changes)

ea9c29a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

db/kv/mdbx: restore kv_mdbx.go to main (no functional changes)

d0b5224

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

rpc: restore http.go to main (no functional changes)

bfe80e6

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' into lupin012/protect-tip-under-load

c760754

lupin012 and others added 2 commits March 19, 2026 23:48

update rpc version

7b047ab

lupin012 marked this pull request as ready for review March 20, 2026 11:31

lupin012 requested review from AskAlexSharov, Giulio2002, mriccobene and sudeepdino008 as code owners March 20, 2026 11:31

Merge branch 'main' into lupin012/protect-tip-under-load

179b611

lupin012 changed the title ~~[wip] protect execution TIP under RPC load~~ protect execution TIP under RPC load Mar 20, 2026

yperbasis requested a review from mh0lt March 24, 2026 06:40

yperbasis requested changes Mar 24, 2026

View reviewed changes

lupin012 and others added 2 commits March 24, 2026 11:53

Merge branch 'main' into lupin012/protect-tip-under-load

1c2134d

yperbasis approved these changes Mar 24, 2026

View reviewed changes

lupin012 added this pull request to the merge queue Mar 27, 2026

Merged via the queue into main with commit b6d67fa Mar 27, 2026
39 checks passed

lupin012 deleted the lupin012/protect-tip-under-load branch March 27, 2026 07:04

AskAlexSharov pushed a commit that referenced this pull request Apr 3, 2026

[3.4] cherry-pick protect execution TIP under RPC load (#19905) (#20297)

bebac8e

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

AskAlexSharov mentioned this pull request Apr 8, 2026

cp: cherry-pick batch 3 from release/3.4 to main #20406

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

protect execution TIP under RPC load#19905

protect execution TIP under RPC load#19905
lupin012 merged 24 commits intomainfrom
lupin012/protect-tip-under-load

lupin012 commented Mar 15, 2026 •

edited

Loading

Uh oh!

yperbasis left a comment

Uh oh!

lupin012 commented Mar 24, 2026 •

edited

Loading

Uh oh!

yperbasis left a comment

Uh oh!

yperbasis left a comment

Uh oh!

yperbasis commented Mar 26, 2026 •

edited by lupin012

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lupin012 commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of Resource Management Improvements

Current SW (main release)

NEW Software (with PR)

🚀 RPC Concurrency & Resource Management Comparison

📊 Performance Comparison: Main (18/03) vs. PR

🔍 Key Observations

🏆 Overall Conclusion

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Uh oh!

lupin012 commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📋 Final Status of Review Comments

🔴 High Priority

🟡 Medium Priority

🟢 Low Priority

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Uh oh!

yperbasis commented Mar 26, 2026 • edited by lupin012 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lupin012 commented Mar 15, 2026 •

edited

Loading

lupin012 commented Mar 24, 2026 •

edited

Loading

yperbasis commented Mar 26, 2026 •

edited by lupin012

Loading