Skip to content

protect execution TIP under RPC load#19905

Merged
lupin012 merged 24 commits intomainfrom
lupin012/protect-tip-under-load
Mar 27, 2026
Merged

protect execution TIP under RPC load#19905
lupin012 merged 24 commits intomainfrom
lupin012/protect-tip-under-load

Conversation

@lupin012
Copy link
Copy Markdown
Contributor

@lupin012 lupin012 commented Mar 15, 2026

This PR introduces an HTTP admission control layer to protect the Staged Sync pipeline from being starved or delayed by high RPC load.

This PR introduces a two-level admission control system to protect the Staged Sync pipeline from being starved or delayed by high RPC load.
Root Cause Analysis:
Under heavy RPC traffic, the node accumulates a large number of goroutines blocked on roTxsLimiter.Acquire. When DB slots become available, the backlog drains in a way that starves the staged sync pipeline. The goroutine pile-up also causes a significant spike in virtual memory and overall system instability.

Solution:
Two gates work in tandem:

  1. HTTP admission handler (rpcAdmissionHandler) — outer gate installed at the top of every HTTP RPC stack, before CORS, Gzip, or JSON decoding. If the number of inflight requests exceeds the configured limit, the request is rejected immediately with HTTP 503. This prevents goroutine accumulation at the source. On every admitted request the handler tags the context with
    WithRPCContext (limit value) so the DB layer can identify the caller.
  2. BeginRo inner gate — if the context carries a positive RPC limit, BeginRo uses TryAcquire on roTxsLimiter and returns ErrServerOverloaded immediately if the semaphore is full. Internal callers (staged sync, background workers) always use blocking Acquire and are never rejected.

This two-level approach means most overload is shed at the HTTP layer (goroutines never enter the system), while any RPC requests that slip through under transient concurrency spikes are still fail-fast at the DB layer rather than piling up behind the semaphore.

Configuration:

  • --rpc.max.concurrency: HTTP admission limit.
    • 0 (default): uses --db.read.concurrency (auto-tuned to GOMAXPROCS × 64, capped at 9000)
    • 0: explicit limit

    • -1: unlimited (admission control disabled, BeginRo falls back to blocking Acquire) (as old behaviour)
Resource Result

Summary of Resource Management Improvements

Resource Result
Goroutine pile-up ✅ Requests rejected at HTTP layer before CORS, Gzip, or JSON decoding
Staged sync starvation ✅ Internal callers (staged sync, workers) use blocking Acquire and are never rejected; RPC uses TryAcquire fail-fast
Transient overload spikes BeginRo inner gate catches RPC requests that pass the HTTP layer during concurrency spikes
Scalability ✅ Default limit auto-tuned to GOMAXPROCS × 64 (capped at 9000) via --db.read.concurrency
Configuration ✅ Zero required config, one optional flag (--rpc.max.concurrency)

Benchmark & Stress Test Results
Setup: 32 Cores, 64GB RAM, 70GB Swap. Minimal Node in Sync. Parallel eth_call stress tests (28k QPS).

Click to expand: Benchmark Data (Before vs After on local node)

Current SW (main release)

CPU
03:23:56 PM all 29.55 0.00 22.30 34.33 0.00 13.83
03:24:06 PM all 56.41 0.00 15.44 10.83 0.00 17.32
03:24:16 PM all 75.60 0.00 13.36 2.86 0.00 8.18
03:24:26 PM all 73.19 0.00 14.35 2.82 0.00 9.63
03:24:36 PM all 73.35 0.00 14.56 2.75 0.00 9.34

Memory
15:23:30 rss=31.89GB vsz=7.65TB proc_swap=11.81GB sys_swap=27.21/72.00GB MemAvail=1.15GB SwapAvail=44.79GB
15:23:40 rss=32.74GB vsz=7.65TB proc_swap=11.00GB sys_swap=27.02/72.00GB MemAvail=1.50GB SwapAvail=44.98GB
15:23:50 rss=33.83GB vsz=7.65TB proc_swap=9.89GB sys_swap=25.65/72.00GB MemAvail=1.44GB SwapAvail=46.35GB
15:24:00 rss=36.33GB vsz=7.65TB proc_swap=7.60GB sys_swap=23.55/72.00GB MemAvail=1.67GB SwapAvail=48.45GB
15:24:10 rss=37.85GB vsz=7.65TB proc_swap=6.91GB sys_swap=21.83/72.00GB MemAvail=5.10GB SwapAvail=50.17GB
15:24:20 rss=39.30GB vsz=7.65TB proc_swap=6.69GB sys_swap=20.23/72.00GB MemAvail=7.28GB SwapAvail=51.77GB
15:24:30 rss=40.40GB vsz=7.65TB proc_swap=6.20GB sys_swap=17.94/72.00GB MemAvail=10.20GB SwapAvail=54.06GB
15:24:40 rss=41.44GB vsz=7.65TB proc_swap=5.23GB sys_swap=14.95/72.00GB MemAvail=20.01GB SwapAvail=57.05GB
15:24:50 rss=41.68GB vsz=7.65TB proc_swap=5.20GB sys_swap=14.92/72.00GB MemAvail=16.14GB SwapAvail=57.08GB
15:25:00 rss=42.77GB vsz=7.65TB proc_swap=4.95GB sys_swap=14.87/72.00GB MemAvail=11.41GB SwapAvail=57.13GB
15:25:11 rss=42.78GB vsz=7.65TB proc_swap=5.26GB sys_swap=15.55/72.00GB MemAvail=8.58GB SwapAvail=56.45GB
15:25:21 rss=40.79GB vsz=7.65TB proc_swap=6.88GB sys_swap=17.46/72.00GB MemAvail=5.65GB SwapAvail=54.54GB

TIP Trucking
[15:21:44] block #24,656,279 ts=2026-03-14 15:19:47 lag=+117.8s ALERT: lag=117.8s — node is behind the tip!
[15:21:44] block #24,656,280 ts=2026-03-14 15:19:59 lag=+105.8s ALERT: lag=105.8s — node is behind the tip!
[15:21:44] block #24,656,281 ts=2026-03-14 15:20:11 lag=+93.8s ALERT: lag=93.8s — node is behind the tip!
[15:21:44] block #24,656,282 ts=2026-03-14 15:20:23 lag=+81.8s ALERT: lag=81.8s — node is behind the tip!
[15:21:44] block #24,656,283 ts=2026-03-14 15:20:47 lag=+57.8s ALERT: lag=57.8s — node is behind the tip!
[15:21:57] block #24,656,284 ts=2026-03-14 15:20:59 lag=+58.0s ALERT: lag=58.0s — node is behind the tip!
[15:21:57] block #24,656,285 ts=2026-03-14 15:21:11 lag=+46.0s ALERT: lag=46.0s — node is behind the tip!
[15:21:57] block #24,656,286 ts=2026-03-14 15:21:23 lag=+34.0s ALERT: lag=34.0s — node is behind the tip!
[15:21:57] block #24,656,287 ts=2026-03-14 15:21:35 lag=+22.0s ALERT: lag=22.0s — node is behind the tip!
[15:21:57] block #24,656,288 ts=2026-03-14 15:21:47 lag=+10.0s OK
[15:22:07] block #24,656,289 ts=2026-03-14 15:21:59 lag=+8.0s OK
[15:22:19] block #24,656,290 ts=2026-03-14 15:22:11 lag=+8.3s OK
[15:22:32] block #24,656,291 ts=2026-03-14 15:22:23 lag=+9.3s OK
[15:23:02] ALERT: no new block for 30s (last block #24656291) — node may be losing the tip!
[15:23:32] ALERT: no new block for 60s (last block #24656291) — node may be losing the tip!
[15:24:02] ALERT: no new block for 90s (last block #24656291) — node may be losing the tip!
[15:24:24] block #24,656,292 ts=2026-03-14 15:22:35 lag=+109.5s ALERT: lag=109.5s — node is behind the tip!
[15:24:24] block #24,656,293 ts=2026-03-14 15:22:47 lag=+97.5s ALERT: lag=97.5s — node is behind the tip!
[15:24:24] block #24,656,294 ts=2026-03-14 15:22:59 lag=+85.5s ALERT: lag=85.5s — node is behind the tip!
[15:24:24] block #24,656,295 ts=2026-03-14 15:23:11 lag=+73.5s ALERT: lag=73.5s — node is behind the tip!
[15:24:54] ALERT: no new block for 30s (last block #24656295) — node may be losing the tip!
[15:25:17] block #24,656,296 ts=2026-03-14 15:23:23 lag=+114.2s ALERT: lag=114.2s — node is behind the tip!
[15:25:17] block #24,656,297 ts=2026-03-14 15:23:35 lag=+102.2s ALERT: lag=102.2s — node is behind the tip!
[15:25:17] block #24,656,298 ts=2026-03-14 15:23:47 lag=+90.2s ALERT: lag=90.2s — node is behind the tip!
[15:25:17] block #24,656,299 ts=2026-03-14 15:23:59 lag=+78.2s ALERT: lag=78.2s — node is behind the tip!
[15:25:17] block #24,656,300 ts=2026-03-14 15:24:11 lag=+66.2s ALERT: lag=66.2s — node is behind the tip!
[15:25:17] block #24,656,301 ts=2026-03-14 15:24:23 lag=+54.2s ALERT: lag=54.2s — node is behind the tip!
[15:25:17] block #24,656,302 ts=2026-03-14 15:24:35 lag=+42.2s ALERT: lag=42.2s — node is behind the tip!
[15:25:17] block #24,656,303 ts=2026-03-14 15:24:47 lag=+30.2s ALERT: lag=30.2s — node is behind the tip!

./run_perf_tests.py -p pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern: pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port: http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m39s]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m46s]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m38s]

./run_perf_tests.py -p pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern: pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port: http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m39s]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m45s]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=100.00% max=1m40s]

NEW Software (with PR)

CPU
7:58:51 AM all 51.09 0.00 6.16 0.35 0.00 42.40
07:58:56 AM all 49.26 0.00 5.82 0.03 0.00 44.89
07:59:01 AM all 50.34 0.00 5.95 0.20 0.00 43.51
07:59:06 AM all 51.60 0.00 5.88 0.04 0.00 42.47
07:59:11 AM all 48.97 0.00 5.90 0.06 0.00 45.07
07:59:16 AM all 49.59 0.00 6.11 0.36 0.00 43.93
07:59:21 AM all 48.69 0.00 5.78 0.03 0.00 45.51
07:59:26 AM all 53.50 0.00 6.66 0.26 0.00 39.59
07:59:31 AM all 50.45 0.00 6.37 0.02 0.00 43.16
07:59:36 AM all 48.71 0.00 6.18 0.03 0.00 45.08
07:59:41 AM all 53.58 0.00 6.45 0.15 0.00 39.81
07:59:46 AM all 53.74 0.00 6.13 0.05 0.00 40.07
07:59:51 AM all 31.76 0.00 3.95 0.23 0.00 64.06
07:59:56 AM all 37.20 0.00 5.05 0.03 0.00 57.71
08:00:01 AM all 77.10 0.00 12.95 0.01 0.00 9.94
08:00:06 AM all 78.22 0.00 12.58 0.08 0.00 9.11
08:00:11 AM all 77.64 0.00 12.50 0.00 0.00 9.86
08:00:16 AM all 77.48 0.00 12.61 0.08 0.00 9.83
08:00:21 AM all 77.61 0.00 12.47 0.01 0.00 9.90
08:00:26 AM all 77.35 0.00 12.89 0.06 0.00 9.70
08:00:31 AM all 77.85 0.00 12.92 0.04 0.00 9.19
08:00:36 AM all 77.73 0.00 12.80 0.02 0.00 9.44
08:00:41 AM all 78.42 0.00 12.95 0.05 0.00 8.59
08:00:46 AM all 78.52 0.00 12.55 0.01 0.00 8.93
08:00:51 AM all 78.42 0.00 12.77 0.19 0.00 8.62
08:00:56 AM all 56.98 0.00 8.64 0.11 0.00 34.28

Memory
2026-03-20 08:00:36 pid=1117840 rss=30.04GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.93GB SwapAvail=71.02GB
2026-03-20 08:00:41 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.86GB SwapAvail=71.02GB
2026-03-20 08:00:41 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.86GB SwapAvail=71.02GB
2026-03-20 08:00:46 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.90GB SwapAvail=71.02GB
2026-03-20 08:00:46 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.90GB SwapAvail=71.02GB
2026-03-20 08:00:51 pid=1117840 rss=30.28GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.88GB SwapAvail=71.02GB
2026-03-20 08:00:51 pid=1117840 rss=30.28GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.88GB SwapAvail=71.02GB
2026-03-20 08:00:56 pid=1117840 rss=30.54GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=40.39GB SwapAvail=71.02GB
2026-03-20 08:00:56 pid=1117840 rss=30.54GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=40.39GB SwapAvail=71.02GB
2026-03-20 08:01:02 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=40.25GB SwapAvail=71.02GB
2026-03-20 08:01:02 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=40.25GB SwapAvail=71.02GB
2026-03-20 08:01:07 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.97GB SwapAvail=71.02GB
2026-03-20 08:01:07 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.97GB SwapAvail=71.02GB
2026-03-20 08:01:12 pid=1117840 rss=30.62GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.48GB SwapAvail=71.02GB
2026-03-20 08:01:12 pid=1117840 rss=30.62GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.48GB SwapAvail=71.02GB
2026-03-20 08:01:17 pid=1117840 rss=30.71GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.57GB SwapAvail=71.02GB
2026-03-20 08:01:17 pid=1117840 rss=30.71GB vsz=7.49TB proc_swap=0.00GB sys_swap=0.98/72.00GB MemAvail=39.57GB SwapAvail=71.02GB

TIP Trucking
07:56:10] block #24,697,055 ts=2026-03-20 07:55:59 lag=+12.0s OK
[07:56:15] block #24,697,056 ts=2026-03-20 07:56:11 lag=+4.5s OK
[07:56:25] block #24,697,057 ts=2026-03-20 07:56:23 lag=+2.5s OK
[07:56:38] block #24,697,058 ts=2026-03-20 07:56:35 lag=+3.4s OK
[07:56:50] block #24,697,059 ts=2026-03-20 07:56:47 lag=+3.5s OK
[07:57:02] block #24,697,060 ts=2026-03-20 07:56:59 lag=+3.6s OK
[07:57:16] block #24,697,061 ts=2026-03-20 07:57:11 lag=+5.6s OK
[07:57:27] block #24,697,062 ts=2026-03-20 07:57:23 lag=+4.7s OK
[07:57:39] block #24,697,063 ts=2026-03-20 07:57:35 lag=+4.3s OK
[07:57:49] block #24,697,064 ts=2026-03-20 07:57:47 lag=+2.4s OK
[07:58:01] block #24,697,065 ts=2026-03-20 07:57:59 lag=+2.9s OK
[07:58:13] block #24,697,066 ts=2026-03-20 07:58:11 lag=+2.8s OK
[07:58:25] block #24,697,067 ts=2026-03-20 07:58:23 lag=+2.4s OK
[07:58:37] block #24,697,068 ts=2026-03-20 07:58:35 lag=+2.7s OK
[07:58:49] block #24,697,069 ts=2026-03-20 07:58:47 lag=+2.3s OK
[07:59:01] block #24,697,070 ts=2026-03-20 07:58:59 lag=+2.1s OK
[07:59:15] block #24,697,071 ts=2026-03-20 07:59:11 lag=+4.3s OK
[07:59:25] block #24,697,072 ts=2026-03-20 07:59:23 lag=+2.6s OK
[07:59:40] block #24,697,073 ts=2026-03-20 07:59:35 lag=+5.3s OK
[08:00:02] block #24,697,074 ts=2026-03-20 07:59:59 lag=+3.9s OK
[08:00:13] block #24,697,075 ts=2026-03-20 08:00:11 lag=+2.8s OK

./run_perf_tests.py -p pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern: pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port: http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=51.39% max=605.449ms error=503 Service Unavailable]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=51.55% max=442.974ms error=503 Service Unavailable]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=49.52% max=440.405ms error=503 Service Unavailable]
[1. 4] daemon: executes test qps: 28000 time: 60 -> [R=51.01% max=440.004ms error=503 Service Unavailable]
[1. 5] daemon: executes test qps: 28000 time: 60 -> [R=49.66% max=597.333ms error=503 Service Unavailable]

./run_perf_tests.py -p pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern: pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port: http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=51.51% max=581.793ms error=503 Service Unavailable]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=51.61% max=431.222ms error=503 Service Unavailable]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=49.48% max=495.57ms error=503 Service Unavailable]
[1. 4] daemon: executes test qps: 28000 time: 60 -> [R=50.91% max=433.208ms error=503 Service Unavailable]
[1. 5] daemon: executes test qps: 28000 time: 60 -> [R=49.57% max=538.283ms error=503 Service Unavailable]

Verified on CI TIPtrucking infrastructure. Previous software versions experienced "TIP lost" at 3,000 QPS. With these changes, the system now successfully handles up to 6,000 QPS without any TIP loss or degradation.

Stress Test Observations (main release)

  • Chain Tip Loss: Under heavy load, the node fails to stay synced and the Chain Tip is lost, as the staged sync pipeline is starved of DB read slots by queued RPC goroutines.
  • Virtual Memory Pressure: The system experiences severe VM pressure, with process swap usage reaching 11.81 GB. The massive accumulation of goroutines blocked on roTxsLimiter.Acquire causes excessive paging and swapping. This state is highly unstable and frequently leads to the process being terminated by the OOM Killer, causing total node downtime.
  • Request Satisfaction (100%): Despite the performance degradation, all requests are eventually satisfied. However, this is achieved at the cost of system stability and synchronization.
  • Increased Latency: Request latency increases dramatically due to deep queuing, with response times reaching up to 1m 40s.

Stress Test Observations (with PR)

  • Chain Tip Stability: The two-level admission control prevents goroutine accumulation entirely. The HTTP outer gate rejects excess requests before any processing; the BeginRo inner gate ensures that any RPC request that does enter the system uses TryAcquire (fail-fast) rather than blocking. Internal callers (staged sync, background workers) always use blocking Acquire
    and are never rejected, so the pipeline makes continuous progress.
  • Virtual Memory Pressure: Significantly lower memory footprint. By eliminating request queuing at the HTTP layer, the system avoids excessive paging and swapping (0.00 GB swap), keeping the OS stable.
  • Request Satisfaction (~50%): Approximately 50% of requests are satisfied; the remainder are immediately rejected with 503 Service Unavailable. This is the intended fail-fast behavior — goroutines never accumulate, DB slots are never exhausted.
  • Latency Consistency: Response latency remains consistently low. By refusing to queue requests beyond the system's capacity, the node avoids the massive latency spikes (previously up to 1m 40s) seen before the fix.

This behavior is aligned with Nethermind, which returns 503 Service Unavailable under high load, prioritizing node health over request queuing.


Final Observation

By adopting a fail-fast strategy at two levels — HTTP admission before any expensive processing, and TryAcquire inside BeginRo for RPC callers — we enforce resource isolation at the core level. Internal execution paths retain guaranteed access to DB read slots via blocking Acquire, while external RPC pressure is shed immediately. This approach shifts congestion management
responsibility to the external infrastructure (load balancers, proxies), which is better equipped to handle buffering, ensuring that the Erigon node remains stable and synchronized regardless of external RPC load.

🚀 RPC Concurrency & Resource Management Comparison

Feature Erigon (main) Erigon (with PR)
Admission control ❌ None HTTP outer gate (rpcAdmissionHandler)
Overload response Unlimited queuing Immediate HTTP 503
Rejection point ❌ None ✅ Pre-CORS, Gzip, JSON decode
Goroutine accumulation ⚠️ Yes, unlimited Eliminated — goroutines don't enter the system
Internal pipeline protection ❌ RPC and staged sync compete for slots Internal callers use blocking Acquire
DB slots protection ❌ None — RPC exhausts slots TryAcquire in BeginRo for RPC
Memory under load ❌ Critical — swap up to 11.81 GB, OOM Stable (0.00 GB swap in test)
Latency under overload High (~1m 40s) Constantly low (fail-fast)
Configuration required ❌ No concurrency flags Zero config; --rpc.max.concurrency optional
Execution isolation ❌ Chain tip lost under load Guaranteed by design

📊 Performance Comparison: Main (18/03) vs. PR

This benchmark compares the current main branch against this PR using the same set of APIs under heavy load.

API main (18/03) post_exec p50 PR post_exec p50 Improvement
eth_call @ 3000 QPS 6.82s ✅ 5.89s ✅ −14%
eth_getBlockByNumber @ 3000 QPS 13.73s ⚠️ 5.23s ✅ −62%
eth_getProof @ 1000–3000 QPS 49.12s (tip lost) 2.84s ✅ −94%

🔍 Key Observations

  • eth_call: Neither main nor the PR caused a chain tip loss. Since eth_call is read-only and light on DB slots, it is inherently more stable, but the PR still delivers a 14% reduction in p50 latency.
  • eth_getBlockByNumber: Remains stable up to 6000 QPS with no actual tip loss. Any observed sync=0 periods during testing were identified as monitoring false negatives rather than actual node desync.
  • eth_getProof: This is the most impactful result. While main lost the chain tip at only 1000 QPS (p50=49s), the PR successfully holds up to 3000 QPS with a p50 of 2.84s—a 94% performance gain.

🏆 Overall Conclusion

The final PR successfully eliminates chain tip loss across all tested APIs and QPS levels. No real tip loss was observed in any production-level test run, ensuring much higher node reliability under stress.

@lupin012 lupin012 changed the title protect execution TIP under RPC load WIP protect execution TIP under RPC load Mar 15, 2026
@lupin012 lupin012 requested a review from canepat March 15, 2026 20:03
@lupin012 lupin012 changed the title WIP protect execution TIP under RPC load [Wip] protect execution TIP under RPC load Mar 15, 2026
@lupin012 lupin012 changed the title [Wip] protect execution TIP under RPC load [wip] protect execution TIP under RPC load Mar 16, 2026
@yperbasis yperbasis added this to the 3.5.0 milestone Mar 17, 2026
@lupin012 lupin012 force-pushed the lupin012/protect-tip-under-load branch from 3a062c5 to 115af7b Compare March 17, 2026 16:31
lupin012 and others added 4 commits March 17, 2026 18:33
Add a two-layer mechanism to guarantee staged sync keeps DB access even
under heavy RPC load:

1. executionLimiter — dedicated semaphore (GOMAXPROCS×2) for staged sync.
   BeginRo with TxPriorityExecution uses blocking Acquire on this limiter,
   never shared with RPC callers.

2. rpcAdmissionHandler — outermost HTTP middleware that tags every request
   context with TxPriorityRPC. BeginRo for RPC uses TryAcquire (fail-fast)
   on the shared roTxsLimiter; when saturated it returns ErrServerOverloaded
   (HTTP 503 + Retry-After: 1) instead of queuing.

Optional flag --rpc.max.concurrent (default 0 = unlimited) enables an
additional HTTP-layer gate that rejects excess requests before CORS/Gzip/
JSON decode, further reducing memory pressure under extreme load.

roTxsLimiter default changed from hardcoded 32 to max(10, GOMAXPROCS×4)
so it scales automatically with hardware (128 slots on 32-core machines).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The client-side roTxsLimiter in remotedb controls concurrent gRPC streams,
not MDBX DB slots. Applying TryAcquire here caused integration tests to
return "server overloaded" whenever more than GOMAXPROCS-1 concurrent RPC
requests were in flight.

The fail-fast TryAcquire is only appropriate at the MDBX layer on the
server side, which already happens via TxPriorityRPC in remotedbserver.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
NewHTTPHandlerStack gains a tagAsRPC bool parameter. The JSON-RPC port
passes true (TxPriorityRPC + fail-fast TryAcquire, as before). The engine
API (auth) port passes false so CL↔EL protocol calls (ForkchoiceUpdated,
NewPayload, GetForkChoice …) reach ExecModule.BeginRo with the default
priority and use blocking Acquire instead of fail-fast TryAcquire.

This fixes spurious "server overloaded" errors in execution tests that use
ExecutionClientDirect, where context values propagate in-process (unlike
real gRPC where context values do not cross the wire).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lupin012 lupin012 force-pushed the lupin012/protect-tip-under-load branch from 74758c7 to 0a03c25 Compare March 17, 2026 17:34
lupin012 and others added 5 commits March 17, 2026 18:40
RPC callers connecting via gRPC (remotedb) now get fail-fast behaviour:
when the client-side roTxsLimiter is saturated, BeginRo returns
ErrServerOverloaded (HTTP 503 + Retry-After: 1) instead of blocking.

Non-RPC callers (execution, default) still use blocking Acquire.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Start the 10-second RPC stats background goroutine only for the
chaindata DB instance, not for txpool and other MDBX instances.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eject

- Remove 6 transient debug counters (DebugRPCCancelledReqs, DebugHTTP503,
  DebugRPCServerEntry, DebugHTTPAfterNext, DebugGzipEntry, DebugVHostEntry)
  and the statusCapture wrapper that existed only to capture 503s
- Keep DebugHTTPTotal and add DebugHTTPRejected (with % in periodic log)
- Restore HTTP 503 response on admission reject (was temporarily HTTP 200)
- Remove Retry-After header from admission response

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lupin012 lupin012 force-pushed the lupin012/protect-tip-under-load branch 2 times, most recently from f621c53 to e6f4205 Compare March 18, 2026 20:10
…rectly

The HTTP admission limit now always equals DBReadConcurrency, removing
the need for a separate --rpc.max.concurrent flag. This simplifies
configuration and ensures the HTTP gate always matches the DB semaphore
size, preventing goroutine pile-up under load.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lupin012 lupin012 force-pushed the lupin012/protect-tip-under-load branch from e6f4205 to 83c6b2c Compare March 18, 2026 20:13
lupin012 and others added 8 commits March 18, 2026 21:57
- Remove TxPriority (Default/RPC/Execution) and all WithTxPriority/TxPriorityFrom usage
- Remove executionLimiter; single roTxsLimiter with blocking Acquire for all callers
- Remove dead counters: rpcInflight, rpcPeak, rpcTotal, rpc503, execInflight, execPeak, execTotal
- Remove debugSysInfo (vmRSS/cpuIdle) from periodic log; keep only http_total/http_rejected/%
- Remove engineAdmissionHandler and statusCapture from rpcstack
- HTTP admission handler returns 503 on overload (reverted from 200)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove the periodic debug logging goroutine (stopDebugCh, http_total/http_rejected stats),
DebugHTTPTotal/DebugHTTPRejected counters, and now-unused kv import in rpcstack.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…trol

Adds --rpc.max.concurrentRequests flag to control the HTTP admission handler limit:
  0 (default) = use db.read.concurrency
  >0 = explicit limit
  -1 = unlimited (admission control disabled)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
lupin012 and others added 2 commits March 19, 2026 23:48
Add two-level protection to prevent RPC load from starving internal
callers (staged sync, background workers) of DB read slots:

1. HTTP admission handler (rpcAdmissionHandler) acts as outer gate:
   rejects with HTTP 503 if inflight requests exceed --rpc.max.concurrency.
   Propagates the concurrency limit into the request context via
   WithRPCContext so BeginRo knows the caller is an RPC handler.

2. BeginRo inner gate: if the context carries a positive limit (RPC caller),
   uses TryAcquire on roTxsLimiter and returns ErrServerOverloaded
   immediately if the semaphore is full. Internal callers always use
   blocking Acquire and are never rejected.

When --rpc.max.concurrency=-1 (unlimited), limit=0 is propagated and
RPC falls back to blocking Acquire with no restriction.

Rename flag to --rpc.max.concurrency (was --rpc.max.concurrent-requests).
Default: 0 = use db.read.concurrency value.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lupin012 lupin012 marked this pull request as ready for review March 20, 2026 11:31
@lupin012 lupin012 changed the title [wip] protect execution TIP under RPC load protect execution TIP under RPC load Mar 20, 2026
@yperbasis yperbasis requested a review from mh0lt March 24, 2026 06:40
Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Claude:

High

  1. remotedbserver.go hardcodes WithRPCContext(ctx, 1) — wrong semantics

db/kv/remotedbserver/remotedbserver.go:138,160: Both begin() and renew() pass a hardcoded limit of 1. Since BeginRo treats any limit > 0 as "use fail-fast TryAcquire", this forces ALL remote DB server
transactions into fail-fast mode — including calls from a standalone rpcdaemon connected via gRPC. The remote DB server is a gRPC service, not HTTP, so it was never behind the HTTP admission handler. Making it
fail-fast means transient DB semaphore fullness causes immediate errors for remote rpcdaemon users, rather than waiting briefly.

If the intent is to tag these as RPC, the limit should come from configuration, not be hardcoded. Or better, since these calls aren't gated by an HTTP admission handler, they should arguably remain blocking (no
WithRPCContext at all) — the goroutine pile-up problem doesn't apply here because gRPC has its own flow control.

  1. No metrics or logging for shed load

There are no counters for HTTP 503 rejections or ErrServerOverloaded returns. Operators need to know how often admission control is firing. Without metrics, it's impossible to tune --rpc.max.concurrency or
detect that the node is under sustained overload. At minimum, add Prometheus counters for both rejection points.

  1. No tests

No unit tests for rpcAdmissionHandler or the BeginRo TryAcquire path. The admission handler has subtle concurrency behavior (the atomic counter can briefly exceed the limit before decrementing) that should be
covered. The enableRPC path in rpcstack.go calls NewHTTPHandlerStack with the new signature, and the existing tests in rpcstack_test.go (lines 249, 265) will fail to compile since they use enableRPC →
NewHTTPHandlerStack which now requires 6 args instead of 4.

Medium

  1. WithRPCContext API stores a limit but BeginRo only uses it as a boolean

kv_interface.go: The function signature WithRPCContext(ctx, limit int64) suggests the limit is meaningful, but BeginRo only checks limit > 0 as a boolean flag. This is confusing. Either:

  • Simplify to WithRPCContext(ctx context.Context) context.Context (boolean tag), or
  • Actually use the limit in BeginRo for something (e.g., pass it to a separate per-RPC semaphore)
  1. roTxLimit default change in node.go is unrelated and undocumented

Line 322: Changes the fallback default from 32 to max(10, GOMAXPROCS*4). This is effectively dead code when DBReadConcurrency is set (always true via flag defaults), but if OpenDatabase is called without HTTP
config, the behavior changes. This should be called out in the PR description or split into its own change.

  1. CI workflow version bumps are unrelated

The rpc-tests tag bumps (v1.115.0/v1.78.0 → v1.124.0) have nothing to do with admission control. These should be in a separate PR.

  1. enableRPC path in rpcstack.go passes config.RpcConcurrencyLimit but nobody sets it

The diff adds config.RpcConcurrencyLimit to httpConfig and uses it in enableRPC, but no caller in the codebase populates this field. The embedded daemon path (node/cli/flags.go:setEmbeddedRpcDaemon) doesn't set
RpcConcurrencyLimit on the httpConfig struct — it sets it on HttpCfg, which flows through startRegularRpcServer (where the limit is computed separately). So the enableRPC path always gets RpcConcurrencyLimit =
0, meaning admission control is disabled there.

Low

  1. No Retry-After header on 503

Standard HTTP practice is to include Retry-After with a 503. This helps well-behaved clients implement proper backoff.

  1. Error format inconsistency

Clients may see two different error shapes for the same overload:

  • HTTP 503 with plain text body (from admission handler)
  • JSON-RPC internal error wrapping ErrServerOverloaded (from BeginRo, if the inner gate fires)

The JSON-RPC error is a standard -32000 internal error, which clients might not associate with overload. Consider returning a JSON-RPC response with error code -32005 (or similar) and the 503 status code for
both paths.

  1. WebSocket connections bypass admission control

WebSocket RPC calls go through srv.WebsocketHandler which doesn't wrap with rpcAdmissionHandler. Under heavy WS load, the same goroutine pile-up problem could occur. Worth documenting as a known limitation.

Minor nits

  • The PR description repeats its first sentence ("This PR introduces...")
  • rpcAdmissionHandler could use semaphore.NewWeighted instead of atomic.Int64 to avoid the brief over-counting on concurrent arrivals — though the current approach is fine in practice

- db/kv: rename WithRPCContext/IsRPCContext to WithNonBlockingAcquire/IsNonBlockingAcquire
  — pure boolean, removes spurious limit parameter; DB layer no longer
  references RPC concepts
- db/kv/mdbx: add db_rotx_overloaded_total Prometheus counter on TryAcquire failure
- db/kv/remotedbserver: remove WithNonBlockingAcquire from begin/renew — gRPC
  transactions use blocking Acquire; they are not behind the HTTP admission handler
- node: restore roTxLimit default to 32 (as on main); remove unrelated runtime import
- node/rpcstack: add rpc_admission_rejected_total counter, Retry-After: 1 header,
  WebSocket limitation comment, and TODO for unified 503+JSON-RPC error format
- node/rpcstack_test: add TestRPCAdmissionHandler covering limit=0, under-limit,
  and over-limit (503) cases

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lupin012
Copy link
Copy Markdown
Contributor Author

lupin012 commented Mar 24, 2026

@yperbasis

If the review fixes are OK, I will perform the following regression tests to ensure no regressions have been introduced:

  • Load Stress Testing: Verify that the system correctly returns HTTP 503 under overload conditions.
  • Concurrency Limit Validation: Run tests with --rpc.max.concurrency=1 to verify it maintains the expected legacy behavior.
  • TIP-trucking Load Test: Perform a full validation using the TIP_trucking_with_load test suite to ensure stability.

📋 Final Status of Review Comments

🔴 High Priority

# Issue Status PR Comment
1 remotedbserver.go hardcodes WithRPCContext(ctx, 1) ✅ Fixed Removed WithRPCContext from begin() and renew() — gRPC transactions now use blocking Acquire, consistent with the fact that they are not behind the HTTP admission handler.
2 No metrics for shed load ✅ Fixed Added Prometheus counters rpc_admission_rejected_total (HTTP admission handler) and db_rotx_overloaded_total (inner BeginRo gate).
3 No tests for rpcAdmissionHandler ✅ Fixed Added TestRPCAdmissionHandler with 3 sub-tests: limit=0 disabled, requests under the limit pass, requests over the limit → 503.

🟡 Medium Priority

# Issue Status PR Comment
4 WithRPCContext API with spurious limit ✅ Fixed Renamed to WithNonBlockingAcquire(ctx) — pure boolean, spurious limit parameter removed. The DB layer no longer has any knowledge of RPC.
5 roTxLimit default changed without documentation ✅ Fixed Restored to 32 as on main — the change was unrelated to this PR.
6 CI rpc-tests version bumps unrelated to PR ✅ False positive The bump was necessary: the new version correctly handles 503 responses che l'admission control può ora restituire.
7 RpcConcurrencyLimit never populated in embedded daemon path ✅ False positive The field is populated at flags.go:450 via ctx.Int(utils.RpcMaxConcurrentRequestsFlag.Name).

🟢 Low Priority

# Issue Status PR Comment
8 No Retry-After header on 503 ✅ Fixed Added Retry-After: 1 to the 503 response from the admission handler.
9 Two different error formats for overload 📝 Separate PR Path 1 (admission) returns HTTP 503 + plain text. Path 2 (BeginRo) returns HTTP 200 + JSON-RPC -32000. Unifying both requires buffering in rpc/http.go — deferred to a separate PR.
10 WebSocket bypasses admission control 📝 Separate PR WebSocket connections are long-lived: the relevant limit is concurrent connections, non le inflight requests. Richiede un connection limiter dedicato — rimandato a una PR separata.

Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enableRPC path has dead admission control (still open from review point #7)

The httpConfig.RpcConcurrencyLimit field is added but never populated by callers of enableRPC. The embedded daemon path (node/cli/flags.go:setEmbeddedRpcDaemon) populates HttpCfg.RpcMaxConcurrentRequests which
flows through startRegularRpcServer (where the limit is computed separately and passed to NewHTTPHandlerStack directly). But the enableRPC → httpConfig.RpcConcurrencyLimit path always gets the zero value,
meaning admission control is always disabled for the embedded HTTP server's enableRPC codepath.

lupin012 and others added 2 commits March 24, 2026 11:53
Ensures admission control wiring (enableRPC → NewHTTPHandlerStack →
rpcAdmissionHandler) is exercised by all existing tests. Previously
RpcConcurrencyLimit was always zero, silently disabling admission control
in the test HTTP server.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issues

Medium

  1. node/rpcstack.go now imports db/kv: This creates a dependency from the HTTP server layer down to the DB layer. The coupling is minimal (just kv.WithNonBlockingAcquire), but the context key + helper could
    live in a thin, dependency-free package (e.g., common/ctxflags or similar) to keep the layering clean. Not a blocker, but worth considering to avoid the import growing over time.
  2. No log on 503 rejection: The admission handler only increments a Prometheus counter. Operators diagnosing client-reported 503s won't see anything in logs unless they check metrics. A debug.Trace-level log
    (or even just logging on the first rejection after a quiet period) would help.
  3. Test spin-wait without runtime.Gosched() (rpcstack_test.go:432):
    for admission.inflight.Load() < limit {
    // spin
    }
  4. On a single-P runtime (e.g., GOMAXPROCS=1 in CI), this could livelock because the goroutines blocked in the handler never get scheduled. Adding runtime.Gosched() inside the loop would be safer.

Low

  1. Static Retry-After: 1: Under sustained overload, all rejected clients retry at the same second, causing a thundering herd. A jittered value (e.g., 1 + rand(0,2)) or documenting that clients should add their
    own jitter would help, but fine for v1.
  2. tagAsRPC parameter naming: The boolean controls whether admission control is applied and whether the context is tagged for non-blocking DB acquire. withAdmissionControl would be a more descriptive name —
    tagAsRPC sounds like it's labeling the request type.
  3. CI rpc-tests version bump from v1.78.0 → v1.124.0 in qa-tip-tracking-with-load.yml is a large jump. The bump is necessary (503 handling), but worth a quick sanity check that no unrelated test behavior
    changed between those versions.

Questions for the author

  • For the default case where admission limit = db.read.concurrency, the HTTP RPC gate can admit enough requests to fill all DB read slots. If Engine API or staged sync internal reads need a slot concurrently,
    they'll have to wait for an HTTP RPC handler to finish. Has this been observed in practice? Would reserving e.g. 10% of slots for non-HTTP callers be worth considering, or is the current design sufficient
    because staged sync primarily uses write transactions?

@yperbasis
Copy link
Copy Markdown
Member

yperbasis commented Mar 26, 2026

@lupin012 should we merge this PR?

I am still running the CI Tip Tracking & Load tests. Once they finish successfully (as expected), I will proceed with the merge.

@lupin012 lupin012 added this pull request to the merge queue Mar 27, 2026
Merged via the queue into main with commit b6d67fa Mar 27, 2026
39 checks passed
@lupin012 lupin012 deleted the lupin012/protect-tip-under-load branch March 27, 2026 07:04
lupin012 added a commit that referenced this pull request Apr 2, 2026
This PR introduces an HTTP admission control layer to protect the Staged
Sync pipeline from being starved or delayed by high RPC load.
This PR introduces a two-level admission control system to protect the
Staged Sync pipeline from being starved or delayed by high RPC load.
Root Cause Analysis:
Under heavy RPC traffic, the node accumulates a large number of
goroutines blocked on roTxsLimiter.Acquire. When DB slots become
available, the backlog drains in a way that starves the staged sync
pipeline. The goroutine pile-up also causes a significant spike in
virtual memory and overall system instability.
Solution:
Two gates work in tandem:
1. HTTP admission handler (rpcAdmissionHandler) — outer gate installed
at the top of every HTTP RPC stack, before CORS, Gzip, or JSON decoding.
If the number of inflight requests exceeds the configured limit, the
request is rejected immediately with HTTP 503. This prevents goroutine
accumulation at the source. On every admitted request the handler tags
the context with
WithRPCContext (limit value) so the DB layer can identify the caller.
2. BeginRo inner gate — if the context carries a positive RPC limit,
BeginRo uses TryAcquire on roTxsLimiter and returns ErrServerOverloaded
immediately if the semaphore is full. Internal callers (staged sync,
background workers) always use blocking Acquire and are never rejected.
This two-level approach means most overload is shed at the HTTP layer
(goroutines never enter the system), while any RPC requests that slip
through under transient concurrency spikes are still fail-fast at the DB
layer rather than piling up behind the semaphore.
Configuration:
                  
- --rpc.max.concurrency: HTTP admission limit.
- 0 (default): uses --db.read.concurrency (auto-tuned to GOMAXPROCS ×
64, capped at 9000)
- > 0: explicit limit
- -1: unlimited (admission control disabled, BeginRo falls back to
blocking Acquire) (as old behaviour)
  | Resource | Result |
| :--- | :--- |
### Summary of Resource Management Improvements

| Resource | Result |
| :--- | :--- |
| **Goroutine pile-up** | ✅ Requests rejected at HTTP layer before CORS,
Gzip, or JSON decoding |
| **Staged sync starvation** | ✅ Internal callers (staged sync, workers)
use blocking `Acquire` and are never rejected; RPC uses `TryAcquire`
fail-fast |
| **Transient overload spikes** | ✅ `BeginRo` inner gate catches RPC
requests that pass the HTTP layer during concurrency spikes |
| **Scalability** | ✅ Default limit auto-tuned to `GOMAXPROCS × 64`
(capped at 9000) via `--db.read.concurrency` |
| **Configuration** | ✅ Zero required config, one optional flag
(`--rpc.max.concurrency`) |

Benchmark & Stress Test Results
Setup: 32 Cores, 64GB RAM, 70GB Swap. Minimal Node in Sync. Parallel
eth_call stress tests (28k QPS).

<details>
<summary><b>Click to expand: Benchmark Data (Before vs After on local
node)</b></summary>

### Current SW (main release)
CPU
03:23:56 PM all 29.55 0.00 22.30 34.33 0.00 13.83
03:24:06 PM all 56.41 0.00 15.44 10.83 0.00 17.32
03:24:16 PM all 75.60 0.00 13.36 2.86 0.00 8.18
03:24:26 PM all 73.19 0.00 14.35 2.82 0.00 9.63
03:24:36 PM all 73.35 0.00 14.56 2.75 0.00 9.34

Memory
15:23:30 rss=31.89GB vsz=7.65TB proc_swap=11.81GB sys_swap=27.21/72.00GB
MemAvail=1.15GB SwapAvail=44.79GB
15:23:40 rss=32.74GB vsz=7.65TB proc_swap=11.00GB sys_swap=27.02/72.00GB
MemAvail=1.50GB SwapAvail=44.98GB
15:23:50 rss=33.83GB vsz=7.65TB proc_swap=9.89GB sys_swap=25.65/72.00GB
MemAvail=1.44GB SwapAvail=46.35GB
15:24:00 rss=36.33GB vsz=7.65TB proc_swap=7.60GB sys_swap=23.55/72.00GB
MemAvail=1.67GB SwapAvail=48.45GB
15:24:10 rss=37.85GB vsz=7.65TB proc_swap=6.91GB sys_swap=21.83/72.00GB
MemAvail=5.10GB SwapAvail=50.17GB
15:24:20 rss=39.30GB vsz=7.65TB proc_swap=6.69GB sys_swap=20.23/72.00GB
MemAvail=7.28GB SwapAvail=51.77GB
15:24:30 rss=40.40GB vsz=7.65TB proc_swap=6.20GB sys_swap=17.94/72.00GB
MemAvail=10.20GB SwapAvail=54.06GB
15:24:40 rss=41.44GB vsz=7.65TB proc_swap=5.23GB sys_swap=14.95/72.00GB
MemAvail=20.01GB SwapAvail=57.05GB
15:24:50 rss=41.68GB vsz=7.65TB proc_swap=5.20GB sys_swap=14.92/72.00GB
MemAvail=16.14GB SwapAvail=57.08GB
15:25:00 rss=42.77GB vsz=7.65TB proc_swap=4.95GB sys_swap=14.87/72.00GB
MemAvail=11.41GB SwapAvail=57.13GB
15:25:11 rss=42.78GB vsz=7.65TB proc_swap=5.26GB sys_swap=15.55/72.00GB
MemAvail=8.58GB SwapAvail=56.45GB
15:25:21 rss=40.79GB vsz=7.65TB proc_swap=6.88GB sys_swap=17.46/72.00GB
MemAvail=5.65GB SwapAvail=54.54GB

TIP Trucking
[15:21:44] block #24,656,279 ts=2026-03-14 15:19:47 lag=+117.8s ALERT:
lag=117.8s — node is behind the tip!
[15:21:44] block #24,656,280 ts=2026-03-14 15:19:59 lag=+105.8s ALERT:
lag=105.8s — node is behind the tip!
[15:21:44] block #24,656,281 ts=2026-03-14 15:20:11 lag=+93.8s ALERT:
lag=93.8s — node is behind the tip!
[15:21:44] block #24,656,282 ts=2026-03-14 15:20:23 lag=+81.8s ALERT:
lag=81.8s — node is behind the tip!
[15:21:44] block #24,656,283 ts=2026-03-14 15:20:47 lag=+57.8s ALERT:
lag=57.8s — node is behind the tip!
[15:21:57] block #24,656,284 ts=2026-03-14 15:20:59 lag=+58.0s ALERT:
lag=58.0s — node is behind the tip!
[15:21:57] block #24,656,285 ts=2026-03-14 15:21:11 lag=+46.0s ALERT:
lag=46.0s — node is behind the tip!
[15:21:57] block #24,656,286 ts=2026-03-14 15:21:23 lag=+34.0s ALERT:
lag=34.0s — node is behind the tip!
[15:21:57] block #24,656,287 ts=2026-03-14 15:21:35 lag=+22.0s ALERT:
lag=22.0s — node is behind the tip!
[15:21:57] block #24,656,288  ts=2026-03-14 15:21:47  lag=+10.0s  OK
[15:22:07] block #24,656,289  ts=2026-03-14 15:21:59  lag=+8.0s  OK
[15:22:19] block #24,656,290  ts=2026-03-14 15:22:11  lag=+8.3s  OK
[15:22:32] block #24,656,291  ts=2026-03-14 15:22:23  lag=+9.3s  OK
[15:23:02] ALERT: no new block for 30s (last block #24656291) — node may
be losing the tip!
[15:23:32] ALERT: no new block for 60s (last block #24656291) — node may
be losing the tip!
[15:24:02] ALERT: no new block for 90s (last block #24656291) — node may
be losing the tip!
[15:24:24] block #24,656,292 ts=2026-03-14 15:22:35 lag=+109.5s ALERT:
lag=109.5s — node is behind the tip!
[15:24:24] block #24,656,293 ts=2026-03-14 15:22:47 lag=+97.5s ALERT:
lag=97.5s — node is behind the tip!
[15:24:24] block #24,656,294 ts=2026-03-14 15:22:59 lag=+85.5s ALERT:
lag=85.5s — node is behind the tip!
[15:24:24] block #24,656,295 ts=2026-03-14 15:23:11 lag=+73.5s ALERT:
lag=73.5s — node is behind the tip!
[15:24:54] ALERT: no new block for 30s (last block #24656295) — node may
be losing the tip!
[15:25:17] block #24,656,296 ts=2026-03-14 15:23:23 lag=+114.2s ALERT:
lag=114.2s — node is behind the tip!
[15:25:17] block #24,656,297 ts=2026-03-14 15:23:35 lag=+102.2s ALERT:
lag=102.2s — node is behind the tip!
[15:25:17] block #24,656,298 ts=2026-03-14 15:23:47 lag=+90.2s ALERT:
lag=90.2s — node is behind the tip!
[15:25:17] block #24,656,299 ts=2026-03-14 15:23:59 lag=+78.2s ALERT:
lag=78.2s — node is behind the tip!
[15:25:17] block #24,656,300 ts=2026-03-14 15:24:11 lag=+66.2s ALERT:
lag=66.2s — node is behind the tip!
[15:25:17] block #24,656,301 ts=2026-03-14 15:24:23 lag=+54.2s ALERT:
lag=54.2s — node is behind the tip!
[15:25:17] block #24,656,302 ts=2026-03-14 15:24:35 lag=+42.2s ALERT:
lag=42.2s — node is behind the tip!
[15:25:17] block #24,656,303 ts=2026-03-14 15:24:47 lag=+30.2s ALERT:
lag=30.2s — node is behind the tip!

> ./run_perf_tests.py -p
pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y
eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern:
pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port:  http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=100.00%
max=1m39s]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=100.00%
max=1m46s]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=100.00%
max=1m38s]

> ./run_perf_tests.py -p
pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y
eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern:
pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port:  http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=100.00%
max=1m39s]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=100.00%
max=1m45s]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=100.00%
max=1m40s]


### NEW Software (with PR)
CPU
7:58:51 AM all 51.09 0.00 6.16 0.35 0.00 42.40
07:58:56 AM all 49.26 0.00 5.82 0.03 0.00 44.89
07:59:01 AM all 50.34 0.00 5.95 0.20 0.00 43.51
07:59:06 AM all 51.60 0.00 5.88 0.04 0.00 42.47
07:59:11 AM all 48.97 0.00 5.90 0.06 0.00 45.07
07:59:16 AM all 49.59 0.00 6.11 0.36 0.00 43.93
07:59:21 AM all 48.69 0.00 5.78 0.03 0.00 45.51
07:59:26 AM all 53.50 0.00 6.66 0.26 0.00 39.59
07:59:31 AM all 50.45 0.00 6.37 0.02 0.00 43.16
07:59:36 AM all 48.71 0.00 6.18 0.03 0.00 45.08
07:59:41 AM all 53.58 0.00 6.45 0.15 0.00 39.81
07:59:46 AM all 53.74 0.00 6.13 0.05 0.00 40.07
07:59:51 AM all 31.76 0.00 3.95 0.23 0.00 64.06
07:59:56 AM all 37.20 0.00 5.05 0.03 0.00 57.71
08:00:01 AM all 77.10 0.00 12.95 0.01 0.00 9.94
08:00:06 AM all 78.22 0.00 12.58 0.08 0.00 9.11
08:00:11 AM all 77.64 0.00 12.50 0.00 0.00 9.86
08:00:16 AM all 77.48 0.00 12.61 0.08 0.00 9.83
08:00:21 AM all 77.61 0.00 12.47 0.01 0.00 9.90
08:00:26 AM all 77.35 0.00 12.89 0.06 0.00 9.70
08:00:31 AM all 77.85 0.00 12.92 0.04 0.00 9.19
08:00:36 AM all 77.73 0.00 12.80 0.02 0.00 9.44
08:00:41 AM all 78.42 0.00 12.95 0.05 0.00 8.59
08:00:46 AM all 78.52 0.00 12.55 0.01 0.00 8.93
08:00:51 AM all 78.42 0.00 12.77 0.19 0.00 8.62
08:00:56 AM all 56.98 0.00 8.64 0.11 0.00 34.28

Memory
2026-03-20 08:00:36 pid=1117840 rss=30.04GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.93GB SwapAvail=71.02GB
2026-03-20 08:00:41 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.86GB SwapAvail=71.02GB
2026-03-20 08:00:41 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.86GB SwapAvail=71.02GB
2026-03-20 08:00:46 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.90GB SwapAvail=71.02GB
2026-03-20 08:00:46 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.90GB SwapAvail=71.02GB
2026-03-20 08:00:51 pid=1117840 rss=30.28GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.88GB SwapAvail=71.02GB
2026-03-20 08:00:51 pid=1117840 rss=30.28GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.88GB SwapAvail=71.02GB
2026-03-20 08:00:56 pid=1117840 rss=30.54GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=40.39GB SwapAvail=71.02GB
2026-03-20 08:00:56 pid=1117840 rss=30.54GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=40.39GB SwapAvail=71.02GB
2026-03-20 08:01:02 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=40.25GB SwapAvail=71.02GB
2026-03-20 08:01:02 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=40.25GB SwapAvail=71.02GB
2026-03-20 08:01:07 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.97GB SwapAvail=71.02GB
2026-03-20 08:01:07 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.97GB SwapAvail=71.02GB
2026-03-20 08:01:12 pid=1117840 rss=30.62GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.48GB SwapAvail=71.02GB
2026-03-20 08:01:12 pid=1117840 rss=30.62GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.48GB SwapAvail=71.02GB
2026-03-20 08:01:17 pid=1117840 rss=30.71GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.57GB SwapAvail=71.02GB
2026-03-20 08:01:17 pid=1117840 rss=30.71GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.57GB SwapAvail=71.02GB

TIP Trucking
07:56:10] block #24,697,055  ts=2026-03-20 07:55:59  lag=+12.0s  OK
[07:56:15] block #24,697,056  ts=2026-03-20 07:56:11  lag=+4.5s  OK
[07:56:25] block #24,697,057  ts=2026-03-20 07:56:23  lag=+2.5s  OK
[07:56:38] block #24,697,058  ts=2026-03-20 07:56:35  lag=+3.4s  OK
[07:56:50] block #24,697,059  ts=2026-03-20 07:56:47  lag=+3.5s  OK
[07:57:02] block #24,697,060  ts=2026-03-20 07:56:59  lag=+3.6s  OK
[07:57:16] block #24,697,061  ts=2026-03-20 07:57:11  lag=+5.6s  OK
[07:57:27] block #24,697,062  ts=2026-03-20 07:57:23  lag=+4.7s  OK
[07:57:39] block #24,697,063  ts=2026-03-20 07:57:35  lag=+4.3s  OK
[07:57:49] block #24,697,064  ts=2026-03-20 07:57:47  lag=+2.4s  OK
[07:58:01] block #24,697,065  ts=2026-03-20 07:57:59  lag=+2.9s  OK
[07:58:13] block #24,697,066  ts=2026-03-20 07:58:11  lag=+2.8s  OK
[07:58:25] block #24,697,067  ts=2026-03-20 07:58:23  lag=+2.4s  OK
[07:58:37] block #24,697,068  ts=2026-03-20 07:58:35  lag=+2.7s  OK
[07:58:49] block #24,697,069  ts=2026-03-20 07:58:47  lag=+2.3s  OK
[07:59:01] block #24,697,070  ts=2026-03-20 07:58:59  lag=+2.1s  OK
[07:59:15] block #24,697,071  ts=2026-03-20 07:59:11  lag=+4.3s  OK
[07:59:25] block #24,697,072  ts=2026-03-20 07:59:23  lag=+2.6s  OK
[07:59:40] block #24,697,073  ts=2026-03-20 07:59:35  lag=+5.3s  OK
[08:00:02] block #24,697,074  ts=2026-03-20 07:59:59  lag=+3.9s  OK
[08:00:13] block #24,697,075  ts=2026-03-20 08:00:11  lag=+2.8s  OK

./run_perf_tests.py -p
pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y
eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern:
pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port:  http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=51.39%
max=605.449ms error=503 Service Unavailable]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=51.55%
max=442.974ms error=503 Service Unavailable]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=49.52%
max=440.405ms error=503 Service Unavailable]
[1. 4] daemon: executes test qps: 28000 time: 60 -> [R=51.01%
max=440.004ms error=503 Service Unavailable]
[1. 5] daemon: executes test qps: 28000 time: 60 -> [R=49.66%
max=597.333ms error=503 Service Unavailable]


./run_perf_tests.py -p
pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y
eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern:
pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port:  http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=51.51%
max=581.793ms error=503 Service Unavailable]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=51.61%
max=431.222ms error=503 Service Unavailable]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=49.48%
max=495.57ms error=503 Service Unavailable]
[1. 4] daemon: executes test qps: 28000 time: 60 -> [R=50.91%
max=433.208ms error=503 Service Unavailable]
[1. 5] daemon: executes test qps: 28000 time: 60 -> [R=49.57%
max=538.283ms error=503 Service Unavailable]

Verified on CI TIPtrucking infrastructure. Previous software versions
experienced "TIP lost" at 3,000 QPS. With these changes, the system now
successfully handles up to 6,000 QPS without any TIP loss or
degradation.

</details>

Stress Test Observations (main release)

- Chain Tip Loss: Under heavy load, the node fails to stay synced and
the Chain Tip is lost, as the staged sync pipeline is starved of DB read
slots by queued RPC goroutines.
- Virtual Memory Pressure: The system experiences severe VM pressure,
with process swap usage reaching 11.81 GB. The massive accumulation of
goroutines blocked on roTxsLimiter.Acquire causes excessive paging and
swapping. This state is highly unstable and frequently leads to the
process being terminated by the OOM Killer, causing total node downtime.
- Request Satisfaction (100%): Despite the performance degradation, all
requests are eventually satisfied. However, this is achieved at the cost
of system stability and synchronization.
- Increased Latency: Request latency increases dramatically due to deep
queuing, with response times reaching up to 1m 40s.
---
Stress Test Observations (with PR)
                                    
- Chain Tip Stability: The two-level admission control prevents
goroutine accumulation entirely. The HTTP outer gate rejects excess
requests before any processing; the BeginRo inner gate ensures that any
RPC request that does enter the system uses TryAcquire (fail-fast)
rather than blocking. Internal callers (staged sync, background workers)
always use blocking Acquire
and are never rejected, so the pipeline makes continuous progress.
- Virtual Memory Pressure: Significantly lower memory footprint. By
eliminating request queuing at the HTTP layer, the system avoids
excessive paging and swapping (0.00 GB swap), keeping the OS stable.
- Request Satisfaction (~50%): Approximately 50% of requests are
satisfied; the remainder are immediately rejected with 503 Service
Unavailable. This is the intended fail-fast behavior — goroutines never
accumulate, DB slots are never exhausted.
- Latency Consistency: Response latency remains consistently low. By
refusing to queue requests beyond the system's capacity, the node avoids
the massive latency spikes (previously up to 1m 40s) seen before the
fix.
This behavior is aligned with Nethermind, which returns 503 Service
Unavailable under high load, prioritizing node health over request
queuing.
---
  Final Observation
                   
By adopting a fail-fast strategy at two levels — HTTP admission before
any expensive processing, and TryAcquire inside BeginRo for RPC callers
— we enforce resource isolation at the core level. Internal execution
paths retain guaranteed access to DB read slots via blocking Acquire,
while external RPC pressure is shed immediately. This approach shifts
congestion management
responsibility to the external infrastructure (load balancers, proxies),
which is better equipped to handle buffering, ensuring that the Erigon
node remains stable and synchronized regardless of external RPC load.

## 🚀 RPC Concurrency & Resource Management Comparison

| Feature | Erigon (main) | **Erigon (with PR)** |
| :--- | :--- | :--- |
| **Admission control** | ❌ None | ✅ **HTTP outer gate**
(`rpcAdmissionHandler`) |
| **Overload response** | Unlimited queuing | ✅ **Immediate HTTP 503** |
| **Rejection point** | ❌ None | ✅ Pre-CORS, Gzip, JSON decode |
| **Goroutine accumulation** | ⚠️ Yes, unlimited | ✅ **Eliminated** —
goroutines don't enter the system |
| **Internal pipeline protection** | ❌ RPC and staged sync compete for
slots | ✅ **Internal callers** use blocking `Acquire` |
| **DB slots protection** | ❌ None — RPC exhausts slots | ✅ `TryAcquire`
in `BeginRo` for RPC |
| **Memory under load** | ❌ Critical — swap up to 11.81 GB, OOM | ✅
**Stable** (0.00 GB swap in test) |
| **Latency under overload** | High (~1m 40s) | ✅ **Constantly low**
(fail-fast) |
| **Configuration required** | ❌ No concurrency flags | ✅ **Zero
config**; `--rpc.max.concurrency` optional |
| **Execution isolation** | ❌ Chain tip lost under load | ✅ **Guaranteed
by design** |

### 📊 Performance Comparison: Main (18/03) vs. PR

This benchmark compares the current `main` branch against this PR using
the same set of APIs under heavy load.

| API | main (18/03) post_exec p50 | PR post_exec p50 | Improvement |
| :--- | :---: | :---: | :---: |
| **eth_call** @ 3000 QPS | 6.82s ✅ | 5.89s ✅ | **−14%** |
| **eth_getBlockByNumber** @ 3000 QPS | 13.73s ⚠️ | 5.23s ✅ | **−62%** |
| **eth_getProof** @ 1000–3000 QPS | 49.12s (tip lost) | 2.84s ✅ |
**−94%** |

---

### 🔍 Key Observations

* **eth_call**: Neither `main` nor the PR caused a chain tip loss. Since
`eth_call` is read-only and light on DB slots, it is inherently more
stable, but the PR still delivers a **14% reduction** in p50 latency.
* **eth_getBlockByNumber**: Remains stable up to **6000 QPS** with no
actual tip loss. Any observed `sync=0` periods during testing were
identified as monitoring false negatives rather than actual node desync.
* **eth_getProof**: This is the most impactful result. While `main` lost
the chain tip at only 1000 QPS (p50=49s), the **PR successfully holds up
to 3000 QPS** with a p50 of 2.84s—a **94% performance gain**.

### 🏆 Overall Conclusion
The final PR successfully **eliminates chain tip loss** across all
tested APIs and QPS levels. No real tip loss was observed in any
production-level test run, ensuring much higher node reliability under
stress.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
lupin012 added a commit that referenced this pull request Apr 2, 2026
This PR introduces an HTTP admission control layer to protect the Staged
Sync pipeline from being starved or delayed by high RPC load.
This PR introduces a two-level admission control system to protect the
Staged Sync pipeline from being starved or delayed by high RPC load.
Root Cause Analysis:
Under heavy RPC traffic, the node accumulates a large number of
goroutines blocked on roTxsLimiter.Acquire. When DB slots become
available, the backlog drains in a way that starves the staged sync
pipeline. The goroutine pile-up also causes a significant spike in
virtual memory and overall system instability.
Solution:
Two gates work in tandem:
1. HTTP admission handler (rpcAdmissionHandler) — outer gate installed
at the top of every HTTP RPC stack, before CORS, Gzip, or JSON decoding.
If the number of inflight requests exceeds the configured limit, the
request is rejected immediately with HTTP 503. This prevents goroutine
accumulation at the source. On every admitted request the handler tags
the context with
WithRPCContext (limit value) so the DB layer can identify the caller.
2. BeginRo inner gate — if the context carries a positive RPC limit,
BeginRo uses TryAcquire on roTxsLimiter and returns ErrServerOverloaded
immediately if the semaphore is full. Internal callers (staged sync,
background workers) always use blocking Acquire and are never rejected.
This two-level approach means most overload is shed at the HTTP layer
(goroutines never enter the system), while any RPC requests that slip
through under transient concurrency spikes are still fail-fast at the DB
layer rather than piling up behind the semaphore.
Configuration:
                  
- --rpc.max.concurrency: HTTP admission limit.
- 0 (default): uses --db.read.concurrency (auto-tuned to GOMAXPROCS ×
64, capped at 9000)
- > 0: explicit limit
- -1: unlimited (admission control disabled, BeginRo falls back to
blocking Acquire) (as old behaviour)
  | Resource | Result |
| :--- | :--- |
### Summary of Resource Management Improvements

| Resource | Result |
| :--- | :--- |
| **Goroutine pile-up** | ✅ Requests rejected at HTTP layer before CORS,
Gzip, or JSON decoding |
| **Staged sync starvation** | ✅ Internal callers (staged sync, workers)
use blocking `Acquire` and are never rejected; RPC uses `TryAcquire`
fail-fast |
| **Transient overload spikes** | ✅ `BeginRo` inner gate catches RPC
requests that pass the HTTP layer during concurrency spikes |
| **Scalability** | ✅ Default limit auto-tuned to `GOMAXPROCS × 64`
(capped at 9000) via `--db.read.concurrency` |
| **Configuration** | ✅ Zero required config, one optional flag
(`--rpc.max.concurrency`) |

Benchmark & Stress Test Results
Setup: 32 Cores, 64GB RAM, 70GB Swap. Minimal Node in Sync. Parallel
eth_call stress tests (28k QPS).

<details>
<summary><b>Click to expand: Benchmark Data (Before vs After on local
node)</b></summary>

### Current SW (main release)
CPU
03:23:56 PM all 29.55 0.00 22.30 34.33 0.00 13.83
03:24:06 PM all 56.41 0.00 15.44 10.83 0.00 17.32
03:24:16 PM all 75.60 0.00 13.36 2.86 0.00 8.18
03:24:26 PM all 73.19 0.00 14.35 2.82 0.00 9.63
03:24:36 PM all 73.35 0.00 14.56 2.75 0.00 9.34

Memory
15:23:30 rss=31.89GB vsz=7.65TB proc_swap=11.81GB sys_swap=27.21/72.00GB
MemAvail=1.15GB SwapAvail=44.79GB
15:23:40 rss=32.74GB vsz=7.65TB proc_swap=11.00GB sys_swap=27.02/72.00GB
MemAvail=1.50GB SwapAvail=44.98GB
15:23:50 rss=33.83GB vsz=7.65TB proc_swap=9.89GB sys_swap=25.65/72.00GB
MemAvail=1.44GB SwapAvail=46.35GB
15:24:00 rss=36.33GB vsz=7.65TB proc_swap=7.60GB sys_swap=23.55/72.00GB
MemAvail=1.67GB SwapAvail=48.45GB
15:24:10 rss=37.85GB vsz=7.65TB proc_swap=6.91GB sys_swap=21.83/72.00GB
MemAvail=5.10GB SwapAvail=50.17GB
15:24:20 rss=39.30GB vsz=7.65TB proc_swap=6.69GB sys_swap=20.23/72.00GB
MemAvail=7.28GB SwapAvail=51.77GB
15:24:30 rss=40.40GB vsz=7.65TB proc_swap=6.20GB sys_swap=17.94/72.00GB
MemAvail=10.20GB SwapAvail=54.06GB
15:24:40 rss=41.44GB vsz=7.65TB proc_swap=5.23GB sys_swap=14.95/72.00GB
MemAvail=20.01GB SwapAvail=57.05GB
15:24:50 rss=41.68GB vsz=7.65TB proc_swap=5.20GB sys_swap=14.92/72.00GB
MemAvail=16.14GB SwapAvail=57.08GB
15:25:00 rss=42.77GB vsz=7.65TB proc_swap=4.95GB sys_swap=14.87/72.00GB
MemAvail=11.41GB SwapAvail=57.13GB
15:25:11 rss=42.78GB vsz=7.65TB proc_swap=5.26GB sys_swap=15.55/72.00GB
MemAvail=8.58GB SwapAvail=56.45GB
15:25:21 rss=40.79GB vsz=7.65TB proc_swap=6.88GB sys_swap=17.46/72.00GB
MemAvail=5.65GB SwapAvail=54.54GB

TIP Trucking
[15:21:44] block #24,656,279 ts=2026-03-14 15:19:47 lag=+117.8s ALERT:
lag=117.8s — node is behind the tip!
[15:21:44] block #24,656,280 ts=2026-03-14 15:19:59 lag=+105.8s ALERT:
lag=105.8s — node is behind the tip!
[15:21:44] block #24,656,281 ts=2026-03-14 15:20:11 lag=+93.8s ALERT:
lag=93.8s — node is behind the tip!
[15:21:44] block #24,656,282 ts=2026-03-14 15:20:23 lag=+81.8s ALERT:
lag=81.8s — node is behind the tip!
[15:21:44] block #24,656,283 ts=2026-03-14 15:20:47 lag=+57.8s ALERT:
lag=57.8s — node is behind the tip!
[15:21:57] block #24,656,284 ts=2026-03-14 15:20:59 lag=+58.0s ALERT:
lag=58.0s — node is behind the tip!
[15:21:57] block #24,656,285 ts=2026-03-14 15:21:11 lag=+46.0s ALERT:
lag=46.0s — node is behind the tip!
[15:21:57] block #24,656,286 ts=2026-03-14 15:21:23 lag=+34.0s ALERT:
lag=34.0s — node is behind the tip!
[15:21:57] block #24,656,287 ts=2026-03-14 15:21:35 lag=+22.0s ALERT:
lag=22.0s — node is behind the tip!
[15:21:57] block #24,656,288  ts=2026-03-14 15:21:47  lag=+10.0s  OK
[15:22:07] block #24,656,289  ts=2026-03-14 15:21:59  lag=+8.0s  OK
[15:22:19] block #24,656,290  ts=2026-03-14 15:22:11  lag=+8.3s  OK
[15:22:32] block #24,656,291  ts=2026-03-14 15:22:23  lag=+9.3s  OK
[15:23:02] ALERT: no new block for 30s (last block #24656291) — node may
be losing the tip!
[15:23:32] ALERT: no new block for 60s (last block #24656291) — node may
be losing the tip!
[15:24:02] ALERT: no new block for 90s (last block #24656291) — node may
be losing the tip!
[15:24:24] block #24,656,292 ts=2026-03-14 15:22:35 lag=+109.5s ALERT:
lag=109.5s — node is behind the tip!
[15:24:24] block #24,656,293 ts=2026-03-14 15:22:47 lag=+97.5s ALERT:
lag=97.5s — node is behind the tip!
[15:24:24] block #24,656,294 ts=2026-03-14 15:22:59 lag=+85.5s ALERT:
lag=85.5s — node is behind the tip!
[15:24:24] block #24,656,295 ts=2026-03-14 15:23:11 lag=+73.5s ALERT:
lag=73.5s — node is behind the tip!
[15:24:54] ALERT: no new block for 30s (last block #24656295) — node may
be losing the tip!
[15:25:17] block #24,656,296 ts=2026-03-14 15:23:23 lag=+114.2s ALERT:
lag=114.2s — node is behind the tip!
[15:25:17] block #24,656,297 ts=2026-03-14 15:23:35 lag=+102.2s ALERT:
lag=102.2s — node is behind the tip!
[15:25:17] block #24,656,298 ts=2026-03-14 15:23:47 lag=+90.2s ALERT:
lag=90.2s — node is behind the tip!
[15:25:17] block #24,656,299 ts=2026-03-14 15:23:59 lag=+78.2s ALERT:
lag=78.2s — node is behind the tip!
[15:25:17] block #24,656,300 ts=2026-03-14 15:24:11 lag=+66.2s ALERT:
lag=66.2s — node is behind the tip!
[15:25:17] block #24,656,301 ts=2026-03-14 15:24:23 lag=+54.2s ALERT:
lag=54.2s — node is behind the tip!
[15:25:17] block #24,656,302 ts=2026-03-14 15:24:35 lag=+42.2s ALERT:
lag=42.2s — node is behind the tip!
[15:25:17] block #24,656,303 ts=2026-03-14 15:24:47 lag=+30.2s ALERT:
lag=30.2s — node is behind the tip!

> ./run_perf_tests.py -p
pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y
eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern:
pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port:  http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=100.00%
max=1m39s]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=100.00%
max=1m46s]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=100.00%
max=1m38s]

> ./run_perf_tests.py -p
pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y
eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern:
pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port:  http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=100.00%
max=1m39s]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=100.00%
max=1m45s]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=100.00%
max=1m40s]


### NEW Software (with PR)
CPU
7:58:51 AM all 51.09 0.00 6.16 0.35 0.00 42.40
07:58:56 AM all 49.26 0.00 5.82 0.03 0.00 44.89
07:59:01 AM all 50.34 0.00 5.95 0.20 0.00 43.51
07:59:06 AM all 51.60 0.00 5.88 0.04 0.00 42.47
07:59:11 AM all 48.97 0.00 5.90 0.06 0.00 45.07
07:59:16 AM all 49.59 0.00 6.11 0.36 0.00 43.93
07:59:21 AM all 48.69 0.00 5.78 0.03 0.00 45.51
07:59:26 AM all 53.50 0.00 6.66 0.26 0.00 39.59
07:59:31 AM all 50.45 0.00 6.37 0.02 0.00 43.16
07:59:36 AM all 48.71 0.00 6.18 0.03 0.00 45.08
07:59:41 AM all 53.58 0.00 6.45 0.15 0.00 39.81
07:59:46 AM all 53.74 0.00 6.13 0.05 0.00 40.07
07:59:51 AM all 31.76 0.00 3.95 0.23 0.00 64.06
07:59:56 AM all 37.20 0.00 5.05 0.03 0.00 57.71
08:00:01 AM all 77.10 0.00 12.95 0.01 0.00 9.94
08:00:06 AM all 78.22 0.00 12.58 0.08 0.00 9.11
08:00:11 AM all 77.64 0.00 12.50 0.00 0.00 9.86
08:00:16 AM all 77.48 0.00 12.61 0.08 0.00 9.83
08:00:21 AM all 77.61 0.00 12.47 0.01 0.00 9.90
08:00:26 AM all 77.35 0.00 12.89 0.06 0.00 9.70
08:00:31 AM all 77.85 0.00 12.92 0.04 0.00 9.19
08:00:36 AM all 77.73 0.00 12.80 0.02 0.00 9.44
08:00:41 AM all 78.42 0.00 12.95 0.05 0.00 8.59
08:00:46 AM all 78.52 0.00 12.55 0.01 0.00 8.93
08:00:51 AM all 78.42 0.00 12.77 0.19 0.00 8.62
08:00:56 AM all 56.98 0.00 8.64 0.11 0.00 34.28

Memory
2026-03-20 08:00:36 pid=1117840 rss=30.04GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.93GB SwapAvail=71.02GB
2026-03-20 08:00:41 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.86GB SwapAvail=71.02GB
2026-03-20 08:00:41 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.86GB SwapAvail=71.02GB
2026-03-20 08:00:46 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.90GB SwapAvail=71.02GB
2026-03-20 08:00:46 pid=1117840 rss=30.20GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.90GB SwapAvail=71.02GB
2026-03-20 08:00:51 pid=1117840 rss=30.28GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.88GB SwapAvail=71.02GB
2026-03-20 08:00:51 pid=1117840 rss=30.28GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.88GB SwapAvail=71.02GB
2026-03-20 08:00:56 pid=1117840 rss=30.54GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=40.39GB SwapAvail=71.02GB
2026-03-20 08:00:56 pid=1117840 rss=30.54GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=40.39GB SwapAvail=71.02GB
2026-03-20 08:01:02 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=40.25GB SwapAvail=71.02GB
2026-03-20 08:01:02 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=40.25GB SwapAvail=71.02GB
2026-03-20 08:01:07 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.97GB SwapAvail=71.02GB
2026-03-20 08:01:07 pid=1117840 rss=30.61GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.97GB SwapAvail=71.02GB
2026-03-20 08:01:12 pid=1117840 rss=30.62GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.48GB SwapAvail=71.02GB
2026-03-20 08:01:12 pid=1117840 rss=30.62GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.48GB SwapAvail=71.02GB
2026-03-20 08:01:17 pid=1117840 rss=30.71GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.57GB SwapAvail=71.02GB
2026-03-20 08:01:17 pid=1117840 rss=30.71GB vsz=7.49TB proc_swap=0.00GB
sys_swap=0.98/72.00GB MemAvail=39.57GB SwapAvail=71.02GB

TIP Trucking
07:56:10] block #24,697,055  ts=2026-03-20 07:55:59  lag=+12.0s  OK
[07:56:15] block #24,697,056  ts=2026-03-20 07:56:11  lag=+4.5s  OK
[07:56:25] block #24,697,057  ts=2026-03-20 07:56:23  lag=+2.5s  OK
[07:56:38] block #24,697,058  ts=2026-03-20 07:56:35  lag=+3.4s  OK
[07:56:50] block #24,697,059  ts=2026-03-20 07:56:47  lag=+3.5s  OK
[07:57:02] block #24,697,060  ts=2026-03-20 07:56:59  lag=+3.6s  OK
[07:57:16] block #24,697,061  ts=2026-03-20 07:57:11  lag=+5.6s  OK
[07:57:27] block #24,697,062  ts=2026-03-20 07:57:23  lag=+4.7s  OK
[07:57:39] block #24,697,063  ts=2026-03-20 07:57:35  lag=+4.3s  OK
[07:57:49] block #24,697,064  ts=2026-03-20 07:57:47  lag=+2.4s  OK
[07:58:01] block #24,697,065  ts=2026-03-20 07:57:59  lag=+2.9s  OK
[07:58:13] block #24,697,066  ts=2026-03-20 07:58:11  lag=+2.8s  OK
[07:58:25] block #24,697,067  ts=2026-03-20 07:58:23  lag=+2.4s  OK
[07:58:37] block #24,697,068  ts=2026-03-20 07:58:35  lag=+2.7s  OK
[07:58:49] block #24,697,069  ts=2026-03-20 07:58:47  lag=+2.3s  OK
[07:59:01] block #24,697,070  ts=2026-03-20 07:58:59  lag=+2.1s  OK
[07:59:15] block #24,697,071  ts=2026-03-20 07:59:11  lag=+4.3s  OK
[07:59:25] block #24,697,072  ts=2026-03-20 07:59:23  lag=+2.6s  OK
[07:59:40] block #24,697,073  ts=2026-03-20 07:59:35  lag=+5.3s  OK
[08:00:02] block #24,697,074  ts=2026-03-20 07:59:59  lag=+3.9s  OK
[08:00:13] block #24,697,075  ts=2026-03-20 08:00:11  lag=+2.8s  OK

./run_perf_tests.py -p
pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y
eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern:
pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port:  http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=51.39%
max=605.449ms error=503 Service Unavailable]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=51.55%
max=442.974ms error=503 Service Unavailable]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=49.52%
max=440.405ms error=503 Service Unavailable]
[1. 4] daemon: executes test qps: 28000 time: 60 -> [R=51.01%
max=440.004ms error=503 Service Unavailable]
[1. 5] daemon: executes test qps: 28000 time: 60 -> [R=49.66%
max=597.333ms error=503 Service Unavailable]


./run_perf_tests.py -p
pattern/mainnet/stress_test_eth_call_001_latest.tar -t 28000:60 -y
eth_call -m 2 -r 100 -Z
Performance Test started
Test repetitions: 100 on sequence: 28000:60 for pattern:
pattern/mainnet/stress_test_eth_call_001_latest.tar
Test on port:  http://localhost:8545
[1. 1] daemon: executes test qps: 28000 time: 60 -> [R=51.51%
max=581.793ms error=503 Service Unavailable]
[1. 2] daemon: executes test qps: 28000 time: 60 -> [R=51.61%
max=431.222ms error=503 Service Unavailable]
[1. 3] daemon: executes test qps: 28000 time: 60 -> [R=49.48%
max=495.57ms error=503 Service Unavailable]
[1. 4] daemon: executes test qps: 28000 time: 60 -> [R=50.91%
max=433.208ms error=503 Service Unavailable]
[1. 5] daemon: executes test qps: 28000 time: 60 -> [R=49.57%
max=538.283ms error=503 Service Unavailable]

Verified on CI TIPtrucking infrastructure. Previous software versions
experienced "TIP lost" at 3,000 QPS. With these changes, the system now
successfully handles up to 6,000 QPS without any TIP loss or
degradation.

</details>

Stress Test Observations (main release)

- Chain Tip Loss: Under heavy load, the node fails to stay synced and
the Chain Tip is lost, as the staged sync pipeline is starved of DB read
slots by queued RPC goroutines.
- Virtual Memory Pressure: The system experiences severe VM pressure,
with process swap usage reaching 11.81 GB. The massive accumulation of
goroutines blocked on roTxsLimiter.Acquire causes excessive paging and
swapping. This state is highly unstable and frequently leads to the
process being terminated by the OOM Killer, causing total node downtime.
- Request Satisfaction (100%): Despite the performance degradation, all
requests are eventually satisfied. However, this is achieved at the cost
of system stability and synchronization.
- Increased Latency: Request latency increases dramatically due to deep
queuing, with response times reaching up to 1m 40s.
---
Stress Test Observations (with PR)
                                    
- Chain Tip Stability: The two-level admission control prevents
goroutine accumulation entirely. The HTTP outer gate rejects excess
requests before any processing; the BeginRo inner gate ensures that any
RPC request that does enter the system uses TryAcquire (fail-fast)
rather than blocking. Internal callers (staged sync, background workers)
always use blocking Acquire
and are never rejected, so the pipeline makes continuous progress.
- Virtual Memory Pressure: Significantly lower memory footprint. By
eliminating request queuing at the HTTP layer, the system avoids
excessive paging and swapping (0.00 GB swap), keeping the OS stable.
- Request Satisfaction (~50%): Approximately 50% of requests are
satisfied; the remainder are immediately rejected with 503 Service
Unavailable. This is the intended fail-fast behavior — goroutines never
accumulate, DB slots are never exhausted.
- Latency Consistency: Response latency remains consistently low. By
refusing to queue requests beyond the system's capacity, the node avoids
the massive latency spikes (previously up to 1m 40s) seen before the
fix.
This behavior is aligned with Nethermind, which returns 503 Service
Unavailable under high load, prioritizing node health over request
queuing.
---
  Final Observation
                   
By adopting a fail-fast strategy at two levels — HTTP admission before
any expensive processing, and TryAcquire inside BeginRo for RPC callers
— we enforce resource isolation at the core level. Internal execution
paths retain guaranteed access to DB read slots via blocking Acquire,
while external RPC pressure is shed immediately. This approach shifts
congestion management
responsibility to the external infrastructure (load balancers, proxies),
which is better equipped to handle buffering, ensuring that the Erigon
node remains stable and synchronized regardless of external RPC load.

## 🚀 RPC Concurrency & Resource Management Comparison

| Feature | Erigon (main) | **Erigon (with PR)** |
| :--- | :--- | :--- |
| **Admission control** | ❌ None | ✅ **HTTP outer gate**
(`rpcAdmissionHandler`) |
| **Overload response** | Unlimited queuing | ✅ **Immediate HTTP 503** |
| **Rejection point** | ❌ None | ✅ Pre-CORS, Gzip, JSON decode |
| **Goroutine accumulation** | ⚠️ Yes, unlimited | ✅ **Eliminated** —
goroutines don't enter the system |
| **Internal pipeline protection** | ❌ RPC and staged sync compete for
slots | ✅ **Internal callers** use blocking `Acquire` |
| **DB slots protection** | ❌ None — RPC exhausts slots | ✅ `TryAcquire`
in `BeginRo` for RPC |
| **Memory under load** | ❌ Critical — swap up to 11.81 GB, OOM | ✅
**Stable** (0.00 GB swap in test) |
| **Latency under overload** | High (~1m 40s) | ✅ **Constantly low**
(fail-fast) |
| **Configuration required** | ❌ No concurrency flags | ✅ **Zero
config**; `--rpc.max.concurrency` optional |
| **Execution isolation** | ❌ Chain tip lost under load | ✅ **Guaranteed
by design** |

### 📊 Performance Comparison: Main (18/03) vs. PR

This benchmark compares the current `main` branch against this PR using
the same set of APIs under heavy load.

| API | main (18/03) post_exec p50 | PR post_exec p50 | Improvement |
| :--- | :---: | :---: | :---: |
| **eth_call** @ 3000 QPS | 6.82s ✅ | 5.89s ✅ | **−14%** |
| **eth_getBlockByNumber** @ 3000 QPS | 13.73s ⚠️ | 5.23s ✅ | **−62%** |
| **eth_getProof** @ 1000–3000 QPS | 49.12s (tip lost) | 2.84s ✅ |
**−94%** |

---

### 🔍 Key Observations

* **eth_call**: Neither `main` nor the PR caused a chain tip loss. Since
`eth_call` is read-only and light on DB slots, it is inherently more
stable, but the PR still delivers a **14% reduction** in p50 latency.
* **eth_getBlockByNumber**: Remains stable up to **6000 QPS** with no
actual tip loss. Any observed `sync=0` periods during testing were
identified as monitoring false negatives rather than actual node desync.
* **eth_getProof**: This is the most impactful result. While `main` lost
the chain tip at only 1000 QPS (p50=49s), the **PR successfully holds up
to 3000 QPS** with a p50 of 2.84s—a **94% performance gain**.

### 🏆 Overall Conclusion
The final PR successfully **eliminates chain tip loss** across all
tested APIs and QPS levels. No real tip loss was observed in any
production-level test run, ensuring much higher node reliability under
stress.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
AskAlexSharov pushed a commit that referenced this pull request Apr 3, 2026
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants