From the 2026-05 scaling dive, setting `socketTransportProtocols: ["websocket"]` (i.e. dropping the polling fallback) consistently worsens client p95 latency under high concurrency:
| Authors |
baseline p95 |
websocket-only p95 |
apply_mean baseline |
apply_mean ws-only |
| 100 |
11 ms |
18 ms |
4.16 ms |
5.13 ms |
| 140 |
14 ms |
25 ms |
4.02 ms |
6.09 ms |
| 180 |
16 ms |
68 ms |
4.48 ms |
9.81 ms |
| 200 |
22 ms |
82 ms |
4.95 ms |
13.33 ms |
Same harness, same runner, same sweep — only difference is the transport setting.
Hypotheses worth checking
- WS-only forces clients that can't establish WebSocket within socket.io's handshake timeout to retry-loop rather than fall back to polling, producing reconnect storms that drive up server CPU.
- The WS-only path changes socket.io's handshake protocol routing in a way that interacts badly with load balancers / proxies.
- Per-message-WebSocket framing overhead becomes significant when emits/sec is high (66k/dwell at 200 authors).
- The polling fallback acts as a natural coalescer (multiple events per HTTP poll) that we lose when forcing pure-WS.
Hypothesis 4 is particularly interesting because it would mean polling-as-batching is doing real work for us today.
Reproducing
```
gh workflow run "Scaling dive" --repo ether/etherpad-load-test --ref main \
-f core_ref=develop \
-f sweep='authors=20..200:step=20:dwell=10s:warmup=2s'
```
Compare `scaling-dive-baseline` vs `scaling-dive-websocket-only` artifacts. Run 25940112728 is the reference.
Why this matters
If we know why the polling fallback helps so much, we can either preserve that property explicitly (via batching) or stop carrying socket.io's polling-fallback code path.
From the 2026-05 scaling dive, setting `socketTransportProtocols: ["websocket"]` (i.e. dropping the polling fallback) consistently worsens client p95 latency under high concurrency:
Same harness, same runner, same sweep — only difference is the transport setting.
Hypotheses worth checking
Hypothesis 4 is particularly interesting because it would mean polling-as-batching is doing real work for us today.
Reproducing
```
gh workflow run "Scaling dive" --repo ether/etherpad-load-test --ref main \
-f core_ref=develop \
-f sweep='authors=20..200:step=20:dwell=10s:warmup=2s'
```
Compare `scaling-dive-baseline` vs `scaling-dive-websocket-only` artifacts. Run 25940112728 is the reference.
Why this matters
If we know why the polling fallback helps so much, we can either preserve that property explicitly (via batching) or stop carrying socket.io's polling-fallback code path.