Investigate why dropping socket.io polling fallback degrades p95 under load (#7756 follow-up)

From the [2026-05 scaling dive](https://github.com/ether/etherpad/pull/7765), setting \`socketTransportProtocols: [\"websocket\"]\` (i.e. dropping the polling fallback) consistently *worsens* client p95 latency under high concurrency:

| Authors | baseline p95 | websocket-only p95 | apply_mean baseline | apply_mean ws-only |
|---:|---:|---:|---:|---:|
| 100 | 11 ms | 18 ms | 4.16 ms | 5.13 ms |
| 140 | 14 ms | 25 ms | 4.02 ms | 6.09 ms |
| 180 | 16 ms | **68 ms** | 4.48 ms | 9.81 ms |
| 200 | 22 ms | **82 ms** | 4.95 ms | 13.33 ms |

Same harness, same runner, same sweep — only difference is the transport setting.

## Hypotheses worth checking

1. WS-only forces clients that can't establish WebSocket within socket.io's handshake timeout to retry-loop rather than fall back to polling, producing reconnect storms that drive up server CPU.
2. The WS-only path changes socket.io's handshake protocol routing in a way that interacts badly with load balancers / proxies.
3. Per-message-WebSocket framing overhead becomes significant when emits/sec is high (66k/dwell at 200 authors).
4. The polling fallback acts as a *natural* coalescer (multiple events per HTTP poll) that we lose when forcing pure-WS.

Hypothesis 4 is particularly interesting because it would mean polling-as-batching is doing real work for us today.

## Reproducing

\`\`\`
gh workflow run \"Scaling dive\" --repo ether/etherpad-load-test --ref main \\
  -f core_ref=develop \\
  -f sweep='authors=20..200:step=20:dwell=10s:warmup=2s'
\`\`\`

Compare \`scaling-dive-baseline\` vs \`scaling-dive-websocket-only\` artifacts. Run [25940112728](https://github.com/ether/etherpad-load-test/actions/runs/25940112728) is the reference.

## Why this matters

If we know *why* the polling fallback helps so much, we can either preserve that property explicitly (via batching) or stop carrying socket.io's polling-fallback code path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate why dropping socket.io polling fallback degrades p95 under load (#7756 follow-up) #7767

Hypotheses worth checking

Reproducing

Why this matters

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Authors	baseline p95	websocket-only p95	apply_mean baseline	apply_mean ws-only
100	11 ms	18 ms	4.16 ms	5.13 ms
140	14 ms	25 ms	4.02 ms	6.09 ms
180	16 ms	68 ms	4.48 ms	9.81 ms
200	22 ms	82 ms	4.95 ms	13.33 ms

Uh oh!

Investigate why dropping socket.io polling fallback degrades p95 under load (#7756 follow-up) #7767

Description

Hypotheses worth checking

Reproducing

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions