feat(scaling): serialize per-socket fan-out + NEW_CHANGES_BATCH (#7756 lever 3)#7768
feat(scaling): serialize per-socket fan-out + NEW_CHANGES_BATCH (#7756 lever 3)#7768JohnMcLear wants to merge 2 commits into
Conversation
Identified by the #7756 scaling dive (PR #7765) and confirmed by the engine.io transport investigation in #7767: socket.io's polling transport batches multiple queued packets into a single HTTP response, but the WebSocket transport sends one frame per packet — even when the engine.io socket has several packets buffered. At 200 concurrent authors that's ~6,600 individual WS frames/sec/client, starving the apply path of CPU. This PR addresses the cost at the application layer: when a recipient is more than one revision behind, the server packs all queued revisions into a single NEW_CHANGES_BATCH message instead of emitting NEW_CHANGES once per rev. The wire payload is the same information, just consolidated. Feature-flagged: - settings.newChangesBatch defaults to false. Production behaviour is unchanged. - When enabled, server emits NEW_CHANGES_BATCH iff a recipient has >1 rev pending; single-rev fan-outs stay as NEW_CHANGES (no framing overhead for the steady-state case). Clients are forward-compatible: both collab_client.ts (live editor) and broadcast.ts (timeslider) now accept either message type and normalise to a list. Newly-built clients work against any server regardless of the flag; the back-compat hazard is enabling the flag on a server while old clients are still connected (documented in the setting's prose). Tests: src/tests/backend-new/specs/new-changes-batch.test.ts pins the server's wire-format decision. 4/4 new + 5/5 existing prom-instruments stay green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Review Summary by QodoPack multi-revision fan-out into single NEW_CHANGES_BATCH emit
WalkthroughsDescription• Pack multiple queued revisions into single NEW_CHANGES_BATCH emit - Reduces engine.io packet count under high concurrency - WebSocket transport sends one frame per packet (polling already batches) • Feature-flagged with settings.newChangesBatch (defaults to false) - Single-rev fan-outs remain as NEW_CHANGES (no overhead) - Multi-rev fan-outs use NEW_CHANGES_BATCH when enabled • Clients forward-compatible: accept both message types - collab_client.ts and broadcast.ts normalize to list - Old clients connecting to enabled server will miss batched revisions • Comprehensive test coverage with new test suite - 4 new tests pin wire-format decision - Existing prom-instruments tests remain green Diagramflowchart LR
A["Multiple queued revisions"] --> B{"newChangesBatch enabled?"}
B -->|No or single rev| C["Emit per-rev NEW_CHANGES"]
B -->|Yes and multi-rev| D["Emit single NEW_CHANGES_BATCH"]
C --> E["Client receives"]
D --> E
E --> F["Normalize to list"]
F --> G["Apply revisions in order"]
File Changes1. src/node/handler/PadMessageHandler.ts
|
Code Review by Qodo
1.
|
Two issues raised on the first push: 1. **Rev advanced before send (Bug · Correctness).** The previous diff advanced sessioninfo.rev/time inside the collect loop, before any emit ran. A concurrent updatePadClients() could then see the bumped rev and skip those revisions, and if the emit threw later, the skipped revs were lost forever. The client enforces strict newRev===rev+1 and silently stops applying on mismatch — net effect was a possible pad desync under concurrent fan-outs. Fix: snapshot startRev/startTime once, claim the (startRev, headRev] range by setting sessioninfo.rev = headRev immediately (so a concurrent run skips it), build the pending list against the local startTime, then emit. If the emit throws, roll sessioninfo.rev back to startRev so the next fan-out retries. Time is only committed after a successful send. 2. **Test re-implemented the decision (Rule violation · Reliability).** The original test re-implemented the NEW_CHANGES vs NEW_CHANGES_BATCH switch locally instead of exercising the production code. Removing the production logic would have left the test green. Fix: extract the pure wire-format decision into src/node/handler/NewChangesPacker.ts (no DB / pad dependency, so the test can import it directly under vitest), and rewrite the test to assert against the exported `buildNewChangesEmits` function from that module. PadMessageHandler now calls the same function; deleting it would fail the test. 9/9 tests across new-changes-batch + prom-instruments. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures everything learned since the first draft: - The "250-author cliff" was a measurement artefact from per-IP commitRateLimiting + colocated harness. Fixed via the etherpad-load-test#105 workflow patch. Real ceiling is ~350-400 authors on a 4-vCPU GitHub runner. - apply_mean ballooning at the cliff isn't slow code — it's OS preemption (7+ cores of work on 4 vCPU). Application-level JS rearrangement can't reach it. - Two changes hold up under the dive: fan-out serialization + NEW_CHANGES_BATCH (#7768, 70% p95 drop at 200 authors) and historicalAuthorData cache (#7769, neutral on dive but real production thundering-herd fix at join time). - Four directions didn't pan out: WebSocket-only transport, heap bump, message-level batching alone (#7766 closed), and rebase-loop prefetch (#7770 closed). Each has a one-line cause documented for the record. - Engine.io transport-level packing (#7767) is the meatiest untouched lever — sending multiple packets per WebSocket frame the way polling already does via encodePayload. Qodo-flagged corrections incorporated: 1. The new instruments are Histogram + Counter + Gauge, not "three counters" — labelled correctly. 2. The lever-3 line reference now points at updatePadClients (lines 985-999) where NEW_CHANGES actually emits, not the wrong line 627 (handleSaveRevisionMessage). 3. Lever 3's results are written up against measured data, not "deferred". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three identical sweeps against develop quantify the runner-noise envelope. Same workload, same code, same workflow → p95 at step 350 ranged 39-122ms on baseline (3.1x spread). At step 300, 1.9x spread. What this means for prior conclusions in this doc: - websocket-only-is-worst HOLDS at the cliff: its envelope min (2463) equals baseline's max (2463), envelopes don't overlap. Single contradicting run was an outlier. - lever-3 (#7768) "70% p95 drop at 200" was a single-run outlier comparison. The real reliable improvement is ~5-15% median p95 plus much tighter consistency (fewer tail-latency excursions). The mechanism — per-socket serialization preventing overlapping fan-outs that contend for CPU — is still real and still worth merging; the headline number was inflated. - below the cliff, all four levers' noise envelopes overlap. No clear winner. Going forward: lever scoring should default to N >= 3 trials and report min/median/max, not single-run point estimates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
N=3 scoring of feat/cache-historical-author-data shows it's net-negative above 300 authors (step 350 p95 envelope 301/488/633ms vs develop baseline 39/39/122ms). Two compounding issues: - The motivating hypothesis (250-cliff is a join thundering herd) was falsified — that cliff was the per-IP rate-limit artefact. - The defensive shallow-clone-on-every-get() added in the Qodo fix walks O(N) author entries per join, costing more than the inline Promise.all it replaced. Updated recommendations: lever 3 (#7768) is now the only PR worth merging. lever 6 (#7769) added to the do-not-merge list with honest data. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two findings from rigorous N=3 scoring: 1. Lever 3 (#7768) is NOT a perf win. When you compare like-for- like matrix entries (develop-baseline vs PR-baseline), the per-socket serialization is slightly net-negative across the curve. My earlier "70% drop" was a single-run outlier; the subsequent "tighter envelope" was a cross-matrix-entry comparison confounded by noise. The serialization is still a real correctness fix (race on concurrent fan-outs + lost revisions on emit error) so the PR stays open, but the recommendation is now correctness-only. 2. Lever 8b (#7774) — engine.io flush deferral. The follow-up to the closed lever 8 that actually patches Socket.sendPacket instead of just transport.send. queueMicrotask-coalesced flush gives the transport multi-packet batches to work with at last. N=3 shows tighter tail at step 300-350 (122 → 110 max at 350, 71 → 58 max at 300). Not a cliff-mover. The only PR in this program with N=3-confirmed perf benefit. Final disposition: - Merge: #7774 (modest perf), #7768 (correctness), #7762 (already merged, instruments). - The cliff at 350-400 authors is hardware-bound on a 4-vCPU runner, not code-bound. Production with more cores per host scales proportionally with no code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Honest re-evaluation after N=3 scoring: this PR's perf claim was wrong twice. The original "70% p95 drop at step 200" was a single-run outlier. The subsequent "tighter envelope" claim was a cross-matrix-entry comparison confounded by the runner-noise envelope (see scaling-dive doc for the methodology trap). Like-for-like comparison (develop-baseline vs this-PR-baseline, N=3 each):
The performance benefit isn't there. Recommendation now stands on the correctness fix only, not perf.
What this PR is for
Two bundled changes; only change A is justified by what we measured:
Change A — per-socket fan-out serialization (correctness fix).
updatePadClientspreviously advancedsessioninfo.revinside the collect phase, before the emit. Under concurrency that allowed overlapping fan-outs on the same socket; if the emit on one branch threw after the other branch had already advanced rev, revisions could be silently lost (the client enforcesnewRev === rev + 1and stops applying on mismatch).The fix snapshots
startRevandheadRevonce and writessessioninfo.rev = headRevimmediately. A concurrent second run sees the bumped rev and skips the range; on emit failuresessioninfo.revrolls back tostartRev. At most one fan-out per socket per pad at a time, with retry semantics preserved. This is a real race the previous code exhibited.Change B — NEW_CHANGES_BATCH wire format. When a recipient is more than one rev behind, the server packs queued revs into one
NEW_CHANGES_BATCHemit. Single-rev fan-outs stay asNEW_CHANGES. Feature-flagged behindsettings.newChangesBatch: falsedefault; clients are forward-compatible.In the dive, change B was dormant — steady-state catch-up is 1 rev per recipient per fan-out, so the batching branch never fired. It would fire under server slowness (GC pauses, disk hiccups, sustained delays inside
updatePadClients). Useful forward-compat groundwork; not contributing to current measured numbers.Tests
src/tests/backend-new/specs/new-changes-batch.test.ts(4/4) — pins the wire-format decision via the exportedbuildNewChangesEmitsfunction insrc/node/handler/NewChangesPacker.ts(extracted so the test exercises the production code path).src/tests/backend-new/specs/prom-instruments.test.ts(5/5) — no regression on feat(metrics): 3 Prometheus counters for scaling dive (#7756) #7762.Recommendation
Merge for the correctness fix. The previous race was real and could lose revisions under concurrent commits. The perf benefit I claimed wasn't real — runner noise dominated my earlier measurements.
For the perf direction, the actually-measured win is #7774 (engine.io flush deferral), which tightens the tail at mid-load with N=3 confirmation. That PR doesn't depend on this one.
🤖 Generated with Claude Code