fix: send author-attributed inserts and preserve trailing newline#131
Merged
Conversation
Etherpad's USER_CHANGES handler now (post-2.7.x, see ether/etherpad#7773) rejects two kinds of malformed inserts: 1. `+` ops with no `author` attribute — they grow pad.atext.text without contributing matching markers to pad.atext.attribs, leaving the two iterables out of sync and breaking setDocAText reconciliation on every later client load. 2. USER_CHANGES whose application would leave the pad text not ending with '\n' — the browser's line assembler asserts on docs that don't end with '\n' and the session dies. Both were silently produced by `ee.append()`: it called `makeSplice(text, text.length, 0, ins)` (insert at very end, no attribs, no pool). When the host pad starts as `'\n'`, appending `'1'` produced `'\n1'` (no trailing newline) and the changeset's insert carried no author attribute. Fix: * Capture `userId` from CLIENT_VARS and use it as the author attribute on subsequent inserts. * Splice at `text.length - 1` so the insert lands before the trailing '\n' and the invariant holds. * Build the changeset against `padState.apool`, then `moveOpsToNewPool` into a fresh wire apool so the server can resolve the `*N` slot numbers we send. Backward-compatible: pre-hardening Etherpad servers accept the new shape too (it's the same shape the standard JS web client sends). Refs ether/etherpad#7773
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
etherpad core's `bin/installDeps.sh` aborts with `ERROR: Your nodejs version "20.20" is too old. nodejs 24.0.x or higher is required.` on the cli-client CI workflows, so they have been failing for unrelated reasons regardless of patch contents. Bump `node-version: 20` -> `24` in both backend-tests and npmpublish workflows.
Old setup left pnpm's global bin dir outside PATH so etherpad core's \`pnpm link --global\` failed with \`The configured global bin directory ... is not in PATH\`. Switch to pnpm/action-setup@v6 and set PNPM_HOME, mirroring how the main etherpad rate-limit workflow wires this up. Also moves \`pnpm build\` ordering — \`actions/setup-node@v6\` runs after pnpm install so its \`cache: pnpm\` step finds a real store.
The test job checked out etherpad-lite and ran \`installDeps.sh\` followed by an unused \`pnpm link --global\` solely as scaffolding. cli-client itself has no tests beyond lint, and no runtime dep on a locally-linked etherpad core. Drop the scaffolding so the job stops inheriting etherpad-lite's lockfile/pnpm-config quirks (pnpm 11 \`autoInstallPeers\` mismatch).
The lockfile was captured with `autoInstallPeers: false` (pnpm 10 default + .npmrc `auto-install-peers=false`). pnpm 11 stops honoring `auto-install-peers` from .npmrc, so CI re-resolves with the new default (true) and `pnpm install --frozen-lockfile` then errors with `ERR_PNPM_LOCKFILE_CONFIG_MISMATCH`. Regenerate the lockfile against pnpm 11 defaults and drop the obsolete .npmrc.
…gin) The workflow was a copy-paste from an etherpad plugin template: it checks out the cli-client into ./plugin/ and tries to run plugin-style backend tests against an etherpad core install. cli-client has no backend tests (and isn't a plugin), so this job has effectively been dead weight that breaks on every etherpad lockfile/build-script churn. Drop it. The npmpublish workflow's test job (install + build + lint) already covers the actual surface of this repo.
2 tasks
JohnMcLear
added a commit
to JohnMcLear/etherpad
that referenced
this pull request
May 16, 2026
4.0.3 sends author-attributed inserts and preserves the trailing newline, complying with this PR's tightened USER_CHANGES validation. The rate-limit CI workflow drives the test pad via this client, so without the bump the new server-side rejects fire on the very first \`pad.append()\` and the rate-limit disconnect never gets a chance to arrive — testlimits.sh exits 0 instead of 1 and the rate-limit job fails with "ratelimit was not triggered when sending every 99 ms". Refs ether/etherpad-cli-client#131
JohnMcLear
added a commit
to ether/etherpad
that referenced
this pull request
May 16, 2026
* harden: reject USER_CHANGES inserts without an author attribute Insert ops MUST carry the author attribute reference so that pad.atext.text and pad.atext.attribs stay in lock-step. An accepted insert with empty attribs would grow text without contributing matching attribute markers, leaving the stored AText in a state where the two iterables disagree on length when reconstructed. Downstream clients then fail reconciliation in ace2_inner.ts:setDocAText with 'mismatch error setting raw text in setDocAText' on every subsequent pad load — making the affected pad effectively unloadable until manually repaired. This commit adds a single defensive check inside the existing per-op validation loop in handleUserChanges: when an op is a '+' (insert) and its attribs string doesn't yield an 'author' entry via AttributeMap.fromString, reject with badChangeset. The check piggybacks on the wireApool that was already constructed for the prior author-match validation, so no extra parsing. Test fixtures in messages.ts were updated to send proper author-attributed inserts plus the matching apool (mirroring what the JS web client always does). A new regression test 'insert without author attribute is rejected' locks in the new behaviour. * harden: also close the HTTP API / plugin path via stable system author The first commit closed the socket.io USER_CHANGES hole. This commit closes the parallel path through Pad.spliceText (used by API.setText, API.appendText, the import flow, and plugins like ep_post_data) where an unattributed insert would otherwise produce a malformed AText. Approach: instead of REJECTING (which would break ep_post_data and many existing tests that call setText/appendText without an authorId), substitute a stable system author when none is provided. The resulting changeset is properly attributed, the AText stays well-formed, and existing callers continue to work unchanged. Plugins that want named author attribution should still pass an explicit authorId (e.g., one allocated via authorManager.createAuthor). Pad.SYSTEM_AUTHOR_ID = 'a.etherpad-system' — a stable identifier that appears in the pad's attribute pool when internal callers (HTTP API, plugins, server-side imports) write text without naming an author. The existing 'attribute changes by another author' protections still apply to socket.io USER_CHANGES paths — a remote client can't impersonate the system author for inserts (their session author check fires first). Test: - Pad.ts spec adds 'spliceText with empty authorId attributes to the system author' — verifies pad text lands AND the pool contains the system-author binding. Existing tests that pass an authorId are unaffected. * harden: reject USER_CHANGES that would strand the trailing newline Etherpad's pad text always ends with '\n'. _handleUserChanges previously appended a separate `nlChangeset` correction revision whenever the applied USER_CHANGES left the pad without a trailing '\n'. The stored pad ended up well-formed, but the FIRST NEW_CHANGES broadcast (the malformed user revision itself) reached browsers BEFORE the correction did, and applyToAttribution's MergingOpAssembler aborts with "line assembler not finished" on a non-'\n'-terminated doc — the watching browser session then dropped the changeset and any subsequent edits silently no-op'd until the user reloaded. Replace the silent auto-correction with an explicit reject. Compute `applyToText(rebasedChangeset, prevText)` before appendRevision; if the result doesn't end with '\n', throw -> badChangeset disconnect. Clients must emit USER_CHANGES whose application preserves the invariant — this matches what the JS web client already does and forces non-JS clients (etherpad-pad, third-party integrations) to surface their bugs in their own logs instead of stranding the trailing newline in pad revision history. Also fixes a latent retransmission-detection bug surfaced by this PR's author-attrib changes: moveOpsToNewPool renumbers `*N` references to whatever slot the pad pool assigns, which can differ from the wire form's slot. Comparing the raw client wire against the stored revision form (`changeset === c`) then misses legitimate retransmissions and the same edit gets duplicated. Snapshot the post-pool-mapping form (`canonicalCs`) and compare that against `c` instead. Backend test additions: - 'changeset that would strand the trailing \\n is rejected' covers the new rejection path with wire `Z:6>1|1=6*0+1$X` against `hello\n`. - handleMessageSecurity test now captures roSocket's own authorId and uses it in the apool sent through roSocket, because the prior PR commit made `*0` referencing the wrong author a hard reject. All 1130 backend tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump etherpad-cli-client to ^4.0.3 4.0.3 sends author-attributed inserts and preserves the trailing newline, complying with this PR's tightened USER_CHANGES validation. The rate-limit CI workflow drives the test pad via this client, so without the bump the new server-side rejects fire on the very first \`pad.append()\` and the rate-limit disconnect never gets a chance to arrive — testlimits.sh exits 0 instead of 1 and the rate-limit job fails with "ratelimit was not triggered when sending every 99 ms". Refs ether/etherpad-cli-client#131 * harden: reject USER_CHANGES that name the reserved system author The session-author equality check already rejects wire `*N` that names a different real user, but `a.etherpad-system` is server- internal — it's only used when spliceText / setText is called with an empty authorId from HTTP API or plugin paths. A wire op that names it is either a confused client or an attempt to launder edits through a reserved attribution slot. Refuse. Backend test 'insert claiming the reserved system author is rejected' locks in the new behavior with wire `Z:1>5*0+5$hello` plus an apool that maps slot 0 to `a.etherpad-system`. All 1131 backend tests pass. Inline literal `'a.etherpad-system'` rather than importing the constant from `Pad.SYSTEM_AUTHOR_ID` — `require('../db/Pad')` at PadMessageHandler module scope returned a partially-initialized class via the padManager circular path, leaving the static-field access undefined at runtime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pull Bot
pushed a commit
to fgreinacher/etherpad-cli-client
that referenced
this pull request
May 16, 2026
The previous \`pnpm version patch\` step ran \`pnpm i\` first, which under pnpm 11 dirties pnpm-workspace.yaml (the build-script approval wizard re-emits its prompt every install). \`pnpm version\` then aborts with \`ERR_PNPM_UNCLEAN_WORKING_TREE\` and the publish never runs — that's how 4.0.3 failed to ship after PR ether#131 merged. Switch to a simpler model: bump the version manually in the PR that needs to ship, then publish whatever version is in package.json on merge to main. Tag the commit only if no \`vX.Y.Z\` tag exists yet, and use \`pnpm publish --no-git-checks\` so transient working-tree state from pnpm itself doesn't block the publish.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Patches
ee.append()so it produces USER_CHANGES that comply with Etherpad core's hardened changeset validation (introduced in ether/etherpad#7773):userIdfromCLIENT_VARSand tags every insert with anauthorattribute.text.length - 1instead oftext.length, so the inserted content lands before the trailing'\n'and the pad's "doc always ends with\n" invariant is preserved.padState.apoolandmoveOpsToNewPools it into a fresh wire apool, so the server can resolve the*Nslot numbers we transmit.Why
Etherpad core 2.7.x's
handleUserChangesnow rejects two kinds of malformed inserts withdisconnect: badChangeset:+ops with noauthorattribute — they growpad.atext.textwithout contributing matching markers topad.atext.attribs, leaving the two iterables out of sync and breakingsetDocATextreconciliation on every later client load.'\n'— the browser's line assembler asserts on docs that don't end with'\n'and the session dies.Both were silently produced by the previous
makeSplice(text, text.length, 0, ins)call (insert at very end, no attribs, no pool). With a fresh pad whose initial text is'\n', appending'1'produced'\n1'(no trailing newline) and the changeset's insert carried no author attribute.Symptom in CI: ether/etherpad's
rate-limitworkflow runsnode send_changesets.mjs ... 99which expects adisconnect: 'rateLimited'; instead it now getsdisconnect: 'badChangeset'on the very first append, never reaches the rate limit, and the test exits 0 (false-negative).This patch produces the same shape the standard JS web client sends, so it's backward-compatible with both pre- and post-hardening servers.
Test plan
pnpm run buildpassespnpm run lintpassesappend('1')against initial atext'\n'produces local atext'1\n'(ends with\n) and a changeset whose+op carries an author attribute resolvable via the wire apool; a second append produces'12\n'etherpad-cli-clientin ether/etherpad PR #7773 and confirm the rate-limit workflow goes greenRefs ether/etherpad#7773
🤖 Generated with Claude Code