Skip to content

fix: send author-attributed inserts and preserve trailing newline#131

Merged
JohnMcLear merged 12 commits into
mainfrom
fix/insert-author-and-trailing-newline
May 16, 2026
Merged

fix: send author-attributed inserts and preserve trailing newline#131
JohnMcLear merged 12 commits into
mainfrom
fix/insert-author-and-trailing-newline

Conversation

@JohnMcLear
Copy link
Copy Markdown
Member

Summary

Patches ee.append() so it produces USER_CHANGES that comply with Etherpad core's hardened changeset validation (introduced in ether/etherpad#7773):

  • Captures userId from CLIENT_VARS and tags every insert with an author attribute.
  • Splices at text.length - 1 instead of text.length, so the inserted content lands before the trailing '\n' and the pad's "doc always ends with \n" invariant is preserved.
  • Builds the changeset against padState.apool and moveOpsToNewPools it into a fresh wire apool, so the server can resolve the *N slot numbers we transmit.

Why

Etherpad core 2.7.x's handleUserChanges now rejects two kinds of malformed inserts with disconnect: badChangeset:

  1. + ops with no author attribute — they grow pad.atext.text without contributing matching markers to pad.atext.attribs, leaving the two iterables out of sync and breaking setDocAText reconciliation on every later client load.
  2. USER_CHANGES whose application would leave the pad text not ending with '\n' — the browser's line assembler asserts on docs that don't end with '\n' and the session dies.

Both were silently produced by the previous makeSplice(text, text.length, 0, ins) call (insert at very end, no attribs, no pool). With a fresh pad whose initial text is '\n', appending '1' produced '\n1' (no trailing newline) and the changeset's insert carried no author attribute.

Symptom in CI: ether/etherpad's rate-limit workflow runs node send_changesets.mjs ... 99 which expects a disconnect: 'rateLimited'; instead it now gets disconnect: 'badChangeset' on the very first append, never reaches the rate limit, and the test exits 0 (false-negative).

This patch produces the same shape the standard JS web client sends, so it's backward-compatible with both pre- and post-hardening servers.

Test plan

  • pnpm run build passes
  • pnpm run lint passes
  • In-process simulation: a single append('1') against initial atext '\n' produces local atext '1\n' (ends with \n) and a changeset whose + op carries an author attribute resolvable via the wire apool; a second append produces '12\n'
  • Once published, bump etherpad-cli-client in ether/etherpad PR #7773 and confirm the rate-limit workflow goes green

Refs ether/etherpad#7773

🤖 Generated with Claude Code

Etherpad's USER_CHANGES handler now (post-2.7.x, see ether/etherpad#7773)
rejects two kinds of malformed inserts:

  1. `+` ops with no `author` attribute — they grow pad.atext.text without
     contributing matching markers to pad.atext.attribs, leaving the two
     iterables out of sync and breaking setDocAText reconciliation on
     every later client load.
  2. USER_CHANGES whose application would leave the pad text not ending
     with '\n' — the browser's line assembler asserts on docs that don't
     end with '\n' and the session dies.

Both were silently produced by `ee.append()`: it called
`makeSplice(text, text.length, 0, ins)` (insert at very end, no attribs,
no pool). When the host pad starts as `'\n'`, appending `'1'` produced
`'\n1'` (no trailing newline) and the changeset's insert carried no
author attribute.

Fix:

* Capture `userId` from CLIENT_VARS and use it as the author attribute
  on subsequent inserts.
* Splice at `text.length - 1` so the insert lands before the trailing
  '\n' and the invariant holds.
* Build the changeset against `padState.apool`, then `moveOpsToNewPool`
  into a fresh wire apool so the server can resolve the `*N` slot
  numbers we send.

Backward-compatible: pre-hardening Etherpad servers accept the new
shape too (it's the same shape the standard JS web client sends).

Refs ether/etherpad#7773
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

JohnMcLear added 11 commits May 16, 2026 15:44
etherpad core's `bin/installDeps.sh` aborts with `ERROR: Your nodejs
version "20.20" is too old. nodejs 24.0.x or higher is required.` on
the cli-client CI workflows, so they have been failing for unrelated
reasons regardless of patch contents. Bump `node-version: 20` -> `24`
in both backend-tests and npmpublish workflows.
Old setup left pnpm's global bin dir outside PATH so etherpad core's
\`pnpm link --global\` failed with
\`The configured global bin directory ... is not in PATH\`. Switch to
pnpm/action-setup@v6 and set PNPM_HOME, mirroring how the main
etherpad rate-limit workflow wires this up.

Also moves \`pnpm build\` ordering — \`actions/setup-node@v6\` runs
after pnpm install so its \`cache: pnpm\` step finds a real store.
The test job checked out etherpad-lite and ran \`installDeps.sh\`
followed by an unused \`pnpm link --global\` solely as scaffolding.
cli-client itself has no tests beyond lint, and no runtime dep on a
locally-linked etherpad core. Drop the scaffolding so the job stops
inheriting etherpad-lite's lockfile/pnpm-config quirks (pnpm 11
\`autoInstallPeers\` mismatch).
The lockfile was captured with `autoInstallPeers: false` (pnpm 10
default + .npmrc `auto-install-peers=false`). pnpm 11 stops honoring
`auto-install-peers` from .npmrc, so CI re-resolves with the new
default (true) and `pnpm install --frozen-lockfile` then errors with
`ERR_PNPM_LOCKFILE_CONFIG_MISMATCH`. Regenerate the lockfile against
pnpm 11 defaults and drop the obsolete .npmrc.
…gin)

The workflow was a copy-paste from an etherpad plugin template: it
checks out the cli-client into ./plugin/ and tries to run plugin-style
backend tests against an etherpad core install. cli-client has no
backend tests (and isn't a plugin), so this job has effectively been
dead weight that breaks on every etherpad lockfile/build-script churn.
Drop it. The npmpublish workflow's test job (install + build + lint)
already covers the actual surface of this repo.
@JohnMcLear JohnMcLear merged commit feaf55e into main May 16, 2026
2 checks passed
@JohnMcLear JohnMcLear deleted the fix/insert-author-and-trailing-newline branch May 16, 2026 15:36
JohnMcLear added a commit to JohnMcLear/etherpad that referenced this pull request May 16, 2026
4.0.3 sends author-attributed inserts and preserves the trailing
newline, complying with this PR's tightened USER_CHANGES validation.
The rate-limit CI workflow drives the test pad via this client, so
without the bump the new server-side rejects fire on the very first
\`pad.append()\` and the rate-limit disconnect never gets a chance to
arrive — testlimits.sh exits 0 instead of 1 and the rate-limit job
fails with "ratelimit was not triggered when sending every 99 ms".

Refs ether/etherpad-cli-client#131
JohnMcLear added a commit to ether/etherpad that referenced this pull request May 16, 2026
* harden: reject USER_CHANGES inserts without an author attribute

Insert ops MUST carry the author attribute reference so that pad.atext.text
and pad.atext.attribs stay in lock-step. An accepted insert with empty
attribs would grow text without contributing matching attribute markers,
leaving the stored AText in a state where the two iterables disagree on
length when reconstructed. Downstream clients then fail reconciliation in
ace2_inner.ts:setDocAText with 'mismatch error setting raw text in
setDocAText' on every subsequent pad load — making the affected pad
effectively unloadable until manually repaired.

This commit adds a single defensive check inside the existing per-op
validation loop in handleUserChanges: when an op is a '+' (insert) and
its attribs string doesn't yield an 'author' entry via
AttributeMap.fromString, reject with badChangeset. The check piggybacks
on the wireApool that was already constructed for the prior author-match
validation, so no extra parsing.

Test fixtures in messages.ts were updated to send proper author-attributed
inserts plus the matching apool (mirroring what the JS web client always
does). A new regression test 'insert without author attribute is rejected'
locks in the new behaviour.

* harden: also close the HTTP API / plugin path via stable system author

The first commit closed the socket.io USER_CHANGES hole. This commit closes
the parallel path through Pad.spliceText (used by API.setText, API.appendText,
the import flow, and plugins like ep_post_data) where an unattributed insert
would otherwise produce a malformed AText.

Approach: instead of REJECTING (which would break ep_post_data and many
existing tests that call setText/appendText without an authorId), substitute
a stable system author when none is provided. The resulting changeset is
properly attributed, the AText stays well-formed, and existing callers
continue to work unchanged. Plugins that want named author attribution
should still pass an explicit authorId (e.g., one allocated via
authorManager.createAuthor).

Pad.SYSTEM_AUTHOR_ID = 'a.etherpad-system' — a stable identifier that
appears in the pad's attribute pool when internal callers (HTTP API,
plugins, server-side imports) write text without naming an author. The
existing 'attribute changes by another author' protections still apply
to socket.io USER_CHANGES paths — a remote client can't impersonate the
system author for inserts (their session author check fires first).

Test:
- Pad.ts spec adds 'spliceText with empty authorId attributes to the
  system author' — verifies pad text lands AND the pool contains the
  system-author binding. Existing tests that pass an authorId are
  unaffected.

* harden: reject USER_CHANGES that would strand the trailing newline

Etherpad's pad text always ends with '\n'. _handleUserChanges previously
appended a separate `nlChangeset` correction revision whenever the
applied USER_CHANGES left the pad without a trailing '\n'. The stored
pad ended up well-formed, but the FIRST NEW_CHANGES broadcast (the
malformed user revision itself) reached browsers BEFORE the correction
did, and applyToAttribution's MergingOpAssembler aborts with
"line assembler not finished" on a non-'\n'-terminated doc — the
watching browser session then dropped the changeset and any subsequent
edits silently no-op'd until the user reloaded.

Replace the silent auto-correction with an explicit reject. Compute
`applyToText(rebasedChangeset, prevText)` before appendRevision; if the
result doesn't end with '\n', throw -> badChangeset disconnect. Clients
must emit USER_CHANGES whose application preserves the invariant —
this matches what the JS web client already does and forces non-JS
clients (etherpad-pad, third-party integrations) to surface their bugs
in their own logs instead of stranding the trailing newline in pad
revision history.

Also fixes a latent retransmission-detection bug surfaced by this PR's
author-attrib changes: moveOpsToNewPool renumbers `*N` references to
whatever slot the pad pool assigns, which can differ from the wire
form's slot. Comparing the raw client wire against the stored revision
form (`changeset === c`) then misses legitimate retransmissions and
the same edit gets duplicated. Snapshot the post-pool-mapping form
(`canonicalCs`) and compare that against `c` instead.

Backend test additions:
- 'changeset that would strand the trailing \\n is rejected' covers
  the new rejection path with wire `Z:6>1|1=6*0+1$X` against
  `hello\n`.
- handleMessageSecurity test now captures roSocket's own authorId and
  uses it in the apool sent through roSocket, because the prior PR
  commit made `*0` referencing the wrong author a hard reject.

All 1130 backend tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump etherpad-cli-client to ^4.0.3

4.0.3 sends author-attributed inserts and preserves the trailing
newline, complying with this PR's tightened USER_CHANGES validation.
The rate-limit CI workflow drives the test pad via this client, so
without the bump the new server-side rejects fire on the very first
\`pad.append()\` and the rate-limit disconnect never gets a chance to
arrive — testlimits.sh exits 0 instead of 1 and the rate-limit job
fails with "ratelimit was not triggered when sending every 99 ms".

Refs ether/etherpad-cli-client#131

* harden: reject USER_CHANGES that name the reserved system author

The session-author equality check already rejects wire `*N` that
names a different real user, but `a.etherpad-system` is server-
internal — it's only used when spliceText / setText is called with
an empty authorId from HTTP API or plugin paths. A wire op that
names it is either a confused client or an attempt to launder
edits through a reserved attribution slot. Refuse.

Backend test 'insert claiming the reserved system author is
rejected' locks in the new behavior with wire `Z:1>5*0+5$hello`
plus an apool that maps slot 0 to `a.etherpad-system`. All 1131
backend tests pass.

Inline literal `'a.etherpad-system'` rather than importing the
constant from `Pad.SYSTEM_AUTHOR_ID` — `require('../db/Pad')` at
PadMessageHandler module scope returned a partially-initialized
class via the padManager circular path, leaving the static-field
access undefined at runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pull Bot pushed a commit to fgreinacher/etherpad-cli-client that referenced this pull request May 16, 2026
The previous \`pnpm version patch\` step ran \`pnpm i\` first, which
under pnpm 11 dirties pnpm-workspace.yaml (the build-script approval
wizard re-emits its prompt every install). \`pnpm version\` then
aborts with \`ERR_PNPM_UNCLEAN_WORKING_TREE\` and the publish never
runs — that's how 4.0.3 failed to ship after PR ether#131 merged.

Switch to a simpler model: bump the version manually in the PR that
needs to ship, then publish whatever version is in package.json on
merge to main. Tag the commit only if no \`vX.Y.Z\` tag exists yet,
and use \`pnpm publish --no-git-checks\` so transient working-tree
state from pnpm itself doesn't block the publish.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant