Skip to content

Replace YJS with custom server-authoritative document sync#655

Merged
softmarshmallow merged 10 commits intomainfrom
canary
Apr 9, 2026
Merged

Replace YJS with custom server-authoritative document sync#655
softmarshmallow merged 10 commits intomainfrom
canary

Conversation

@softmarshmallow
Copy link
Copy Markdown
Member

@softmarshmallow softmarshmallow commented Apr 8, 2026

Summary

Replaces the YJS-based PoC sync layer with a custom server-authoritative document sync system. The old system bridged YJS ↔ Immer ↔ Editor state, bypassing our FlatBuffers format entirely. The new system operates directly on typed property-level diffs aligned with the document model.

  • New package @grida/canvas-sync — standalone sync engine with protocol types, diff/patch/compose logic, server-side validation, and SyncClient (optimistic rebase model inspired by TLDraw's tlsync)
  • Rewrite grida-canvas-document-worker-cfSyncRoom Durable Object with SQLite persistence, hibernatable WebSockets, atomic transactions, 1MB message guard, and presence relay
  • New editor plugin plugins/sync/ — replaces plugins/yjs/, bridges SyncClientEditorDocumentStore via typed serialization layer
  • Delete all YJS code — removed yjs, y-websocket, y-protocols, y-webrtc dependencies and the entire plugins/yjs/ directory

Test coverage

  • 81 tests in @grida/canvas-sync — unit tests for diff/validate/client + 17 multi-client integration tests through a MockServer that implements SyncRoom semantics (concurrent edits, LWW conflicts, reconnection, rapid bursts, 3-client scenarios)
  • 9 end-to-end tests with headless Editor instances — two real editors wired through MockServer, testing the full pipeline: editor dispatch → Zustand → serialize → diff → sync → broadcast → deserialize → applyDocumentPatches → editor state

Key design decisions

Decision Choice
Sync model Server-authoritative optimistic rebase (TLDraw-inspired)
Conflict resolution Property-level LWW — concurrent edits to different fields of the same node both survive
Wire format Custom DocumentDiff with typed NodeOp (put/patch/remove) and FieldOp (put/delete)
Persistence CF Durable Object embedded SQLite with atomic transactions
Transport ISyncTransport interface — WebSocketTransport for production, MockTransport for tests

Architecture

Client                          Server (CF Durable Object)
┌──────────┐  ┌────────────┐   ┌────────────────────────┐
│ Editor   │↔ │ SyncClient │ ↔ │ SyncRoom (G1DO)        │
│ (Zustand)│  │ (rebase)   │ws │ ├─ canonical state      │
└──────────┘  └────────────┘   │ ├─ validate + apply     │
                               │ ├─ broadcast to peers   │
                               │ └─ SQLite persistence   │
                               └────────────────────────┘

See crdt-impl.plan.md at repo root for the full design document.

Summary by CodeRabbit

  • New Features

    • CanvasPlayground now mounts the canvas surface by default and uses a new built-in sync plugin for multiplayer cursors/presence.
  • Chores

    • Replaced the prior Yjs-based sync with a new hosted sync service and package plus a server-backed room/storage implementation and websocket transport.
  • Tests

    • Added extensive unit, integration, and end-to-end tests for diffs, validation, client/server sync, transport, presence, and multi-client convergence.
  • Documentation

    • Added design documentation describing the synchronization architecture and protocol.

… utilities

- Add DocumentClock class for managing monotonic document clocks.
- Introduce diff utilities for computing, applying, and composing document diffs.
- Create protocol types for defining the wire protocol between client and server.
- Implement transport layer with WebSocketTransport for communication.
- Add presence management for tracking user presence in the document.
- Include validation logic for incoming diffs to ensure structural integrity.
- Set up TypeScript configuration for the project.
- Deleted y-document.ts and y-patches.ts files, removing Yjs dependencies.
- Updated package.json to remove Yjs and related packages.
- Introduced @grida/canvas-sync as a new dependency for document synchronization.
- Refactored integration tests to utilize a new makeNode helper function for creating nodes.
- Enhanced document worker to include CORS support and improved WebSocket message handling.
- Implemented atomic write operations in SyncStorage to ensure data integrity during diff applications.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
blog Ready Ready Preview, Comment Apr 9, 2026 11:08am
docs Ready Ready Preview, Comment Apr 9, 2026 11:08am
grida Ready Ready Preview, Comment Apr 9, 2026 11:08am
viewer Ready Ready Preview, Comment Apr 9, 2026 11:08am
3 Skipped Deployments
Project Deployment Actions Updated (UTC)
backgrounds Ignored Ignored Preview Apr 9, 2026 11:08am
code Ignored Ignored Apr 9, 2026 11:08am
legacy Ignored Ignored Apr 9, 2026 11:08am

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

Walkthrough

Replaces the Yjs-based multiplayer sync with a new workspace package @grida/canvas-sync, adding a SyncClient, protocol/diff/validate/transport layers, EditorSyncPlugin with document/presence adapters and serialization, deterministic tests, and a Durable Object backed by SQLite; removes prior Yjs plugins and related worker/websocket/storage code. (49 words)

Changes

Cohort / File(s) Summary
Yjs removal
editor/grida-canvas/plugins/yjs/*, editor/grida-canvas/plugins/yjs/__tests__/y-patches.test.ts
Deleted the entire Yjs-based sync implementation (document & awareness managers, Y-patch utilities) and its tests.
New sync plugin & adapters
editor/grida-canvas/plugins/sync/index.ts, editor/grida-canvas/plugins/sync/document-sync.ts, editor/grida-canvas/plugins/sync/presence-sync.ts, editor/grida-canvas/plugins/sync/serialize.ts
Added EditorSyncPlugin, DocumentSyncAdapter, PresenceSyncAdapter, and document↔state serialization bridging the editor to the new SyncClient.
Playground integration
editor/grida-canvas-hosted/playground/playground.tsx
Switched imports/usages from EditorYSyncPlugin to EditorSyncPlugin and changed CanvasPlayground default backend prop from \dom`to`canvas``.
Editor headless tests
editor/grida-canvas/__tests__/headless/sync-integration.test.ts
Added headless end-to-end sync integration tests with in-file MockTransport/MockServer covering serialization, two-editor sync, concurrent edits, deletes, and convergence.
New @grida/canvas-sync package
packages/grida-canvas-sync/src/*, packages/grida-canvas-sync/package.json, packages/grida-canvas-sync/tsconfig.json
Implemented protocol types, diff/compose/apply, validation, DocumentClock, presence helpers, transport abstraction (WebSocketTransport), SyncClient state machine, and package manifest/tsconfig.
Canvas-sync tests & helpers
packages/grida-canvas-sync/__tests__/*, packages/grida-canvas-sync/__tests__/helpers.ts
Added comprehensive unit and integration tests and a deterministic MockTransport/MockServer test harness/helpers.
Durable Object refactor (removed)
services/grida-canvas-document-worker-cf/src/do.ts, services/grida-canvas-document-worker-cf/src/lib/*, services/grida-canvas-document-worker-cf/src/lib/websocket.ts, .../storage.ts
Removed prior Durable Object and Yjs-centric websocket/storage utilities.
New Durable Object + storage
services/grida-canvas-document-worker-cf/src/room.ts, services/grida-canvas-document-worker-cf/src/storage.ts, services/grida-canvas-document-worker-cf/src/index.ts
Added G1DO Durable Object implementing SyncRoom protocol, WebSocket handling, SQLite-backed SyncStorage with apply/getDelta, hibernation/session management, and presence broadcasting.
Dependency & package updates
editor/package.json, services/grida-canvas-document-worker-cf/package.json
Removed Yjs-related deps (yjs, y-websocket, y-webrtc, y-protocols, lib0) and added workspace dependency @grida/canvas-sync; updated scripts/devDeps.
TypeScript config changes
services/grida-canvas-document-worker-cf/tsconfig.json, packages/grida-canvas-sync/tsconfig.json
Changed moduleResolution to bundler and added package-specific tsconfig for the new sync package.

Sequence Diagram(s)

sequenceDiagram
    participant Editor as Editor
    participant Adapter as DocumentSyncAdapter
    participant Client as SyncClient
    participant Transport as WebSocketTransport
    participant Server as G1DO

    Editor->>Adapter: document change (throttled)
    Adapter->>Client: pushDiff(diff)
    Client->>Transport: send(push message)
    Transport->>Server: websocket push
    Server->>Server: validate & applyDiff (tick clock)
    Server->>Transport: push_ok / patch broadcast
    Transport->>Client: deliver(push_ok / patch)
    Client->>Adapter: emit stateChange
    Adapter->>Editor: applyDocumentPatches
Loading
sequenceDiagram
    participant ClientA as Client A
    participant Server as G1DO
    participant ClientB as Client B

    ClientA->>Server: push(diff: add node X)
    Server->>Server: validate & apply, broadcast patch
    Server->>ClientA: push_ok(commit)
    Server->>ClientB: patch(diff: node X added)
    ClientB->>ClientB: apply patch, recompute state
    Note over ClientA,ClientB: Both clients converge to same state
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested labels

org, cg, enhancement

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.37% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: replacing YJS with a custom server-authoritative document sync system. It directly matches the primary objective of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch canary

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@softmarshmallow softmarshmallow changed the title Canary Replace YJS with custom server-authoritative document sync Apr 8, 2026
@softmarshmallow softmarshmallow marked this pull request as ready for review April 8, 2026 16:26
Comment thread editor/grida-canvas/plugins/sync/presence-sync.ts Fixed
Comment thread packages/grida-canvas-sync/__tests__/helpers.ts Fixed
Comment thread packages/grida-canvas-sync/__tests__/client.test.ts Fixed
Comment thread packages/grida-canvas-sync/__tests__/client.test.ts Fixed
Comment thread packages/grida-canvas-sync/__tests__/integration.test.ts Fixed
Comment thread packages/grida-canvas-sync/__tests__/integration.test.ts Fixed
Comment thread packages/grida-canvas-sync/__tests__/integration.test.ts Fixed
Comment thread packages/grida-canvas-sync/__tests__/integration.test.ts Fixed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 618259e6c2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +274 to +277
// If we have unsent changes from before reconnect, push them
if (!isDiffEmpty(this._unsent)) {
this._schedulePush();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Requeue unacked speculative diffs after reconnect

When the socket drops after _flush() has moved a local change from _unsent into _speculative, reconnect only schedules a resend if _unsent is non-empty. That leaves the in-flight diff stranded forever, so the server never commits it. In that state, a later push_ok can acknowledge a newer push while _handlePushOk still applies _speculative[0], causing client/server divergence and lost edits after transient disconnects.

Useful? React with 👍 / 👎.

Comment thread packages/grida-canvas-sync/src/diff.ts Outdated
Comment on lines +161 to +163
// If the set is the same but order changed, emit a reorder
if (ops.length === 0) {
ops.push({ op: "reorder", ids: after });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve scene ordering when membership also changes

computeSceneDiff emits reorder only when there are no add/remove ops, so mixed changes lose ordering information. For example, before=[s1,s2,s3] and after=[s3,s2,s4] produces only remove s1 + add s4; applying that yields [s2,s3,s4], not [s3,s2,s4]. Any coalesced edit that both changes scene membership and reorders surviving scenes will sync to the wrong order.

Useful? React with 👍 / 👎.

Comment on lines +77 to +81
case "add":
if (!projectedNodes.has(sceneOp.id)) {
errors.push({
target: sceneOp.id,
code: "SCENE_ADD_MISSING_NODE",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject non-scene nodes in scene-add validation

scene add currently validates only node existence, not node type, so a client can add a non-scene node ID into scenes. Downstream editor logic assumes scenes_ref entries are actual SceneNodes (e.g. getScene asserts type === "scene"), so accepting this payload can poison canonical state and trigger runtime failures when scene actions execute.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (7)
packages/grida-canvas-sync/tsconfig.json (1)

6-6: Move Vitest globals out of the src-only tsconfig.

This config injects vitest/globals into src, but Lines 15-16 exclude __tests__. That makes test-only globals available in production code while tsc --noEmit skips the files that actually use them. Please confirm there is another tsconfig covering the tests; otherwise split this into a library tsconfig plus a test tsconfig.

Also applies to: 15-16

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/grida-canvas-sync/tsconfig.json` at line 6, The tsconfig currently
injects "vitest/globals" into the src-only build (the "types":
["vitest/globals"] entry) while simultaneously excluding test folders
("__tests__"), which exposes test-only globals to production code; fix this by
removing "vitest/globals" from this tsconfig and either create a separate
tsconfig.test.json that extends the base and adds "types": ["vitest/globals"]
for tests, or split into a library tsconfig (no vitest types) plus a
test-specific tsconfig that includes the __tests__ patterns; ensure the test
tsconfig is referenced by your test runner or npm scripts so tests still pick up
vitest globals.
editor/grida-canvas/plugins/sync/presence-sync.ts (1)

140-160: Unused peerId variable in presence iteration.

The peerId from Object.entries(peers) (line 140) is not used in the loop body. The cursor ID is taken from presence.cursor.cursor_id instead. This is fine if the cursor_id is the canonical identifier, but the unused variable could be removed for clarity.

♻️ Optional: Use array iteration if peerId is unused
-      for (const [peerId, presence] of Object.entries(peers)) {
+      for (const presence of Object.values(peers)) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@editor/grida-canvas/plugins/sync/presence-sync.ts` around lines 140 - 160,
The loop is unpacking [peerId, presence] from Object.entries(peers) but never
uses peerId; update the iteration to avoid the unused variable by iterating
values only (e.g., use Object.values(peers) or for (const presence of
Object.values(peers))) and keep the existing logic that reads
presence.cursor.cursor_id into the cursors map; ensure symbols referenced remain
presence, cursor, cursors and this._colorToPalette.
editor/grida-canvas/plugins/sync/index.ts (2)

54-60: Track TODOs for schema version and clock persistence.

The hardcoded schema version (line 55) and lastClock: 0 (line 58) are flagged with TODOs. These are important for:

  1. Schema version: Ensures clients and server agree on document format
  2. lastClock: Enables proper catch-up after page refresh without full state resync

Consider creating issues to track these before the sync feature is considered production-ready.

Do you want me to open issues to track these TODOs?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@editor/grida-canvas/plugins/sync/index.ts` around lines 54 - 60, The code
creates a SyncClient with hardcoded schema and lastClock (see SyncClient
instantiation) which risks version drift and lost progress; replace the literal
schema string with a constant imported from the canonical schema package (e.g.,
import SCHEMA_VERSION from `@grida/schema` and use it in the SyncClient options)
and load persisted clock state instead of 0 by reading the stored sync-state
(e.g., from OPFS/localStorage) before constructing SyncClient and passing that
value as lastClock; also add TODO-to-issue entries for "sync: derive schema
version from `@grida/schema`" and "sync: persist/load lastClock from storage" so
these tasks are tracked before production rollout.

40-40: Consider using structured logging or removing debug console.log.

The console.log statements on lines 40 and 75 are useful for debugging but may clutter production logs. Consider using a configurable logger or removing them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@editor/grida-canvas/plugins/sync/index.ts` at line 40, Replace the two
console.log debug statements (the one logging "sync::constructor" for the
constructor/initialization and the other at line ~75) with either removal or
calls to the project's structured logger; locate the logs in the Sync plugin
initializer/constructor in editor/grida-canvas/plugins/sync/index.ts (search for
"sync::constructor") and either remove them or change them to use the shared
logger instance (or a debug-level logger call) controlled by configuration so
debug output can be enabled in dev and suppressed in production.
packages/grida-canvas-sync/src/client.ts (2)

192-200: Presence updates are silently dropped when not connected.

This is acceptable for ephemeral data like cursor position, but worth noting that presence updates during reconnection are lost. Consider documenting this behavior or returning a boolean to indicate success.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/grida-canvas-sync/src/client.ts` around lines 192 - 200, The
setPresence method currently drops updates when this._status !== "ready" without
feedback; update setPresence(presence: PresenceState) to return a boolean
indicating success (true when this._transport.send was invoked and false when
dropped) and ensure callers can act on the result, or alternatively add a clear
comment/docs on the method explaining that presence updates are ephemeral and
are not queued during reconnection; reference the setPresence method, the
this._status check, and this._transport.send call when making the change.

308-311: Unnecessary scheduling when only speculative diffs remain.

The condition this._speculative.length > 0 triggers scheduling even when those diffs are already in-flight. Since _flush() guards with _pushInFlight, this won't cause duplicate sends, but it's misleading and creates unnecessary timer churn.

Only schedule when there are unsent changes
     // If there's more to push, schedule it
-    if (!isDiffEmpty(this._unsent) || this._speculative.length > 0) {
+    if (!isDiffEmpty(this._unsent)) {
       this._schedulePush();
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/grida-canvas-sync/src/client.ts` around lines 308 - 311, The
scheduling condition is too broad: change the block that calls
this._schedulePush() so it only schedules when there are actual unsent diffs
(use !isDiffEmpty(this._unsent)) and remove the this._speculative.length > 0
check; _flush() already prevents duplicate sends via _pushInFlight, so only
unsent changes should trigger _schedulePush() to avoid unnecessary timer churn
while speculative diffs are in-flight.
packages/grida-canvas-sync/__tests__/helpers.ts (1)

161-177: Minor: Redundant condition in connect handler.

The condition msg.lastClock === 0 is redundant since msg.lastClock < this.clock.value already covers the lastClock === 0 case (assuming server clock starts at 0 and only increases).

Simplify condition
 private _handleConnect(session: MockSession, msg: ConnectMessage): void {
-  if (msg.lastClock === 0 || msg.lastClock < this.clock.value) {
+  if (msg.lastClock < this.clock.value) {
     // Send full state
     session.transport.deliver({
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/grida-canvas-sync/__tests__/helpers.ts` around lines 161 - 177, The
if condition in _handleConnect is redundant: remove the explicit msg.lastClock
=== 0 check and use a single comparison msg.lastClock < this.clock.value to
decide to send the full state; update the branch in the private method
_handleConnect (involving MockSession, ConnectMessage, and this.clock.value) so
when msg.lastClock < this.clock.value it sends the full state (type "connect_ok"
with clock, state, scenes) and otherwise sends the up-to-date response (type
"connect_ok" with only clock).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@editor/grida-canvas/plugins/sync/document-sync.ts`:
- Around line 59-84: The throttled editor callback (created via editor.throttle
inside the subscribeWithSelector block) can delay local edits so they get
overwritten by incoming remote state; fix it by assigning the result of
editor.throttle(...) to a named throttled function (e.g., flushPendingEdits or
throttledPush) instead of inlining it, and then ensure any remote state-apply
path (the code handling incoming stateChange / document replacement — also
around the other subscribe block at lines 92-117) calls throttledPush.flush() or
invokes throttledPush synchronously before applying the remote document; keep
the same 30ms throttle and trailing behavior but guarantee pending local edits
are flushed via the named throttled function (and update usages of _mutex,
_lastTid, lastSyncedState, and _client.pushDiff accordingly).

In `@editor/grida-canvas/plugins/sync/serialize.ts`:
- Around line 49-60: Add server-side validation to block remote edits to the
pseudo-node "__doc_meta__": in validateNodeOp (and mirror checks in applyDiff if
it applies ops directly), detect operations whose target node id ===
"__doc_meta__" and reject patch/put/remove operations from remote peers by
returning/throwing a validation error (or marking op invalid) instead of
applying them; ensure the check covers op types that mutate (e.g., "patch",
"put", "remove") and leaves reads/no-ops unaffected so the server’s canonical
doc-level fields (links, images, bitmaps, properties, entry_scene_id) cannot be
mutated by incoming sync ops.

In `@packages/grida-canvas-sync/__tests__/helpers.ts`:
- Around line 323-336: The current comparison only iterates Object.keys(sNode)
so extra properties on clientNodes[id] are missed; update the check inside the
serverIds loop (the block using serverNodes, clientNodes, sNode, cNode) to also
iterate Object.keys(cNode) and assert each client key exists on sNode and has
the same JSON value (or perform a symmetric key-set comparison between sNode and
cNode) and throw a clear error mentioning the node id and offending key when a
client-only field or value mismatch is found.

In `@packages/grida-canvas-sync/src/client.ts`:
- Around line 255-278: On connect_ok in _handleConnectOk, clear the stale
speculative diff state (_speculative) before calling _recomputeLocalState so
previously in-flight speculative patches from the prior connection are not
applied to the new canonical state; locate _handleConnectOk and after updating
this._canonical and this._serverClock (and before this._recomputeLocalState())
set this._speculative to an empty/initial diff value (matching how _speculative
is initialized elsewhere) and then proceed to _recomputeLocalState and the
existing _schedulePush logic for any truly unsent changes in _unsent.
- Around line 392-408: The ref-check in _recomputeLocalState is false-positive
because applyDiff always returns new objects; update the function to skip
applying logically-empty diffs by checking each speculative diff with
isDiffEmpty(spec) before calling applyDiff on _speculative, and also use
isDiffEmpty to guard applying _unsent (or perform a deep/content equality check
between merged and _localState instead of a reference check) so stateChange only
emits when content actually differs; reference symbols: _recomputeLocalState,
_speculative, applyDiff, isDiffEmpty, _unsent, and _localState.

In `@packages/grida-canvas-sync/src/clock.ts`:
- Around line 16-18: DocumentClock must enforce monotonic, non-NaN, non-negative
values: update the constructor (DocumentClock.constructor) and the reset method
(DocumentClock.reset) to validate their incoming number argument: ensure it's a
finite number (not NaN/Infinity), >= 0, and does not roll back the clock (for
reset require newValue >= this._value or else throw or ignore). If invalid,
throw a clear error (or reject) rather than accepting the value; keep the
internal field (_value) unchanged on invalid input. This guarantees the
monotonic behavior used by clock comparisons.

In `@packages/grida-canvas-sync/src/diff.ts`:
- Around line 143-164: computeDiff currently emits separate remove/add when
membership changes but length stays the same (e.g. [a,b]→[c,b]) which loses
ordering; update the logic after building ops to detect the case where ops
include both "remove" and "add" and before.length === after.length and, in that
case, replace ops with a single { op: "reorder", ids: after } so
applyDiff(before, ops) reconstructs after; reference the local variables ops,
before, after and the "reorder" op when making the change.

In `@packages/grida-canvas-sync/src/transport.ts`:
- Around line 92-109: connect() must cancel any pending reconnect timers before
opening a new socket to avoid a stale timer creating a second WebSocket; modify
connect() to clear and null out _reconnectTimer (same logic as in disconnect())
before calling _openSocket(), and also apply the same guard where reconnection
is scheduled (the other _openSocket() invocations around the reconnect logic) so
_openSocket() cannot be invoked by an outstanding timer after a manual connect;
reference connect(), disconnect(), _reconnectTimer, _openSocket(), _ws and
_intentionalClose when making the change.

In `@packages/grida-canvas-sync/src/validate.ts`:
- Around line 32-33: ValidationErrorCode includes "SCENE_ADD_NOT_SCENE" but
there's no code that emits it; add a check in the scene validation block (the
code handling scene additions/removals—look for the logic that currently emits
SCENE_ADD and SCENE_REMOVE) to verify the referenced node's type === "scene"
and, if not, throw/return a ValidationError using
ValidationErrorCode.SCENE_ADD_NOT_SCENE (include the offending node id and
context). Alternatively, if that validation is not needed, remove the unused
enum value ValidationErrorCode.SCENE_ADD_NOT_SCENE to keep the API consistent.

In `@services/grida-canvas-document-worker-cf/src/room.ts`:
- Around line 270-273: The code advances the in-memory clock and canonical
before persisting, so if this.storage.applyDiff(msg.diff, newClock) throws the
DO will be left in a non-persistent state; instead, compute the next clock value
(e.g. call the tick/next function but do not assign it to this.clock), call
this.storage.applyDiff(msg.diff, nextClock) first, and only after that succeeds
assign this.clock = nextClock and set this.canonical = applyDiff(this.canonical,
msg.diff); keep references to this.clock.tick()/tick result, applyDiff,
this.canonical, and this.storage.applyDiff to locate the changes.
- Around line 137-145: The current oversized-message guard uses message.length
(UTF-16 code units) which undercounts wire bytes; change the check to compute
UTF-8 byte length (e.g., use Buffer.byteLength(message, 'utf8') or new
TextEncoder().encode(message).length) when comparing against MAX_MESSAGE_SIZE in
the same block, so the if(...) uses the UTF-8 byte count and the error path that
calls this._send(ws, {...}) remains unchanged.

---

Nitpick comments:
In `@editor/grida-canvas/plugins/sync/index.ts`:
- Around line 54-60: The code creates a SyncClient with hardcoded schema and
lastClock (see SyncClient instantiation) which risks version drift and lost
progress; replace the literal schema string with a constant imported from the
canonical schema package (e.g., import SCHEMA_VERSION from `@grida/schema` and use
it in the SyncClient options) and load persisted clock state instead of 0 by
reading the stored sync-state (e.g., from OPFS/localStorage) before constructing
SyncClient and passing that value as lastClock; also add TODO-to-issue entries
for "sync: derive schema version from `@grida/schema`" and "sync: persist/load
lastClock from storage" so these tasks are tracked before production rollout.
- Line 40: Replace the two console.log debug statements (the one logging
"sync::constructor" for the constructor/initialization and the other at line
~75) with either removal or calls to the project's structured logger; locate the
logs in the Sync plugin initializer/constructor in
editor/grida-canvas/plugins/sync/index.ts (search for "sync::constructor") and
either remove them or change them to use the shared logger instance (or a
debug-level logger call) controlled by configuration so debug output can be
enabled in dev and suppressed in production.

In `@editor/grida-canvas/plugins/sync/presence-sync.ts`:
- Around line 140-160: The loop is unpacking [peerId, presence] from
Object.entries(peers) but never uses peerId; update the iteration to avoid the
unused variable by iterating values only (e.g., use Object.values(peers) or for
(const presence of Object.values(peers))) and keep the existing logic that reads
presence.cursor.cursor_id into the cursors map; ensure symbols referenced remain
presence, cursor, cursors and this._colorToPalette.

In `@packages/grida-canvas-sync/__tests__/helpers.ts`:
- Around line 161-177: The if condition in _handleConnect is redundant: remove
the explicit msg.lastClock === 0 check and use a single comparison msg.lastClock
< this.clock.value to decide to send the full state; update the branch in the
private method _handleConnect (involving MockSession, ConnectMessage, and
this.clock.value) so when msg.lastClock < this.clock.value it sends the full
state (type "connect_ok" with clock, state, scenes) and otherwise sends the
up-to-date response (type "connect_ok" with only clock).

In `@packages/grida-canvas-sync/src/client.ts`:
- Around line 192-200: The setPresence method currently drops updates when
this._status !== "ready" without feedback; update setPresence(presence:
PresenceState) to return a boolean indicating success (true when
this._transport.send was invoked and false when dropped) and ensure callers can
act on the result, or alternatively add a clear comment/docs on the method
explaining that presence updates are ephemeral and are not queued during
reconnection; reference the setPresence method, the this._status check, and
this._transport.send call when making the change.
- Around line 308-311: The scheduling condition is too broad: change the block
that calls this._schedulePush() so it only schedules when there are actual
unsent diffs (use !isDiffEmpty(this._unsent)) and remove the
this._speculative.length > 0 check; _flush() already prevents duplicate sends
via _pushInFlight, so only unsent changes should trigger _schedulePush() to
avoid unnecessary timer churn while speculative diffs are in-flight.

In `@packages/grida-canvas-sync/tsconfig.json`:
- Line 6: The tsconfig currently injects "vitest/globals" into the src-only
build (the "types": ["vitest/globals"] entry) while simultaneously excluding
test folders ("__tests__"), which exposes test-only globals to production code;
fix this by removing "vitest/globals" from this tsconfig and either create a
separate tsconfig.test.json that extends the base and adds "types":
["vitest/globals"] for tests, or split into a library tsconfig (no vitest types)
plus a test-specific tsconfig that includes the __tests__ patterns; ensure the
test tsconfig is referenced by your test runner or npm scripts so tests still
pick up vitest globals.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 56920275-a013-4c3f-b6bf-2a1d506138c9

📥 Commits

Reviewing files that changed from the base of the PR and between 2572258 and 618259e.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (36)
  • editor/grida-canvas-hosted/playground/playground.tsx
  • editor/grida-canvas/__tests__/headless/sync-integration.test.ts
  • editor/grida-canvas/plugins/sync/document-sync.ts
  • editor/grida-canvas/plugins/sync/index.ts
  • editor/grida-canvas/plugins/sync/presence-sync.ts
  • editor/grida-canvas/plugins/sync/serialize.ts
  • editor/grida-canvas/plugins/yjs/__tests__/y-patches.test.ts
  • editor/grida-canvas/plugins/yjs/index.ts
  • editor/grida-canvas/plugins/yjs/y-awareness.ts
  • editor/grida-canvas/plugins/yjs/y-document.ts
  • editor/grida-canvas/plugins/yjs/y-patches.ts
  • editor/package.json
  • packages/grida-canvas-sync/__tests__/client.test.ts
  • packages/grida-canvas-sync/__tests__/diff.test.ts
  • packages/grida-canvas-sync/__tests__/helpers.ts
  • packages/grida-canvas-sync/__tests__/integration.test.ts
  • packages/grida-canvas-sync/__tests__/validate.test.ts
  • packages/grida-canvas-sync/package.json
  • packages/grida-canvas-sync/src/client.ts
  • packages/grida-canvas-sync/src/clock.ts
  • packages/grida-canvas-sync/src/diff.ts
  • packages/grida-canvas-sync/src/index.ts
  • packages/grida-canvas-sync/src/presence.ts
  • packages/grida-canvas-sync/src/protocol.ts
  • packages/grida-canvas-sync/src/transport.ts
  • packages/grida-canvas-sync/src/validate.ts
  • packages/grida-canvas-sync/tsconfig.json
  • services/grida-canvas-document-worker-cf/package.json
  • services/grida-canvas-document-worker-cf/src/do.ts
  • services/grida-canvas-document-worker-cf/src/index.ts
  • services/grida-canvas-document-worker-cf/src/lib/index.ts
  • services/grida-canvas-document-worker-cf/src/lib/storage.ts
  • services/grida-canvas-document-worker-cf/src/lib/websocket.ts
  • services/grida-canvas-document-worker-cf/src/room.ts
  • services/grida-canvas-document-worker-cf/src/storage.ts
  • services/grida-canvas-document-worker-cf/tsconfig.json
💤 Files with no reviewable changes (9)
  • services/grida-canvas-document-worker-cf/src/lib/index.ts
  • services/grida-canvas-document-worker-cf/src/lib/storage.ts
  • editor/grida-canvas/plugins/yjs/tests/y-patches.test.ts
  • editor/grida-canvas/plugins/yjs/y-awareness.ts
  • editor/grida-canvas/plugins/yjs/y-patches.ts
  • editor/grida-canvas/plugins/yjs/index.ts
  • services/grida-canvas-document-worker-cf/src/lib/websocket.ts
  • services/grida-canvas-document-worker-cf/src/do.ts
  • editor/grida-canvas/plugins/yjs/y-document.ts

Comment on lines +59 to +84
this._unsubscribeEditor = this._editor.doc.subscribeWithSelector(
(state) => state.document,
editor.throttle(
(
store: EditorDocumentStore,
_next: grida.program.document.Document,
_prev: grida.program.document.Document
) => {
if (store.locked) return;
if (this._lastTid === store.tid) return;
this._lastTid = store.tid;

this._mutex(() => {
const currentState = documentToState(
this._editor.doc.state.document
);
const diff = computeDiff(this.lastSyncedState, currentState);
if (diff) {
this.lastSyncedState = currentState;
this._client.pushDiff(diff);
}
});
},
30, // 30ms throttle, same as old YJS plugin
{ trailing: true }
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Do not let the editor outrun SyncClient during the throttle window.

Local edits only reach SyncClient when this throttled callback runs, but remote stateChange replaces the whole document immediately. That means an edit made inside the throttle window can be overwritten by an incoming patch before pushDiff() ever sees it, so the change is silently lost instead of being rebased. The manual flushEditorToServer() helper in the headless tests currently hides this race.

Also applies to: 92-117

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@editor/grida-canvas/plugins/sync/document-sync.ts` around lines 59 - 84, The
throttled editor callback (created via editor.throttle inside the
subscribeWithSelector block) can delay local edits so they get overwritten by
incoming remote state; fix it by assigning the result of editor.throttle(...) to
a named throttled function (e.g., flushPendingEdits or throttledPush) instead of
inlining it, and then ensure any remote state-apply path (the code handling
incoming stateChange / document replacement — also around the other subscribe
block at lines 92-117) calls throttledPush.flush() or invokes throttledPush
synchronously before applying the remote document; keep the same 30ms throttle
and trailing behavior but guarantee pending local edits are flushed via the
named throttled function (and update usages of _mutex, _lastTid,
lastSyncedState, and _client.pushDiff accordingly).

Comment on lines +49 to +60
// Store document-level data as a special meta record
const meta: DocMetaRecord = {
type: "__doc_meta__",
id: "__doc_meta__",
links: doc.links ?? {},
images: doc.images ?? {},
bitmaps: doc.bitmaps ?? {},
properties: doc.properties ?? {},
metadata: doc.metadata,
entry_scene_id: doc.entry_scene_id ?? null,
};
nodes["__doc_meta__"] = meta as unknown as SerializedNode;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if there's any special handling for __doc_meta__ in the sync layer
echo "=== Checking for __doc_meta__ handling in diff/validate logic ==="
rg -n "__doc_meta__" --type ts

Repository: gridaco/grida

Length of output: 1612


🏁 Script executed:

# Examine the test that checks __doc_meta__ exclusion
sed -n '300,320p' editor/grida-canvas/__tests__/headless/sync-integration.test.ts

Repository: gridaco/grida

Length of output: 878


🏁 Script executed:

# Examine the server-side filtering mentioned at line 448-450
sed -n '445,455p' editor/grida-canvas/__tests__/headless/sync-integration.test.ts

Repository: gridaco/grida

Length of output: 530


🏁 Script executed:

# Check if validateNodeOp has special handling for __doc_meta__
sed -n '106,167p' packages/grida-canvas-sync/src/validate.ts | head -80

Repository: gridaco/grida

Length of output: 1654


🏁 Script executed:

# Check applyDiff for __doc_meta__ handling
sed -n '177,224p' packages/grida-canvas-sync/src/diff.ts

Repository: gridaco/grida

Length of output: 1200


🏁 Script executed:

# Examine stateToDocument to see where __doc_meta__ is filtered
sed -n '73,105p' editor/grida-canvas/plugins/sync/serialize.ts

Repository: gridaco/grida

Length of output: 1144


🏁 Script executed:

# Search for any other places that might validate/protect __doc_meta__
rg -n "validateNodeOp|applyDiff|__doc_meta__" editor/grida-canvas plugins/sync packages/grida-canvas-sync/src --type ts -B 2 -A 2 | grep -E "(validateNodeOp|applyDiff|__doc_meta__|server|Server)" | head -40

Repository: gridaco/grida

Length of output: 3377


🏁 Script executed:

# Check if there's any server-side middleware or validation that blocks __doc_meta__
fd -e ts -path "*/server/*" -o -path "*/validate*" packages/grida-canvas-sync | head -10

Repository: gridaco/grida

Length of output: 225


🏁 Script executed:

# Search for any validation that explicitly blocks __doc_meta__ operations
rg -n "__doc_meta__" packages/grida-canvas-sync/src --type ts -B 3 -A 3

Repository: gridaco/grida

Length of output: 39


🏁 Script executed:

# Check if there are tests for __doc_meta__ patch operations or protection
rg -n "patch.*__doc_meta__|__doc_meta__.*patch" editor/grida-canvas/__tests__ --type ts -B 2 -A 5

Repository: gridaco/grida

Length of output: 39


🏁 Script executed:

# Look for any comments about __doc_meta__ security or design decisions
rg -n "__doc_meta__" editor/grida-canvas/plugins/sync --type ts -B 5 -A 5 | head -60

Repository: gridaco/grida

Length of output: 4918


Add server-side validation to prevent remote clients from patching __doc_meta__ metadata fields.

The __doc_meta__ pseudo-node stores critical document-level metadata (links, images, bitmaps, properties, entry_scene_id). While stateToDocument correctly filters it out before returning a Document to clients, the server's sync layer in applyDiff and validateNodeOp currently treats __doc_meta__ like any other node without special guards.

This allows a remote peer to send patch operations targeting __doc_meta__ that would modify the server's canonical document metadata. Add explicit validation in validateNodeOp to reject patch/put/remove operations on the __doc_meta__ node, or document this as an intentional design choice if server-side integrity is managed elsewhere.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@editor/grida-canvas/plugins/sync/serialize.ts` around lines 49 - 60, Add
server-side validation to block remote edits to the pseudo-node "__doc_meta__":
in validateNodeOp (and mirror checks in applyDiff if it applies ops directly),
detect operations whose target node id === "__doc_meta__" and reject
patch/put/remove operations from remote peers by returning/throwing a validation
error (or marking op invalid) instead of applying them; ensure the check covers
op types that mutate (e.g., "patch", "put", "remove") and leaves reads/no-ops
unaffected so the server’s canonical doc-level fields (links, images, bitmaps,
properties, entry_scene_id) cannot be mutated by incoming sync ops.

Comment thread packages/grida-canvas-sync/__tests__/helpers.ts Outdated
Comment thread packages/grida-canvas-sync/src/client.ts
Comment thread packages/grida-canvas-sync/src/client.ts
Comment thread packages/grida-canvas-sync/src/transport.ts
Comment thread packages/grida-canvas-sync/src/validate.ts
Comment thread services/grida-canvas-document-worker-cf/src/room.ts Outdated
Comment thread services/grida-canvas-document-worker-cf/src/room.ts Outdated
Comment thread services/grida-canvas-document-worker-cf/src/storage.ts
…se, persist-before-advance, and code quality

- Fix critical scene diff ordering bug: computeSceneDiff now always emits
  a single reorder op, preventing [a,b]→[c,b] from producing [b,c]
- Fix reconnect state corruption: clear stale speculative diffs and
  requeue them as unsent on connect_ok
- Fix persist-before-advance: storage.applyDiff runs before in-memory
  clock/canonical advance, so a storage throw leaves state consistent
- Fix spurious stateChange events: fast-path ref equality when no
  speculative/unsent changes exist, skip applyDiff for empty diffs
- Fix double-socket race: cancel pending reconnect timer in connect()
- Fix tombstone pruning gap: record tombstone_floor clock, getDelta
  returns null (forcing full state) for stale sinceClock values
- Fix message size guard: use UTF-8 byte length instead of UTF-16
- Implement SCENE_ADD_NOT_SCENE validation (was declared but unused)
- Add clock validation guards (non-negative safe integer)
- Fix assertConvergence to check keys from both server and client
- Remove all unused imports/variables flagged by code quality checks
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
editor/grida-canvas/plugins/sync/presence-sync.ts (1)

157-158: Consider tracking presence features for follow-up.

The marquee and ephemeral_chat fields are set to null with TODO comments. These features may be useful for collaborative editing (showing other users' selection boxes and chat bubbles).

Would you like me to open an issue to track implementing marquee and cursor chat in the presence protocol?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@editor/grida-canvas/plugins/sync/presence-sync.ts` around lines 157 - 158,
Add tracked follow-up work for the null presence fields by opening a repository
issue to implement marquee and ephemeral_chat in the presence protocol, and
update the TODO comments in presence-sync.ts (the marquee and ephemeral_chat
entries) to reference that issue number/URL; ensure the issue describes expected
behavior (selection marquee rendering, ephemeral cursor chat bubbles), data
shape, and backwards-compatible handling so future work can implement
PresenceSync methods using those fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@editor/grida-canvas/plugins/sync/presence-sync.ts`:
- Around line 157-158: Add tracked follow-up work for the null presence fields
by opening a repository issue to implement marquee and ephemeral_chat in the
presence protocol, and update the TODO comments in presence-sync.ts (the marquee
and ephemeral_chat entries) to reference that issue number/URL; ensure the issue
describes expected behavior (selection marquee rendering, ephemeral cursor chat
bubbles), data shape, and backwards-compatible handling so future work can
implement PresenceSync methods using those fields.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3c74d197-13de-4e2b-8c45-9e0bdec8279f

📥 Commits

Reviewing files that changed from the base of the PR and between 618259e and 68ce685.

📒 Files selected for processing (13)
  • editor/grida-canvas/plugins/sync/presence-sync.ts
  • packages/grida-canvas-sync/__tests__/client.test.ts
  • packages/grida-canvas-sync/__tests__/diff.test.ts
  • packages/grida-canvas-sync/__tests__/helpers.ts
  • packages/grida-canvas-sync/__tests__/integration.test.ts
  • packages/grida-canvas-sync/__tests__/validate.test.ts
  • packages/grida-canvas-sync/src/client.ts
  • packages/grida-canvas-sync/src/clock.ts
  • packages/grida-canvas-sync/src/diff.ts
  • packages/grida-canvas-sync/src/transport.ts
  • packages/grida-canvas-sync/src/validate.ts
  • services/grida-canvas-document-worker-cf/src/room.ts
  • services/grida-canvas-document-worker-cf/src/storage.ts
✅ Files skipped from review due to trivial changes (2)
  • packages/grida-canvas-sync/tests/integration.test.ts
  • packages/grida-canvas-sync/tests/client.test.ts
🚧 Files skipped from review as they are similar to previous changes (4)
  • packages/grida-canvas-sync/src/clock.ts
  • services/grida-canvas-document-worker-cf/src/storage.ts
  • packages/grida-canvas-sync/src/transport.ts
  • packages/grida-canvas-sync/tests/helpers.ts

Comment thread packages/grida-canvas-sync/__tests__/client.test.ts Fixed
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
editor/grida-canvas/plugins/sync/document-sync.ts (1)

66-89: ⚠️ Potential issue | 🔴 Critical

Pending local edits can still be lost when remote state arrives.

At Line 66 the editor→client path is trailing-throttled, while Line 98 applies remote stateChange immediately. A local edit inside the 30ms window can be overwritten before pushDiff() runs, so it never gets rebased/pushed.

Suggested fix pattern
+ // 1) Extract the local diff push logic into a named immediate function.
+ // 2) Use a named throttled wrapper only for editor subscription events.
+ // 3) Before applying remote state, force a sync of pending local edits
+ //    by calling the immediate push function (or flushing the named throttle)
+ //    so local changes are not dropped.

- editor.throttle((store, _next, _prev) => { ...pushDiff... }, 30, { trailing: true })
+ const throttledPushLocal = editor.throttle(() => {
+   this._pushLocalDiffNow();
+ }, 30, { trailing: true });

  this._unsubscribeEditor = this._editor.doc.subscribeWithSelector(
    (state) => state.document,
-   /* inline throttled callback */
+   throttledPushLocal
  );

  this._unsubscribeClient = this._client.on("stateChange", (newState) => {
+   this._pushLocalDiffNow(); // flush pending local edits before remote apply
    this._mutex(() => {
      // apply remote state
    });
  });

Also applies to: 98-121

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@editor/grida-canvas/plugins/sync/document-sync.ts` around lines 66 - 89, The
trailing-only throttle on the editor→client path (editor.throttle callback that
checks this._lastTid and calls this._mutex -> this._client.pushDiff) can lose
local edits when remote stateChange is applied immediately; fix by making the
local-to-client flush serialize with remote application: either change the
throttle options to run leading (e.g., { leading: true, trailing: true }) so
local edits are pushed immediately, or ensure the remote stateChange handler
acquires the same this._mutex (or awaits a flush of the throttled handler)
before applying incoming changes so pushDiff and remote state application cannot
interleave and overwrite pending local edits (reference editor.throttle,
this._lastTid, this._mutex, this.lastSyncedState, this._client.pushDiff, and the
remote stateChange handler).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/grida-canvas-sync/__tests__/client.test.ts`:
- Around line 23-25: MockTransport.send currently ignores connection state and
always pushes messages, violating the ISyncTransport contract that send() must
throw when not connected; update MockTransport.send to check the transport's
connection status (e.g., this.connected or an isConnected() flag) and throw an
error if not connected, otherwise push the message into this.sent—ensure the
method signature remains send(message: ClientMessage): void and that the thrown
error is descriptive so tests can detect connection-state bugs in SyncClient.
- Around line 452-471: The test must explicitly assert that listeners are not
invoked after teardown: after calling client.destroy(), clear the handler's
recorded calls (handler.mockClear() or equivalent) then call
transport.deliver(...) and assert the handler was not called
(expect(handler).not.toHaveBeenCalled()); keep the existing
expect(client.status).toBe("disconnected") as well. This uses the existing
handler mock, client.destroy(), transport.deliver(), and client.status to verify
listener teardown.
- Around line 160-165: The test calls makeNode with an invalid third argument
("scene") causing a TS type error; update the call in the test case ("preserves
local state when server responds with no state and no diff") to pass only the
valid parameters by removing the third argument so the makeNode invocation
becomes makeNode("scene", { name: "Main" }), leaving the other makeNode calls
unchanged.

---

Duplicate comments:
In `@editor/grida-canvas/plugins/sync/document-sync.ts`:
- Around line 66-89: The trailing-only throttle on the editor→client path
(editor.throttle callback that checks this._lastTid and calls this._mutex ->
this._client.pushDiff) can lose local edits when remote stateChange is applied
immediately; fix by making the local-to-client flush serialize with remote
application: either change the throttle options to run leading (e.g., { leading:
true, trailing: true }) so local edits are pushed immediately, or ensure the
remote stateChange handler acquires the same this._mutex (or awaits a flush of
the throttled handler) before applying incoming changes so pushDiff and remote
state application cannot interleave and overwrite pending local edits (reference
editor.throttle, this._lastTid, this._mutex, this.lastSyncedState,
this._client.pushDiff, and the remote stateChange handler).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d6d23f11-f015-4da1-9008-ea12f8aada45

📥 Commits

Reviewing files that changed from the base of the PR and between 68ce685 and 8545841.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (5)
  • editor/grida-canvas/plugins/sync/document-sync.ts
  • packages/grida-canvas-sync/__tests__/client.test.ts
  • packages/grida-canvas-sync/src/client.ts
  • services/grida-canvas-document-worker-cf/package.json
  • services/grida-canvas-document-worker-cf/worker-configuration.d.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • services/grida-canvas-document-worker-cf/package.json
  • packages/grida-canvas-sync/src/client.ts

Comment thread packages/grida-canvas-sync/__tests__/client.test.ts
Comment on lines +160 to +165
it("preserves local state when server responds with no state and no diff", () => {
const initialState: DocumentState = {
nodes: {
n1: makeNode("n1", { width: 100 }),
scene: makeNode("scene", { name: "Main" }, "scene"),
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify there are no 3-arg makeNode() calls in this test file.
python - <<'PY'
from pathlib import Path
import re

p = Path("packages/grida-canvas-sync/__tests__/client.test.ts")
src = p.read_text()

for m in re.finditer(r'\bmakeNode\s*\(([^)]*)\)', src):
    args = [a.strip() for a in m.group(1).split(",")]
    if len(args) > 2:
        line = src.count("\n", 0, m.start()) + 1
        print(f"Line {line}: {m.group(0)}")
PY

Repository: gridaco/grida

Length of output: 291


🏁 Script executed:

cat -n "packages/grida-canvas-sync/__tests__/client.test.ts" | sed -n '160,170p'

Repository: gridaco/grida

Length of output: 551


Remove the invalid third argument to makeNode at line 164.

The function makeNode accepts only an id (string) and optional props object, but the call passes three arguments: makeNode("scene", { name: "Main" }, "scene"). The third argument "scene" does not correspond to any parameter and causes a TypeScript type error.

Proposed fix
-          scene: makeNode("scene", { name: "Main" }, "scene"),
+          scene: makeNode("scene", { name: "Main" }),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it("preserves local state when server responds with no state and no diff", () => {
const initialState: DocumentState = {
nodes: {
n1: makeNode("n1", { width: 100 }),
scene: makeNode("scene", { name: "Main" }, "scene"),
},
it("preserves local state when server responds with no state and no diff", () => {
const initialState: DocumentState = {
nodes: {
n1: makeNode("n1", { width: 100 }),
scene: makeNode("scene", { name: "Main" }),
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/grida-canvas-sync/__tests__/client.test.ts` around lines 160 - 165,
The test calls makeNode with an invalid third argument ("scene") causing a TS
type error; update the call in the test case ("preserves local state when server
responds with no state and no diff") to pass only the valid parameters by
removing the third argument so the makeNode invocation becomes makeNode("scene",
{ name: "Main" }), leaving the other makeNode calls unchanged.

Comment thread packages/grida-canvas-sync/__tests__/client.test.ts
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
packages/grida-canvas-sync/__tests__/client.test.ts (1)

139-150: Test name overstates what is asserted.

The test title includes connecting, but assertions only check syncing and ready. Consider renaming the test or explicitly asserting connecting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/grida-canvas-sync/__tests__/client.test.ts` around lines 139 - 150,
The test title claims "connecting → syncing → ready" but only asserts "syncing"
and "ready"; update the spec to either rename the test to reflect only "syncing
→ ready" or add an explicit assertion for the "connecting" state: after calling
transport.simulateConnected() assert that statuses contains "connecting" (in
addition to or instead of the current "syncing" check), or adjust the test name
string passed to it(...) to remove "connecting". Reference the existing test
block using the it(...) description, the createClientAndTransport() helper,
transport.simulateConnected(), client.on("statusChange", ...),
transport.deliver({ type: "connect_ok", ... }) and client.status when making the
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/wg/research/crdt/tldraw.md`:
- Around line 250-252: Update the conflicting phrasing around TLDraw's
conflict-resolution: replace the blanket “not last-write-wins” statement near
the line containing “TLDraw uses optimistic concurrency with
server-authoritative rebase” and adjust the later sentence that claims
same-field concurrent edits are effectively LWW via server order so both
describe the same behavior — e.g., say TLDraw is optimistic with
server-authoritative rebase which resolves concurrent same-field edits by
adopting server-ordered application (a deterministic resolution that can appear
LWW for that class of conflict) — and ensure both the earlier “not
last-write-wins” claim and the later “server order = LWW-like” phrasing are
aligned to this clarified wording.
- Around line 19-24: Several fenced code blocks in the tldraw CRDT doc are
missing language tags (MD040); update every unlabeled triple-backtick block
shown (the lists and ASCII diagrams such as the lines starting with
"@tldraw/store …", the WebSocket diagram, the "Client Server" diagrams, the
numbered steps, and the state transition "AwaitingConnectMessage → Connected …")
to include an explicit language like "text" (e.g., replace ``` with ```text) so
markdownlint no longer flags MD040 and the diagrams render as plain text.
- Around line 1-11: The markdown is missing required YAML frontmatter; add a
top-of-file frontmatter block containing title (use "TLDraw Sync: Real-Time
Collaboration Architecture" or similar), description (short SEO summary),
keywords (array of relevant terms), and format: md so the renderer picks up the
page; ensure the frontmatter is a valid YAML block (--- at top and bottom)
before the existing content/header so tooling and guidelines for
docs/{wg,reference} are satisfied.

---

Nitpick comments:
In `@packages/grida-canvas-sync/__tests__/client.test.ts`:
- Around line 139-150: The test title claims "connecting → syncing → ready" but
only asserts "syncing" and "ready"; update the spec to either rename the test to
reflect only "syncing → ready" or add an explicit assertion for the "connecting"
state: after calling transport.simulateConnected() assert that statuses contains
"connecting" (in addition to or instead of the current "syncing" check), or
adjust the test name string passed to it(...) to remove "connecting". Reference
the existing test block using the it(...) description, the
createClientAndTransport() helper, transport.simulateConnected(),
client.on("statusChange", ...), transport.deliver({ type: "connect_ok", ... })
and client.status when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b2fd69e7-9612-46ae-ab9b-91da7222d5c8

📥 Commits

Reviewing files that changed from the base of the PR and between 8545841 and d523849.

📒 Files selected for processing (2)
  • docs/wg/research/crdt/tldraw.md
  • packages/grida-canvas-sync/__tests__/client.test.ts

Comment on lines +1 to +11
# TLDraw Sync: Real-Time Collaboration Architecture

> Research document covering tldraw's sync engine — architecture, data model,
> protocol, and conflict resolution strategy.
>
> **Source repo:** `tldraw/tldraw` (main branch, ~2024-2026)
> **Key packages:** `@tldraw/store`, `@tldraw/sync-core`, `@tldraw/sync`

---

## 1. Architecture Overview
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add required docs frontmatter (title, description, keywords, format).

This new docs/wg/**.md page is missing required SEO frontmatter and format: md.

🔧 Proposed fix
+---
+title: TLDraw Sync: Real-Time Collaboration Architecture
+description: Research notes on TLDraw’s server-authoritative sync architecture, protocol, diff model, and optimistic rebase behavior.
+keywords:
+  - tldraw
+  - realtime collaboration
+  - sync protocol
+  - optimistic rebase
+  - conflict resolution
+format: md
+---
+
 # TLDraw Sync: Real-Time Collaboration Architecture

As per coding guidelines, docs/{wg,reference}/**/*.md: “Include SEO frontmatter with title, description, and keywords … Add format: md frontmatter…”.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# TLDraw Sync: Real-Time Collaboration Architecture
> Research document covering tldraw's sync engine — architecture, data model,
> protocol, and conflict resolution strategy.
>
> **Source repo:** `tldraw/tldraw` (main branch, ~2024-2026)
> **Key packages:** `@tldraw/store`, `@tldraw/sync-core`, `@tldraw/sync`
---
## 1. Architecture Overview
---
title: TLDraw Sync: Real-Time Collaboration Architecture
description: Research notes on TLDraw’s server-authoritative sync architecture, protocol, diff model, and optimistic rebase behavior.
keywords:
- tldraw
- realtime collaboration
- sync protocol
- optimistic rebase
- conflict resolution
format: md
---
# TLDraw Sync: Real-Time Collaboration Architecture
> Research document covering tldraw's sync engine — architecture, data model,
> protocol, and conflict resolution strategy.
>
> **Source repo:** `tldraw/tldraw` (main branch, ~2024-2026)
> **Key packages:** `@tldraw/store`, `@tldraw/sync-core`, `@tldraw/sync`
---
## 1. Architecture Overview
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/wg/research/crdt/tldraw.md` around lines 1 - 11, The markdown is missing
required YAML frontmatter; add a top-of-file frontmatter block containing title
(use "TLDraw Sync: Real-Time Collaboration Architecture" or similar),
description (short SEO summary), keywords (array of relevant terms), and format:
md so the renderer picks up the page; ensure the frontmatter is a valid YAML
block (--- at top and bottom) before the existing content/header so tooling and
guidelines for docs/{wg,reference} are satisfied.

Comment on lines +19 to +24
```
@tldraw/store — Generic record store with typed IDs, diffs, and history
@tldraw/sync-core — Protocol types, TLSyncRoom (server), TLSyncClient, storage interfaces
@tldraw/sync — React hook (useSync) that wires TLSyncClient to a TLStore
@tldraw/tlschema — Schema definitions, migrations, record types for tldraw shapes
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Specify languages for unlabeled fenced code blocks (MD040).

markdownlint warnings are valid here; add explicit fence languages (e.g., text) to these blocks.

🔧 Proposed fix
-```
+```text
 `@tldraw/store`          — Generic record store with typed IDs, diffs, and history
 ...
-```
+```

-```
+```text
   ┌──────────┐    WebSocket    ┌─────────────┐    WebSocket    ┌──────────┐
 ...
-```
+```

-```
+```text
 Client                          Server
 ...
-```
+```

-```
+```text
 Client                          Server
 ...
-```
+```

-```
+```text
 1. Flush store history
 ...
-```
+```

-```
+```text
 AwaitingConnectMessage → Connected → AwaitingRemoval → (removed)
-```
+```

Also applies to: 28-37, 173-195, 204-222, 263-274, 382-384

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 19-19: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/wg/research/crdt/tldraw.md` around lines 19 - 24, Several fenced code
blocks in the tldraw CRDT doc are missing language tags (MD040); update every
unlabeled triple-backtick block shown (the lists and ASCII diagrams such as the
lines starting with "@tldraw/store …", the WebSocket diagram, the "Client
Server" diagrams, the numbered steps, and the state transition
"AwaitingConnectMessage → Connected …") to include an explicit language like
"text" (e.g., replace ``` with ```text) so markdownlint no longer flags MD040
and the diagrams render as plain text.

Comment on lines +250 to +252
TLDraw uses **optimistic concurrency with server-authoritative rebase** — not
CRDTs, not OT, not last-write-wins.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resolve contradiction in conflict-resolution wording.

Section 5 says “not last-write-wins,” but later the doc says same-field concurrent edits are effectively LWW via server order. Align these statements to avoid design ambiguity.

🔧 Proposed wording adjustment
-TLDraw uses **optimistic concurrency with server-authoritative rebase** — not
-CRDTs, not OT, not last-write-wins.
+TLDraw uses **optimistic concurrency with server-authoritative rebase** — not
+CRDTs and not OT. For same-field concurrent edits, final resolution is
+effectively server-order LWW after validation/rebase.

Also applies to: 429-430

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/wg/research/crdt/tldraw.md` around lines 250 - 252, Update the
conflicting phrasing around TLDraw's conflict-resolution: replace the blanket
“not last-write-wins” statement near the line containing “TLDraw uses optimistic
concurrency with server-authoritative rebase” and adjust the later sentence that
claims same-field concurrent edits are effectively LWW via server order so both
describe the same behavior — e.g., say TLDraw is optimistic with
server-authoritative rebase which resolves concurrent same-field edits by
adopting server-ordered application (a deterministic resolution that can appear
LWW for that class of conflict) — and ensure both the earlier “not
last-write-wins” claim and the later “server order = LWW-like” phrasing are
aligned to this clarified wording.

@softmarshmallow softmarshmallow merged commit 4644025 into main Apr 9, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant