Skip to content

perf(cell): use DoublyLinkedList for pendingMessageIds with list node on metadata#27415

Draft
anthony-murphy wants to merge 3 commits into
microsoft:mainfrom
anthony-murphy:prep-cell
Draft

perf(cell): use DoublyLinkedList for pendingMessageIds with list node on metadata#27415
anthony-murphy wants to merge 3 commits into
microsoft:mainfrom
anthony-murphy:prep-cell

Conversation

@anthony-murphy
Copy link
Copy Markdown
Contributor

Description

Replaces SharedCell's pendingMessageIds: number[] with a DoublyLinkedList<number> from @fluidframework/core-utils/internal, and extends ICellLocalOpMetadata to carry the inserted list node.

  • Submit: push() returns a ListNodeRange; the metadata records the inserted node as pendingNode.
  • ACK: pendingMessageIds.first?.data head check, then shift() — both O(1) instead of Array.shift()'s O(n) re-index.
  • Rollback: pop() returns the trailing node; compare node.data to the rolled-back message id.

Why

Two wins, both independent of squash:

  1. Array.shift() is O(n) — every ACK in SharedCell re-indexes the entire pending tail today. DoublyLinkedList.shift() is O(1).
  2. Carrying the node reference on localOpMetadata enables any future per-op operation (rollback, arbitrary-position drop, ...) to be O(1) via list.remove(node) instead of an indexOf + splice scan.

Notes

Pure structural prep — no reSubmitSquashed, no squash logic, no tests touched, no api-report changes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Hi! Thank you for opening this PR. Want me to review it?

Based on the diff (104 lines, 2 files), I've queued these reviewers:

  • Correctness — logic errors, race conditions, lifecycle issues
  • Security — vulnerabilities, secret exposure, injection
  • API Compatibility — breaking changes, release tags, type design
  • Performance — algorithmic regressions, memory leaks
  • Testing — coverage gaps, hollow tests

How this works

  • Adjust the reviewer set by ticking/unticking boxes above. Reviewer toggles alone don't trigger anything.

  • Tick Start review below to dispatch the review fleet.

  • After review finishes, tick Start review again to request another run — it auto-resets after each dispatch.

  • This comment updates as new commits land; your reviewer selections are preserved.

  • Start review

Comment thread packages/dds/cell/src/interfaces.ts Outdated
/**
* The node in the pending message list corresponding to this operation (op).
* Holding a direct reference to the node enables O(1) removal from arbitrary
* positions in the pending list, which is required for future squash support.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: ListNode<number> leaks onto the exported ICellLocalOpMetadata here, and the new import type { ListNode } from "@fluidframework/core-utils/internal" at the top of this file couples the cell package's interfaces module to a linked-list node type that is purely a runtime implementation detail.

Containment check: ICellLocalOpMetadata is not re-exported from packages/dds/cell/src/index.ts and the only literal construction is updated in cell.ts:306-317, so this is not a public-API break today. The cost is encapsulation and maintainability — embedding ListNode<number> on a package-shaped interface makes the metadata shape harder to mock or re-implement out of tree, and your own PR self-summary flagged this same risk. PR #12273's reviewer (scarlettjlee) historically pushed back on widening this exact interface with implementation details.

Suggested fix: declare a private ICellPendingLocalOpMetadata extends ICellLocalOpMetadata inside cell.ts and use that as the concrete metadata type at the three creation/consumption sites — interfaces.ts then doesn't need the core-utils/internal import. If pendingNode must remain on the exported type for a reason not visible in the diff, mark it optional and tag it @internal. The O(1) ACK and the future O(1) arbitrary-removal capability are preserved either way.

Comment thread packages/dds/cell/src/cell.ts Outdated
): ICellLocalOpMetadata {
const pendingMessageId = ++this.messageId;
this.pendingMessageIds.push(pendingMessageId);
const { first: pendingNode } = this.pendingMessageIds.push(pendingMessageId);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: DoublyLinkedList.push(...items) returns a ListNodeRange spanning all newly appended nodes (packages/common/core-utils/src/list.ts:196-205). For a single-item push, range.first === range.last, so this is correct today — but push appends at the tail, so last reads correctly as "the node I just appended." If a future caller ever batches multiple ids in one push, first would silently grab the head of the inserted range instead of the most recently appended node.

Suggested change
const { first: pendingNode } = this.pendingMessageIds.push(pendingMessageId);
const { last: pendingNode } = this.pendingMessageIds.push(pendingMessageId);

…tests

Address deep-review findings on PR microsoft#27415:

- Stop leaking `ListNode<number>` onto `ICellLocalOpMetadata`. Move
  `pendingNode` to a private `ICellPendingLocalOpMetadata` subtype declared
  inside cell.ts. The public/internal interfaces module no longer imports
  from `@fluidframework/core-utils/internal`, decoupling it from a runtime
  implementation detail.
- Use `last` instead of `first` when destructuring the single-node range
  returned by `DoublyLinkedList.push(...)`. They are equivalent today
  (single-item push), but `first` would be incorrect if a future change
  batched multiple pending ids into one push call.
- Add two regression tests in cell.spec.ts:
  - drain-many-ACKs: queue several local sets and ACK them incrementally
    via processSomeMessages(1); asserts no assert-tag fires (0x471) and
    final value converges across clients in order.
  - rollback-with-multiple-pending-ops: enqueue three pending sets under
    TurnBased flush mode and run containerRuntime.rollback(), confirming
    the LIFO pop against the expected pending id does not throw.
@anthony-murphy
Copy link
Copy Markdown
Contributor Author

Deep Review

Reviewed commit 382f2d7 on 2026-05-27.

Readiness: 8/10 — ALMOST READY

The number[]DoublyLinkedList<number> swap is correct and preserves the FIFO-ACK / LIFO-rollback invariant from PR #12273. Both prior inline threads (the ListNode leak on the exported ICellLocalOpMetadata and the push().first vs .last choice) were addressed in d12ef64. Four small polish items remain — flagged inline — plus the PR description still says "no tests touched."

Path to Ready

  • Resolve inline threads
  • Update the PR description: drop "no tests touched", add a short bullet describing the two new Pending op bookkeeping tests in cell.spec.ts (drain-via-ACKs and LIFO-rollback), and note that pendingNode is intentionally unread until the squash follow-up.

Context for Reviewers

  • This is a data-structure swap around the invariant scarlettjlee enforced in PR Local Rollback for Cell #12273: every push paired with exactly one shift (on ACK) or pop (on rollback). The diff preserves it 1:1 (cell.ts:281-289 ACK; cell.ts:370-375 rollback).
  • First structural change to pendingMessageIds since Local Rollback for Cell #12273 introduced it in 2022. cell.ts is low-churn (~5 PRs over its lifetime); deserves close reading rather than pattern-matching to a trivial perf refactor. Historical authorities: scarlettjlee / clarenceli-msft (Local Rollback for Cell #12273, rollback contract); wes-carlson / dannimad (cell.ts.applyStashedOp implementation #10776, applyStashedOp path).
  • pendingNode is dead-on-read in this PR — neither the ACK path nor the rollback path consults cellOpMetadata.pendingNode; both still operate on the list head/tail directly. The field is prep for the squash follow-up where O(1) arbitrary-position removal becomes necessary. An inline thread proposes strengthening this in-PR via identity asserts.
  • The .last destructure at cell.ts:323-326 depends on DoublyLinkedList.push(...items) returning a range whose .last is the final element of items — verified against packages/common/core-utils/src/list.ts:38-47, 87-112, 196-205. Today first === last (single-id push). A core-utils owner is the right reviewer for whether that contract should be formally documented.
  • Attribution (per Add SharedCell Attribution #13560) still runs after the local-ACK branch's pending-id validation/removal. Worth confirming the new shift-after-head-check ordering still precedes the attribution update.
For human reviewer
  • Needs human judgment — whether pendingNode carries its weight in this PR or should land together with the squash PR that first reads it. scarlettjlee and clarenceli-msft are the historical authorities (PR Local Rollback for Cell #12273).
  • Needs human judgment — whether DoublyLinkedList.push(...items) should formally document that the returned range's .last is the final element of items. A core-utils owner is the right reviewer.
  • Cannot be assessed by the pipeline — whether a DDS-wide perf-regression harness exists or is planned where a burst-drain test for pendingMessageIds would be in-convention. If yes, characterizing the O(n)→O(1) ACK claim with a stress test may be worth doing here rather than deferring.
Review history (2 prior reviews)
  • d12ef64 2026-05-27 · 9/10 — both prior inline threads resolved; only stale PR description remained
  • e4f82d7 2026-05-27 · 7/10 — encapsulation + .last threads opened; rollback/drain test gap flagged

const { last: pendingNode } = this.pendingMessageIds.push(pendingMessageId);
const localMetadata: ICellPendingLocalOpMetadata<T> = {
pendingMessageId,
pendingNode,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: PR #12273 wired applyStashedOp through createLocalOpMetadata, so stashed ops push onto pendingMessageIds and emit ICellPendingLocalOpMetadata carrying a pendingNode. The new Pending op bookkeeping describe covers direct cell.set(...) submits and LIFO rollback but doesn't exercise stash → rehydrate → ACK.

This is narrower than it might look: applyStashedOp isn't an independent producer — it delegates to set()/delete()submitCellMessage()createLocalOpMetadata(), the same path the new tests already cover, so the "different list instance" failure mode is ruled out by construction. The residual risk is a future regression in stash-rehydrate sequencing through processCore's ICellPendingLocalOpMetadata cast surfacing only in production.

One applyStashedOp round-trip test in the new Pending op bookkeeping describe (drive a stashed set/delete op, then ACK, assert no 0x471 and ordering preserved) closes that gap cheaply. If you prefer to defer to the squash follow-up that first reads pendingNode, note that in the PR description so the gap is intentional.

Is there an existing stash → rehydrate → ACK regression elsewhere in the DDS test corpus that exercises applyStashedOp end-to-end through the pending-id queue? If yes, this is fully addressed by reference.

cell.get(),
undefined,
"cell should be empty after rolling back all pending sets",
);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: MockContainerRuntime.rollback is declared optional (public rollback?(): void in packages/runtime/test-runtime-utils/src/mocks.ts:449-454), so containerRuntime.rollback?.() short-circuits to undefined if the method is ever removed/renamed. assert.doesNotThrow then succeeds vacuously, and the test's whole purpose — exercising the new pop()?.data rollback path — is silently lost; the subsequent cell.get() === undefined check would fail for the wrong reason.

Guard the invocation so the test fails loudly if the mock loses the method.

Suggested change
);
assert(
typeof containerRuntime.rollback === "function",
"MockContainerRuntime.rollback must exist for this test",
);
assert.doesNotThrow(
() => containerRuntime.rollback!(),
"rollback should pop pending ids in LIFO order",
);

attribution?: AttributionKey;
}

/**
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: ICellPendingLocalOpMetadata is documented as "Kept private to this module," and its only producer is createLocalOpMetadata on SharedCell<T> (cell.ts:319-329), which threads a concrete T. The <T = any> default has no caller and forces the // eslint-disable-next-line @typescript-eslint/no-explicit-any directly above it. The pattern on SharedCell<T = any> / ISharedCell<T = any> / ICellLocalOpMetadata<T = any> exists because those are public-shaped types with external callers; this new private extension has none, and wes-carlson explicitly pushed back on any in this file during PR #10776.

Drop the = any default and the eslint suppression:

interface ICellPendingLocalOpMetadata<T> extends ICellLocalOpMetadata<T> {
	pendingNode: ListNode<number>;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant