Skip to content

perf(tree): binary-search anchor removeChild + O(1) revision lookup in editManager#27428

Draft
anthony-murphy wants to merge 5 commits into
microsoft:mainfrom
anthony-murphy:prep-tree
Draft

perf(tree): binary-search anchor removeChild + O(1) revision lookup in editManager#27428
anthony-murphy wants to merge 5 commits into
microsoft:mainfrom
anthony-murphy:prep-tree

Conversation

@anthony-murphy
Copy link
Copy Markdown
Contributor

Description

Bundles two tree structural changes:

  1. Binary-search by parentIndex in removeChild (anchorSet.ts and sparseTree.ts) — replaces the linear indexOf with binary search by child.parentIndex(). The field arrays are already kept sorted by parentIndex (the existing binaryFind helpers on the same structs rely on it). The existing TODOs at both sites are removed.

  2. Map<RevisionTag, number> side-map on SharedBranch (editManager.ts + sharedTreeCore.ts) — so sharedTreeCore.ts:getEnrichedCommit's localCommits.findIndex(c => c.revision === revision) becomes O(1). The localCommits array and getLocalCommits() shape are unchanged. Maintenance uses a head-offset counter on shift so other map entries don't need to be touched (avoids O(m) decrement-all per ACK).

A local binaryFindIndex helper is added in each removeChild file. The existing benchmarked binaryFind is left untouched — it returns the element (not the index) and has a tuned dense-array fast-path that doesn't apply to the index-by-parentIndex case.

Why

  • The TODO at both removeChild sites called this out: "should do more optimized search (ex: binary search or better) using child.parentIndex()." This is invoked from anchor lifecycle (considerDispose) — every anchor free in workloads with many siblings paid the scan.
  • Burst submit / resubmit storms accumulate large local-commit batches. findIndex per resubmit was O(n) → O(n²) per full reSubmit. The side-map keeps the lookup O(1) without amortizing the shift cost over every other entry.

Notes

getLocalCommits() return shape unchanged (readonly GraphCommit[]). The sort-by-parentIndex invariant on anchor fields is already maintained by every field mutator (getOrCreateChild sorts after push; coupleNodes / decoupleNodes / offsetChildren preserve order). Pure perf prep — no squash logic, no tests touched, no api-report changes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Hi! Thank you for opening this PR. Want me to review it?

Based on the diff (233 lines, 6 files), I've queued these reviewers:

  • Correctness — logic errors, race conditions, lifecycle issues
  • Security — vulnerabilities, secret exposure, injection
  • API Compatibility — breaking changes, release tags, type design
  • Performance — algorithmic regressions, memory leaks
  • Testing — coverage gaps, hollow tests

How this works

  • Adjust the reviewer set by ticking/unticking boxes above. Reviewer toggles alone don't trigger anything.

  • Tick Start review below to dispatch the review fleet.

  • After review finishes, tick Start review again to request another run — it auto-resets after each dispatch.

  • This comment updates as new commits land; your reviewer selections are preserved.

  • Start review

}
}
return -1;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: binaryFindIndex here is a textbook [min, max) search with no fast-path. The sibling binaryFind immediately above (anchorSet.ts:1284-1310) opens with const guess = sorted[index]; if (guess?.parentIndex === index) return guess;, and its privateRemarks (anchorSet.ts:1273-1283) explicitly warns "current usages tends to fully populate the anchor tree leading the correct array index to be the requested parent index … care and benchmarking should be used when messing with this function."

The PR description says the fast-path "doesn't apply to the index-by-parentIndex case", but when the field is dense sorted[index].parentIndex === index, so the array index of the hit equals index — one read instead of log n iterations. removeChild runs from considerDispose on every anchor free, which is exactly the path CraigMacomber tuned in #22717 ("I started with a more generic binary search which returned the index, but this was so dominated by small data sizes (0 and 1 are the common array sizes here) that it had a lot of overhead…", 2024-10-03).

Suggest adding the guess fast-path before the binary loop and updating binaryFind's privateRemarks to also cover the new index variant. Re-run the BubbleBench-style anchor benchmarks #22717 used to confirm parity.

Suggested change
}
function binaryFindIndex(sorted: readonly PathNode[], index: number): number {
const guess = sorted[index];
if (guess !== undefined && guess.parentIndex === index) {
return index;
}
let min = 0;
let max = sorted.length;
while (min < max) {
const mid = (min + max) >>> 1;
const current = sorted[mid].parentIndex;
if (current === index) {
return mid;
} else if (current < index) {
min = mid + 1;
} else {
max = mid;
}
}
return -1;
}

}
}
return -1;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Same missing fast-path as the anchorSet.ts copy. When the field is dense sorted[index].parentIndex === index, so the answer is index — one read instead of log n iterations. The sibling binaryFind (and binaryFind in anchorSet.ts) opens with exactly this guess to avoid the loop on the common dense-array shape that #22717's benchmarks were tuned against. Suggest mirroring the fast-path here and re-running the relevant benchmark to confirm parity.

Suggested change
}
function binaryFindIndex<TData>(sorted: readonly SparseNode<TData>[], index: number): number {
const guess = sorted[index];
if (guess !== undefined && guess.parentIndex === index) {
return index;
}
let min = 0;
let max = sorted.length;
while (min < max) {
const mid = (min + max) >>> 1;
const current = sorted[mid].parentIndex;
if (current === index) {
return mid;
} else if (current < index) {
min = mid + 1;
} else {
max = mid;
}
}
return -1;
}

const childIndex = field === undefined ? -1 : binaryFindIndex(field, child.parentIndex);
assert(childIndex !== -1, 0x35c /* child must be parented to be removed */);
field?.splice(childIndex, 1);
if (field?.length === 0) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: The previous field?.indexOf(child) implicitly cross-checked object identity (it returns -1 if the exact child reference is absent). The replacement binaryFindIndex(field, child.parentIndex) returns a hit whenever some node with that parentIndex exists — if the sort/uniqueness invariant ever drifted (duplicate parentIndex, stale child reference), the PR would silently splice the wrong node.

Extend the assert to also verify identity. Zero runtime cost in non-debug builds; restores the implicit invariant indexOf provided.

Suggested change
if (field?.length === 0) {
assert(childIndex !== -1 && field![childIndex] === child, 0x35c /* child must be in the field */);

// Note that this is the index in the list of child paths, not the index within the field.
const childIndex = field === undefined ? -1 : binaryFindIndex(field, child.parentIndex);
assert(childIndex !== -1, 0x4a5 /* child must be parented to be removed */);
field?.splice(childIndex, 1);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Same identity-loss concern as the anchorSet.ts assert. The prior field?.indexOf(child) returned -1 unless the exact child reference was present; binaryFindIndex(field, child.parentIndex) matches any node with that parentIndex. If the sorted-by-parentIndex invariant ever drifted (duplicate, stale reference), this would splice the wrong node silently.

Suggested change
field?.splice(childIndex, 1);
assert(childIndex !== -1 && field![childIndex] === child, 0x4a5 /* child must be in the field */);

public getLocalCommitIndexByRevision(
branchId: BranchId,
revision: RevisionTag,
): number | undefined {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: getLocalCommitIndexByRevision(branchId, revision): number | undefined leaks the storage shape — callers need to know the result is an index into localCommits and must reason about the new headOffset index space the side-map maintains. The only call site (sharedTreeCore.ts:558-566) immediately does localCommits.slice(revisionIndex), so the index is a means to a suffix slice.

Both solution-space reviewers independently designed a getLocalCommitsFrom(branchId, revision): readonly GraphCommit<TChangeset>[] returning the suffix slice (or undefined/empty). That matches the symmetry of the existing getLocalCommits(branchId) accessor at editManager.ts:422-425, removes the undefined-vs--1 sentinel split the PR's own "Risks and tradeoffs" calls out, and keeps EditManager's storage substitutable.

The fair counter is that getLocalCommits() already exposes the array shape — but it does not expose headOffset, which this accessor effectively does. If you'd rather keep the index API for callers we haven't seen yet, a one-line // returns an index into getLocalCommits(branchId); undefined if not present comment here would be enough to ward off the next reader.

* @param index - parentIndex being looked for.
* @returns array index of the child with the requested parentIndex, or -1 if not found.
*/
function binaryFindIndex(sorted: readonly PathNode[], index: number): number {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: This binaryFindIndex and the copy in sparseTree.ts:150-172 are byte-identical apart from the element type parameter (PathNode vs SparseNode<TData>), and both rely only on .parentIndex — a property on both via UpPath. Unlike binaryFind, this helper has no tuned fast-path or benchmark surface, so the "don't extract benchmarked code" rationale in the PR description does not apply here. Drift risk is real: a fast-path or bug-fix added to one copy (see the adjacent thread asking for the dense-array guess) will silently miss the other.

Suggest extracting function binaryFindIndexByParentIndex<T extends { parentIndex: number }>(sorted: readonly T[], index: number): number to a shared module alongside UpPath (or core/tree/) and calling it from both removeChild sites. If the fast-path lands, the shared helper carries it once — one code change, two beneficiaries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@anthony-murphy
Copy link
Copy Markdown
Contributor Author

Deep Review

Reviewed commit 87be3cb on 2026-05-27.

Readiness: 7/10 — ALMOST READY

Two contained perf-prep changes — binary-search removeChild and an O(1) revision→index side-map with head-offset — both correct and well-aligned with #22717, #19272, #20270. No correctness blockers. Polish accumulates on a 127-line diff: the most material items are a missing dense-array guess fast-path on the new binaryFindIndex (which #22717 specifically tuned for the adjacent binaryFind) and an un-preserved reference-identity check in removeChild's assert.

Path to Ready

  • Resolve inline threads
  • Migrate SparseNode.getOrCreateChild (sparseTree.ts:50-109) to lower-bound splice insertion and convert the descendant-walk find (sparseTree.ts:173-184) to binaryFindIndex — or file a tracked follow-up issue and drop the orphaned TODO comment
  • Acknowledge in the PR description that localCommits.shift() (editManager.ts:957-973) retains a symmetric O(n) per-ack cost so the perf claim is honestly scoped, and file a follow-up issue for the logical-head/compaction refactor (the existing localCommitsHeadOffset becomes the logical head)

Context for Reviewers

For human reviewer
Review history (2 prior reviews)
  • 20bb41a 2026-05-27 · 7/10 — same two perf changes; dense-array fast-path and identity-assert flagged
  • 9b64f11 2026-05-27 · 7/10 — same two perf changes; five Tier 3 follow-ups flagged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant