fix(tree): Optionally use UUIDs for non-finalized identifiers in tree attach summaries#27278
fix(tree): Optionally use UUIDs for non-finalized identifiers in tree attach summaries#27278Abe27342 wants to merge 2 commits into
Conversation
…summaries When a SharedTree is attached to an already-attached container, tree identifier values generated locally before the IdCompressor has been allocated a cluster were serialized as negative op-space IDs. Other clients could decode the attach op because it carries originatorSessionId, but once the summarizer reuses the blob as a handle in a later summary with no intervening ops to finalize, the persisted IdCompressor state (serialized with withSession=false) has no record of those local IDs and new clients fail to load. Thread an `idsMustBeFinalized` flag from ForestSummarizer (set when incrementalSummaryContext === undefined) through FieldBatchEncodingContext into the schema-based encoder. When the flag is set and an identifier would normalize to a negative op-space ID, emit the stable UUID instead. The identifier-field decode path already accepts either a number or a string, so no format-version bump is required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Hi! Thank you for opening this PR. Want me to review it? Based on the diff (323 lines, 20 files), I've queued these reviewers:
How this works
|
|
|
There was a problem hiding this comment.
Pull request overview
This PR introduces an idsMustBeFinalized encoding knob that forces attach/first-summary paths to serialize non-finalized (negative) compressed IDs as stable UUID strings, preventing load-time failures when session-local IdCompressor state is unavailable.
Changes:
- Add
idsMustBeFinalizedto the change/summary encoding contexts and thread it through edit manager, forest summary, and chunked-forest encoding pipelines. - Update
RevisionTagCodecand identifier-field encoding to emit stable UUIDs instead of negative op-space numbers whenidsMustBeFinalizedis enabled. - Update and add tests, including a regression repro covering identifier encoding under attach-summary conditions.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/dds/tree/src/test/rebase/revisionTagCodec.spec.ts | Update tests for new RevisionTagCodec.encode(tag, context) signature. |
| packages/dds/tree/src/test/feature-libraries/sequence-field/sequenceFieldCodecs.test.ts | Pass encoding context through revision tag encoding in sequence-field codec tests. |
| packages/dds/tree/src/test/feature-libraries/chunked-forest/codec/nodeEncoder.spec.ts | Add regression coverage for identifier encoding with idsMustBeFinalized. |
| packages/dds/tree/src/shared-tree/treeCheckout.ts | Provide encoding context when encoding revision tags for consistency comparisons. |
| packages/dds/tree/src/shared-tree-core/editManagerSummarizer.ts | Enable idsMustBeFinalized when incrementalSummaryContext is absent. |
| packages/dds/tree/src/shared-tree-core/editManagerCodecsVSharedBranches.ts | Thread encoding context into shared-branches edit manager summary encoding. |
| packages/dds/tree/src/shared-tree-core/editManagerCodecsV1toV4.ts | Extend summary encoding context type with idsMustBeFinalized. |
| packages/dds/tree/src/shared-tree-core/editManagerCodecsCommons.ts | Propagate idsMustBeFinalized into revision tag encoding throughout shared-branch summaries. |
| packages/dds/tree/src/shared-tree-core/editManagerCodecs.ts | Extend summary encoding context type with idsMustBeFinalized. |
| packages/dds/tree/src/feature-libraries/modular-schema/modularChangeCodecV1.ts | Propagate idsMustBeFinalized into detached-node encoding contexts. |
| packages/dds/tree/src/feature-libraries/forest-summary/forestSummarizer.ts | Set idsMustBeFinalized for forest encoding when incrementalSummaryContext is absent. |
| packages/dds/tree/src/feature-libraries/chunked-forest/codec/schemaBasedEncode.ts | Add idsMustBeFinalized plumbed parameter into schema-based encoding entrypoints. |
| packages/dds/tree/src/feature-libraries/chunked-forest/codec/nodeEncoder.ts | Emit stable UUID for identifier fields when idsMustBeFinalized and op-space would be negative. |
| packages/dds/tree/src/feature-libraries/chunked-forest/codec/compressedEncode.ts | Add idsMustBeFinalized to EncoderContext. |
| packages/dds/tree/src/feature-libraries/chunked-forest/codec/codecs.ts | Add idsMustBeFinalized to FieldBatchEncodingContext and pass to encoders. |
| packages/dds/tree/src/core/tree/detachedFieldIndexCodecV2.ts | Update revision tag encode/decode to handle stable UUIDs explicitly and refine type checks. |
| packages/dds/tree/src/core/tree/detachedFieldIndexCodecV1.ts | Update revision tag encode/decode callsites for new signature and refine type checks. |
| packages/dds/tree/src/core/rebase/types.ts | Broaden EncodedRevisionTag/schema to include string stable-UUID fallback and document behavior. |
| packages/dds/tree/src/core/rebase/revisionTagCodec.ts | Add stable-UUID fallback encode/decode behavior guarded by idsMustBeFinalized. |
| packages/dds/tree/src/core/change-family/changeFamily.ts | Document and add idsMustBeFinalized to ChangeEncodingContext. |
| // vSharedBranches stores the summary's originator at the top level, | ||
| // and the decode path threads that originator down to each child codec. | ||
| // Stable-UUID fallbacks are therefore unnecessary here even when this is | ||
| // an attach summary, so suppress `idsMustBeFinalized` for the inner codec | ||
| // calls regardless of what the caller passed. | ||
| const innerContext = { ...context, idsMustBeFinalized: false }; | ||
| const mainBranch = encodeSharedBranch( | ||
| changeCodec, | ||
| revisionTagCodec, | ||
| data.main, | ||
| context, | ||
| innerContext, | ||
| data.originator, |
| export type EncodedRevisionTag = Brand<OpSpaceCompressedId, "EncodedRevisionTag"> | string; | ||
| export const RevisionTagSchema = Type.Union([ | ||
| Type.Literal("root"), | ||
| brandedNumberType<Exclude<EncodedRevisionTag, string>>(), | ||
| Type.String(), | ||
| brandedNumberType<Brand<OpSpaceCompressedId, "EncodedRevisionTag">>(), | ||
| ]); |
| // An undefined `incrementalSummaryContext` indicates this is an attach summary (or a GC | ||
| // pass). Attach summary blobs can be reused as handles in later summaries with no | ||
| // intervening ops to finalize local IDs, after which the originating session's local | ||
| // ID state is no longer available to readers — so any non-finalized identifier must | ||
| // be encoded as its stable UUID instead. |
| // An undefined `incrementalSummaryContext` indicates this is an attach summary (or a GC | ||
| // pass). Attach summary blobs can be reused as handles in later summaries with no | ||
| // intervening ops to finalize local IDs, after which the originating session's local | ||
| // ID state is no longer available to readers — so any compressed ID we persist must be | ||
| // in a form (stable UUID) that does not depend on session state. |
When a SharedTree is attached to an already-attached container, tree identifier values generated locally before the IdCompressor has been allocated a cluster can sometimes be serialized as negative op-space IDs. This is a bug which will be fixed in a separate PR (near-term, by using the long id in these cases similarly to what we do in some other codec paths).
When other clients attempt to load SharedTrees with such summaries, they currently crash as the decode path attempts to use the local client's session ID as the originating ID, which is incorrect. In the general case, the information about which client generated the ID has been lost at this point.
This PR adds a feature flag to
configuredSharedTreewhich can potentially be enabled (with some risk, see notes) to allow documents that have reached this state to continue to operate anyway. It does so by generating a v5 uuid based on the SharedObject ID + op-space unfinalized ID. This provides similar collision-resistant properties as the original intent of the ID provided unfinalized ids are only currently written at attach summary generation time (the known bug). Depending on how the application used the ID, this may break stability guarantees (e.g. if application assumed the id would remain stable and wrote it to an external store or a different data store within Fluid, but did so using the previous long form). Thus this flag should only be enabled with those risks in mind.