-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot name collisions #4485
Snapshot name collisions #4485
Conversation
trees: { | ||
[protocolTreeName]: ISnapshotTree; | ||
[blobsTreeName]: ISnapshotTree; | ||
".dataStores": ISnapshotTree; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open to suggestions for the names.
".channels" - refers to only data store trees for now, but we could store blobs, etc. that the DataStores
class can access.
".channels" - refers to only the channel trees for now, but we could store blobs, etc. that the DataStoreRuntime
can access. Was considering naming this something else like ".runtime" in case components wanted to store additional info here they could.
|
||
export const dataStoreAttributesBlobName = ".component"; | ||
|
||
export interface IRuntimeSnapshot { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking to more strongly type the other layers as well (and actually wrote out the types as seen in the first commit), but it requires more work across the boundaries with not much gain right now. Will look more into it after fixing SummarizerNode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think id (at that level) should be always null., no?
Based on latest discussion with SPO, I think we should start asserting in our layers that we provide either value (content but id === null) or reference (id !== null, but no other fields are provided) for trees, similar how we do for blobs - we either reuse existing one, or write out new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that storage always returns ID + all tree contents (i.e. this is the full skeleton). I think it could make sense for them to sometimes return partial trees, but I don't know exactly how that would be specified right now.
If there's a better way to type it that you know of, I can change it, but I know just from debugging that currently it will be id !== null and other fields are provided so that type would be misleading or at least limiting (maybe intentionally?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I though this interface is for writing snapshots (summaries), not for reading.
Runtime never reads "snapshots", it operates in individual (shallow) trees that always have content and ID.
BTW, we should be try to eliminate "snapshot" word from our code base. It is used as in "snapshot tree but I think naming is from the past. I think they are always shallow trees (i.e. only contain one level of data), right?
■ @fluidframework/base-host: No change
⯅ @fluid-example/bundle-size-tests: +894 Bytes
Baseline commit: fa35947 |
I already violate it with mixinSummaryHandler() (and using it in Bohemia repo to write out /DataStoreId/_search/01 blobs |
Can we please call it channels? |
export type PropertyValues<T> = T[keyof T]; | ||
|
||
export const containerSnapshotFormatVersions = { | ||
missing: undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please add some comments on what these fields mean?
Also should be defined interface properly (with all of the comments that go to documentation) then derive it from this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments.
Not sure what exactly you mean by interface.. If I add an interface, I need to list the versions/types twice I think?
I have changed to more simple union type.
The alternative I could think (without listing all versions twice) was the same as it was before but change current
to v1
and next
to v2
etc. so that we don't have to update all references every time.
const blobId = context.baseSnapshot?.blobs[blobName]; | ||
if (context.baseSnapshot && blobId) { | ||
return context.storage ? | ||
readAndParse<T>(context.storage, blobId) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, should it be reverse? I.e. if we have it in snapshot, why do we read it from storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be an unrelated change I think, but it makes sense to me. I don't know as much about what it means when the blobs are stored directly in the snapshot though, maybe we prioritize the ones in storage for some reason? i.e. out of date or something?
@jatgarg would know I think.
let dataStoresSnapshot = context.baseSnapshot; | ||
let dataStoresSnapshotType: BaseSnapshotType = "legacy"; | ||
|
||
if (!!dataStoresSnapshot && metadata.snapshotFormatVersion !== containerSnapshotFormatVersions.missing) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit at a loss in terms of using "missing" & "next". What if we need to change format again?
Would approach of just saying that existing format is 1.0 (and that's default value if attributes are missing) and new format is 2.0 work? Then we can clearly articulate what is 1.0 format, what is 2.0 format, write documentation, and someone willing to write code on the side to parse our files can actually do it.
Maybe that's what I'm missing - formal description of format, i.e. what is the instruction to me as a side developer of how to read and write these formats, without looking in the code to understand formats and expectations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - you essentially have 3 versions. I'd rather call them 1.0, 2.0 and 3.0 and make 1.0 default - a bit cleaner IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, I was trying to not diverge as far from the pre-existing "current" logic, but I agree it probably makes more sense not to use sliding version names when reading.
|
||
export const containerSnapshotFormatVersions = { | ||
missing: undefined, | ||
next: "0.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see containerSnapshotFormatVersions.next used anywhere in he code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't used, but we do check for undefined and treat that else scenario specially.
// However the feature of loading a detached container from snapshot, is added when the | ||
// snapshotFormatVersion is "0.1", so we don't expect it to be anything else. | ||
if (snapshotFormatVersion === currentSnapshotFormatVersion) { | ||
if (snapshotFormatVersion === dataStoreSnapshotFormatVersions.current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused here. Existing files will take the else branch, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's intentional based on the comment:
// However the feature of loading a detached container from snapshot, is added when the
// snapshotFormatVersion is "0.1", so we don't expect it to be anything else.
} from "./dataStoreContext"; | ||
} from "./dataStoreContext"; | ||
|
||
export type BaseSnapshotType = "legacy" | "next"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe put some comment in here describing what legacy and next mean in practice?
Should be given them more descriptive names? While this is only runtime data, I image that we might have "next next" format some day :)
/** Tree to use to find children subtrees */ | ||
childrenTree: ISnapshotTree, | ||
/** Additional path part where children are isolated */ | ||
childrenPathPart: string | undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably did not stair too long at the code to better understand it, but it is not clear to me from glancing what the actual meaning of childrenPathPart and how callers of API below use it. Maybe it's just me (and not having enough focus), but would it be helpful to have a bit more in comment on when it's undefined, and when it's not, what it means and how this data is expected to be used?
this needs tests, specifically for forward/back comapt |
why do we need this hear? seems like it could be defined in container-runtime Refers to: packages/runtime/runtime-definitions/src/summary.ts:184 in 597decc. [](commit_id = 597decc, deletion_comment = False) |
*/ | ||
export function parseSummaryForSubtrees(baseSummary: ISnapshotTree): ISubtreeInfo { | ||
// New versions of snapshots have child nodes isolated in .channels subtree | ||
const channelsSubtree = baseSummary.trees[channelsTreeName]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
channelsTreeName [](start = 46, length = 16)
this seem coupled to our specific container runtime. i feel like this should live in container runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but will need to revisit later to extract this functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to block this going in. I'd like BaseSnapshotType removed, but the file moves, and testing can come in a follow up
@arinwt, it would be great to address this area sooner than later. Do you need any help with moving this PR forward? |
@vladsud no sorry, I just want to add support for DDS's and address a few of Tony's comments first (version as number if possible and move interface definition). |
First step towards fixing #3542
Adds a subtree at each layer of the stack (except DDS for now) to isolate dynamic tree keys, such as data store IDs and channel IDs.
ITreeEntry
s from internal snapshotOld tree structure:
New tree structure:
With this change, ContainerRuntime only gives upper
.channels
tree to DataStores, restricting access.Each DataStoreContext only gives the lower
.channels
tree to its DataStoreRuntime.Changes in this PR are back/forwards compatible reading of new format only, it does not write the new snapshotFormatVersion. So after a few versions with this released, we can start writing in the new snapshotFormatVersion. This is because an old client runtime will not be able to understand the new format.
SummarizerNode handles this by calling
parseSummaryForSubtrees
which gives a tree to use for children nodes as well as the additional path part for the child nodes.