Proposal: Get rid of object attachment flow #9575

vladsud · 2022-03-24T14:22:23Z

Background

Objects (DDSs and DataStores) are created initially detached. They (a swarm of detached, but connected objects forming DAG) is tracked and attached together to attached container whenever single object's handle is stored in attached DDS.
There is a ton of code doing this logic and tracking states, and it's spread out across DDSs, Data stores, handles, etc.

Correctness issues with existing code

a. Our tracking logic only deals with addition of edges to graph, and does not track removal of edges. As result, when swarm of objects is attached, we may attach objects that are not reachable from the root and should not have been attached.
b. Users may use combination of handles & URIs to attach objects, but may remove handles after that, resulting in objects that will be removed by GC, even though clients may continue to access such objects.
c. If undo is invoked, object creation is not undone.
d. Attachment is not atomic (does not use orderSequentially), If there are cycles between objects, remote clients may find itself loading one object of this cycle and have reference to another object that is not yet available on this client
e. Detached container flow is broken: Adding handle to data store in detached container doesn't bind the data store #9127

Proposal

Proposal looks into actual changes of mechanics, as well as risks / side-effects:

Simplicity. The goal of the proposal is to simplify code base by removing a ton of code that deals with tracking of detached object swarms and their attachment. There is a lot of code involved here, and a lot of state tracking, logic is complicated and we continue to find issues with it.
Creation flow. IContainerRuntimeBase.createDetachedDataStore() flow allows range of options for channel (data store + DDSs) initialization. It's possible for a channel to be created without any initial state (and initialization to be done via ops), or it allows initialization to be fully done locally before channel is attached with full initial state. Later is used in DataObject flow. We are not going to change that.
- Need to inspect other ways to create objects and make sure they follow similar flow (of attach after init). Two APIs relying on _createFluidDataStoreContext() would need to be changed - both call realize() immediately, so they follow same pattern of initialization before attach, the flow needs to be changed slightly.
Performance considerations. It's possible that future scenarios will be heavy-intensive when it comes to initial state of DDSs. Current creation flow allows single op with full initial content of data store (and all of its DDSs) to be used for such creation, thus reducing overhead of sending a lot of ops (property by property updates). There are number of considerations in this space:
- a. DataObject (leveraging atomic create-initialize flow) will not generate ops through creation, unless a swarm of dependent objects is created at once. In case of Tablero, each cell (represented by DataObject) creation will have one (attach) op, which is equivalent to what we have today (different timing though, which is considered in Correctness section).
- b. In general, sending a lot of data through op stream is not advised. That does not play nice with virtualization (this data might not be needed by all clients in the moment), we have troubles with big ops, it hurts responsiveness. It's recommended to upload such data through blob APIs and reference in op stream
- c. DDSs could provide mechanisms to batch changes and send them in smaller number of ops. We could provide an equivalent to orderSequentially() mechanism at the level of DDS to group a bunch of changes into single update.
Correctness considerations. There is concern that attaching objects sooner to a file opens possibility for remote clients to access such objects sooner than they become visible in the graph via supported flows (i.e. handles, not request route). Please note that there are plenty of places where these issues already exists (see above).
- a. Once an object is fully initialized and attached (like it happens with DataObjects), they are fully functionable and can be loaded on remote client and perform their duties. There are no correctness issues here when it comes to creation of group of objects as long as they do not form cycles.
- b. Cycles is an issue, even though users control visibility of created swarm of objects (via controlling when their handles / aliases show up in a file). That's because summarizer can load such objects, and objects have ability to run custom code as part of a summary handler (and thus reach out to other objects that might not be available on remote machine yet).
  - I believe we do not have scenarios like that today, and this area will have a solution once unified channel are implemented (as all such objects can be created under one parent, and that parent's initialization is over, and attachment happens once full swarm is fully initialized). The other way to describe it - in order to create an object (data store or DDS), one needs to have access to a parent object, and we should move away from everybody having access to root (container runtime)

agarwal-navin · 2022-03-24T16:32:56Z

It's worth also mentioning that the existing state tracking and attach flow is not entirely correct and it somehow just works. This PR (#9548) attempts to fix a bunch of the issues and simplify the flow but there are still issues. For instance, the state tracking in handles cannot be made consistent with the underlying object and with other handles to the same object. It will fall apart if someone creates additional handles to data stores / DDS and starts using those instead of the ones FF creates - SharedObject has its own handle and data store handles are created by PureDataObject in aqueduct.

We have to fix these issues one way or another and the proposal in this work item seems like the right path forward.

curtisman · 2022-03-24T16:51:00Z

For 4. one additional consideration is that DataObject associated with the Datastore, can potentially have access to it after it is attached, even without the handle. We need to figure out how communicate that "initialization" is completed, and the DataObject can safely load and rematerialize on the remote client.

vladsud · 2022-03-25T03:10:44Z

Re # 2: PureDataObject.initializeInternal() is async, so orderSequentially() around creation will not work.
Looking deeper here, that's not the problem, as current process looks like this:

A data store context is created (IContainerRuntimeBase.createDetachedDataStore)
An async creation of FluidDataStore, DataObject is done
IFluidDataStoreContextDetached.attachRuntime() runtime is called to finish process of binding data store implementation with context.
- This should be used to "attach" data store to container.

Stepping back, it's basically two angles of concern:

Partially initialized data store invokes (or have to invoke) user code (like DataObject) in one of the workflows (like summarization).
- summarization is the only flow where it happens (due to search contract). And it's not a problem due to above mentioned init sequence.
Remote client learns about fully initialized data store through URI, not handle. I.e., existing design would result in violation (handle was not used - attachment did not happen, so URI will not point to anything), but proposed design will work.
- I touched a bit on this in # 4 (initial post). We can catch such cases, it's more work, but it's possible.
Remote client learns about partially initialized data store before it is fully initialized (through URI or handle being is stored in a file prior to data store fully initialized). Given that user of such data store fully controls that (it controls when it hands out handle / uri / id to the rest of the world), and the fact that we will continue to do atomic init (see start of this comment), I believe it's not a runtime issue. But it's worth looking at some interesting cases:
- Partially initialized data store (while it initializes) can reach out to other parts of the system and let them know the identity of such partially initialized data store (and thus reference can leak into a file before full initialization is done). This problem exists even today and does not change (with atomic creation).
- Cycles are possible (i.e., initializing object creates other objects that have reference back to partially initialized object). It's not the greatest pattern to support, but I believe there is no issue, as long as whole swarm does not violate previous two conditions (we can look at whole swarm as single object).

@curtisman, I think it's other way around (see # 1 above in this post).
DataObject even today has access to data store runtime before it is attached - it leverages that to create DDSs - no problem with that. Remote clients will always see fully initialized data store.

curtisman · 2022-03-25T15:42:47Z

I might be misunderstanding something here, but I thought you want things attached immediately when it is created. So the sequence of how data object are initialized you described above will need to change (i.e. instead of createDetachedDataStore as the first step, it will just be createDataStore. I believe your # 1 is the problem, where a remote client (the summarizer) will rematerialize a partially initialized data store and create a DataObject that doesn't expect things to be partially initialized.

vladsud · 2022-03-25T18:34:35Z

I'm practical :) I want to remove any tracking (binding) and logic around it - that's the main goal.
The implications on how that impacts other parts of the system can be different depending on goals, timing / staging.
Looking deeper into it, it's clear that data store has atomic creation, i.e. not changing anything will result in same flow with attach op that has full data store content (including all channels). That's fine with me. It does not remove other parts of the design I do not like (like having both sync & async summarization paths in runtime), but we can inspect that separately when the tide settles.

In my view, this is still "attached immediately", in a sense that all the binding code is gone, and the object is attached right away, before its visibility (handle or alias) changes.

I'm happy to look at the next level (of breaking it even further down), but we need to clearly spell out goals. I'd say the reason we will look into next round (if we do) is to simplify summarization paths in code and have just one path (async), if that's possible. That would require those considerations of creating objects (in storage) right away and using ops through creation flow - I think it's doable, but I'd rather not think about it now (and assess if that's the right direction), unless we believe the state after first phase (if it becomes permanent) is worse than what we have today.

curtisman · 2022-03-26T21:28:02Z

Originally, I understood the proposal as completely removing delay attach, which has implications to existing patterns that are in use, even though the requirements not explicit spelled out at the moment. That is why I provide above list of concerns. This is a fundamental change. While we want to take steps in the right direction, but I am not yet clear that this is the right direction that will address our long-term needs. That's why I am asking questions. I agree with you that we need to spell out the requirements for our scenarios and goals.

My interpretation of your comment above is that basically you still have "delay attach" for data stores and DDSes, but only within the duration of creation and initialization for the corresponding DataObject. So, that changes my mental model of your proposal to:

Instead of using "graph" based approach to trace references to detect all the objects that needs to be attached when it has the first "visible" reference, you are proposing to switching to a "channel tree" based approach, where the attach will be triggered at the end of DataObject (with pairs with a data store) creation and initialization that is under a "visible" channel, and allow a "tree" of data stores and DDSes to delay attach.*

I am more at ease of this solution because it is just reducing flexibility, instead of total removal of capability, the scope of consideration is smaller:

Creation: the question becomes: What initialization pattern do we continue to support? What pattern we will be stop supporting? Does that meet our goal (which we need state)?
- Your proposal as you describe in the previous comment will be able support single DataObject initialization
- Probably multiple directly dependent DataObject initialization (tree like dependency graph) by flattening the dependency tree without nested channels support (for now)
- Multiple mutually dependent DataObject initialization will be a gap, as we don't have nested channels yet to delay attach multiple data store. We should see if this will create problem with existing partners.
- Does it meet our future "encapsulation" need for all the scenarios, if we require a tree like grouping on DataObject? (Include being physically nested in memory and summary tree)
Performance: the question becomes whether the smaller window for detach initialization is sufficient for developer scenarios to fit all the work during that time.
Correctness: DataObject re-materialization of a partially initialized data store won't be a problem (as long as the initialization pattern would allow developer to fully initialize state for remote client with the restricted scope). The only question I have at the moment is that whether keeping the "delay attach" capability for DataObject would fix the woes with the broken request/handle patterns that our partner is using, (I don't have the detail say either way)

The caveat is that all these reasoning assumes nested channels will be here, but we don't have a plan or design for it yet.

Nevertheless, can you amend your proposal to put more detail and specifics of the change will be? It might be easier to talk about specifics and avoid misunderstanding.

vladsud · 2022-03-27T16:05:16Z

I've updated description.

I'd use this analogy in assessing correctness.
Interrupting any synchronous flow and doing rendering half-way through results in bugs and partial state on screen. Because of that, we have to ensure remote clients cannot schedule callbacks (including rendering) on a partial data model state from another client - thus tools like orderSequentially() and default batching at the level of JS turn, which ensure interruptions are symmetrical on local & remote clients.

But the moment there is any kind of asynchrony in the system, continuation / timer callbacks can be scheduled and inspect data model (starting with roots). Data model should be consistent (in some definition of consistency) to support correctness of these processes. Some objects might be in partially created state at that moment, but they should not be globally accessible (from roots).
It's same for local & remote clients (latter observe replication of state from local client).

Basically, I think there should be no unique affordances for remote clients.
The only exception to this rule is where "visibility" might have different interpretation. The only case I can think of - internal runtime processes like summarization, where partially created objects are "visible" to remote client processes (see further comment about cycles).

Coming back to you comment. I'd phrase it slightly differently. From runtime perspective, attachment happens immediately when data store runtime is attached to a context. Runtime does not specify how this flow is used and leaves it up to users to decide what part of initialization is done at channel (data store runtime) creation, and what part happens after object is attached. Correctness of the system is controlled by visibility of this object in the system (ability to reach from roots) and is same for remote & local clients.
Aqueduct will continue to support existing pattern of initialization as part of creation.

I believe the only substantial / meaningful change in behavior is creation of swarm of objects with cycles. While it has some issues in current implementation (as described in description), they are minor and easy to fix. With new flow, if such objects depend on each other when performing runtime activities (like summarization, i.e., not directly invoked by developer actions), then we will run into issues. I believe current scenarios do not have such dependencies (as current flow does not address correctness in this flow either). Channel unification work should address such scenarios (a bit more on that in description).

Yes, I believe it solves #9127, there should be no difference between attached or detached container (other than existing notification telling DDSs when to start sending ops).

I'll point out that it's possible that perf for some future scenarios will be worse with the proposal. I'd prefer us collectively move away from init props to objects exposing proper object model and users using it to initialize object after they are created. In current model clients have more control when object is attached. I'm not sure this is runtime problem to solve though. I.e. we expose detached data store runtime creation (that we use today, like in DataObject), users can leverage that to build alternative solutions that manually control attachment process. I'd personally not go that route (at least not for complex interactions involving many objects created at once), and rely more on future unification of channels, that would allow all objects under parent channel to be attached in one go (similar to cycles solution).

markfields · 2022-04-04T20:11:59Z

Is there an issue open where I can learn more about the "future unification of channels" which is mentioned several times in this discussion?

vladsud · 2022-04-04T20:43:58Z

@markfields, I found #3469, but it's very sparse :(

ghost · 2022-12-18T03:02:30Z

This issue has been automatically marked as stale because it has had no activity for 180 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!

vladsud self-assigned this Mar 24, 2022

vladsud mentioned this issue May 20, 2022

Make it easier for hosts to learn when locally saved container state is invalidated by user changes #10127

Closed

curtisman added the ado Issue is tracked in the internal Azure DevOps backlog label Jun 20, 2022

ghost added the status: stale label Dec 18, 2022

ghost closed this as completed Dec 26, 2022

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Get rid of object attachment flow #9575

Proposal: Get rid of object attachment flow #9575

vladsud commented Mar 24, 2022 •

edited

agarwal-navin commented Mar 24, 2022

curtisman commented Mar 24, 2022 •

edited

vladsud commented Mar 25, 2022

curtisman commented Mar 25, 2022 •

edited

vladsud commented Mar 25, 2022 •

edited

curtisman commented Mar 26, 2022 •

edited

vladsud commented Mar 27, 2022

markfields commented Apr 4, 2022

vladsud commented Apr 4, 2022

ghost commented Dec 18, 2022

Proposal: Get rid of object attachment flow #9575

Proposal: Get rid of object attachment flow #9575

Comments

vladsud commented Mar 24, 2022 • edited

Background

Proposal

agarwal-navin commented Mar 24, 2022

curtisman commented Mar 24, 2022 • edited

vladsud commented Mar 25, 2022

curtisman commented Mar 25, 2022 • edited

vladsud commented Mar 25, 2022 • edited

curtisman commented Mar 26, 2022 • edited

vladsud commented Mar 27, 2022

markfields commented Apr 4, 2022

vladsud commented Apr 4, 2022

ghost commented Dec 18, 2022

vladsud commented Mar 24, 2022 •

edited

curtisman commented Mar 24, 2022 •

edited

curtisman commented Mar 25, 2022 •

edited

vladsud commented Mar 25, 2022 •

edited

curtisman commented Mar 26, 2022 •

edited