Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements in pending state transitions related to presence of clientId #20768

Closed
wants to merge 3 commits into from

Conversation

vladsud
Copy link
Contributor

@vladsud vladsud commented Apr 21, 2024

I might be wrong with the direction of this PR - it's based on my understanding / learning of how code works, based on tracing code and running through tests. If I'm wrong, consider that my learning exercise :)

I started with making one line change in serializedStateManager.ts, in order to bring more consistency (as part of #20767) to states we present to the user, including through reloads.

But I run into a bunch of problems that do look like bugs, or at lease very hard to follow implicit dependencies that are not documented anywhere.

  1. Dropping clientId when there is no pending state makes sense (as being correct), but nothing tells that is has to happen, and I do not see any explicit dependency in runtime that it relies on it.
  2. If container runtime can't rely on it, then it has to preserve this.savedOps always, as these will be counted as local ops and things will explode if pendingStateManger can't find them.
  3. The check in ContainerRuntime.getPendingLocalState() looks fishy (and wrong with # 2 change above). It should not depend on details of pending state manager, and rather rely on if it has any pending state to offer.
    • Overall, I do not see any benefits or improvements from not returning "empty" pending state (i.e. undefined, as opposed to defined but empty). It helps remove another assumption / implicit dependency (logic in Container.ts returning {} when in fact it does not know anything about type of that field.
  4. I changed logic in applyStashedOpsAt() to make it on-par with previous logic (as it would otherwise fail "snapshot was refreshed" test after all above changes), but I believe the current logic has serious bug. I do not see any communication between container runtime and loader layer communicating that potential snapshot is safe to use. I.e. that its sequence number is not higher than lowest reference sequence number across pending ops. If it refreshes too far, and there is no reconnect to socket (that forces op rebasing), then next serialization could produce broken state as it will make it impossible to apply stashed ops.
  5. Mostly unrelated to all above: I do not see us propagating savedOp metadata property from grouped ops (as visible in Container) to ungrouped ops. OpGroupingManager.ungroupOp() does not do it as far as I can see. I think we are getting lucky with ID compression ops as they are usually not grouped (there is usually a single ID compression op, as it's part of its own batch). When/if any of those assumptions change, this code would stop working.
    • Propagate properly this property.

@github-actions github-actions bot added base: main PRs targeted against main branch area: loader Loader related issues area: runtime Runtime related issues labels Apr 21, 2024
@msfluid-bot
Copy link
Collaborator

@fluid-example/bundle-size-tests: +435 Bytes
Metric NameBaseline SizeCompare SizeSize Diff
aqueduct.js 452.76 KB 452.89 KB +132 Bytes
azureClient.js 549.21 KB 549.31 KB +101 Bytes
connectionState.js 680 Bytes 680 Bytes No change
containerRuntime.js 256.01 KB 256.14 KB +132 Bytes
fluidFramework.js 339.57 KB 339.57 KB No change
loader.js 133.01 KB 132.98 KB -31 Bytes
map.js 41.42 KB 41.42 KB No change
matrix.js 143.41 KB 143.41 KB No change
odspClient.js 517.09 KB 517.19 KB +101 Bytes
odspDriver.js 97.06 KB 97.06 KB No change
odspPrefetchSnapshot.js 41.93 KB 41.93 KB No change
sharedString.js 161.19 KB 161.19 KB No change
sharedTree.js 339.57 KB 339.57 KB No change
Total Size 3.15 MB 3.15 MB +435 Bytes

Baseline commit: fb1c1da

Generated by 🚫 dangerJS against edbffb4

@vladsud vladsud changed the title Improvements in pending state transitions related to present of clientId Improvements in pending state transitions related to presence of clientId Apr 23, 2024
@vladsud
Copy link
Contributor Author

vladsud commented Apr 24, 2024

FYI - Looks like this PR conflicts with #20427, and possibly some of the issues raised in this PR are addressed by it.

@dannimad
Copy link
Contributor

Hi Vlad. I think most of your concerns were solved in the PR you mentioned above. Once we stopped dropping the clientID, all sort of bugs and fishy behavior surfaced, which were solved in that PR. Let me know if you still have concerns about points 1 through 4, since 5 most likely hasn't been fixed.

@vladsud
Copy link
Contributor Author

vladsud commented Apr 25, 2024

Forked savedOp changed into #20837.

vladsud added a commit that referenced this pull request Apr 25, 2024
… using "read" connection. (#20817)

First step to enable Audience to have same behavior for "read" and "write" clients
- "connected" event raised when "self" is present in Audience.

Next step:
- client is caught up with latest ops.

For deeper description of work in this space, please see first PR below.

This is continuation of this work:
- #20703
- #20752
- #20704
- #20745
- #20766

And active PRs (do not block this PR, but make whole story more consistent for serialized / rehydrated containers):
- #20768
- #20767

Changeset added in previous PR in the same version (part of it should have been included in this PR, but made it earlier by accident):
[.changeset/cruel-trees-dance.md](https://github.com/microsoft/FluidFramework/pull/20215/files#diff-8f706b3e769f698985bea8fbf9bb3fb0a0a8798de53ad3f51cbace1dd3d1d2b2)
@vladsud
Copy link
Contributor Author

vladsud commented Apr 25, 2024

Closing in favor of

Note - I made a mistake on Line 206 in pendingStatemanager.ts in this PR. Comparison should have been (for "initial ops") based on sequence number! Same issue as in first PR - thinking in sequence numbers is hard!

@vladsud vladsud closed this Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: loader Loader related issues area: runtime Runtime related issues base: main PRs targeted against main branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants