Skip to content

Switch default connection mode to "read".#7393

Closed
vladsud wants to merge 6 commits intomicrosoft:mainfrom
vladsud:DefaultConnectionMode
Closed

Switch default connection mode to "read".#7393
vladsud wants to merge 6 commits intomicrosoft:mainfrom
vladsud:DefaultConnectionMode

Conversation

@vladsud
Copy link
Copy Markdown
Contributor

@vladsud vladsud commented Sep 7, 2021

This is big contributor to issue #7312 (at least in stress tests)

@github-actions github-actions Bot added the area: loader Loader related issues label Sep 7, 2021
@vladsud
Copy link
Copy Markdown
Contributor Author

vladsud commented Sep 7, 2021

@DLehenbauer - FYI, another set of property-DDS failures that do not really look related to the change, plus I can't repro locally :(
I'll look a bit closure into it, as it's possible I can find relationship between change and failing tests, but inability to repro locally makes it really hard to diagnose these issues.
Note - this change is extracted from PR #7387 that also hit it, so it's not 1-off failure.

@andre4i
Copy link
Copy Markdown
Contributor

andre4i commented Sep 7, 2021

I also ran into those errors back when I was looking at #6033 (comment)

@vladsud
Copy link
Copy Markdown
Contributor Author

vladsud commented Sep 7, 2021

@andre4i, I'm not sure it's directly related. In a sense that I do not think property-DDS test fail because they rely on old flow of creation files on socket connection - I believe you killed that flow completely (leaving only snapshot tests rely on opening files without snapshots). Here it feels it's something different, though I have not had time to drill into it (and it's hard as there is no local repro for me).

@andre4i
Copy link
Copy Markdown
Contributor

andre4i commented Sep 7, 2021

No, I mean in one of my experiments, I tried to do exactly what's in this change (based on the comment in the original issue) along with my changes related to the 'existing' field and ran into these failures also. This is just to put another timestamp in the past for these failures occurring.

vladsud added a commit that referenced this pull request Oct 8, 2021
…DDS tests (#7784)

Slight changes in reconnect logic (see PRs #7753, #7393) result in failures in PropertyDDS UTs.

Looking a bit deeper, I can't easily follow intentions in OpProcessingController.
For example, OpProcessingController.process(dm1) will leave all but dm1 paused. UTs do rely on that (it's pretty clearly from some of them), in other places it feels like it's unintended result. And I do not think it correctly waits for all pending activity to be flushed (as observed in above mentioned PRs).

Given that it's deprecated and repo uses LoaderContainerTracker (explicitly or implicitly through), it's time to do the move.

Some random tests fail with this change, but are addressed by increasing timeout.
We saw this problem before with these tests - see pending #7350
@vladsud
Copy link
Copy Markdown
Contributor Author

vladsud commented Oct 9, 2021

@DLehenbauer, @evaliyev, @karlbom, reaching out to you as you seems to be main contributors to PropertyDDS tests in question.
I believe this PR exposes some eventual consistency issues in PropertyDDS.
It's easiest to explain it with "inserting properties into both trees" that fails.

Before this PR, 2 containers involved here were connected as "write" connection (when test starts), and thus they would both send concurrently ops as result of insertProperties() calls.
Ops from both clients would have same base (as they were sent when neither container knew anything about ops of other container).

With this PR, second container is now connected as "read" connection initially.
That causes it to reconnect (upgrade connection to "write) as result of generating ops, which is async operation.
Because of that, it becomes behind, and needs to ingest all recent ops in the document BEFORE it can send its own local changes.
As result, it has to rebase its pending changes (while injecting ops from first client).

That's where we hit this error (it's not in the logs - something worth looking into to fix, as this error closes container and causes downstream failures that are visible in the logs):

PR-036: ArrayProperty: insert range - Start offset is invalid: 4
at ArrayProperty._insertRangeWithoutDirtying (E:\Git\FluidFramework3\experimental\PropertyDDS\packages\property-properties\dist\properties\arrayProperty.js:593:19)
at ArrayProperty._applyChangeset (E:\Git\FluidFramework3\experimental\PropertyDDS\packages\property-properties\dist\properties\arrayProperty.js:917:30)
at NodeProperty._applyChangeset (E:\Git\FluidFramework3\experimental\PropertyDDS\packages\property-properties\dist\properties\indexedCollectionBaseProperty.js:600:56)
at NodeProperty.applyChangeSet (E:\Git\FluidFramework3\experimental\PropertyDDS\packages\property-properties\dist\properties\baseProperty.js:322:14)
at SharedPropertyTree._applyRemoteChangeSet (e:\Git\FluidFramework3\experimental\PropertyDDS\packages\property-dds\dist\propertyTree.js:438:24)
at SharedPropertyTree.processCore (e:\Git\FluidFramework3\experimental\PropertyDDS\packages\property-dds\dist\propertyTree.js:195:26)
at SharedPropertyTree.process (E:\Git\FluidFramework3\packages\dds\shared-object-base\dist\sharedObject.js:323:14)

This basically means that propretyDDS breaks eventual consistency guarantee.

It's easy to demonstrate that my PR does not change order of operations sent on the wire (though even if it did, it should not matter).
One can insert await opProcessingController.processOutgoing(container1); after first two inserts and it would not change anything - test would continue to succeed without my change and would continue to fail with my change.

I'd love to get your help here in investigating and addressing this issue.

@ghost
Copy link
Copy Markdown

ghost commented Apr 14, 2022

This issue has been automatically marked as stale because it has had no activity for 180 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!

@ghost ghost closed this Apr 22, 2022
@vladsud vladsud deleted the DefaultConnectionMode branch February 8, 2024 16:15
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants