Skip to content

Simplified model for Audience; Leverage join signal's referenceSequenceNumber for catch up logic#12164

Merged
vladsud merged 5 commits intomicrosoft:mainfrom
vladsud:JoinSignals_final
Nov 13, 2022
Merged

Simplified model for Audience; Leverage join signal's referenceSequenceNumber for catch up logic#12164
vladsud merged 5 commits intomicrosoft:mainfrom
vladsud:JoinSignals_final

Conversation

@vladsud
Copy link
Copy Markdown
Contributor

@vladsud vladsud commented Sep 28, 2022

Final break out from #12042

Most important changes:

  1. "self" will show up at the same time in Quorum and Audience for "write" connection. This is done by ignoring signals for such connections and duplicating Quorum changes into Audience. This gives clients simpler picture to work with.
    • I'd think it makes sense to merge these two data structures in the future, and instead of having 2 of them, provide various adapters / filters that keep usage simple for scenarios where someone needs only "write" clients (as an example).
  2. "connected" transition will happen after "self" shows up in Audience. This significantly simplifies programming model for users
    • In case of failure (currently defined as 5 seconds of not receiving join signal), we force reconnect. It's hard to say how often it happens and if such recovery is too expensive / has good success rates. If tracking performance of join op any indicator, likely such recovery will not be very successful, but tracking it in telemetry will help us find real bugs in relay service.

Two main pieces feedback I'm looking for:

  • What to do with NoDeltaConnection
  • What to do in rare cases where we do not get join signal at all.
    • I'm planning to use stress test feedback to make a final call here.

Back-compat considerations:

  • There should be no back-compat concerns here, as behavior (where changed) was never defined. I.e., order in which clients show up in Audience and Quorum are undefined.
  • Similar, order of "connected" transition and presence of "self" in Audience is undefined.

Note: depending on results, I may check in this code dark (i.e. flip default feature gate value) and keep working on enabling it.

@msfluid-bot
Copy link
Copy Markdown
Collaborator

msfluid-bot commented Sep 28, 2022

Could not find a usable baseline build with search starting at CI e7783bf

Generated by 🚫 dangerJS against d1c6949

Comment thread packages/loader/container-loader/src/connectionManager.ts Outdated
Comment thread packages/loader/container-loader/src/connectionStateHandler.ts
Comment thread packages/loader/container-loader/src/connectionStateHandler.ts Outdated
@vladsud
Copy link
Copy Markdown
Contributor Author

vladsud commented Sep 30, 2022

Ping - would love to keep moving in making Audience & connectivity transitions better :)

@markfields
Copy link
Copy Markdown
Member

Sorry, wasn't able to look today, will set another reminder for Monday!

Comment thread packages/loader/container-loader/src/connectionStateHandler.ts Outdated
Comment thread packages/loader/container-loader/src/connectionStateHandler.ts Outdated
Comment thread packages/loader/container-loader/src/connectionStateHandler.ts Outdated
Comment thread packages/loader/container-loader/src/connectionStateHandler.ts Outdated
Comment thread packages/loader/container-loader/src/connectionStateHandler.ts Outdated
Comment thread packages/loader/container-loader/src/connectionStateHandler.ts Outdated
Comment thread packages/loader/container-loader/src/connectionStateHandler.ts
Comment thread packages/loader/container-loader/src/connectionStateHandler.ts Outdated
Comment thread packages/loader/container-loader/src/connectionStateHandler.ts Outdated
Comment thread packages/loader/container-loader/src/connectionStateHandler.ts
connectionStateHandler.receivedConnectEvent(connectionDetails);
assert.strictEqual(connectionStateHandler.connectionState, ConnectionState.CatchingUp,
"Client should be in connecting state");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next few lines: In practice how could receivedAddMemberEvent be called before initProtocol?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to remember, but I think here is what I'm trying to do here:
First of all, this does not result in a call to ConnectionStateHandler, it only sets up quorum state.
There is a tiny tiny race possible, where these events happen in this order:
1.a connection is established (no "cached" mode is used, so it happens in parallel / faster than other steps)
2. some other client produces a summary
3. we get "lucky" and load from that summary as our initial snapshot
4. ConnectionStateHandler.initProtocol is called, "self" is already in the quorum.

We can prevent that case by reordering # 1 & # 3 in Container.load().

Thoughts? Meanwhile, I'll add comments for that sequence here. and in the code

Comment thread packages/loader/container-loader/src/test/connectionStateHandler.spec.ts Outdated
Copy link
Copy Markdown
Member

@markfields markfields left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Please take a look at my comments, but I will sign off now to unblock you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: loader Loader related issues base: main PRs targeted against main branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants