Address issues from earlier downgraded connection mode change#8468
Address issues from earlier downgraded connection mode change#8468vladsud merged 1 commit intomicrosoft:mainfrom
Conversation
Addressing microsoft#8398 - New Prod error: assert(this.runtime.deltaManager.active, 0x25d /* "We should never connect as 'read'" */); Partial undo of https://github.com/microsoft/FluidFramework/pull/7753/files
|
Looks like all of these solutions have a same problem - summarizer will transition to "read" state in absence of activity. Do we every want this to happen? Should there be some configuration that tells the server to not disconnect summarizer client? |
|
Today, if the client gets a "leave" op, the connection is still maintainer as "write"? Basically, connection.mode will return "write"? |
|
The main issue (and difference in solutions) we should look for - how the state of the system is exposed to various layers and is it consistent / up to expectations. We should strive to make sure that layers (for most part) are not concerned with complexity and that the message is clear - as long as connection is in flight, it has all properties of connection locked (not changed). So, answering your question - yes, some options will transition to "read". However, the main question we should ask - is this visible? I.e., in solutions 4-5 summarizer does not know about it and acts as if it works with "write" connection, and it is consistently exposed to runtime as "write" connection (vs. current state where DM.active changes on the fly). But there is reconnection on any op being sent, so runtime is essentially lied to, but it has no way to discover that (as disconnection can happen at any moment in time for any reason).
After PR #7753 (current state of main), connection mode (as observed by most layers) will change to "read" on leave op. With this change, it will stay as "write". Long term I want to move to # 2 solution, but take time to do it properly (this change is mostly back out of parts of previous change) |
|
Note - opened #8483 to consider next steps, if any. |
Addressing Issue #8398 - New Prod error: assert(this.runtime.deltaManager.active, 0x25d /* "We should never connect as 'read'" */);
Partial undo of #7753 that causes assert above.
The core of the problem - relay service sends leave op for "write" connection after 5 minutes of inactivity.
This leaves connection in weird state - most of the system believes it's "write" connection, but you can't send ops on that connection. Above mentioned PR attempted (among other things) to remove this discrepancy for loader layer, but I missed that runtime layer does not expect change in properties of connection.
This change constraints visible changes to DM layer, making other layers believe that we still work with "write" connection (see items below for more details).
It's worth raising that we can go in various ways about it, roughly in priority order: