-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Client Side Only Op Processing Controller #4718
Improve Client Side Only Op Processing Controller #4718
Conversation
⯆ @fluidframework/base-host: -23 Bytes
■ @fluid-example/bundle-size-tests: No change
Baseline commit: 6eae286 |
private readonly lastInboundPerClient = new Map<string, ISequencedDocumentMessage>(); | ||
private pendingWriteConnection = false; | ||
|
||
public expectingInboundFrom(outbound: DeltaManagerMonitor): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to have a comment above it describing what it tries to achieve.
Glancing through code, I'm not sure I'm getting all the details...
// check if we are waiting to see a message from outbound | ||
const lastInboundForOutbound = this.lastInboundPerClient.get(outbound.clientId); | ||
if (lastInboundForOutbound !== undefined) { | ||
return outbound.lastOutbound.clientSequenceNumber > lastInboundForOutbound.clientSequenceNumber; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this check can lie due to noop coalescing on the server. I.e. it can result in false positive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure how that would happen. no-op are excluded from lastOutbound and server NoOp should have a null client id so won't be tracked in lastInboundPerClient.
|
||
// has pending work will be true for outbound until it receives it's own seq | ||
// this check ensures the other client has seen the same ops as the outbound | ||
return outbound.latestSequenceNumber > this.latestSequenceNumber; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to answer slightly different question - "is there any work", not "is there work from given client".
I.e. should it return false once we got here, similar to outbound.lastOutbound === undefined?
} | ||
|
||
// check if we are waiting to see a message from outbound | ||
const lastInboundForOutbound = this.lastInboundPerClient.get(outbound.clientId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes there are no disconnects, i.e. clientId does not change through tests, right? Is this important condition? I.e. should this class assert if disconnects happened (as part of UT mimicking disconnection logic) as this code would start lying in some cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we clear lastOutbound on disconnect. i think this can lead to missing outbound work if the sending client disconnects with ops in flight, but that was already the case for this class. i don't think we can assert on disconnects, as this should work against real servers, so disconnects can happen
// If we are not connected, we assume that we are trying to, and classify as pending work | ||
return this.clientId === undefined || this.pendingCount !== 0; | ||
return !this.deltaManager.disposed | ||
&& (this.pendingWriteConnection || this.pendingCount !== 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE pendingWriteConnection: There is really nothing requiring IRuntime to resubmit an op on a nack.
I.e. runtime may chose to not submit anything on reconnection. Great example of that would be potential improvement in Consensus Registry DDS, where it would not resent an op where it knows for sure it would be ignored by all clients (today we resend everything, i.e. we do not do any optimizations).
Can we make this logic more generic and not rely on any particular behavior of IRuntime around nacks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i poked around a bit here when making this change, but this transition is hard to track reliably. open to other options if you have ideas about how to make this better
const systemLeaveMessage = message as ISequencedDocumentSystemMessage; | ||
const clientId = JSON.parse(systemLeaveMessage.data) as string; | ||
this.lastInboundPerClient.delete(clientId); | ||
} | ||
|
||
if (this.clientId === undefined) { | ||
// Ignore message when we are not connected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know if it matters, but logic below comparing of message.clientId !== this.clientId does not correctly account how disconnect/reconnect work, because we register for wrong event in constructor:
deltaManager.on("connect", (details) => this.connect(details.clientId));
clientId should be switched only when Container raises "connected" event, which happens only after inbouding all ops up to own join op. Delta Manager's "connect" fires too early and is not propagated to IRuntime, it should only be used by hosts (exposed from Container as "connect" event) for connected/disconnected banner logic.
I've opened an issue somewhere (assigned to Matt) to reevaluate slit of logic between Container & DeltaManager, as it's should be in one place (today, while DeltaManager sort of does not know about this, it actually does - there are couple places where it does unnatural things to account for this discrepancy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. the op processing controller is in a bit of a weird place. in general, i'd like to move away from it, as it's not reliable for a lot of the reasons you call out. in general i'd rather see tests await more deterministic side effects but refactoring for that is not something i think we have time for currently.
…to improve-opProcessingController
These tests weren't registering deltamangers with the op processing controller before the ops were sent, so the controller didn't know about them. related to #4718
This change improves the op processing controller to run more reliably when only used client side. This will allow it to run our tests against other servers like r11s, and odsp, as we no longer need to leverage server data to orchestrate op processing.
related #4655 #4660