-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor SummaryManager [0.10] #362
Conversation
} else if (this.clientId !== this.summarizer) { | ||
return "parentShouldNotSummarize"; | ||
} else { | ||
return undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return undefined [](start = 12, length = 16)
When do we get here? Worth drilling into sub-cases and providing proper reason for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never get here in our code. If we get here, we do not have a reason to stop.
if (!force && clientId === this._summarizer) { | ||
private start(attempt: number = 1) { | ||
if (attempt > this.maxRestarts) { | ||
this.logger.sendErrorEvent({ eventName: "MaxRestarts", maxRestarts: this.maxRestarts }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.logger.sendErrorEvent({ eventName: "MaxRestarts", maxRestarts: this.maxRestarts }); [](start = 12, length = 88)
Likely not for this PR:
Do we expect these errors to be specific to this client, or this document?
If some of these errors client-specific, should we force another client to be summarizer somehow?
Like maybe disconnecting this client (of there are more clients in this doc), maybe with 5 minute delay , to not make things much worse if all clients can't summarize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue #213 tracks this. I do want to address at some point, but I'm not sure how right now. I suspect it can be both. I think typically client-specific when they are on a different version, and document-specific when there is some corruption or a bug that hasn't been fixed yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Restart summarizer when it completes and do not run new summarizer if already running (#348) * Container handle errors during reload context [0.10] (#369) * Handle errors during reloadContext and await pause/resume calls * Always emit errors and reject promise on error * Change reloadContext to reject throw error on catch * Refactor SummaryManager [0.10] (#362) * Change SummaryManager to state machine * Remove shouldNotStop telemetry event * Clean up switch statement to prevent fallthrough * Small changes for PR feedback * Use finally for run completion * Summarize more frequently [0.10] (#371) * Quorum ops count towards summarizer heuristics; summarizer handle ops while summarizing or pending * Ignore summarizer leave messages, just in case * Refactor summarizer code * Add telemetry and small refactors * Add comments; small refactor of summary op handling; track seq instead of flag
Because of the multiple points of entry and async nature of SummaryManager, there were loopholes causing overlap in states. In order to make it easier to prevent bad summary states (multiple summarizers, no summarizers) and easier to understand, I've refactored it as a simpler state machine. If we want to optimize states/transitions, we can do so from this (hopefully) stable starting point.
The states are based off the SummaryManagerState enum:
Transitions between states only occur by promise fulfillment or
refreshSummarizer()
calls (previouslycomputeSummarizer()
). This will refresh the calculated summarizer client ID, and then try to switch between states.If elected the summarizer, the client will perpetually get stuck in the Starting state if summaries are disabled or the client itself is a spawned summarizer client.