Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor SummaryManager [0.10] #362

Merged
merged 5 commits into from
Oct 21, 2019

Conversation

arinwt
Copy link
Contributor

@arinwt arinwt commented Oct 18, 2019

Because of the multiple points of entry and async nature of SummaryManager, there were loopholes causing overlap in states. In order to make it easier to prevent bad summary states (multiple summarizers, no summarizers) and easier to understand, I've refactored it as a simpler state machine. If we want to optimize states/transitions, we can do so from this (hopefully) stable starting point.

The states are based off the SummaryManagerState enum:

  • Off: When the SummaryManager is not running a summarizer. In this state, the client is either not the elected summarizer or disconnected.
  • Starting: When SummaryManager should summarize it enters this state. This state is spent creating the summarizer (async).
  • Running: After creating the summarizer, if the client should still summarize, it enters this state. It will remain in this state until the summarizer client it spawned has closed.

Transitions between states only occur by promise fulfillment or refreshSummarizer() calls (previously computeSummarizer()). This will refresh the calculated summarizer client ID, and then try to switch between states.

If elected the summarizer, the client will perpetually get stuck in the Starting state if summaries are disabled or the client itself is a spawned summarizer client.

@arinwt arinwt requested a review from vladsud October 18, 2019 00:37
@arinwt arinwt changed the title Refactor SummaryManager Refactor SummaryManager [0.10] Oct 19, 2019
} else if (this.clientId !== this.summarizer) {
return "parentShouldNotSummarize";
} else {
return undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return undefined [](start = 12, length = 16)

When do we get here? Worth drilling into sub-cases and providing proper reason for them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never get here in our code. If we get here, we do not have a reason to stop.

if (!force && clientId === this._summarizer) {
private start(attempt: number = 1) {
if (attempt > this.maxRestarts) {
this.logger.sendErrorEvent({ eventName: "MaxRestarts", maxRestarts: this.maxRestarts });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.logger.sendErrorEvent({ eventName: "MaxRestarts", maxRestarts: this.maxRestarts }); [](start = 12, length = 88)

Likely not for this PR:
Do we expect these errors to be specific to this client, or this document?
If some of these errors client-specific, should we force another client to be summarizer somehow?
Like maybe disconnecting this client (of there are more clients in this doc), maybe with 5 minute delay , to not make things much worse if all clients can't summarize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #213 tracks this. I do want to address at some point, but I'm not sure how right now. I suspect it can be both. I think typically client-specific when they are on a different version, and document-specific when there is some corruption or a bug that hasn't been fixed yet.

Copy link
Contributor

@vladsud vladsud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@arinwt arinwt merged commit 18d4898 into microsoft:release/0.10 Oct 21, 2019
@arinwt arinwt deleted the summary-manager-states branch October 21, 2019 23:49
@arinwt arinwt mentioned this pull request Oct 31, 2019
arinwt pushed a commit that referenced this pull request Oct 31, 2019
* Restart summarizer when it completes and do not run new summarizer if already running (#348)

* Container handle errors during reload context [0.10] (#369)

* Handle errors during reloadContext and await pause/resume calls

* Always emit errors and reject promise on error

* Change reloadContext to reject throw error on catch

* Refactor SummaryManager [0.10] (#362)

* Change SummaryManager to state machine

* Remove shouldNotStop telemetry event

* Clean up switch statement to prevent fallthrough

* Small changes for PR feedback

* Use finally for run completion

* Summarize more frequently [0.10] (#371)

* Quorum ops count towards summarizer heuristics; summarizer handle ops while summarizing or pending

* Ignore summarizer leave messages, just in case

* Refactor summarizer code

* Add telemetry and small refactors

* Add comments; small refactor of summary op handling; track seq instead of flag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants