Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] Cluster upgrade monitor notifications support #195

Closed
craftyhouse opened this issue Jan 28, 2022 · 7 comments · Fixed by #197
Closed

[FEATURE REQUEST] Cluster upgrade monitor notifications support #195

craftyhouse opened this issue Jan 28, 2022 · 7 comments · Fixed by #197
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@craftyhouse
Copy link

Is your feature request related to a problem? Please describe.
As an operator I would like to know when cluster upgrades are occurring and their status. Today SFRP only supports notification if there is a failure. See https://docs.microsoft.com/en-gb/azure/service-fabric/service-fabric-cluster-upgrade-version-azure#register-for-notifications

Describe the solution you'd like
A clear and concise description of what you want to happen.

Able to easily configure an email notification to stay informed
https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-diagnostics-event-generation-operational#cluster-events

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Alternatives are described in https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-diagnostics-event-generation-operational#cluster-events , but not prescriptive. It's up to the customer to figure out and may incur more charges.

Additional context
Add any other context or screenshots about the feature request here.

@craftyhouse craftyhouse added the enhancement New feature or request label Jan 28, 2022
@GitTorre
Copy link
Member

This would make sense in ClusterObserver, which already reads and transmits cluster, node, and app level health information via etw events and telemetry (AppInsights, LogAnalytics) as a single instance service. FO runs on each node, so this type of monitoring would be redundant across instances. Will modify ClusterObserver to support this.

@andyrdean
Copy link

+1 please on behalf of my customer.

@GitTorre
Copy link
Member

GitTorre commented Feb 2, 2022

You can track progress in the develop branch. Initial impl is in place there.

@GitTorre GitTorre modified the milestones: 2.1.13, 3.1.24 Feb 2, 2022
@GitTorre
Copy link
Member

GitTorre commented Feb 8, 2022

This will ship in the next release, which should ship this week along with FO 3.1.24.

Explanation of this feature.

CO will query for Upgrade status across Cluster and Application scope. Each time it detects some change in upgrade state, it will emit telemetry and ETW that contains the details. In the case when you want to know if some upgrade failed, you will see a FailureReason field. This will contain the name of the upgrade step where the failure happened (e.g., it failed in HealthCheck step). Here is an example of CO LogAnalytics telemetry where a cluster upgrade was moving forward across UDs, but then RolledBack due to a failure in HealthCheck step (SF decided it was not safe to progress with the upgrade and began rolling back to previous version).

COUpgradeDetails

@GitTorre
Copy link
Member

GitTorre commented Feb 8, 2022

More details will be in CO Readme upon shipping.

@GitTorre GitTorre linked a pull request Feb 9, 2022 that will close this issue
@GitTorre
Copy link
Member

GitTorre commented Feb 9, 2022

This is now shipped.

@GitTorre GitTorre closed this as completed Feb 9, 2022
@andyrdean
Copy link

Awesome! Thanks Charles 👍.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants