Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrader monitoring and alerts #28951

Merged
merged 6 commits into from Jul 17, 2023
Merged

Conversation

fspmarshall
Copy link
Contributor

This pr contains a collection of minor improvements to visibility/feedback when migrating to automatic upgrades:

  • Adds a new cluster alert to prompt users when some of their agents aren't using automatic upgrades, and said agents are falling behind (alert does not trigger if the median unenrolled agent version is newer than or equivalent to the median enrolled agent version).
  • Adds new metrics:
    • teleport_enrolled_in_upgrades: total number of instances enrolled in upgrades.
    • teleport_ upgrader_counts: instances enrolled per upgrader.
    • teleport_total_instances: total number of instances (for easy comparison).
  • Updates cloud-specific docs to include instructions for how to discover agents that aren't yet enrolled in upgrades.

note: this PR should not be merged until after https://github.com/gravitational/cloud/pull/5219, and should not be backported to v13 until after #28847 goes live in cloud.

@@ -53,6 +53,35 @@ updates.

## Enroll instructions

<Details
scope={["cloud"]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this only visible for the cloud scope?

It's also worth noting that most users tend not to use the scope switcher when they navigate between pages, so there's a good chance that users will be viewing the default (oss) scope and not see this Details box.

Finally, if this is a flow we expect most users to use, I would remove it from the Details box and make it part of the main body text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature is currently only available on cloud. This will eventually be able to be moved into the main docs, but we need to do some performance improvements first so that we can enable the feature by default. Until then, we have to scope this to cloud to avoid confusion.

We also suggest these commands in the alert so that users will be informed of this strategy even if they miss the docs.

lib/auth/periodic_test.go Outdated Show resolved Hide resolved
lib/auth/periodic_test.go Outdated Show resolved Hide resolved
lib/auth/periodic.go Outdated Show resolved Hide resolved
lib/auth/periodic.go Outdated Show resolved Hide resolved
lib/inventory/controller.go Outdated Show resolved Hide resolved
@fspmarshall fspmarshall force-pushed the fspmarshall/upgrader-visibility branch from 0fd002b to d729f23 Compare July 12, 2023 15:29
@fspmarshall fspmarshall requested review from zmb3 and ptgott July 12, 2023 15:36
api/internalutils/stream/stream_test.go Outdated Show resolved Hide resolved
api/internalutils/stream/stream_test.go Outdated Show resolved Hide resolved
lib/auth/auth.go Outdated Show resolved Hide resolved
@fspmarshall fspmarshall force-pushed the fspmarshall/upgrader-visibility branch from d729f23 to bcdacde Compare July 12, 2023 16:35
Copy link
Contributor

@ptgott ptgott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a minor suggestion to ensure that users see these instructions.

@fspmarshall fspmarshall force-pushed the fspmarshall/upgrader-visibility branch from f8c8c44 to 0426fa1 Compare July 17, 2023 14:06
@fspmarshall fspmarshall added this pull request to the merge queue Jul 17, 2023
Merged via the queue into master with commit a84681a Jul 17, 2023
30 checks passed
@fspmarshall fspmarshall deleted the fspmarshall/upgrader-visibility branch July 17, 2023 15:00
fspmarshall added a commit that referenced this pull request Jul 17, 2023
* add rate limit stream helper

* upgrader metrics & alert

* add docs for discovering upgrade enroll prospects

* update prehod protos

* Update docs/pages/management/operations/enroll-agent-into-automatic-updates.mdx

Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com>

---------

Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com>
github-merge-queue bot pushed a commit that referenced this pull request Jul 17, 2023
* add rate limit stream helper

* upgrader metrics & alert

* add docs for discovering upgrade enroll prospects

* update prehod protos

* Update docs/pages/management/operations/enroll-agent-into-automatic-updates.mdx



---------

Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com>
greedy52 pushed a commit that referenced this pull request Jul 19, 2023
* add rate limit stream helper

* upgrader metrics & alert

* add docs for discovering upgrade enroll prospects

* update prehod protos

* Update docs/pages/management/operations/enroll-agent-into-automatic-updates.mdx

Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com>

---------

Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants