-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1985073: use 1m resolution for control plane cpu alerts #1201
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're refactoring this, I think it's easier to read if we rephrase from
100 - (avg idle) > 90
toavg idle < 10
.Also, this still doesn't account for holes in the
node_cpu_seconds_total
metric. My understanding of therate
call is that if the covered minute has anynode_cpu_seconds_total
data, butnode_cpu_seconds_total
is ticking up at 20% for 10s, whilenode_cpu_seconds_total
is missing for the other 50s, it will look like 0.2 * 0.1 = 0.02 = 2% idle, so theexpr
would match despite 20% being > 10%There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rate
does extrapolation based on the slope of the first and the last sample under the window.rate
also avoids extrapolating too far, extrapolation extends to half the sample interval when the first or the last sample is too far away from the windowso prometheus does not handle the missing data/gap as above. since
rate
already extrapolates for us i don't see it's necessary for the alert to take into account any gap in its calculation.(slack thread where we discussed it - https://coreos.slack.com/archives/C01CQA76KMX/p1626752724386400)