Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will changing the values here cause problems for monitoring already set up to track/alarm on these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly, I can think of two ways this metric is (and can be) broken for people who have monitoring set up against these.
The first group would be people who are using this metric and are aware that the metric is emitting in microseconds even though the label is in seconds. If they have set their alerts accordingly (this would be weird but not impossible), then we would break their monitoring with this fix.
The second group are people who are using this metric as if this metric was working correctly, i.e. emitting latency in seconds. In that case, thresholds which are currently set for alerting would be off by orders of magnitude and this fix would actually make those alerts start working as intended.
Personally, I think we should just fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also for changing as this is an actual bug, but can you add an item to the changelog that this is a change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the metrics overhaul planned for 1.14 where a number of metrics are changing and we're documenting every single case including what to change. As a middle ground, let's add an already deprecated metric that is called
admission_latencies_milliseconds_summary
so people who are affected by the break would only have to change the metric name and not do a unit conversion. I think this would work well, as 1.14 is the "metric migration" release, where we have deprecated metrics as well as the (new) best practice following metrics and the deprecated ones will be removed in 1.15.This one is an interesting case as it's not just not following the best practice, but also incorrectly labels its unit. It will either way need a separate, additional changelog notice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with @brancz proposal.
Added
admission_latencies_milliseconds
andadmission_latencies_milliseconds_summary
for Backward compatible. PTAL