-
Notifications
You must be signed in to change notification settings - Fork 39.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/util/workqueue/prometheus: fix double registration #77553
Conversation
/cc @brancz as you had some thoughts on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/sig instrumentation
/priority backlog
Interesting solution, I do wonder, shouldn't these metrics be vector metrics? Then we wouldn't have this problem in the first place. |
@brancz this is indeed a great idea 👍 let me play with that, now that this has a unit test. |
// an invalid or duplicate metric descriptor, | ||
// a previously registered descriptor with the same fqdn but different labels, | ||
// or inconsistent label names or help strings for the same fqdn. | ||
klog.Fatalf("failed to register metric %v name %v: %v", metric, name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/assign @logicalhan
as per:
#75737
klog.Fatalf() can be an anti-pattern in certain cases, but it's up to the maintainers.
i guess it's OK to check for AlreadyRegisteredError, but it is a question whether the process should hard-fail on metric registration (or log "a lot" of errors).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you verified that double registration renders the existing metrics defunct? I do not believe that that is actually the case (of course I could be wrong).
If you look in the documentation for prometheus.Desc
, then you can see that descriptors are considered equal if the fully-qualified name and the labels are the same. In the case of the workqueue metrics, even if the function is invoked multiple times, each metric which is returned is therefore considered identical to the registry.
@logicalhan yes I verified this by looking the registration logic in the registry, it first checks if the collector is already registered in kubernetes/vendor/github.com/prometheus/client_golang/prometheus/registry.go Lines 327 to 332 in e9af72c
and only sets it in the registry later if the above detection did not fail: kubernetes/vendor/github.com/prometheus/client_golang/prometheus/registry.go Lines 339 to 347 in e9af72c
and used a small verification snippet https://play.golang.org/p/WsnIqXbddzP |
Currently, if workqueue metrics are registered twice, these metrics will be ignored. This fixes it.
@brancz added vector metrics, where possible. The deprecated ones are not label based so I thing we have to use the Once we have consensus on this PR, I'll address the bazel failures as they seem mechanical. |
klog.Fatalf("failed to register metric %v name %v: %v", metric, name, err) | ||
return nil | ||
} | ||
|
||
func (prometheusMetricsProvider) NewDeprecatedDepthMetric(name string) workqueue.GaugeMetric { | ||
depth := prometheus.NewGauge(prometheus.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind, as discussed with @logicalhan we can go with simple singletons here and can omit mustRegister at all 👍 I will adapt the PR accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@logicalhan @brancz unfortunately declaring those deprecated metrics as singletons doesn't seem to work, as the subsystem is dynamic (sorry when I discussed this i was blind).
To omit the ugly mustRegister method, I added another commit which completely removes the deprecated metrics for good. @brancz the question is, if we can do this at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time on this!
klog.Fatalf("failed to register metric %v name %v: %v", metric, name, err) | ||
return nil | ||
} | ||
|
||
func (prometheusMetricsProvider) NewDeprecatedDepthMetric(name string) workqueue.GaugeMetric { | ||
depth := prometheus.NewGauge(prometheus.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
6a26efd
to
d019b06
Compare
This deletes deprecated metrics and simplifies registration.
/test pull-kubernetes-dependencies |
/retest |
Thanks for cleaning this up! /lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Looks good to me as well, thanks so much for fixing this!
ping @logicalhan @brancz Not sure what |
I think that was an outdated status. This now just needs an approver. |
ping @smarterclayton for review and approval :-) |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: s-urbaniak, smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
1 similar comment
/retest Review the full test history for this PR. Silence the bot with an |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Currently, if workqueue metrics are registered twice, these metrics will be ignored.
This fixes it.
Which issue(s) this PR fixes:
Fixes #76956
Does this PR introduce a user-facing change?: