Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/util/workqueue/prometheus: fix double registration #77553

Merged
merged 2 commits into from
Jul 23, 2019

Conversation

s-urbaniak
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

Currently, if workqueue metrics are registered twice, these metrics will be ignored.
This fixes it.

Which issue(s) this PR fixes:

Fixes #76956

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 7, 2019
@s-urbaniak
Copy link
Contributor Author

/cc @brancz as you had some thoughts on the prometheus.AlreadyRegisteredError approach
/cc @krasi-georgiev :-)

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label May 7, 2019
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/sig instrumentation
/priority backlog

@k8s-ci-robot k8s-ci-robot added sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 7, 2019
@brancz
Copy link
Member

brancz commented May 7, 2019

Interesting solution, I do wonder, shouldn't these metrics be vector metrics? Then we wouldn't have this problem in the first place.

@s-urbaniak
Copy link
Contributor Author

@brancz this is indeed a great idea 👍 let me play with that, now that this has a unit test.

// an invalid or duplicate metric descriptor,
// a previously registered descriptor with the same fqdn but different labels,
// or inconsistent label names or help strings for the same fqdn.
klog.Fatalf("failed to register metric %v name %v: %v", metric, name, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign @logicalhan
as per:
#75737

klog.Fatalf() can be an anti-pattern in certain cases, but it's up to the maintainers.
i guess it's OK to check for AlreadyRegisteredError, but it is a question whether the process should hard-fail on metric registration (or log "a lot" of errors).

Copy link
Member

@logicalhan logicalhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you verified that double registration renders the existing metrics defunct? I do not believe that that is actually the case (of course I could be wrong).

If you look in the documentation for prometheus.Desc, then you can see that descriptors are considered equal if the fully-qualified name and the labels are the same. In the case of the workqueue metrics, even if the function is invoked multiple times, each metric which is returned is therefore considered identical to the registry.

@s-urbaniak
Copy link
Contributor Author

@logicalhan yes I verified this by looking the registration logic in the registry, it first checks if the collector is already registered in

if existing, exists := r.collectorsByID[collectorID]; exists {
return AlreadyRegisteredError{
ExistingCollector: existing,
NewCollector: c,
}
}

and only sets it in the registry later if the above detection did not fail:

// Only after all tests have passed, actually register.
r.collectorsByID[collectorID] = c
for hash := range newDescIDs {
r.descIDs[hash] = struct{}{}
}
for name, dimHash := range newDimHashesByName {
r.dimHashesByName[name] = dimHash
}
return nil

and used a small verification snippet https://play.golang.org/p/WsnIqXbddzP

Currently, if workqueue metrics are registered twice, these metrics will be ignored. This fixes it.
@s-urbaniak
Copy link
Contributor Author

@brancz added vector metrics, where possible. The deprecated ones are not label based so I thing we have to use the AlreadyRegisteredError approach on these.

Once we have consensus on this PR, I'll address the bazel failures as they seem mechanical.

klog.Fatalf("failed to register metric %v name %v: %v", metric, name, err)
return nil
}

func (prometheusMetricsProvider) NewDeprecatedDepthMetric(name string) workqueue.GaugeMetric {
depth := prometheus.NewGauge(prometheus.GaugeOpts{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, as discussed with @logicalhan we can go with simple singletons here and can omit mustRegister at all 👍 I will adapt the PR accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@logicalhan @brancz unfortunately declaring those deprecated metrics as singletons doesn't seem to work, as the subsystem is dynamic (sorry when I discussed this i was blind).

To omit the ugly mustRegister method, I added another commit which completely removes the deprecated metrics for good. @brancz the question is, if we can do this at this point.

Copy link
Member

@logicalhan logicalhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time on this!

klog.Fatalf("failed to register metric %v name %v: %v", metric, name, err)
return nil
}

func (prometheusMetricsProvider) NewDeprecatedDepthMetric(name string) workqueue.GaugeMetric {
depth := prometheus.NewGauge(prometheus.GaugeOpts{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label May 13, 2019
@s-urbaniak s-urbaniak force-pushed the fix-76956 branch 2 times, most recently from 6a26efd to d019b06 Compare May 14, 2019 10:54
This deletes deprecated metrics and simplifies registration.
@s-urbaniak
Copy link
Contributor Author

/test pull-kubernetes-dependencies

@brancz
Copy link
Member

brancz commented May 16, 2019

/retest

@brancz
Copy link
Member

brancz commented May 16, 2019

Thanks for cleaning this up!

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 16, 2019
Copy link
Member

@logicalhan logicalhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Looks good to me as well, thanks so much for fixing this!

@s-urbaniak
Copy link
Contributor Author

ping @logicalhan @brancz Not sure what Not mergeable. Must be in milestone v1.15. means. I guess this is just missing some label?

@brancz
Copy link
Member

brancz commented Jul 15, 2019

I think that was an outdated status. This now just needs an approver.

cc @smarterclayton

@s-urbaniak
Copy link
Contributor Author

ping @smarterclayton for review and approval :-)

@smarterclayton
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: s-urbaniak, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 22, 2019
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

duplicate metrics collector registration attempted
7 participants