Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential for false AlreadyRegistered errors #633

Closed
awilliams opened this issue Aug 13, 2019 · 3 comments
Closed

Potential for false AlreadyRegistered errors #633

awilliams opened this issue Aug 13, 2019 · 3 comments
Assignees
Labels

Comments

@awilliams
Copy link

awilliams commented Aug 13, 2019

This issue describes the potential for false AlreadyRegisteredError errors when registering collectors.

The below test case is a simplified version taken from a real-world application where registration of collectors incorrectly failed with duplicate metrics collector registration attempted errors, even though they were all unique. The test case attempts to shows the likelihood of these false errors and also the underlying uniqueness algorithm.

A potential fix would be to change how collectorID is calculated. Instead of summing the descriptor hashes, the fnv hashing function (same as used by the descriptors) could be used to create a hash of the descriptor hashes. I'm not sure why summing the hashes leads to collisions, but experimentally have seen that hashing the hashes does not have the same weakness.

Go Playground Test Cases

Related issues: #222

Version of Prometheus: master at the time of writing (4efc3ccc7a660996803fff4721a2643b33dc3f77)

@beorn7
Copy link
Member

beorn7 commented Aug 14, 2019

Yeah, I was naively hoping this would never became a problem in practice. Thanks for proving me wrong. (Meant seriously, not sarcastically.)

In v2 of this library, I'm pretty sure we'll simply drop hashing here at all. It's probably not really worth it as this code path is unlikely to be performance critical. As hinted at in #222.

Hashing the hashes would be an easy improvement, but the fnv hash function is not very strong anyway (and our original sin was to ever use it without hash collision detection in place).

I'm more inclined to use a stronger hash function (like the one used in prometheus/prometheus now, which is even faster than fnv if the hardware supports it). Or somehow kick the hashing out in v1 already.

@beorn7 beorn7 self-assigned this Aug 14, 2019
@beorn7 beorn7 added the bug label Aug 14, 2019
@awilliams
Copy link
Author

awilliams commented Aug 15, 2019

Thanks @beorn7 for the insight.

We have come up with a workaround for our particular case. I'll mention it below in case it's useful for anyone.

Using names from the included test case, instead of having myCollector implement prometheus.Collector, we instead added the following methods:

type myCollector struct {
    a, b, c, d prometheus.Collector
}

func (m *myCollector) Register(r prometheus.Registerer) error {
// registers each sub-collector using r
}

func (m *myCollector) Unregister(r prometheus.Registerer) {
// unregisters each sub-collector using r
}

By inverting how registration happens, it seems to avoid the collisions described above.

beorn7 added a commit that referenced this issue Oct 14, 2019
Signed-off-by: beorn7 <beorn@grafana.com>
@beorn7
Copy link
Member

beorn7 commented Oct 15, 2019

“Fixed”… :o)

@beorn7 beorn7 closed this as completed Oct 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants