-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash collision between labels will lead to incorrect results #12519
Comments
You're correct: there are a number of places in Prometheus where Labels hash collisions are not handled. They seem to be quite rare in the wild; #5724 is the only example I can see reported here. By "birthday problem" math with 6 million series there is a 1 in a million chance of a collision. Working on #12993 gave me the idea that we could produce a 256-bit hash for every series, then use that everywhere that the 64-bit hash is currently used. Collisions would be astronomically unlikely. A (likely incomplete) list of places
There are also hashes of subsets of labels, which would not be helped by pre-computing a 256-bit hash.
|
I must admit that the untreated hash collisions are still giving me the creeps. My main concern is that we often use fast non-cryptographic hashes, which are probably much farther away from perfect pseudo-random distribution of hash values than cryptographic hashes. Pardon me if I use the wrong terminology here (I'm not a proper statistician), but the gist is that the probability of a collision is much higher than calculated via the birthday problem math if the hash values are correlated with the original values rather than randomly and equally distributed. In prometheus/prometheus, we use xxHash, which is probably better than fnv64a, which we use in prometheus/client_golang. (See prometheus/client_golang#220 for the hash collision handling added in client_golang eventually, but I'm not so sure anymore it deals with all cases properly.) While properly dealing with all variations of hash collisions in a performance-preserving way would be cool, a sufficiently low chance of hash collisions is also "good enough". Note that the only real-world example reported by @bboreham above was fixed by improving the hash calculation (when combining different hash values). |
What did you do?
After reading the code related to stringlabel, I found that there are a lot of hash function calls, and it seems that the errors caused by hash collisions are not considered.
For example in funcLabelReplace:
prometheus/promql/functions.go
Lines 1149 to 1201 in 031d22d
And there are many ContainsSameLabelset checks in the calculation of eval, here is also judged by hash, will there be the same problem?
prometheus/promql/value.go
Lines 232 to 249 in 031d22d
What did you expect to see?
No response
What did you see instead? Under which circumstances?
I'm not sure if it's really a problem, but I really don't see how collisions are handled
System information
No response
Prometheus version
No response
Prometheus configuration file
No response
Alertmanager version
No response
Alertmanager configuration file
No response
Logs
No response
The text was updated successfully, but these errors were encountered: