Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash collision between labels will lead to incorrect results #12519

Open
wzharies opened this issue Jul 4, 2023 · 2 comments
Open

Hash collision between labels will lead to incorrect results #12519

wzharies opened this issue Jul 4, 2023 · 2 comments

Comments

@wzharies
Copy link

wzharies commented Jul 4, 2023

What did you do?

After reading the code related to stringlabel, I found that there are a lot of hash function calls, and it seems that the errors caused by hash collisions are not considered.
For example in funcLabelReplace:

prometheus/promql/functions.go

Lines 1149 to 1201 in 031d22d

func funcLabelReplace(vals []parser.Value, args parser.Expressions, enh *EvalNodeHelper) Vector {
var (
vector = vals[0].(Vector)
dst = stringFromArg(args[1])
repl = stringFromArg(args[2])
src = stringFromArg(args[3])
regexStr = stringFromArg(args[4])
)
if enh.regex == nil {
var err error
enh.regex, err = regexp.Compile("^(?:" + regexStr + ")$")
if err != nil {
panic(fmt.Errorf("invalid regular expression in label_replace(): %s", regexStr))
}
if !model.LabelNameRE.MatchString(dst) {
panic(fmt.Errorf("invalid destination label name in label_replace(): %s", dst))
}
enh.Dmn = make(map[uint64]labels.Labels, len(enh.Out))
}
for _, el := range vector {
h := el.Metric.Hash()
var outMetric labels.Labels
if l, ok := enh.Dmn[h]; ok {
outMetric = l
} else {
srcVal := el.Metric.Get(src)
indexes := enh.regex.FindStringSubmatchIndex(srcVal)
if indexes == nil {
// If there is no match, no replacement should take place.
outMetric = el.Metric
enh.Dmn[h] = outMetric
} else {
res := enh.regex.ExpandString([]byte{}, repl, srcVal, indexes)
lb := labels.NewBuilder(el.Metric).Del(dst)
if len(res) > 0 {
lb.Set(dst, string(res))
}
outMetric = lb.Labels()
enh.Dmn[h] = outMetric
}
}
enh.Out = append(enh.Out, Sample{
Metric: outMetric,
F: el.F,
H: el.H,
})
}
return enh.Out
}

And there are many ContainsSameLabelset checks in the calculation of eval, here is also judged by hash, will there be the same problem?

prometheus/promql/value.go

Lines 232 to 249 in 031d22d

func (vec Vector) ContainsSameLabelset() bool {
switch len(vec) {
case 0, 1:
return false
case 2:
return vec[0].Metric.Hash() == vec[1].Metric.Hash()
default:
l := make(map[uint64]struct{}, len(vec))
for _, ss := range vec {
hash := ss.Metric.Hash()
if _, ok := l[hash]; ok {
return true
}
l[hash] = struct{}{}
}
return false
}
}

What did you expect to see?

No response

What did you see instead? Under which circumstances?

I'm not sure if it's really a problem, but I really don't see how collisions are handled

System information

No response

Prometheus version

No response

Prometheus configuration file

No response

Alertmanager version

No response

Alertmanager configuration file

No response

Logs

No response

@bboreham
Copy link
Member

bboreham commented Oct 24, 2023

You're correct: there are a number of places in Prometheus where Labels hash collisions are not handled.

They seem to be quite rare in the wild; #5724 is the only example I can see reported here. By "birthday problem" math with 6 million series there is a 1 in a million chance of a collision.

Working on #12993 gave me the idea that we could produce a 256-bit hash for every series, then use that everywhere that the 64-bit hash is currently used. Collisions would be astronomically unlikely.

A (likely incomplete) list of places Labels.Hash() is used without any protection against collisions:

  • TSDB, to find a series in the head block. - this one does handle collisions.
  • In the PromQL engine, collecting the results of a range query.
  • ContainsSameLabelset, to check output of a PromQL subexpression.
  • DropMetricName, used by most PromQL functions.

There are also hashes of subsets of labels, which would not be helped by pre-computing a 256-bit hash.

  • Grouping in PromQL aggregations like sum by (foo, bar)

@beorn7
Copy link
Member

beorn7 commented Oct 24, 2023

I must admit that the untreated hash collisions are still giving me the creeps.

My main concern is that we often use fast non-cryptographic hashes, which are probably much farther away from perfect pseudo-random distribution of hash values than cryptographic hashes. Pardon me if I use the wrong terminology here (I'm not a proper statistician), but the gist is that the probability of a collision is much higher than calculated via the birthday problem math if the hash values are correlated with the original values rather than randomly and equally distributed.

In prometheus/prometheus, we use xxHash, which is probably better than fnv64a, which we use in prometheus/client_golang. (See prometheus/client_golang#220 for the hash collision handling added in client_golang eventually, but I'm not so sure anymore it deals with all cases properly.)

While properly dealing with all variations of hash collisions in a performance-preserving way would be cool, a sufficiently low chance of hash collisions is also "good enough". Note that the only real-world example reported by @bboreham above was fixed by improving the hash calculation (when combining different hash values).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants