metricvec: handle hash collision for labeled metrics #221

stevvooe · 2016-08-12T00:21:13Z

While hash collisions are quite rare, the current state of the client
library carries a risk of merging two separate label values into a
single metric bucket. The effects are near impossible to detect and the
result will be missing or incorrect counters.

This changeset handles hash collisions by falling back to collision
resolution if multiple label values hash to the same value. This works
similar to separate chaining using a slice. Extra storage is minimized
to only the value key slice to that metrics can be differentiated
within a bucket.

In general, the cost of handling collisions is completely minimized
under normal operation. Performance does show slight increases in
certain areas, but these are more likely statistically anomalies. More
importantly, zero allocation behavior for metrics is preserved on the
fast path. Minimal allocations may be made during collision handling but
this has minimal effect.

Benchmark comparisons with and without collision resolution follow.

benchmark                                                         old ns/op     new ns/op     delta
BenchmarkCounterWithLabelValues-4                                 99.0          107           +8.08%
BenchmarkCounterWithLabelValuesConcurrent-4                       79.6          91.0          +14.32%
BenchmarkCounterWithMappedLabels-4                                518           542           +4.63%
BenchmarkCounterWithPreparedMappedLabels-4                        127           137           +7.87%
BenchmarkCounterNoLabels-4                                        19.5          19.1          -2.05%
BenchmarkGaugeWithLabelValues-4                                   97.4          110           +12.94%
BenchmarkGaugeNoLabels-4                                          12.4          10.3          -16.94%
BenchmarkSummaryWithLabelValues-4                                 1204          915           -24.00%
BenchmarkSummaryNoLabels-4                                        936           847           -9.51%
BenchmarkHistogramWithLabelValues-4                               147           147           +0.00%
BenchmarkHistogramNoLabels-4                                      50.6          49.3          -2.57%
BenchmarkHistogramObserve1-4                                      37.9          37.5          -1.06%
BenchmarkHistogramObserve2-4                                      122           137           +12.30%
BenchmarkHistogramObserve4-4                                      310           352           +13.55%
BenchmarkHistogramObserve8-4                                      691           729           +5.50%
BenchmarkHistogramWrite1-4                                        3374          3097          -8.21%
BenchmarkHistogramWrite2-4                                        5310          5051          -4.88%
BenchmarkHistogramWrite4-4                                        12094         10690         -11.61%
BenchmarkHistogramWrite8-4                                        19416         17755         -8.55%
BenchmarkHandler-4                                                11934304      13765894      +15.35%
BenchmarkSummaryObserve1-4                                        1119          1105          -1.25%
BenchmarkSummaryObserve2-4                                        3679          3430          -6.77%
BenchmarkSummaryObserve4-4                                        10678         7982          -25.25%
BenchmarkSummaryObserve8-4                                        22974         16689         -27.36%
BenchmarkSummaryWrite1-4                                          25962         14680         -43.46%
BenchmarkSummaryWrite2-4                                          38019         35073         -7.75%
BenchmarkSummaryWrite4-4                                          78027         56816         -27.18%
BenchmarkSummaryWrite8-4                                          117220        132248        +12.82%
BenchmarkMetricVecWithLabelValuesBasic-4                          138           133           -3.62%
BenchmarkMetricVecWithLabelValues2Keys10ValueCardinality-4        150           144           -4.00%
BenchmarkMetricVecWithLabelValues4Keys10ValueCardinality-4        263           256           -2.66%
BenchmarkMetricVecWithLabelValues2Keys100ValueCardinality-4       145           155           +6.90%
BenchmarkMetricVecWithLabelValues10Keys100ValueCardinality-4      606           634           +4.62%
BenchmarkMetricVecWithLabelValues10Keys1000ValueCardinality-4     746           641           -14.08%

benchmark                                                         old allocs     new allocs     delta
BenchmarkCounterWithLabelValues-4                                 0              0              +0.00%
BenchmarkCounterWithLabelValuesConcurrent-4                       0              0              +0.00%
BenchmarkCounterWithMappedLabels-4                                2              2              +0.00%
BenchmarkCounterWithPreparedMappedLabels-4                        0              0              +0.00%
BenchmarkCounterNoLabels-4                                        0              0              +0.00%
BenchmarkGaugeWithLabelValues-4                                   0              0              +0.00%
BenchmarkGaugeNoLabels-4                                          0              0              +0.00%
BenchmarkSummaryWithLabelValues-4                                 0              0              +0.00%
BenchmarkSummaryNoLabels-4                                        0              0              +0.00%
BenchmarkHistogramWithLabelValues-4                               0              0              +0.00%
BenchmarkHistogramNoLabels-4                                      0              0              +0.00%
BenchmarkMetricVecWithLabelValuesBasic-4                          0              0              +0.00%
BenchmarkMetricVecWithLabelValues2Keys10ValueCardinality-4        0              0              +0.00%
BenchmarkMetricVecWithLabelValues4Keys10ValueCardinality-4        0              0              +0.00%
BenchmarkMetricVecWithLabelValues2Keys100ValueCardinality-4       0              0              +0.00%
BenchmarkMetricVecWithLabelValues10Keys100ValueCardinality-4      0              0              +0.00%
BenchmarkMetricVecWithLabelValues10Keys1000ValueCardinality-4     0              0              +0.00%

benchmark                                                         old bytes     new bytes     delta
BenchmarkCounterWithLabelValues-4                                 0             0             +0.00%
BenchmarkCounterWithLabelValuesConcurrent-4                       0             0             +0.00%
BenchmarkCounterWithMappedLabels-4                                336           336           +0.00%
BenchmarkCounterWithPreparedMappedLabels-4                        0             0             +0.00%
BenchmarkCounterNoLabels-4                                        0             0             +0.00%
BenchmarkGaugeWithLabelValues-4                                   0             0             +0.00%
BenchmarkGaugeNoLabels-4                                          0             0             +0.00%
BenchmarkSummaryWithLabelValues-4                                 0             0             +0.00%
BenchmarkSummaryNoLabels-4                                        0             0             +0.00%
BenchmarkHistogramWithLabelValues-4                               0             0             +0.00%
BenchmarkHistogramNoLabels-4                                      0             0             +0.00%
BenchmarkMetricVecWithLabelValuesBasic-4                          0             0             +0.00%
BenchmarkMetricVecWithLabelValues2Keys10ValueCardinality-4        0             0             +0.00%
BenchmarkMetricVecWithLabelValues4Keys10ValueCardinality-4        0             0             +0.00%
BenchmarkMetricVecWithLabelValues2Keys100ValueCardinality-4       0             0             +0.00%
BenchmarkMetricVecWithLabelValues10Keys100ValueCardinality-4      0             0             +0.00%
BenchmarkMetricVecWithLabelValues10Keys1000ValueCardinality-4     0             0             +0.00%

Signed-off-by: Stephen J Day stephen.day@docker.com

Closes #220

grobie · 2016-08-12T01:03:33Z

prometheus/vec.go

+	if ok && len(metrics) == 1 {
+		return metrics[0].metric, nil
 	}
+	m.mtx.RUnlock()


How does the lock get released in case of the happy path above?

stevvooe · 2016-08-12T23:23:58Z

Take a peak 9b464a4. It provides some further fixes around collision handling and greatly increases the unit test coverage.

Zero-alloc behavior is preserved at the cost of some performance.

grobie · 2016-08-13T02:17:17Z

prometheus/vec.go

+// deleteByHash removes the metric from the hash bucket h. If there are
+// multiple matches in the bucket, use lvs to select a metric and remove only
+// that metric.
+func (m *MetricVec) deleteByHash(h uint64, values interface{}) bool {


I'm not really a fan of interface type arguments. We're losing the advantages of the type system. Is this really necessary?

To maintain correctness and zero-alloc behavior without repeating code, it is necessary. Any errors are caught during test time.

What is the issue? This is private code. Are you just afraid?

We could, of course, repeat the code. Go isn't really afraid of that. But in this case, it would be a lot of code. I'm fine with the interface usage as it avoids the allocation in L117 of the old code.

It might be good, though, to document the requirements for values at the top of the call stack (even if it is an unexported method). I'll leave a comment where I think it should be documented.

You're right @stevvooe, I couldn't come up with a typed implementation either without causing more allocations. We'll need to wait until golang's escape analysis becomes better.

@grobie @beorn7 Just made a polymorphic implementation and confirmed we get allocations.

Up to you, but it might be better to allow allocations and wait for better escape analysis. It would avoid the needless complexity in this code. A simple string slice header allocation on each lookup would let us delete a massive swath of code.

I guess we're more interested in as little as possible overhead of the client than a nice and short code base. @beorn7 is the gatekeeper of this repository though.

Yes, avoiding allocations in this code path is essential. It is very likely to be called very frequently by users.

Current code is the best solution, I guess.

beorn7 · 2016-08-15T11:34:09Z

Thanks @stevvooe for the PR, thanks @grobie for the review. I'm back from my travels and will have a look ASAP. Zero-allocation and benchmarks look great.

grobie · 2016-08-15T16:53:57Z

Note: I did not do a full review. Only commented on two things I saw while on mobile.

beorn7 · 2016-08-15T17:00:01Z

Spotting dead locks while looking at code on his mobile phone — that's Super-Grobie's super power...

Signed-off-by: Stephen J Day <stephen.day@docker.com>

While hash collisions are quite rare, the current state of the client library carries a risk of merging two separate label values into a single metric bucket. The effects are near impossible to detect and the result will be missing or incorrect counters. This changeset handles hash collisions by falling back to collision resolution if multiple label values hash to the same value. This works similar to separate chaining using a slice. Extra storage is minimized to only the value key slice to that metrics can be differentiated within a bucket. In general, the cost of handling collisions is completely minimized under normal operation. Performance does show slight increases in certain areas, but these are more likely statistically anomalies. More importantly, zero allocation behavior for metrics is preserved on the fast path. Minimal allocations may be made during collision handling but this has minimal effect. Benchmark comparisons with and without collision resolution follow. ``` benchmark old ns/op new ns/op delta BenchmarkCounterWithLabelValues-4 99.0 107 +8.08% BenchmarkCounterWithLabelValuesConcurrent-4 79.6 91.0 +14.32% BenchmarkCounterWithMappedLabels-4 518 542 +4.63% BenchmarkCounterWithPreparedMappedLabels-4 127 137 +7.87% BenchmarkCounterNoLabels-4 19.5 19.1 -2.05% BenchmarkGaugeWithLabelValues-4 97.4 110 +12.94% BenchmarkGaugeNoLabels-4 12.4 10.3 -16.94% BenchmarkSummaryWithLabelValues-4 1204 915 -24.00% BenchmarkSummaryNoLabels-4 936 847 -9.51% BenchmarkHistogramWithLabelValues-4 147 147 +0.00% BenchmarkHistogramNoLabels-4 50.6 49.3 -2.57% BenchmarkHistogramObserve1-4 37.9 37.5 -1.06% BenchmarkHistogramObserve2-4 122 137 +12.30% BenchmarkHistogramObserve4-4 310 352 +13.55% BenchmarkHistogramObserve8-4 691 729 +5.50% BenchmarkHistogramWrite1-4 3374 3097 -8.21% BenchmarkHistogramWrite2-4 5310 5051 -4.88% BenchmarkHistogramWrite4-4 12094 10690 -11.61% BenchmarkHistogramWrite8-4 19416 17755 -8.55% BenchmarkHandler-4 11934304 13765894 +15.35% BenchmarkSummaryObserve1-4 1119 1105 -1.25% BenchmarkSummaryObserve2-4 3679 3430 -6.77% BenchmarkSummaryObserve4-4 10678 7982 -25.25% BenchmarkSummaryObserve8-4 22974 16689 -27.36% BenchmarkSummaryWrite1-4 25962 14680 -43.46% BenchmarkSummaryWrite2-4 38019 35073 -7.75% BenchmarkSummaryWrite4-4 78027 56816 -27.18% BenchmarkSummaryWrite8-4 117220 132248 +12.82% BenchmarkMetricVecWithLabelValuesBasic-4 138 133 -3.62% BenchmarkMetricVecWithLabelValues2Keys10ValueCardinality-4 150 144 -4.00% BenchmarkMetricVecWithLabelValues4Keys10ValueCardinality-4 263 256 -2.66% BenchmarkMetricVecWithLabelValues2Keys100ValueCardinality-4 145 155 +6.90% BenchmarkMetricVecWithLabelValues10Keys100ValueCardinality-4 606 634 +4.62% BenchmarkMetricVecWithLabelValues10Keys1000ValueCardinality-4 746 641 -14.08% benchmark old allocs new allocs delta BenchmarkCounterWithLabelValues-4 0 0 +0.00% BenchmarkCounterWithLabelValuesConcurrent-4 0 0 +0.00% BenchmarkCounterWithMappedLabels-4 2 2 +0.00% BenchmarkCounterWithPreparedMappedLabels-4 0 0 +0.00% BenchmarkCounterNoLabels-4 0 0 +0.00% BenchmarkGaugeWithLabelValues-4 0 0 +0.00% BenchmarkGaugeNoLabels-4 0 0 +0.00% BenchmarkSummaryWithLabelValues-4 0 0 +0.00% BenchmarkSummaryNoLabels-4 0 0 +0.00% BenchmarkHistogramWithLabelValues-4 0 0 +0.00% BenchmarkHistogramNoLabels-4 0 0 +0.00% BenchmarkMetricVecWithLabelValuesBasic-4 0 0 +0.00% BenchmarkMetricVecWithLabelValues2Keys10ValueCardinality-4 0 0 +0.00% BenchmarkMetricVecWithLabelValues4Keys10ValueCardinality-4 0 0 +0.00% BenchmarkMetricVecWithLabelValues2Keys100ValueCardinality-4 0 0 +0.00% BenchmarkMetricVecWithLabelValues10Keys100ValueCardinality-4 0 0 +0.00% BenchmarkMetricVecWithLabelValues10Keys1000ValueCardinality-4 0 0 +0.00% benchmark old bytes new bytes delta BenchmarkCounterWithLabelValues-4 0 0 +0.00% BenchmarkCounterWithLabelValuesConcurrent-4 0 0 +0.00% BenchmarkCounterWithMappedLabels-4 336 336 +0.00% BenchmarkCounterWithPreparedMappedLabels-4 0 0 +0.00% BenchmarkCounterNoLabels-4 0 0 +0.00% BenchmarkGaugeWithLabelValues-4 0 0 +0.00% BenchmarkGaugeNoLabels-4 0 0 +0.00% BenchmarkSummaryWithLabelValues-4 0 0 +0.00% BenchmarkSummaryNoLabels-4 0 0 +0.00% BenchmarkHistogramWithLabelValues-4 0 0 +0.00% BenchmarkHistogramNoLabels-4 0 0 +0.00% BenchmarkMetricVecWithLabelValuesBasic-4 0 0 +0.00% BenchmarkMetricVecWithLabelValues2Keys10ValueCardinality-4 0 0 +0.00% BenchmarkMetricVecWithLabelValues4Keys10ValueCardinality-4 0 0 +0.00% BenchmarkMetricVecWithLabelValues2Keys100ValueCardinality-4 0 0 +0.00% BenchmarkMetricVecWithLabelValues10Keys100ValueCardinality-4 0 0 +0.00% BenchmarkMetricVecWithLabelValues10Keys1000ValueCardinality-4 0 0 +0.00% ``` Signed-off-by: Stephen J Day <stephen.day@docker.com>

beorn7 · 2016-08-16T12:16:34Z

prometheus/vec.go

+// getOrCreateMetric retrieves the metric by hash and label value or creates it
+// and returns the new one.
+//
+// This function takes a lock, so avoid


Unfinished sentence.

And here is the other entry point where the requirements for interface{} should be documented.

Unfinished sentence.

Woops. Was trying to get this pushed for review.

beorn7 · 2016-08-16T12:56:41Z

Looks good. Just a few naming and comment nits.

I'll try to dig out a fnv64a hash collision.

beorn7 · 2016-08-16T13:31:39Z

77kepQFQ8Kl and !0IC=VloaY both have a hash value of 0. You can use them in the test as label values so that you don't need to inject the hashing function anymore.

beorn7 · 2016-08-16T13:36:28Z

prometheus/vec.go

 	h := hashNew()
 	for _, val := range vals {
-		h = hashAdd(h, val)
+		h = m.hashAdd(h, val)


This should actually be followed by hashAddByte(h, separatorByte) (this is a bug in the original code already, hash collisions could be created with label values like {"foo", "bar"} and {"foob", "ar"}).

Good catch.

I'll use the 0x00 to keep sort order of tuple in case we decide to optimize this differently.

separatorByte (= 0xFF) has the advantage that it is not a valid byte value anywhere in a UTF-8 string, while 0x00 is (I believe). It is also consistently used in other hashing in Prometheus code.

Since the hashing is completely internal here, it doesn't really matter. Even if somebody constructed a collision, it doesn't matter anymore as you have implemented the detection here. Still I'd prefer consistency.

separatorByte (= 0xFF) has the advantage that it is not a valid byte value anywhere in a UTF-8 string, while 0x00 is (I believe). It is also consistently used in other hashing in Prometheus code.

The main problem there is that if you're debugging and print in the wrong place, things explode. Also, if you ever want to use these keys to binary search, which may make sense in certain cases, you'll have to generate one with a different separator that doesn't mess with tuple-oriented sort order.

0x00 separated records are near standard. I'm really surprised that one would go with anything else. Where else in the code base is this used?

This is about hashing, not about tuples. Only add this character while creating the hash, nowhere else.

https://github.com/prometheus/common/blob/master/model/signature.go

Good sir, we are hashing tuples.

For certain definition of tuples, certainly. But we are not modifying the "tuple". The separator byte is only added during hashing, there is no way it will ever be printed or used in sorting.
The null byte is a valid UTF-8 character. I could put it legally into a label value. The 0xFF character can never be part of a valid UTF-8 string. That qualifies it as a good separator character during hashing.

The 0xFF character can never be part of a valid UTF-8 string. That qualifies it as a good separator character during hashing.

I don't see this link but I've used the constant model.SeparatorByte.

To give you context, we had a sort error in the docker registry API due to the use of a / that sorts earlier than a -. Changing to 0x00 gives you expected, lexical sort order.

stevvooe · 2016-08-16T18:15:21Z

77kepQFQ8Kl and !0IC=VloaY both have a hash value of 0. You can use them in the test as label values so that you don't need to inject the hashing function anymore.

Thanks for hunting this down, but this won't test the 2+N search case. That extra test caught a number of bugs in both the new and original implementation.

beorn7 · 2016-08-16T20:10:18Z

More strings with hash value 0:

=. hx"iX<;
QvXtM%gt;@Fp%
_"kWk=-v$c

See http://www.isthe.com/chongo/tech/comp/fnv/

After increasing unit test coverage, it was found that the split function call nature of metric matching wasn't working well in many cases. By increasing test coverage, we've ensured that both the fast path and fallback collision path are working appropriately. With these changes, there is a further performance hit, but now the results are ensured to be correct. Signed-off-by: Stephen J Day <stephen.day@docker.com>

stevvooe · 2016-08-17T01:40:08Z

Most commentary is dealt with. I have not added the new collision tests yet, as I don't see a way to get the coverage needed. If you're fine with getting rid of the extra coverage in, I'll re-tool to use the provided values. If we want to keep it, I'll add a separate test with the colliding values.

beorn7 · 2016-08-17T07:36:47Z

Tests are fine. I'm merging this now, as I want to have it int the release planned for today.

Feel free to add a test with a true collision in a separate PR. I can also do it myself.

Thanks for your contribution.

beorn7 · 2016-08-17T10:24:40Z

Only now I get it: the benchmarks in the commit description are superseeded by the following commit.

We indeed have to rework to have no allocations in the WithLabelValues case.
But don't worry, I'm already working on a solution.
(And I have also added a test with a "real" collision in #225.)

Thanks again.

stevvooe · 2016-08-17T18:54:59Z

@beorn7 Alright, thanks for taking the PR.

Let me know if you need anything else.

grobie reviewed Aug 12, 2016
View reviewed changes

stevvooe force-pushed the benchmark-with-labels-metric-vec branch from e8e3d37 to 1ac1158 Compare August 12, 2016 01:54

grobie reviewed Aug 13, 2016
View reviewed changes

stevvooe force-pushed the benchmark-with-labels-metric-vec branch from 9b464a4 to db87834 Compare August 15, 2016 23:46

stevvooe added 2 commits August 15, 2016 16:47

benchmark: measure label resolution in MetricVec

c4004ef

Signed-off-by: Stephen J Day <stephen.day@docker.com>

stevvooe force-pushed the benchmark-with-labels-metric-vec branch from db87834 to 0992f45 Compare August 15, 2016 23:48

beorn7 reviewed Aug 16, 2016
View reviewed changes

beorn7 mentioned this pull request Aug 16, 2016

Vendoring + cross compilation = missing procfs.Proc.NewStat #206

Closed

stevvooe force-pushed the benchmark-with-labels-metric-vec branch from 0992f45 to 27c4f00 Compare August 17, 2016 01:32

stevvooe force-pushed the benchmark-with-labels-metric-vec branch from 27c4f00 to 4db77b0 Compare August 17, 2016 01:34

beorn7 merged commit dadfef8 into prometheus:master Aug 17, 2016

stevvooe deleted the benchmark-with-labels-metric-vec branch August 17, 2016 18:55

juliusv mentioned this pull request Aug 15, 2025

Use Set hash in Distinct (2nd attempt) open-telemetry/opentelemetry-go#7175

Merged

metricvec: handle hash collision for labeled metrics #221

metricvec: handle hash collision for labeled metrics #221

Uh oh!

Conversation

stevvooe commented Aug 12, 2016

Uh oh!

grobie Aug 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevvooe commented Aug 12, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beorn7 commented Aug 15, 2016

Uh oh!

grobie commented Aug 15, 2016

Uh oh!

beorn7 commented Aug 15, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beorn7 commented Aug 16, 2016

Uh oh!

beorn7 commented Aug 16, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevvooe commented Aug 16, 2016

Uh oh!

beorn7 commented Aug 16, 2016

Uh oh!

stevvooe commented Aug 17, 2016

Uh oh!

beorn7 commented Aug 17, 2016

Uh oh!

beorn7 commented Aug 17, 2016

Uh oh!

stevvooe commented Aug 17, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

grobie Aug 12, 2016 •

edited

Loading