Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add matchers to LabelValues() call #8400

Merged
merged 29 commits into from
Feb 9, 2021

Conversation

replay
Copy link
Contributor

@replay replay commented Jan 22, 2021

This change improves the performance of the label-values lookup when matchers are specified (API documented here). It does so by passing the matchers through to the index readers, which can implement the same functionality more efficiently.

This PR replaces (and is based on) #8361

I tested the performance difference by adding 1M test metrics to a Prometheus instance.
All metrics have a label ninetyTen, on 900k/90% of them this label has the value value1, on the other 100k/10% it has the value value2.
All metrics also have a label tenTen, on the first 100k/10% this label has the value value0, on the next 100k/10% it has the value value1, etc such that each label value from value0-value9 matches 100k/10% of the metrics.
Then I query the label values of tenTen with the matcher ninetyTen="value1", that matcher will match 900k/90% of the metrics and the expected result set are the tenTen values value0 - value8. I ran that query 10 times on master and 10 time on this branch.

time curl -g 'http://localhost:9090/api/v1/label/tenTen/values?match[]={ninetyTen="value1"}'

On master the latency range was 3.1s - 3.3s, on this branch the latency range was 1.5s - 1.6s.
Note that this change might only make a relatively small difference for Prometheus itself, but it makes a big difference for the Cortex project once it also implements the match parameter on the label value API call, because without this change the Queriers would have to run a .Select() when matchers are specified and extract the label values from the returned metrics, these metrics would first get transmitted via the network and then they'd get sorted and de-duplicated, when a large number of metrics matches the matchers this becomes very expensive/slow. With this change the Queriers won't need to fetch & sort & de-duplicate all the metrics, because they can directly call the .LabelValues() method of the Querier interface and pass the matchers parameter.
I believe that the Thanos project could benefit from this in a similar way.

@replay replay force-pushed the add_matchers_to_label_values_call branch from 6012843 to b333db9 Compare January 22, 2021 21:13
@replay replay changed the title [WIP] Add matchers to LabelValues() call Add matchers to LabelValues() call Jan 22, 2021
@replay
Copy link
Contributor Author

replay commented Jan 22, 2021

This is related to cortexproject/cortex#3658

@roidelapluie
Copy link
Member

Hello, thanks for this pull requests.

Do we have benchmarks covering this ?

@replay
Copy link
Contributor Author

replay commented Jan 22, 2021

Do we have benchmarks covering this ?

Thx for your comment.

A benchmark to effectively compare the performance of master against this branch would have to be in the api package because the master implementation of the matchers parameter is in the API, this benchmark would also need to generate a block and then instantiate a Querier for it.

Alternatively, I could just add a benchmark for the LabelValues() methods to show what their performance looks like. This would be relatively simple to do, but it would not be useful to compare the performance against master.

Which of these two options do you think would be preferable? Would you consider it critical to be able to compare the performance against master?

@roidelapluie
Copy link
Member

roidelapluie commented Jan 22, 2021

Wait, no, it is not needed. When I did look at first sight, it looked like this was a cortex only improvement. It turns out it is not, it is a different feature.

Edit: Okay, I have taken a look at the code now, we are moving matcher from api to tsdb.

@brian-brazil
Copy link
Contributor

You could still add benchmarks for the new code so we can know if it regresses in future, and for the sake of this PR I think it would suffice to hack together something locally to give a rough idea of the improvement (even just using curl by hand) - no need to check it in or anything.

tsdb/head.go Outdated
values := h.head.postings.LabelValues(name)
return values, nil
if len(matchers) == 0 {
h.head.symMtx.RUnlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should still be deferred ?

Copy link
Contributor Author

@replay replay Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason why i made it non-deferred is because that way we can release the lock earlier on :1666 if matchers have been specified.
there is a trade-off between the safety of using deferred unlocks (because no-one can forget to unlock) and reducing lock congestion by unlocking early when it is possible.
not sure which side of this trade-off would be preferable, since the TSDB is a pretty integral and optimized part of Prometheus in general I chose the latter, but I'm happy to change this back if you disagree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not defering is only excluding a small part, and not really required for symMtx since it's only used for creating series and these queries. We can defer this unlock.

@replay
Copy link
Contributor Author

replay commented Jan 25, 2021

You could still add benchmarks for the new code so we can know if it regresses in future

I added benchmarks for the LabelValues() methods on the blockIndexReader and the headIndexReader.
The benchmarks are doing the LabelValues() lookup with a set of matchers matching 900k metrics.

blockIndexReader

replay@nb-ubuntu:~/go/src/github.com/prometheus/prometheus$ go test  -run=^$ -bench '^(BenchmarkLabelValuesWithMatchers)$' 'github.com/prometheus/prometheus/tsdb'
goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/tsdb
BenchmarkLabelValuesWithMatchers-8          2    701878734 ns/op        20841696 B/op    1800062 allocs/op
PASS
ok      github.com/prometheus/prometheus/tsdb   32.349s

headIndexReader

replay@nb-ubuntu:~/go/src/github.com/prometheus/prometheus$ go test  -run=^$ -bench '^(BenchmarkHeadLabelValuesWithMatchers)$' 'github.com/prometheus/prometheus/tsdb'
goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/tsdb
BenchmarkHeadLabelValuesWithMatchers-8                 3         337562575 ns/op            3178 B/op         38 allocs/op
PASS
ok      github.com/prometheus/prometheus/tsdb   12.972s

I'll also come up with a curl-based benchmark to compare master against this branch and I'll post the details here.

@replay
Copy link
Contributor Author

replay commented Jan 26, 2021

Curl-based benchmark

I generated a metrics file using the following Python script. It generates 1'000'000 metrics, out of which the first 900'000 have the label ninetyTen="value1", the last 100'000 have ninetyTen="value2".
In addition each metric has a label tenTen, in the first 10% of metrics this label has the value value0, in the next 10% it has the value value1, etc up to value9.

#!/usr/bin/env python3

maxValue = 1000000
tenth = maxValue/10

for uniqueId in range(maxValue):
    labels = {
        'uniqueId': 'unique{uniqueId}'.format(uniqueId=uniqueId),
        'ninetyTen': 'value1' if uniqueId/tenth<9 else 'value2',
        'tenTen': 'value{bracket}'.format(bracket=int(uniqueId/tenth)),
    }
    print('test_metrics{' + ','.join(['{key}="{value}"'.format(key=key, value=value) for key, value in labels.items()]) + '} 123')

Then I made a Prometheus instance scrape that resulting metrics file and I left it running until it had generated 10 blocks with these 1'000'000 metrics.
Then I queried each of the two versions of Prometheus (master and this branch) 10 times with the following query, which looks up the label values of the label tenTen among all the metrics where ninetyTen="value1". The expected output is value0-value8 because the matchers will match the first 900'000 metrics but not the last 100'000:

curl -g 'http://localhost:9090/api/v1/label/tenTen/values?match[]={ninetyTen="value1"}'

master

$ for i in {0..9}; do (time curl -g 'http://localhost:9090/api/v1/label/tenTen/values?match[]={ninetyTen="value1"}') 2>&1 | grep -E '^real'; done
real    0m28.915s
real    0m27.260s
real    0m27.599s
real    0m28.259s
real    0m28.379s
real    0m27.413s
real    0m27.997s
real    0m27.612s
real    0m27.616s
real    0m28.042s

this branch

$ for i in {0..9}; do (time curl -g 'http://localhost:9090/api/v1/label/tenTen/values?match[]={ninetyTen="value1"}') 2>&1 | grep -E '^real'; done
real    0m9.268s
real    0m8.501s
real    0m9.185s
real    0m8.571s
real    0m8.463s
real    0m8.544s
real    0m9.403s
real    0m8.534s
real    0m8.562s
real    0m9.245s

@replay replay marked this pull request as ready for review January 26, 2021 19:31
@codesome codesome self-assigned this Jan 27, 2021
@replay
Copy link
Contributor Author

replay commented Feb 3, 2021

@roidelapluie @brian-brazil
Thanks for your comments above. This PR would be ready to review again, if you get a chance. I have posted the benchmarks which you asked for, please let me know if there's anything else I can do which would help to move this forward.

Copy link
Member

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM honestly, probably best if you have a quick look as tsdb maintainer @codesome

tsdb/block.go Outdated
@@ -453,6 +502,11 @@ func (r blockIndexReader) Close() error {
return nil
}

// LabelValueFor returns value of given label of metric referred to by id.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and other comments

Suggested change
// LabelValueFor returns value of given label of metric referred to by id.
// LabelValueFor returns value of given label value of series referred to by ID.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: there are similar comments like this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry about that, i updated the remaining ones ed64646

tsdb/index/index.go Outdated Show resolved Hide resolved
tsdb/querier_test.go Outdated Show resolved Hide resolved
@replay replay force-pushed the add_matchers_to_label_values_call branch 3 times, most recently from ed64646 to cb090fb Compare February 8, 2021 12:34
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
@replay replay force-pushed the add_matchers_to_label_values_call branch from cb090fb to 125a8c2 Compare February 8, 2021 12:40
@replay
Copy link
Contributor Author

replay commented Feb 8, 2021

FYI I re-based onto the latest master and force pushed

storage/merge_test.go Outdated Show resolved Hide resolved
tsdb/block.go Outdated Show resolved Hide resolved
tsdb/block.go Outdated Show resolved Hide resolved
tsdb/block.go Show resolved Hide resolved
tsdb/head.go Outdated
values := h.head.postings.LabelValues(name)
return values, nil
if len(matchers) == 0 {
h.head.symMtx.RUnlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not defering is only excluding a small part, and not really required for symMtx since it's only used for creating series and these queries. We can defer this unlock.

tsdb/head.go Outdated Show resolved Hide resolved
tsdb/index/index.go Outdated Show resolved Hide resolved
tsdb/index/index.go Outdated Show resolved Hide resolved
tsdb/index/index.go Outdated Show resolved Hide resolved
tsdb/index/index.go Show resolved Hide resolved
replay and others added 2 commits February 8, 2021 18:26
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
@replay replay force-pushed the add_matchers_to_label_values_call branch from 3556e80 to 700df8f Compare February 8, 2021 18:31
replay and others added 4 commits February 8, 2021 18:36
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
@replay replay force-pushed the add_matchers_to_label_values_call branch from fbfc8ad to dc2ebce Compare February 8, 2021 19:18
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
@replay replay force-pushed the add_matchers_to_label_values_call branch from 4110d36 to aebe167 Compare February 8, 2021 21:52
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

storage/merge_test.go Outdated Show resolved Hide resolved
tsdb/block.go Show resolved Hide resolved
web/api/v1/api.go Outdated Show resolved Hide resolved
replay and others added 2 commits February 9, 2021 09:44
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
@replay replay force-pushed the add_matchers_to_label_values_call branch from 2e10b00 to ad8047f Compare February 9, 2021 15:03
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks! I will merge on green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants