-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add matchers to LabelValues() call #8400
Add matchers to LabelValues() call #8400
Conversation
6012843
to
b333db9
Compare
This is related to cortexproject/cortex#3658 |
Hello, thanks for this pull requests. Do we have benchmarks covering this ? |
Thx for your comment. A benchmark to effectively compare the performance of Alternatively, I could just add a benchmark for the Which of these two options do you think would be preferable? Would you consider it critical to be able to compare the performance against |
Edit: Okay, I have taken a look at the code now, we are moving matcher from api to tsdb. |
You could still add benchmarks for the new code so we can know if it regresses in future, and for the sake of this PR I think it would suffice to hack together something locally to give a rough idea of the improvement (even just using curl by hand) - no need to check it in or anything. |
tsdb/head.go
Outdated
values := h.head.postings.LabelValues(name) | ||
return values, nil | ||
if len(matchers) == 0 { | ||
h.head.symMtx.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that should still be deferred ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason why i made it non-deferred is because that way we can release the lock earlier on :1666
if matchers have been specified.
there is a trade-off between the safety of using deferred unlocks (because no-one can forget to unlock) and reducing lock congestion by unlocking early when it is possible.
not sure which side of this trade-off would be preferable, since the TSDB is a pretty integral and optimized part of Prometheus in general I chose the latter, but I'm happy to change this back if you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not defering is only excluding a small part, and not really required for symMtx
since it's only used for creating series and these queries. We can defer this unlock.
I added benchmarks for the
I'll also come up with a curl-based benchmark to compare |
Curl-based benchmark I generated a
Then I made a Prometheus instance scrape that resulting
master
this branch
|
@roidelapluie @brian-brazil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM honestly, probably best if you have a quick look as tsdb maintainer @codesome
tsdb/block.go
Outdated
@@ -453,6 +502,11 @@ func (r blockIndexReader) Close() error { | |||
return nil | |||
} | |||
|
|||
// LabelValueFor returns value of given label of metric referred to by id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and other comments
// LabelValueFor returns value of given label of metric referred to by id. | |
// LabelValueFor returns value of given label value of series referred to by ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: there are similar comments like this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry about that, i updated the remaining ones ed64646
ed64646
to
cb090fb
Compare
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
cb090fb
to
125a8c2
Compare
FYI I re-based onto the latest |
tsdb/head.go
Outdated
values := h.head.postings.LabelValues(name) | ||
return values, nil | ||
if len(matchers) == 0 { | ||
h.head.symMtx.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not defering is only excluding a small part, and not really required for symMtx
since it's only used for creating series and these queries. We can defer this unlock.
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
3556e80
to
700df8f
Compare
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
fbfc8ad
to
dc2ebce
Compare
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
4110d36
to
aebe167
Compare
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there!
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2e10b00
to
ad8047f
Compare
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks! I will merge on green.
This change improves the performance of the label-values lookup when matchers are specified (API documented here). It does so by passing the matchers through to the index readers, which can implement the same functionality more efficiently.
This PR replaces (and is based on) #8361
I tested the performance difference by adding
1M
test metrics to a Prometheus instance.All metrics have a label
ninetyTen
, on900k
/90%
of them this label has the valuevalue1
, on the other100k
/10%
it has the valuevalue2
.All metrics also have a label
tenTen
, on the first100k
/10%
this label has the valuevalue0
, on the next100k
/10%
it has the valuevalue1
, etc such that each label value fromvalue0
-value9
matches100k
/10%
of the metrics.Then I query the label values of
tenTen
with the matcherninetyTen="value1"
, that matcher will match900k
/90%
of the metrics and the expected result set are thetenTen
valuesvalue0
-value8
. I ran that query10
times onmaster
and10
time on this branch.On
master
the latency range was3.1s
-3.3s
, on this branch the latency range was1.5s
-1.6s
.Note that this change might only make a relatively small difference for Prometheus itself, but it makes a big difference for the Cortex project once it also implements the
match
parameter on the label value API call, because without this change the Queriers would have to run a.Select()
when matchers are specified and extract the label values from the returned metrics, these metrics would first get transmitted via the network and then they'd get sorted and de-duplicated, when a large number of metrics matches the matchers this becomes very expensive/slow. With this change the Queriers won't need to fetch & sort & de-duplicate all the metrics, because they can directly call the.LabelValues()
method of theQuerier
interface and pass the matchers parameter.I believe that the Thanos project could benefit from this in a similar way.