-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional matcher to IndexReader.LabelValues #3391
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks fine to me.
pkg/storegateway/bucket.go
Outdated
@@ -1963,7 +1964,7 @@ func checkNilPosting(l labels.Label, p index.Postings) index.Postings { | |||
} | |||
|
|||
// NOTE: Derived from tsdb.postingsForMatcher. index.Merge is equivalent to map duplication. | |||
func toPostingGroup(lvalsFn func(name string) ([]string, error), m *labels.Matcher) (*postingGroup, error) { | |||
func toPostingGroup(lvalsFn func(name string, filter func(string) bool) ([]string, error), m *labels.Matcher) (*postingGroup, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd suggest to extract func(name string, filter func(string) bool) ([]string, error)
into a separate type, to make it easier to see what is the type of lvalsFn
, as this entire function signature is getting difficult to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've extracted a labelValuesReader
interface that has only this function and now pass the indexHeader as an instance of that here: e13c20f
toRemove = append(toRemove, labels.Label{Name: m.Name, Value: val}) | ||
} | ||
vals, err := lvalsFn(m.Name, not(m.Matches)) | ||
toRemove := make([]labels.Label, len(vals)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we could skip this allocation if err
was non-nil, same below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline and will leave as is. The error case is very rare here (even though that's not the contract, with current implementation it would only happen if the block is corrupted), so we'd just clutter the code with an extra err check (plus the cpu cost!) for a very rare case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. I also agree with Peter's comments.
More often than not we retrieve label values to filter them using a matcher later. If we pass in the matcher as a filter function, we can avoid moving some extra values. This shows a -5% cpu improvement when this path is hit, and no change otherwise. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Instead of asking for an ad-hoc function, ask for an interface that has just the function we need. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2372186
to
e13c20f
Compare
My comments are non-blocking nits, feel free to ignore them. |
* Add optional matcher to IndexReader.LabelValues More often than not we retrieve label values to filter them using a matcher later. If we pass in the matcher as a filter function, we can avoid moving some extra values. This shows a -5% cpu improvement when this path is hit, and no change otherwise. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Cleaner toPostingsGroup signature Instead of asking for an ad-hoc function, ask for an interface that has just the function we need. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
What this PR does
More often than not we retrieve label values to filter them using a matcher later. If we pass in the matcher as a filter function, we can avoid moving some extra values.
This shows a -5% cpu improvement when this path is hit, and no change otherwise.
Before submitting this PR I tried to understand why is it really better?
I ran cpu profiling on the most improving case
BucketIndexReader_ExpandedPostings/n="1",i!="",j="foo"
on both versions, and zoomed in to thetoPostingsGroup
function:cpu on
main
cpu after this change
Since the extra time was clearly spent growing a slice, I validated two things:
First I thought that maybe inside of LabelValues we were not building enough space for the values:
https://github.com/grafana/mimir/blob/4d586af90d9c38cbe18a4b2cb0ede46b74a1e824/pkg/storegateway/indexheader/binary_reader.go#L884-L884
After double checking the logic itself, I just checked that this was not the case by adding a check at the end of that method and checking that it was not hit in the benchmarks:
Then I thought, okay, we're just growing the
toAdd
andtoRemove
intoPostingsGroup
, so maybe by preallocating those to their worst case would solve that?However, I benchmarked that comparing it to both this PR (the results below: old is this PR, new is
main
with just the preallocated slice) and to main (results not posted, not really interesting, but you can infer them), and while it's comparable and even improving in some cases, it is much worse in some other:I think this change makes sense and it's not making it much more complex, so we can go forward with it.
Which issue(s) this PR fixes or relates to
None, just was surfing the code.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]