-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharding optimizations I: AST mapping #1846
Conversation
9427ea3
to
3bf5df1
Compare
3bf5df1
to
5356361
Compare
Co-Authored-By: Cyril Tovena <cyril.tovena@gmail.com>
return nil, err | ||
} | ||
|
||
func rangeAggEvaluator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love those changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if evaluator is big enough to become a package. Just a thought, not action required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it's a good idea, let's see where everything settles and then do some refactoring
Codecov Report
@@ Coverage Diff @@
## master #1846 +/- ##
==========================================
+ Coverage 64.86% 64.93% +0.06%
==========================================
Files 122 125 +3
Lines 9239 9391 +152
==========================================
+ Hits 5993 6098 +105
- Misses 2833 2867 +34
- Partials 413 426 +13
|
switch e := expr.(type) { | ||
case *literalExpr: | ||
return e, nil | ||
case *matchersExpr, *filterExpr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue marchersExpr
requires shard mapping. Those will just increase the amount of data we process with their overhead, this is because the result is limited and it will always be super fast to answer unless there is a lot streams in the query.
However here it seems that the function is recursive so in case of metrics rate({app="foo"}[5m])
we probably want sharding mapping.
It's also difficult to imagine because I have yet to discover where will the parallelization take place. So I'll need more to judge on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll think about how to more effectively short circuit this rather than relying on the frontend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What
This PR starts implementing shard-based query optimizations. In essence, this takes advantage of the v9 schema onwards which splits storage across
n
shards. For example,sum(rate({foo="bar"} |="id=123" [1m]))
maps intowhere
downstream<x>
indicates that it should be executed independently on a downstream querier andx ++ y
indicates that the resulting vectors should be concatenated.This PR hinges on the new interface
which maps one AST into a functionally equivalent AST. We use this to map incoming queries into versions which are more parallelizable.
Remaining Work
I'm splitting my work into a few different PRs to lighten the cognitive load while reviewing. These will be
Inspiration
cortexproject/cortex#1878
Misc
There is a bit of
Evaluator
refactoring in this PR which isn't technically part of the AST mapping, but it was burdensome to revert the changes here and they aren't too obtrusive.