Sharding optimizations I: AST mapping #1846

owen-d · 2020-03-24T14:57:43Z

What

This PR starts implementing shard-based query optimizations. In essence, this takes advantage of the v9 schema onwards which splits storage across n shards. For example,

sum(rate({foo="bar"} |="id=123" [1m])) maps into

sum(
  downstream<sum(rate(({foo="bar"}|="id=123")[1m])), shard=0_of_2> ++ 
  downstream<sum(rate(({foo="bar"}|="id=123")[1m])), shard=1_of_2>
)

where downstream<x> indicates that it should be executed independently on a downstream querier and x ++ y indicates that the resulting vectors should be concatenated.

This PR hinges on the new interface

// ASTMapper is the exported interface for mapping between multiple AST representations
type ASTMapper interface {
	Map(Expr) (Expr, error)
}

which maps one AST into a functionally equivalent AST. We use this to map incoming queries into versions which are more parallelizable.

Remaining Work

I'm splitting my work into a few different PRs to lighten the cognitive load while reviewing. These will be

AST mapping PR (current pr). This won’t expose options or anything, but will just add astmapping & related testing for sharding.
some internal pkg refactoring to make way for (3). This also won’t change anything but internal organization.
code which wires together & downstreams sharded queries + options to enable sharding. This one will activate the code paths.

Inspiration

cortexproject/cortex#1878

Misc

There is a bit of Evaluator refactoring in this PR which isn't technically part of the AST mapping, but it was burdensome to revert the changes here and they aren't too obtrusive.

pkg/logql/evaluator.go

Co-Authored-By: Cyril Tovena <cyril.tovena@gmail.com>

cyriltovena · 2020-03-26T17:30:53Z

pkg/logql/evaluator.go

-		return nil, err
-	}
-
+func rangeAggEvaluator(


I love those changes.

I wonder if evaluator is big enough to become a package. Just a thought, not action required.

yeah, it's a good idea, let's see where everything settles and then do some refactoring

codecov-io · 2020-03-26T17:36:09Z

Codecov Report

Merging #1846 into master will increase coverage by 0.06%.
The diff coverage is 76.37%.

@@            Coverage Diff             @@
##           master    #1846      +/-   ##
==========================================
+ Coverage   64.86%   64.93%   +0.06%     
==========================================
  Files         122      125       +3     
  Lines        9239     9391     +152     
==========================================
+ Hits         5993     6098     +105     
- Misses       2833     2867      +34     
- Partials      413      426      +13

Impacted Files	Coverage Δ
pkg/logql/astmapper.go	`0.00% <0.00%> (ø)`
pkg/logql/ast.go	`88.40% <55.55%> (-0.54%)`	⬇️
pkg/logql/shardmapper.go	`77.41% <77.41%> (ø)`
pkg/logql/evaluator.go	`90.69% <83.33%> (-1.12%)`	⬇️
pkg/logql/engine.go	`90.06% <100.00%> (ø)`
pkg/logql/sharding.go	`100.00% <100.00%> (ø)`
pkg/promtail/targets/tailer.go	`73.86% <0.00%> (-4.55%)`	⬇️
pkg/promtail/targets/filetarget.go	`68.71% <0.00%> (-1.85%)`	⬇️
... and 3 more

pkg/logql/shardmapper.go

cyriltovena · 2020-03-26T17:54:06Z

pkg/logql/shardmapper.go

+	switch e := expr.(type) {
+	case *literalExpr:
+		return e, nil
+	case *matchersExpr, *filterExpr:


I'd argue marchersExpr requires shard mapping. Those will just increase the amount of data we process with their overhead, this is because the result is limited and it will always be super fast to answer unless there is a lot streams in the query.

However here it seems that the function is recursive so in case of metrics rate({app="foo"}[5m]) we probably want sharding mapping.

It's also difficult to imagine because I have yet to discover where will the parallelization take place. So I'll need more to judge on that.

Good point. I'll think about how to more effectively short circuit this rather than relying on the frontend.

pkg/logql/shardmapper.go

cyriltovena

LGTM

pull-request-size bot added the size/XXL label Mar 24, 2020

owen-d requested a review from cyriltovena March 24, 2020 14:59

owen-d force-pushed the feature/querysharding branch from 9427ea3 to 3bf5df1 Compare March 25, 2020 21:48

owen-d added 20 commits March 25, 2020 17:54

[wip] sharding evaluator/ast

e4fe8a1

[wip] continues experimenting with ast mapping

1471d0a

refactoring in preparation for binops

8260bc3

evaluators can pass state to other evaluators

b71472b

compiler alignment

5425630

Evaluator method renamed to StepEvaluator

632aecf

chained evaluator impl

ec6ff83

tidying up sharding code

e15ed28

handling for ConcatSampleExpr

2565ca6

downstream iterator

52e1cfa

structure for downstreaming asts

ff55f63

outlines sharding optimizations

df5a69a

work on sharding mapper

ec050ac

ast sharding optimizations

ff1ca2b

test for different logrange positions

6a3e860

shard mapper tests

2657ab0

stronger ast sharding & tests

84d64b9

shardmapper tests for string->string

0f234cf

removes sharding evaluator code

3743f37

removes unused ctx arg

5356361

owen-d force-pushed the feature/querysharding branch from 3bf5df1 to 5356361 Compare March 25, 2020 21:55

cyriltovena reviewed Mar 26, 2020

View reviewed changes

pkg/logql/evaluator.go Outdated Show resolved Hide resolved

Update pkg/logql/evaluator.go

eee8798

Co-Authored-By: Cyril Tovena <cyril.tovena@gmail.com>

cyriltovena reviewed Mar 26, 2020

View reviewed changes

pkg/logql/shardmapper.go Show resolved Hide resolved

cyriltovena reviewed Mar 26, 2020

View reviewed changes

pkg/logql/shardmapper.go Show resolved Hide resolved

cyriltovena approved these changes Mar 26, 2020

View reviewed changes

owen-d merged commit 7effeec into grafana:master Mar 27, 2020

owen-d deleted the feature/querysharding branch March 27, 2020 14:28

owen-d mentioned this pull request Apr 16, 2020

Feature/querysharding ii #1927

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharding optimizations I: AST mapping #1846

Sharding optimizations I: AST mapping #1846

owen-d commented Mar 24, 2020

cyriltovena Mar 26, 2020

cyriltovena Mar 26, 2020

owen-d Mar 26, 2020

codecov-io commented Mar 26, 2020

cyriltovena Mar 26, 2020 •

edited

Loading

owen-d Mar 27, 2020

cyriltovena left a comment

Sharding optimizations I: AST mapping #1846

Sharding optimizations I: AST mapping #1846

Conversation

owen-d commented Mar 24, 2020

What

Remaining Work

Inspiration

Misc

cyriltovena Mar 26, 2020

Choose a reason for hiding this comment

cyriltovena Mar 26, 2020

Choose a reason for hiding this comment

owen-d Mar 26, 2020

Choose a reason for hiding this comment

codecov-io commented Mar 26, 2020

Codecov Report

cyriltovena Mar 26, 2020 • edited Loading

Choose a reason for hiding this comment

owen-d Mar 27, 2020

Choose a reason for hiding this comment

cyriltovena left a comment

Choose a reason for hiding this comment

cyriltovena Mar 26, 2020 •

edited

Loading