feat: multi-measurement query optimization #22301

williamhbaker · 2021-08-25T18:16:28Z

This replaces the existing single measurement query optimization with an expanded version that applies to certain queries containing multiple measurements.

Background

The optimization applies for for queries where:

All measurements only use equality for comparison, for example r["_measurement"] == "measName".

One or more measurement expressions constituting a "group" of measurements are evaluated using only OR.
For example, a query like this applies the optimization:

from(bucket: "bucket") 
  |> range(start: -10m) 
  |> filter(fn: (r) => r["_measurement"] == "Meas-1" or r["_measurement"] == "Meas-2") 
  |> filter(fn: (r) => r["_field"] == "Field-1" or r["_field"] == "Field-2")

...but this does not:

from(bucket: "bucket") 
  |> range(start: -10m) 
  |> filter(fn: (r) => r["_measurement"] == "Measurement-1" and r["_measurement"] == "Measurement-2") 
  |> filter(fn: (r) => r["_field"] == "Field-1" or r["_field"] == "Field-2")

~~There are measurements at no other place in the query than in the single group of OR'd measurements.~~
The group of OR'd measurements is applied to the query by a "top-level" AND. This ensures that all results from the query must belong to series containing one of those measurements. Put another way, there are no operators other than AND separating the group of OR'd measurements from the other conditions of the query.
As an influxql.Expr string, such a query could look like this:
```
(tag1 != 'foo' OR tag2 = 'bar') AND (_measurement = 'm0' OR _measurement = 'm1' OR _measurement = 'm2') AND (_field = 'val1' OR _field = 'val2')
```
...but not this:
```
(tag1 != 'foo' OR tag2 = 'bar') OR (_measurement = 'm0' OR _measurement = 'm1' OR _measurement = 'm2') AND (_field = 'val1' OR _field = 'val2')
```
Note that a query consisting only of a group of OR'd measurements will apply the optimization as well:
```
(_measurement = 'm0' OR _measurement = 'm1' OR _measurement = 'm2')
```

The code for determining if the optimization applies and what measurement names are contained in the query uses traversal of the influxql.Expr AST, as did the previous single measurement optimization. It locates subtrees consisting entirely of binary expressions with OR operators (or parens) where the leaf nodes are EQ comparisons between _measurement and the string value of the measurement. These subtrees must be accessible from the head of the tree by exclusively traveling through binary expressions with AND operators (or parens). If there is more than one such subtree, the optimization cannot be applied.

Performance Improvement

For high-cardinality data spanning across several shards, the performance increase from this optimization is substantial. I tested locally using a dataset spanning 1 year (~52 shards) with 60,000 unique measurements containing 6,000,000 lines with each line having 20 fields (comparable to the data reported in #22156).

The following flux query was used, with the time range encompassing the entire dataset:

from(bucket: "benchmark_db") 
  |> range(start: 2019-01-01) 
  |> filter(fn: (r) => r["_measurement"] == "Measurement-1" or r["_measurement"] == "Measurement-2" or r["_measurement"] == "Measurement-3" or r["_measurement"] == "Measurement-4" or r["_measurement"] == "Measurement-5") 
  |> filter(fn: (r) => r["_field"] == "Field-1" or r["_field"] == "Field-2" or r["_field"] == "Field-3" or r["_field"] == "Field-4" or r["_field"] == "Field-5")

Pre-optimization, the average time for a single query was around ~20 seconds:

{
   "count":7,
   "max":24907.185157,
   "maxRate":0.040149057137392204,
   "mean":21416.873211857142,
   "meanRate":0.04669215669850277,
   "min":15555.256859,
   "minRate":0.06428694871865247,
   "sum":149.91811248300002
}

With the optimization in place, the query time was reduced to an average of ~55 milliseconds:

{
   "count":533,
   "max":252.204301,
   "maxRate":3.9650394384035508,
   "mean":56.5385272007505,
   "meanRate":17.6870542886502,
   "min":35.950905,
   "minRate":27.815711454273544,
   "sum":30.135034998000023
}

The optimized query time is comparable to an equivalent InfluxQL query, which averages around 35 milliseconds:

{
   "count":862,
   "max":107.6885,
   "maxRate":9.286042613649554,
   "mean":34.82083440023203,
   "meanRate":28.718438751523326,
   "min":23.437095,
   "minRate":42.66740395940709,
   "sum":30.01555925300001
}

v1/services/storage/predicate_influxql.go

lesam · 2021-08-25T19:50:45Z

I have quickly looked through this change. While I think it is good, I think it might be worth spending a bit more time understanding why the pre-optimization case was so bad - there might be better places to put this logic than right at the top level.

Discussing with @wbaker85 offline about it.

tsdb/shard.go

v1/services/storage/series_cursor.go

tsdb/shard.go

lesam · 2021-08-30T19:31:44Z

tsdb/shard.go

+// subtree containing the OR'd measurements accessible from root of the tree
+// either directly (tree contains nothing but OR'd measurements) or by
+// traversing AND binary expression nodes.
+func (v measurementEvaluator) Visit(node influxql.Node) influxql.Visitor {


Would it make the logic better to have two visitors? You could have one visitor responsible for finding a potential head ( the LHS or RHS of an AND node that was not itself an AND node) and a second visitor responsible for finding if a given head is valid (composed of only measurement EQ conditions) or fatal (some measurement conditions and some non-measurements).

Separating the complicated algorithm into two simple algorithms would probably make the whole thing more clear when we read this code in the future.

I split it up and I think it did make it easier to understand. Check it out and see what you think!

lesam

See comments.

tsdb/shard.go

lesam · 2021-08-31T18:00:37Z

tsdb/shard.go

+			return v
+		}
+
+		if name, ok := measurementNameFromBinary(n, v.measurementKey); ok {


Do we have a test for something this misses? e.g.

(r._name = "foo" || r._name = "bar") && (r._value = r._name)

I made some updates that we discussed which should alleviate this concern. Since an expression like (_name = 'm0' OR _name = 'm1' OR _name = 'm2') AND (tag1 != 'foo' OR _name = 'm1') should be able to apply the optimization (we know that the values must come from m0, m1, or m2, even though m1 occurs on the left and right side of the AND), I updated the code to only look for a single group of exclusive OR'd measurements, and not invalidate the optimization if a measurement occurs elsewhere.

This simplifies looking for measurements anywhere else in the query, since it basically doesn't matter. I renamed this funciton to measurementNameFromEqBinary and added clarification that it only returns measurement names for EQ operations in the from of _measurement = name. For an EQ that has anything else, the tree will be invalidated.

lesam · 2021-08-31T21:33:51Z

tsdb/shard.go

+		// A BinaryExpr must have an operation of OR or EQ in a valid tree
+		if n.Op == influxql.OR {
+			// The children of ORs must be either BinaryExprs themselves, or Parens
+			if binaryOrParen(n.LHS) && binaryOrParen(n.RHS) {


I don't think you need this anymore - you can just return v now, if not binary or paren then the visitor will return false (hence invalid) anyway.

tsdb/shard.go

lesam

Two minor comments and I think this is good to go.

lesam · 2021-09-01T13:45:06Z

This looks good to me now, but a flux test failed...

* feat: multi-measurement query optimization (cherry picked from commit 3e275a1)

williamhbaker marked this pull request as ready for review August 25, 2021 18:27

lesam reviewed Aug 25, 2021

View reviewed changes

v1/services/storage/predicate_influxql.go Outdated Show resolved Hide resolved

williamhbaker force-pushed the wb-multi-measurement-optimization branch from e5842f4 to 542ee7c Compare August 30, 2021 18:51