Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: field metaqueries take fast path if predicate is only on _measurement #21962

Merged
merged 3 commits into from
Jul 29, 2021

Conversation

williamhbaker
Copy link
Contributor

@williamhbaker williamhbaker commented Jul 28, 2021

Closes #21961

This PR updates the logic for detecting if a query that is attempting to get values for _field contains a predicate on something other than _measurement.

Since the influxql expression will have had references to _measurement or the bytes equivalent \x00 replaced with _name by reads.NodeToExpr:

case datatypes.NodeTypeTagRef:
ref := n.GetTagRefValue()
if v.remap != nil {
if nk, ok := v.remap[ref]; ok {
ref = nk
}
}

...we need to check for that equivalent remapped value when determining if an expression node contains something other than that.

This will allow a query like the one in #21961 to avoid a performance-intensive block scan. The performance increase can be dramatic when querying a large number of series, and this kind of query is very common when exploring data via the UI.

A relevant existing test is https://github.com/influxdata/flux/blob/master/stdlib/influxdata/influxdb/schema/show_fields_with_pred_test.flux, which ensures that the correct result is produced when a query for fields does include a non-measurement predicate. I've also verified locally that a query for fields with a predicate of only _measurement produces the correct result, and adding that test case would probably be a good idea as well.

Loading a database with ~1 million series and running the query listed in #21961 results in the following from query_benchmarker_influxdb when running locally on my machine, an average query time of ~10ms over 100 queries executed:

{
   "InfluxDB (Flux) field keys":{
      "count":100,
      "max":12.232133,
      "maxRate":81.75189069641411,
      "mean":9.265909370000003,
      "meanRate":107.92248877780678,
      "min":8.842354,
      "minRate":113.09205670797617,
      "sum":0.9265909370000003
   }
}

Running the same benchmark against the build from the commit just prior to this one results in a query time of ~4 seconds (note: the benchmark time was limited to 30 seconds, so only 9 queries ran):

{
   "InfluxDB (Flux) field keys":{
      "count":9,
      "max":4275.679762,
      "maxRate":0.23388093956134784,
      "mean":4110.049170444445,
      "meanRate":0.24330609161346453,
      "min":3619.381888,
      "minRate":0.27629027025732855,
      "sum":36.990442534
   }
}

@wolffcm
Copy link

wolffcm commented Jul 28, 2021

This seems right to me. Neither measurement or field (or whatever synonyms are actually used in any given layer) is a "tag key" in the proper sense in TSM's data model. Wish that I had thought of this before!

@williamhbaker williamhbaker force-pushed the wb-faster-field-metaquery-21961 branch from b420275 to 5978cab Compare July 28, 2021 20:53
@williamhbaker williamhbaker marked this pull request as ready for review July 28, 2021 20:54
@danxmoran
Copy link
Contributor

@ wbaker85 do you still plan on adding a test case for a query with a _measurement-only predicate?

@williamhbaker
Copy link
Contributor Author

@ wbaker85 do you still plan on adding a test case for a query with a _measurement-only predicate?

I've opened a PR in the flux repo with the additional test: influxdata/flux#3900

@williamhbaker
Copy link
Contributor Author

I duplicated the test that was merged to flux with this PR so that it will run in our CI and via make test-flux until we upgrade to the latest flux. Also created an issue to remove the duplicated test when we do upgrade: #21970

This will allow us to get this performance improvement merged to master and backported to 2.0 with test coverage even if we don't upgrade to the latest flux on 2.0.

@williamhbaker williamhbaker merged commit 8e80798 into master Jul 29, 2021
@williamhbaker williamhbaker deleted the wb-faster-field-metaquery-21961 branch July 29, 2021 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metaqueries to list field values take slow path when the only predicate is _measurement
3 participants