Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The contains function may not be optimized #20931

Open
LieLieLiekey opened this issue Mar 12, 2021 · 2 comments
Open

The contains function may not be optimized #20931

LieLieLiekey opened this issue Mar 12, 2021 · 2 comments
Labels
area/flux Issues related to the Flux query engine area/performance area/2.x OSS 2.0 related issues and PRs

Comments

@LieLieLiekey
Copy link

LieLieLiekey commented Mar 12, 2021

Environment info:

influxDB version: 2.0.3

System info: from docker
Debain, X86_64, 8-core Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz, 16GB RAM

Data describe:

BucketName: 15day_profile_bucket
MeasurementName: function_info
Tags: Function, Pid, Tid, ProcessName, UUID, State
Fields: Internal, cumulative

there may be 2.4w record and 200 series in 1 minute.

Problem:

The contains function query is very slow, it seems that the group key filter is not used.

The following flux query took 0.63s:

from(bucket: "15day_profile_bucket")
|> range(start: 2021-03-03T05:54:39.611Z, stop: 2021-03-03T06:54:39.611Z)
|> filter(fn: (r) => r["_measurement"] == "function_info" )
|> limit(n: 1)
|> filter(fn: (r) => contains(value: r["UUID"], set: ["7f0a1436-37ad-4b7a-9ab1-7acce9ee3060"])  )
|> yield()

this is image:
image

but the flux query took 37.88s:

from(bucket: "15day_profile_bucket")
|> range(start: 2021-03-03T05:54:39.611Z, stop: 2021-03-03T06:54:39.611Z)
|> filter(fn: (r) => r["_measurement"] == "function_info" )
|> filter(fn: (r) => contains(value: r["UUID"], set: ["7f0a1436-37ad-4b7a-9ab1-7acce9ee3060"])  )
|> limit(n: 1)
|> yield()

this is image:

image

Expected behavior:

The time spent on the two queries differs too much.

Because UUID is a tag field, so the first flux query ( is filter first then limit), and the second query (is limit first then filter) should no big difference.

So I guess the contains function does not use the group key for filtering, but scans all the data。

Use Case:

Our team used influxdb-v2, but that is the bottleneck of our project now.

I have tried to use multiple or operations to replace contains function, but when the number of filters is large(70+), the or operation is slower.

@danxmoran danxmoran added area/2.x OSS 2.0 related issues and PRs area/flux Issues related to the Flux query engine area/performance labels Mar 15, 2021
@MarcoPignati
Copy link

Same here. Comparing time taken for 2 identical simple scripts (one with a filter, another with a contains) the one with contains took, if i remember well, more than 30x.

@MarcoPignati
Copy link

rather than using contains() I am now using the approach suggested here: https://community.grafana.com/t/grafana-influxdb-flux-query-for-displaying-multi-select-variable-inputs/35536
the filtering works perfectly and performance is not impacted. In my case the variable $device of the example is obtained via a another query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/flux Issues related to the Flux query engine area/performance area/2.x OSS 2.0 related issues and PRs
Projects
None yet
Development

No branches or pull requests

3 participants