Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subqueries run much slower for simple SELECT statement #9122

Open
sparrc opened this issue Nov 16, 2017 · 21 comments
Open

Subqueries run much slower for simple SELECT statement #9122

sparrc opened this issue Nov 16, 2017 · 21 comments

Comments

@sparrc
Copy link
Contributor

sparrc commented Nov 16, 2017

Bug report

System info: [Include InfluxDB version, operating system name, and other relevant details]

Amazon Linux
InfluxDB 1.3.6

Steps to reproduce:

  1. Run a query counting and averaging response times:
> SELECT mean(response_time),count(response_time) FROM fastly_logs WHERE time > now() - 1m GROUP BY account_id
name: fastly_logs
tags: account_id=3855502814001
time                         mean   count
----                         ----   -----
2017-11-16T14:03:13.0021277Z 0.0194 5

name: fastly_logs
tags: account_id=4603804221001
time                         mean  count
----                         ----  -----
2017-11-16T14:03:13.0021277Z 0.642 1

name: fastly_logs
tags: account_id=5102072605001
time                         mean  count
----                         ----  -----
2017-11-16T14:03:13.0021277Z 0.016 3

name: fastly_logs
tags: account_id=5130362111001
time                         mean                count
----                         ----                -----
2017-11-16T14:03:13.0021277Z 0.04241873600155608 10269

name: fastly_logs
tags: account_id=5219833342001
time                         mean                count
----                         ----                -----
2017-11-16T14:03:13.0021277Z 0.05869871794871796 156

name: fastly_logs
tags: account_id=5330942432001
time                         mean                count
----                         ----                -----
2017-11-16T14:03:13.0021277Z 0.35833333333333334 6

...
...
  1. Run the same query but within a subquery, selecting some or all elements of the query:
> SELECT mean,count FROM (SELECT mean(response_time),count(response_time) FROM fastly_logs WHERE time > now() - 1m GROUP BY account_id)
name: fastly_logs
time                           mean                 count
----                           ----                 -----
2017-11-16T14:03:35.169381598Z 0.015                8
2017-11-16T14:03:35.169381598Z 0.11424999999999999  5
2017-11-16T14:03:35.169381598Z 0.0625               9
2017-11-16T14:03:35.169381598Z 0.0315               8
2017-11-16T14:03:35.169381598Z 0.23199999999999998  6
2017-11-16T14:03:35.169381598Z                      2
2017-11-16T14:03:35.169381598Z 0.012                1
2017-11-16T14:03:35.169381598Z 0.042096663600633646 18626
2017-11-16T14:03:35.169381598Z 0.05981645569620251  228
2017-11-16T14:03:35.169381598Z 0.4050000000000001   7
2017-11-16T14:03:35.169381598Z 0.004                8
2017-11-16T14:03:35.169381598Z 0.016                4
2017-11-16T14:03:35.169381598Z 0.3765               3
2017-11-16T14:03:35.169381598Z 0.057                5

Expected behavior: [What you expected to happen]

I would expect these queries to take about the same amount of time

Actual behavior: [What actually happened]

the 2nd query takes around 30-60s whereas the first returns almost instantly

Additional info: [Include gist of relevant config, logs, etc.]

I will attach vars.txt and iostat.txt in a comment

@sparrc
Copy link
Contributor Author

sparrc commented Nov 16, 2017

👋 again BTW 😄

I can provide sample data to reproduce this if you want, just get in touch with me on Gopher Slack

@sparrc
Copy link
Contributor Author

sparrc commented Nov 16, 2017

vars.txt
iostat.txt

@tijsvd
Copy link

tijsvd commented Nov 29, 2017

I notice the same thing. Influxdb-1.3.6 on CentOS 7.2. Even something as simple as:

select mean(value) from table

returns instantly (for 5M values), where

select mean(value) from (select value from table)

takes over a minute.

@daveselan
Copy link

We have seen this problem too, but I found that by moving the time constraint from the inner to the outer select, the query would complete much faster, and as expected.
In this example I'm wanting to count the number of groups that contain the tested values.

Time in inner select ran in 37 seconds

> time influx -database testdb -execute "select count(sum) from (select sum(util_pc) from host_id where group = 'devel' and name =~ /testing/ and type = 'static' and util_pc > 50 and time > now() - 15m group by ident)"
name: host_id
time count
---- -----
0    15
0.008u 0.004s 0:37.24 0.0%	0+0k 0+0io 0pf+0w

Time in outer select ran in 2 seconds

% time influx -database testdb -execute "select count(sum) from (select sum(util_pc) from host_id where group = 'devel' and name =~ /testing/ and type = 'static' and util_pc > 50 group by ident) where time > now() - 15m"
name: host_id
time                count
----                -----
1516125703050401501 15
0.004u 0.008s 0:02.19 0.0%	0+0k 0+0io 0pf+0w

Original single query ran in 2 seconds

> time influx -database testdb -execute "select sum(util_pc) from host_id where group = 'devel' and name =~ /testing/ and type = 'static' and util_pc > 50 and time > now() - 15m group by ident" 
name: host_id
tags: ident=82459165
time                sum
----                ---
1516125638290016167 199.79999999999995

name: host_id
tags: ident=82459821
time                sum
----                ---
1516125638290016167 399.69999999999993

name: host_id
tags: ident=82459927
time                sum
----                ---
1516125638290016167 400.0999999999999

name: host_id
tags: ident=82460499
time                sum
----                ---
1516125638290016167 399.69999999999993

name: host_id
tags: ident=82460500
time                sum
----                ---
1516125638290016167 372.3999999999999

name: host_id
tags: ident=82460501
time                sum
----                ---
1516125638290016167 341.69999999999993

name: host_id
tags: ident=82460502
time                sum
----                ---
1516125638290016167 268.19999999999993

name: host_id
tags: ident=82460505
time                sum
----                ---
1516125638290016167 399.69999999999993

name: host_id
tags: ident=82460506
time                sum
----                ---
1516125638290016167 400.0999999999999

name: host_id
tags: ident=82460507
time                sum
----                ---
1516125638290016167 400.19999999999993

name: host_id
tags: ident=82460508
time                sum
----                ---
1516125638290016167 399.5999999999999

name: host_id
tags: ident=82527534
time                sum
----                ---
1516125638290016167 393.20000000000005

name: host_id
tags: ident=82527535
time                sum
----                ---
1516125638290016167 393.4
0.004u 0.008s 0:02.36 0.0%	0+0k 0+0io 0pf+0w

@cbrainerd
Copy link

I'm running into this in 1.5.3. I've found some detail using EXPLAIN ANALYZE that seems to explain why moving the WHERE clause to the outside speeds things up as noted by @daveselan.

The WHERE time ... clause in the subquery does not seem to be honored. In my case I'm limiting to a 3 minute time span in the subquery, but explain analyze shows that the planner is iterating through all 24 1 hour shards instead of the 1 shard containing the 3 minute time range.

By moving the WHERE clause to the outer query, the query runs much faster and EXPLAIN ANALYZE shows the only the expected shard is iterated.

To make this a bit more concrete, here's the query I want to run. This takes around 45 seconds and creates iterators on 25 shards:
EXPLAIN ANALYZE SELECT TOP("totals", "ip", 7) FROM (SELECT SUM(iops) as "totals" FROM "stats" WHERE "time" > 1529612445s - 3m AND "time" < 1529612445s AND "hostname" = 'myhost.local' GROUP BY "ip")

Moving the WHERE clause to the outside speeds things up to about 3 seconds and create an iterator on one shard only. The same data is returned as above:
EXPLAIN ANALYZE SELECT TOP("totals", "ip", 7) FROM (SELECT SUM(iops) as "totals" FROM "stats" WHERE "hostname" = 'myhost.local' GROUP BY "ip") WHERE "time" > 1529612445s - 3m AND "time" < 1529612445s

This seems like a bug to me, I'd expect the planner to use the WHERE clause to limit the scope of the subquery that has the WHERE clause.

@stale
Copy link

stale bot commented Jul 24, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 24, 2019
@stale
Copy link

stale bot commented Jul 31, 2019

This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.

@stale stale bot closed this as completed Jul 31, 2019
@trollvottel
Copy link

This still is an issue :(

@russorat
Copy link
Contributor

russorat commented Dec 2, 2019

from @trollvottel , this is still an issue in 1.7.9.

@dgnorton dgnorton added this to the 1.8.1 milestone Mar 24, 2020
@dhoffend
Copy link

dhoffend commented Apr 7, 2020

I can confirm this issue (running 1.6.3). We're doing lots of queries like this

SELECT sum("mean")
FROM (
    SELECT mean("value") 
    FROM sometable
    WHERE ... $timeFilter
    GROUP BY time($__interval),"host", "interface" fill(null)
)
GROUP BY time($__interval), "host" fill(null)

The inner query itselfs runs pretty fast, but combined it takes ages. It doesn't make a difference if add a second timefilter on the outer query.

@3dGrabber
Copy link

Same problem on 1.7.4.
It's a real showstopper

@russorat
Copy link
Contributor

This is scheduled for the 1.8.1 release and should be backported to the 1.7 line as well. It will not be backported to 1.6 so you will need to upgrade.

@freebsdfrau
Copy link

I don't know about "1.8.1" or "1.7" or "1.6", but I am seeing this issue here:

$ grafana-server -v
Version 6.2.5 (commit: 6082d19, branch: HEAD)

@trollvottel
Copy link

trollvottel commented Jun 12, 2020 via email

@freebsdfrau
Copy link

Derp! Thanks!

So, I am still seeing this in Influxdb-1.7.7

Do you know what version in the 1.7 line will receive the backport?

@trollvottel
Copy link

trollvottel commented Jun 12, 2020 via email

@freebsdfrau
Copy link

Sorry, I guess that was addressed to @russorat

@dgnorton dgnorton modified the milestones: 1.8.1, 1.8.2 Jul 8, 2020
@dhoffend
Copy link

dhoffend commented Oct 9, 2020

@dgnorton since 1.8.3 has already been released. Did somebody had time to look into this problem yet?

@3dGrabber
Copy link

Waiting for this, too

@jeanse7en
Copy link

Hello all.
I already tested this case in 1.8.3 and read the EXPLAIN ANALYZE. And it has the same issue.

VictorRobellini added a commit to VictorRobellini/pfSense-Dashboard that referenced this issue May 19, 2021
influxdata/influxdb#9122

Found out that my subqueries were bogging down my DB.  I adjusted them as recommended in the above issue report.
There were also 2 widgets with hardcoded and incorrect datasources.
@mrabhishekchauhan
Copy link

mrabhishekchauhan commented Jun 22, 2021

Same issue happening in influx 1.8 with this query:
select sum(value) as _value from (select (non_negative_difference(bytes_sent)/(1024.0 * 1024.0)) as value from net where time >= 1623315960000000000 and time <= 1623920760000000000 and orgid='xxxxx' and interface='ens4') where time >= 1623315960000000000 and time <= 1623920760000000000 group by deviceid, interface, time(604800s,32760s) fill(0.0)

Tried removing time constraint in the inner query but still taking more than 20s to run.
Anybody has any workaround subqueries or any other solution?

@psteinbachs psteinbachs removed this from the 1.8.2 milestone Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests