Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous Query with derivative fails silently #3247

Closed
tylerhi opened this issue Jul 6, 2015 · 40 comments
Closed

Continuous Query with derivative fails silently #3247

tylerhi opened this issue Jul 6, 2015 · 40 comments

Comments

@tylerhi
Copy link

tylerhi commented Jul 6, 2015

Created this continue query on my database running 0.9.1.

create continuous query deriv on increase_test begin select derivative(value) into derivative_value from increasing_value group by time(1m) end

The series increasing_value is a monotonicially increasing value incremented at 5 second intervals via a shell script. The result is the series derivative_value is never populated and no errors are recorded in the log.

@pdf
Copy link

pdf commented Jul 7, 2015

Possible dupe of #3221

@kruckenb
Copy link

For me (running 0.9.1 installed via brew on MacOS 10.10.4 w/ go 1.4.2), continuous queries work except for derivative(). No errors in the log. Here's how to recreate:

> create database foo
> use foo
> create continuous query test_sum on foo begin select sum(value) into sum from temp group by time(30s) end
> create continuous query test_der on foo begin select derivative(value) into rate from temp group by time(30s) end
> create continuous query test_pct on foo begin select percentile(value,95) into pct from temp group by time(30s) end
> create continuous query test_count on foo begin select count(value) into count from temp group by time(30s) end

[wait around 15 sec between each insert]
> insert temp value=1
> insert temp value=2
> insert temp value=3
> insert temp value=4
> insert temp value=5

[wait a minute or so for continuous queries to run]
> show series

[note that every continuous query is represented below, except test_der -> rate]
name: count
-----------
count

name: pct
---------
pct

name: sum
---------
sum

name: temp
----------
temp

@beckettsean
Copy link
Contributor

@kruckenb if you run the derivative query ad hoc (not as a CQ) does it return the expected results? E.g. select derivative(value) from temp where time > now() - 1h group by time(30s)

@beckettsean
Copy link
Contributor

@kruckenb another thought, given #3000 is not fixed in the most recent builds, can you try your test with float data, e.g. > insert temp value=5.0? It's possible whatever is breaking operations on integers also affects derivative.

@kruckenb
Copy link

@beckettsean Yes derivative() works fine ad-hoc:

> select derivative(value) from temp where time > now() - 1h group by time(30s)
name: temp
----------
time                value
2015-07-16T20:17:07.707069119Z  2.613148063820763
2015-07-16T20:17:18.776822622Z  2.7100874461088713
2015-07-16T20:17:30.041753995Z  2.6631320694864207
2015-07-16T20:17:40.700819424Z  2.8145056618546818

Re: #3000 I repeated the steps above but inserted 1.1, 2.2, 3.3, 4.4, 5.5. Results look the same, so hard to say whether float data is the cause:

> show measurements
name: measurements
------------------
name
count
pct
sum
temp

@beckettsean
Copy link
Contributor

Thanks, @kruckenb, I think we can rule out #3000 being a contributor to the behavior you're seeing.

I wonder if the first point written had an integer (likely 0) for value, and the subsequent inserts are failing because of a type mismatch? It seems like that should put something in the logs, but perhaps certain CQ errors are being squelched or swallowed somewhere else.

What happens if you insert data directly into rate? E.g. > insert rate value=1 vs. > insert rate value=1.0

@kruckenb
Copy link

@beckettsean I tried the following with rate value=1 and rate value=1.0, same results either way:

> create database foo
> use foo
> insert rate value=1.0 (or 1)
> create continuous query test_der on foo begin select derivative(value) into rate from temp group by time(30s) end
[ wait about 15 sec between each insert ]
> insert temp value=1
> insert temp value=2
> insert temp value=3
> insert temp value=4
> insert temp value=5
> show measurements
name: measurements
------------------
name
rate
temp

[ wait a few minutes for CQ to run ]
> select * from rate
name: rate
----------
time                value
2015-07-16T22:02:13.220541778Z  1

Doesn't look like int vs float is the issue.

@beckettsean
Copy link
Contributor

@mjdesa related to #3355?

@kruckenb
Copy link

@beckettsean This issue (at least, the one I reported in the comment above #3247 (comment)) appears to be fixed in 0.9.2. I'm not sure which commit fixes it, but it works.

@kruckenb
Copy link

@beckettsean I spoke too soon. In 0.9.2 CQ with derivative() runs, but value generated is always 0. Running same CQ with sum() works fine. So derivative() is kind-of fixed, but not all the way.

@kruckenb
Copy link

@beckettsean After more fiddling, I think CQ derivative() is working with 0.9.2. Had to get the right combination of [continuous_queries] settings, GROUP BY time() and derivative() parameters, but it's working!

@gucki
Copy link

gucki commented Jul 18, 2015

I can confirm that CQ with derivate were not working at all in 0.9.1. I just insatlled 0.9.2-rc1 and they are executed now. However all values are 0, so they are still broken:

> CREATE CONTINUOUS QUERY "net.derivative.1m" ON netskin BEGIN SELECT derivative(value) AS value INTO "netskin"."1m".derivative FROM "netskin"."default".net GROUP BY time(1m), * END


> SELECT DERIVATIVE(value) AS value FROM "netskin"."default"."net" WHERE host = 'r-ch100' and interface = '1' AND time > NOW() - 5m GROUP BY time(1m),*
name: net
tags: host=r-ch100, interface=1, metric=rx.octets
time        value
----        -----
2015-07-18T12:48:00.565515905Z  7.774834790019962e+07
2015-07-18T12:49:01.658300073Z  3.2114732152404856e+07
2015-07-18T12:50:01.965279053Z  4.9003421991674766e+07
2015-07-18T12:51:03.116769176Z  7.287955389207712e+07


name: net
tags: host=r-ch100, interface=1, metric=rx.pkts
time        value
----        -----
2015-07-18T12:48:00.172157355Z  189142.62656269464
2015-07-18T12:49:00.887072845Z  143856.9078044516
2015-07-18T12:50:02.119077017Z  156420.16180126544
2015-07-18T12:51:03.278956625Z  138562.40486927808


name: net
tags: host=r-ch100, interface=1, metric=tx.octets
time        value
----        -----
2015-07-18T12:48:00.292337452Z  7.95230838825095e+07
2015-07-18T12:49:01.362391836Z  3.3170417816603858e+07
2015-07-18T12:50:02.291535176Z  4.952454760707292e+07
2015-07-18T12:51:02.777146816Z  7.290829769974035e+07


name: net
tags: host=r-ch100, interface=1, metric=tx.pkts
time        value
----        -----
2015-07-18T12:48:00.421732471Z  187677.76979262085
2015-07-18T12:49:01.533039315Z  145660.11528313375
2015-07-18T12:50:02.473005306Z  157583.28452986418
2015-07-18T12:51:02.911566842Z  140246.88517695962


> SELECT * FROM "netskin"."1m"."derivative" WHERE host = 'r-ch100' and interface = '1'
name: derivative
tags: host=r-ch100, interface=1, metric=rx.octets
time        value
----        -----
2015-07-18T12:46:59.555380666Z  0
2015-07-18T12:48:00.565515905Z  0
2015-07-18T12:49:01.658300073Z  0
2015-07-18T12:50:01.965279053Z  0


name: derivative
tags: host=r-ch100, interface=1, metric=rx.pkts
time        value
----        -----
2015-07-18T12:46:59.705724125Z  0
2015-07-18T12:48:00.172157355Z  0
2015-07-18T12:49:00.887072845Z  0
2015-07-18T12:50:02.119077017Z  0


name: derivative
tags: host=r-ch100, interface=1, metric=tx.octets
time        value
----        -----
2015-07-18T12:46:59.814376523Z  0
2015-07-18T12:48:00.292337452Z  0
2015-07-18T12:49:01.362391836Z  0
2015-07-18T12:50:02.291535176Z  0


name: derivative
tags: host=r-ch100, interface=1, metric=tx.pkts
time        value
----        -----
2015-07-18T12:46:59.918136267Z  0
2015-07-18T12:48:00.421732471Z  0
2015-07-18T12:49:01.533039315Z  0
2015-07-18T12:50:02.473005306Z  0

@kruckenb
Copy link

@gucki @beckettsean Yes I ran into the same thing. The problem is with the way the CQ service sets the time range for the CQ (here https://github.com/influxdb/influxdb/blob/master/services/continuous_querier/service.go#L233). If the GROUP BY time() interval matches your sample rate, it'll never work. At least 2 data points need to be in each CQ to compute derivative(), so to make that work you have to set the GROUP BY time() interval to 2x your sample rate, plus a little just in case your samples are a bit late.

The problem with that is that you'll end up with 0/nil measurements periodically, because the CQ setTimeRange window won't line up with every sampling interval, so you end up with empty buckets.

I think the fix is to get rid of the recomputing intervals (RecomputePreviousN) and let the CQ query GROUP BY time() clause do the bucketing back to RecomputeNoOlderThan:

diff --git a/services/continuous_querier/service.go b/services/continuous_querier/service.go
index df5d640..704fd16 100644
--- a/services/continuous_querier/service.go
+++ b/services/continuous_querier/service.go
@@ -230,7 +230,7 @@ func (s *Service) ExecuteContinuousQuery(dbi *meta.DatabaseInfo, cqi *meta.Conti
                startTime = startTime.Add(-interval)
        }

-       if err := cq.q.SetTimeRange(startTime, startTime.Add(interval)); err != nil {
+       if err := cq.q.SetTimeRange(now.Add(-time.Duration(s.Config.RecomputeNoOlderThan)), startTime.Add(interval)); err != nil {
                s.Logger.Printf("error setting time range: %s\n", err)
        }

@@ -240,27 +240,6 @@ func (s *Service) ExecuteContinuousQuery(dbi *meta.DatabaseInfo, cqi *meta.Conti
                return err
        }

-       recomputeNoOlderThan := time.Duration(s.Config.RecomputeNoOlderThan)
-
-       for i := 0; i < s.Config.RecomputePreviousN; i++ {
-               // if we're already more time past the previous window than we're going to look back, stop
-               if now.Sub(startTime) > recomputeNoOlderThan {
-                       return nil
-               }
-               newStartTime := startTime.Add(-interval)
-
-               if err := cq.q.SetTimeRange(newStartTime, startTime); err != nil {
-                       s.Logger.Printf("error setting time range: %s\n", err)
-                       return err
-               }
-
-               if err := s.runContinuousQueryAndWriteResult(cq); err != nil {
-                       s.Logger.Printf("error during recompute previous: %s. running: %s\n", err, cq.q.String())
-                       return err
-               }
-
-               startTime = newStartTime
-       }
        return nil
 }

@@ -272,6 +251,7 @@ func (s *Service) runContinuousQueryAndWriteResult(cq *ContinuousQuery) error {
        }

@gucki
Copy link

gucki commented Jul 18, 2015

I did all my previous tests with the default configuration:

[continuous_queries]
  enabled = true
  recompute-previous-n = 2
  recompute-no-older-than = "10m"
  compute-runs-per-interval = 10
  compute-no-more-than = "2m"

I changed that too and restarted influx:

[continuous_queries]
  enabled = true
  recompute-previous-n = 10
  recompute-no-older-than = "1h"
  compute-runs-per-interval = 100
  compute-no-more-than = "1h"

However it didn't change anything, still only 0 values:

> SELECT * FROM "netskin"."1m"."derivative" WHERE host = 'r-ch100' and interface = '1'
name: derivative
tags: host=r-ch100, interface=1, metric=rx.octets
time        value
----        -----
2015-07-18T12:46:59.555380666Z  0
2015-07-18T12:48:00.565515905Z  0
2015-07-18T12:49:01.658300073Z  0
2015-07-18T12:50:01.965279053Z  0
2015-07-18T12:51:03.116769176Z  0
2015-07-18T12:52:03.589781083Z  0
2015-07-18T12:53:04.235705243Z  0
2015-07-18T12:54:09.768824888Z  0
2015-07-18T12:55:10.18425067Z 0
2015-07-18T12:56:11.294839986Z  0
2015-07-18T12:57:11.542281645Z  0
2015-07-18T12:58:12.353385228Z  0
2015-07-18T12:59:13.193730843Z  0
2015-07-18T13:00:13.705337865Z  0
2015-07-18T13:01:14.712746976Z  0
2015-07-18T13:02:19.653485086Z  0
2015-07-18T13:29:53.958159213Z  0
2015-07-18T13:30:54.74183494Z 0
2015-07-18T13:30:54.74183494Z 0
2015-07-18T13:31:59.597056288Z  0
2015-07-18T13:31:59.597056288Z  0
2015-07-18T13:33:00.298712415Z  0
2015-07-18T13:33:00.298712415Z  0
2015-07-18T13:34:01.127098042Z  0
2015-07-18T13:34:01.127098042Z  0
2015-07-18T13:35:01.686031959Z  0
2015-07-18T13:36:02.889472195Z  0
2015-07-18T13:37:03.641268932Z  0
2015-07-18T13:38:04.352626147Z  0
2015-07-18T13:39:09.51332475Z 0


name: derivative
tags: host=r-ch100, interface=1, metric=rx.pkts
time        value
----        -----
2015-07-18T12:46:59.705724125Z  0
2015-07-18T12:48:00.172157355Z  0
2015-07-18T12:49:00.887072845Z  0
2015-07-18T12:50:02.119077017Z  0
2015-07-18T12:51:03.278956625Z  0
2015-07-18T12:52:03.747189507Z  0
2015-07-18T12:53:04.392573249Z  0
2015-07-18T12:54:09.851217295Z  0
2015-07-18T12:55:10.337764321Z  0
2015-07-18T12:56:10.939502879Z  0
2015-07-18T12:57:11.711365332Z  0
2015-07-18T12:58:12.462067452Z  0
2015-07-18T12:59:13.352978025Z  0
2015-07-18T13:00:13.864170543Z  0
2015-07-18T13:01:14.877354951Z  0
2015-07-18T13:02:19.781044985Z  0
2015-07-18T13:29:54.101752542Z  0
2015-07-18T13:30:54.854289586Z  0
2015-07-18T13:30:54.854289586Z  0
2015-07-18T13:31:59.772798123Z  0
2015-07-18T13:31:59.772798123Z  0
2015-07-18T13:33:00.429748475Z  0
2015-07-18T13:33:00.429748475Z  0
2015-07-18T13:34:01.283309551Z  0
2015-07-18T13:34:01.283309551Z  0
2015-07-18T13:35:01.854968564Z  0
2015-07-18T13:36:03.062232901Z  0
2015-07-18T13:37:03.807847731Z  0
2015-07-18T13:38:04.483323061Z  0
2015-07-18T13:39:09.596573746Z  0


name: derivative
tags: host=r-ch100, interface=1, metric=tx.octets
time        value
----        -----
2015-07-18T12:46:59.814376523Z  0
2015-07-18T12:48:00.292337452Z  0
2015-07-18T12:49:01.362391836Z  0
2015-07-18T12:50:02.291535176Z  0
2015-07-18T12:51:02.777146816Z  0
2015-07-18T12:52:03.84500159Z 0
2015-07-18T12:53:04.566373596Z  0
2015-07-18T12:54:09.544230161Z  0
2015-07-18T12:55:10.48974113Z 0
2015-07-18T12:56:11.032440534Z  0
2015-07-18T12:57:11.842232375Z  0
2015-07-18T12:58:12.638672165Z  0
2015-07-18T12:59:13.465461165Z  0
2015-07-18T13:00:14.020307875Z  0
2015-07-18T13:01:15.051124987Z  0
2015-07-18T13:02:19.909770168Z  0
2015-07-18T13:29:54.241441173Z  0
2015-07-18T13:30:55.02937143Z 0
2015-07-18T13:30:55.02937143Z 0
2015-07-18T13:31:59.915419978Z  0
2015-07-18T13:31:59.915419978Z  0
2015-07-18T13:33:00.56063724Z 0
2015-07-18T13:33:00.56063724Z 0
2015-07-18T13:34:01.382438398Z  0
2015-07-18T13:34:01.382438398Z  0
2015-07-18T13:35:02.008356495Z  0
2015-07-18T13:36:03.219923133Z  0
2015-07-18T13:37:03.93866763Z 0
2015-07-18T13:38:04.584081351Z  0
2015-07-18T13:39:09.722388757Z  0


name: derivative
tags: host=r-ch100, interface=1, metric=tx.pkts
time        value
----        -----
2015-07-18T12:46:59.918136267Z  0
2015-07-18T12:48:00.421732471Z  0
2015-07-18T12:49:01.533039315Z  0
2015-07-18T12:50:02.473005306Z  0
2015-07-18T12:51:02.911566842Z  0
2015-07-18T12:52:03.953835852Z  0
2015-07-18T12:53:04.657472913Z  0
2015-07-18T12:54:09.62677894Z 0
2015-07-18T12:55:10.64901478Z 0
2015-07-18T12:56:11.203314923Z  0
2015-07-18T12:57:11.996149625Z  0
2015-07-18T12:58:12.75333229Z 0
2015-07-18T12:59:13.024632628Z  0
2015-07-18T13:00:14.135801554Z  0
2015-07-18T13:01:15.166540296Z  0
2015-07-18T13:02:20.025662209Z  0
2015-07-18T13:29:53.766848037Z  0
2015-07-18T13:30:54.601013848Z  0
2015-07-18T13:30:54.601013848Z  0
2015-07-18T13:32:00.01959436Z 0
2015-07-18T13:32:00.01959436Z 0
2015-07-18T13:33:00.696970881Z  0
2015-07-18T13:33:00.696970881Z  0
2015-07-18T13:34:00.994986308Z  0
2015-07-18T13:34:00.994986308Z  0
2015-07-18T13:35:02.109055424Z  0
2015-07-18T13:36:02.464817405Z  0
2015-07-18T13:37:03.488690378Z  0
2015-07-18T13:38:04.706812357Z  0
2015-07-18T13:39:09.859520609Z  0

There are even some duplicate timestamps now, I reported that already in other issues:

#3373
#3381

@kruckenb
Copy link

@gucki I found that I had to set the GROUP BY time() interval to be at least 2x the sampling rate, so in your case probably 65s since some of your samples are later than 1 minute. Also, I had to set the optional parameter to derivative() to derivative(value, 1s) since I wanted 1-second rate calculations.

That fixed most of the problem, but I still had buckets that were empty because of how the intervals are computed. #3383 fixes that.

@kruckenb
Copy link

@gucki I noticed that dups happen when influx is restarted. Not sure if that explains all of your dups.

@gucki
Copy link

gucki commented Jul 18, 2015

I constantly get invalid/ dup values (#3373) without restarting influx.

I just tried to use an aggregated table for the derivative CQ in order to avoid the sample rate inaccuraties (the aggregated table only contains well aligned/ rounded timestamps).

CREATE CONTINUOUS QUERY "net.derivative.1m" ON netskin BEGIN SELECT derivative(value) AS value INTO "netskin"."1m".derivative FROM "netskin"."1m".net GROUP BY time(1m), * END

But now the CQ doesn't seem to run at all again. In the logs I can see:

[continuous_querier] 2015/07/18 16:20:45 timeout
[continuous_querier] 2015/07/18 16:20:45 error during recompute previous: timeout. running: SELECT derivative(value, 1m) AS value INTO "netskin"."1m".derivative FROM "netskin"."1m".net WHERE time >= '2015-07-18 14:19:00' AND time < '2015-07-18 14:20:00' GROUP BY time(1m), *

Using your idea of using an increased group by indeed works:

CREATE CONTINUOUS QUERY "net.derivative.1m" ON "netskin" BEGIN SELECT DERIVATIVE(value, 1m) AS value INTO "netskin"."1m"."derivative" FROM "netskin"."default"."net" GROUP BY time(2m),* END

@pdf
Copy link

pdf commented Jul 18, 2015

Besides fixing any actual bugs, generally what people want for fixing derivative calculation (CQ or not) is probably #3273.

@sharang
Copy link

sharang commented Oct 9, 2015

not fixed yet? We have this problem in 0.9.3 -- 0.9.4.2.

continuous query definition:

switch_traffic_1m_mean  CREATE CONTINUOUS QUERY switch_traffic_1m_mean ON livecloud BEGIN SELECT derivative(last(counter), 1s) AS "rate" INTO "livecloud"."rp_1d".switch_traffic_1m_mean FROM "livecloud"."default".switch_traffic_raw GROUP BY time(1m), * END

the ad hoc cli result is correct, but all 'rate' in CQ result are zero:
(all the following query has the same WHERE clause)

[raw data]

QL: SELECT * FROM switch_traffic_raw WHERE time >=  1444400400s AND time <= 1444400580s AND switch_id = '2' AND instance_id = '5' AND instance_type='rx_bytes'

name: switch_traffic_raw
------------------------
time            counter     instance_id instance_name   instance_type   switch_id   switch_mip
2015-10-09T14:20:29Z    568090710   5       MEth0/0/0   rx_bytes    2       69.28.56.252
2015-10-09T14:21:29Z    568101438   5       MEth0/0/0   rx_bytes    2       69.28.56.252
2015-10-09T14:22:28Z    568119778   5       MEth0/0/0   rx_bytes    2       69.28.56.252

[LAST result]

QL: SELECT last(counter) AS rate FROM switch_traffic_raw WHERE time >=  1444400400s AND time <= 1444400580s AND switch_id = '2' AND instance_id = '5' AND instance_type='rx_bytes' GROUP BY time(1m), *

name: switch_traffic_raw
tags: instance_id=5, instance_name=MEth0/0/0, instance_type=rx_bytes, switch_id=2, switch_mip=69.28.56.252
time            rate
----            ----
2015-10-09T14:20:00Z    568090710
2015-10-09T14:21:00Z    568101438
2015-10-09T14:22:00Z    568119778
2015-10-09T14:23:00Z    568130654

[DERIVATIVE result]

QL: SELECT derivative(last(counter),1s) AS rate FROM switch_traffic_raw WHERE time >=  1444400400s AND time <= 1444400580s AND switch_id = '2' AND instance_id = '5' AND instance_type='rx_bytes' GROUP BY time(1m), *

name: switch_traffic_raw
tags: instance_id=5, instance_name=MEth0/0/0, instance_type=rx_bytes, switch_id=2, switch_mip=69.28.56.252
time            rate
----            ----
2015-10-09T14:21:00Z    178.8
2015-10-09T14:22:00Z    305.6666666666667
2015-10-09T14:23:00Z    181.26666666666668

[CQ result]

QL: SELECT * FROM livecloud.rp_1d.switch_traffic_1m_mean WHERE time >=  1444400400s AND time <= 1444400580s AND switch_id = '2' AND instance_id = '5' AND instance_type='rx_bytes'

name: switch_traffic_1m_mean
----------------------------
time            instance_id instance_name   instance_type   rate    switch_id   switch_mip
2015-10-09T14:20:00Z    5       MEth0/0/0   rx_bytes    0   2       69.28.56.252
2015-10-09T14:21:00Z    5       MEth0/0/0   rx_bytes    0   2       69.28.56.252
2015-10-09T14:22:00Z    5       MEth0/0/0   rx_bytes    0   2       69.28.56.252

@pdf
Copy link

pdf commented Oct 9, 2015

@sharang please fix your formatting and/or post large chunks as gists - scrolling through pages of cruft makes the issue hard to read.

@sharang
Copy link

sharang commented Oct 9, 2015

@pdf formatting fixed

@beckettsean
Copy link
Contributor

@jwilder still an issue in 0.9.5 nightlies. Can you investigate?

@dswarbrick
Copy link

Seems to still be broken as of 0.9.5-nightly-6ecb62e.

> show continuous queries
name: perfdb
------------
name    query
testcq1 CREATE CONTINUOUS QUERY testcq1 ON perfdb BEGIN SELECT derivative(mean("read"), 1s) INTO "perfdb"."default".read_sectors_delta FROM "perfdb"."default".disk_ops GROUP BY time(2m), * END
testcq2 CREATE CONTINUOUS QUERY testcq2 ON perfdb BEGIN SELECT mean("read") INTO "perfdb"."default".read_sectors_mean FROM "perfdb"."default".disk_ops GROUP BY time(2m), * END
> select * from read_sectors_mean where time > now() - 10m
name: read_sectors_mean
-----------------------
time            host        mean        plugin  plugin_instance type
1447271880000000000 fra-influxdb01  7103.5      disk    vda     disk_ops
1447271880000000000 fra-influxdb01  4.92721e+06 disk    vdb     disk_ops
1447272000000000000 fra-influxdb01  7122        disk    vda     disk_ops
1447272000000000000 fra-influxdb01  5.0821435e+06   disk    vdb     disk_ops
1447272120000000000 fra-influxdb01  7185.5      disk    vda     disk_ops
1447272120000000000 fra-influxdb01  5.252931e+06    disk    vdb     disk_ops
1447272240000000000 fra-influxdb01  7192        disk    vda     disk_ops
1447272240000000000 fra-influxdb01  5.348674e+06    disk    vdb     disk_ops
> select * from read_sectors_delta where time > now() - 10m
name: read_sectors_delta
------------------------
time            derivative  host        plugin  plugin_instance type
1447271880000000000 0       fra-influxdb01  disk    vda1        disk_ops
1447271880000000000 0       fra-influxdb01  disk    vdb     disk_ops
1447271880000000000 0       fra-influxdb01  disk    vda     disk_ops
1447271880000000000 0       fra-influxdb01  disk    vda2        disk_ops
1447271880000000000 0       fra-influxdb01  disk    vdb1        disk_ops
1447272000000000000 0       fra-influxdb01  disk    vda2        disk_ops
1447272000000000000 0       fra-influxdb01  disk    vdb1        disk_ops
1447272000000000000 0       fra-influxdb01  disk    vda1        disk_ops
1447272000000000000 0       fra-influxdb01  disk    vda     disk_ops
1447272000000000000 0       fra-influxdb01  disk    vdb     disk_ops
1447272120000000000 0       fra-influxdb01  disk    vda     disk_ops
1447272120000000000 0       fra-influxdb01  disk    vdb     disk_ops
1447272120000000000 0       fra-influxdb01  disk    vda2        disk_ops
1447272120000000000 0       fra-influxdb01  disk    vda1        disk_ops
1447272120000000000 0       fra-influxdb01  disk    vdb1        disk_ops
1447272240000000000 0       fra-influxdb01  disk    vda1        disk_ops
1447272240000000000 0       fra-influxdb01  disk    vdb1        disk_ops
1447272240000000000 0       fra-influxdb01  disk    vdb     disk_ops
1447272240000000000 0       fra-influxdb01  disk    vda     disk_ops
1447272240000000000 0       fra-influxdb01  disk    vda2        disk_ops

whereas:

> SELECT derivative(mean("read"), 1s) FROM "perfdb"."default".disk_ops WHERE time > now() - 10m GROUP BY time(2m), *
name: disk_ops
tags: host=fra-influxdb01, plugin=disk, plugin_instance=vda, type=disk_ops
time            derivative
----            ----------
1447272000000000000 0.15416666666666667
1447272120000000000 0.5291666666666667
1447272240000000000 0.775
1447272360000000000 0.7416666666666667
1447272480000000000 


name: disk_ops
tags: host=fra-influxdb01, plugin=disk, plugin_instance=vdb, type=disk_ops
time            derivative
----            ----------
1447272000000000000 1291.1125
1447272120000000000 1423.2291666666667
1447272240000000000 819.3125
1447272360000000000 80.9875
1447272480000000000 

The odd thing is that derivative points calculated by testcq1 include plugin_instance vda1 and vdb1, which are no longer being inserted into the source measurement - as if testcq1 has warped back in time.

@beckettsean beckettsean removed this from the 0.9.4 milestone Dec 4, 2015
@bbczeuz
Copy link

bbczeuz commented Dec 23, 2015

Workaround

Here is a small hack to circumvent the problem: Using cron instead of CQ to calculate the derivatives using a time window larger than the GROUP BY clause:

-bash-4.2$ sudo cat /etc/cron.d/influx_regular 
# Manually run derivatives
* * * * * root /etc/influxdb/update_derivatives.sh

Using the script:

-bash-4.2$ sudo cat /etc/influxdb/update_derivatives.sh 
#!/bin/bash
nowsec=`date +%s`
nowrfc0=`echo $nowsec | awk '{ print "@"$0-120 }' | xargs date -u +%Y-%m-%dT%H:%M:00Z -d`
nowrfc1=`echo $nowsec | awk '{ print "@"$0    }' | xargs date -u +%Y-%m-%dT%H:%M:00Z -d`

/bin/influx -database 'collectd_db' -username 'cronjob' -password 'SECRET' -execute "SELECT non_negative_derivative(mean(value))/60 AS \"dvalue\" INTO collectd_db.store30dd_test2.:MEASUREMENT FROM collectd_db.store1h./^interface_[rt]x/ WHERE time >= '$nowrfc0' AND time < '$nowrfc1' GROUP BY time(1m), host, instance, type, type_instance"

--> One needs to use date modulo 1m, otherwise the window will be shifted somewhat and not show the correct values.

Log

[query] 2015/12/23 14:04:05 SELECT non_negative_derivative(mean(value)) / 60.000 AS "dvalue" INTO collectd_db.store30dd_test2.:MEASUREMENT FROM collectd_db.store1h./^interface_[rt]x/ WHERE time >= '2015-12-23T13:02:00Z' AND time < '2015-12-23T13:04:00Z' GROUP BY time(1m), host, instance, type, type_instance

@beckettsean
Copy link
Contributor

@bbczeuz I see nothing wrong with your setup. I suspect that NON_NEGATIVE_DERIVATIVE has some regression issues when used in CQs. The query engine is being reworked which may help.

@sharang
Copy link

sharang commented Dec 31, 2015

I think the following code can fix this bug,
but I'm a freshman for golang and failed to build the entire project :(
anyone can help to fix it?

diff --git a/services/continuous_querier/service.go b/services/continuous_querier/service.go
index 9e218bd..8694bfc 100644
--- a/services/continuous_querier/service.go
+++ b/services/continuous_querier/service.go
@@ -274,7 +274,12 @@ func (s *Service) ExecuteContinuousQuery(dbi *meta.DatabaseInfo, cqi *meta.Conti
                startTime = startTime.Add(-interval)
        }

-       if err := cq.q.SetTimeRange(startTime, startTime.Add(interval)); err != nil {
+       if cq.q.HasDerivative() {
+               err := cq.q.SetTimeRange(startTime.Add(-interval), startTime.Add(interval))
+       } else {
+               err := cq.q.SetTimeRange(startTime, startTime.Add(interval))
+       }
+       if err != nil {
                s.Logger.Printf("error setting time range: %s\n", err)
        }

@@ -297,7 +302,12 @@ func (s *Service) ExecuteContinuousQuery(dbi *meta.DatabaseInfo, cqi *meta.Conti
                }
                newStartTime := startTime.Add(-interval)

-               if err := cq.q.SetTimeRange(newStartTime, startTime); err != nil {
+               if cq.q.HasDerivative() {
+                       err := cq.q.SetTimeRange(newStartTime.Add(-interval), startTime)
+               } else {
+                       err := cq.q.SetTimeRange(newStartTime, startTime)
+               }
+               if err != nil {
                        s.Logger.Printf("error setting time range: %s\n", err)
                        return err
                }

@dswarbrick
Copy link

FWIW, issue is still present in 0.10.0-nightly-6ccc416. CQ appears to run, but all values are zero.

CREATE CONTINUOUS QUERY io_sectors_write_delta ON perfdb BEGIN SELECT derivative(mean("write"), 1s) AS write_delta INTO io_sectors_delta FROM disk_octets GROUP BY time(1m), * END

Raw data is sampled every minute, so I also tried a CQ with GROUP BY time(2m), to no avail. Running the query interactively yields the expected, non-zero values.

@sharang
Copy link

sharang commented Jan 19, 2016

not fixed yet? this bug is still present in almost all branches. @beckettsean @otoolep

@beckettsean
Copy link
Contributor

@sharang it is not fixed yet, that's why the issue is still open.

sharang added a commit to sharang/influxdb that referenced this issue Feb 16, 2016
@sharang
Copy link

sharang commented Feb 16, 2016

@beckettsean fixed in v0.10.0 and submit a pull request #5698

sharang added a commit to sharang/influxdb that referenced this issue Feb 18, 2016
@kamsz
Copy link

kamsz commented Mar 23, 2016

Sadly still not fixed, discussion in #5733

jsternberg added a commit that referenced this issue Mar 23, 2016
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval.

This does not apply to raw queries yet.

Fixes #3247. Contributes to #5943.
jsternberg added a commit that referenced this issue Mar 23, 2016
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval.

This does not apply to raw queries yet.

Fixes #3247. Contributes to #5943.
jsternberg added a commit that referenced this issue Apr 4, 2016
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval. Null values
will still be discarded so if the interval before the one you are
querying is null, then it will be discarded like if it were in the
middle of the query. You can use `fill(0)` to fill in these values.

This does not apply to raw queries yet.

Also modified the derivative and difference aggregates to use the stream
iterator instead of the reduce slice iterator for space efficiency.

Fixes #3247. Contributes to #5943.
@jsternberg jsternberg self-assigned this Apr 4, 2016
jsternberg added a commit that referenced this issue Apr 8, 2016
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval. Null values
will still be discarded so if the interval before the one you are
querying is null, then it will be discarded like if it were in the
middle of the query. You can use `fill(0)` to fill in these values.

This does not apply to raw queries yet.

Also modified the derivative and difference aggregates to use the stream
iterator instead of the reduce slice iterator for space efficiency.

Fixes #3247. Contributes to #5943.
jsternberg added a commit that referenced this issue Apr 11, 2016
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval. Null values
will still be discarded so if the interval before the one you are
querying is null, then it will be discarded like if it were in the
middle of the query. You can use `fill(0)` to fill in these values.

This does not apply to raw queries yet.

Also modified the derivative and difference aggregates to use the stream
iterator instead of the reduce slice iterator for space efficiency.

Fixes #3247. Contributes to #5943.
@jsternberg jsternberg added this to the 0.13.0 milestone Apr 11, 2016
jsternberg added a commit that referenced this issue Apr 13, 2016
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval. Null values
will still be discarded so if the interval before the one you are
querying is null, then it will be discarded like if it were in the
middle of the query. You can use `fill(0)` to fill in these values.

This does not apply to raw queries yet.

Also modified the derivative and difference aggregates to use the stream
iterator instead of the reduce slice iterator for space efficiency.

Fixes #3247. Contributes to #5943.
jsternberg added a commit that referenced this issue Apr 15, 2016
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval. Null values
will still be discarded so if the interval before the one you are
querying is null, then it will be discarded like if it were in the
middle of the query. You can use `fill(0)` to fill in these values.

This does not apply to raw queries yet.

Also modified the derivative and difference aggregates to use the stream
iterator instead of the reduce slice iterator for space efficiency.

Fixes #3247. Contributes to #5943.
@paulstuart
Copy link

Is this in the nightly build, because I don't see this resolved in 0.13.0~n201604180800.

A newly created CQ with a derivative still has a time range generated for the grouping:

[query] 2016/04/18 09:11:00 SELECT derivative(mean(value)) AS value INTO network."default".btest01 FROM network."default".vsvrTotalRequests WHERE time >= '2016-04-18T16:10:00Z' AND time < '2016-04-18T16:11:00Z' GROUP BY time(1m), *

@jsternberg
Copy link
Contributor

@paulstuart the actual query won't be modified to change the times. The time change is done inside of the query engine itself and doesn't cause the condition to be rewritten.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests