Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mathematics across measurements #3552

Open
srfraser opened this issue Aug 4, 2015 · 88 comments
Open

Mathematics across measurements #3552

srfraser opened this issue Aug 4, 2015 · 88 comments
Labels
1.x area/influxql Issues related to InfluxQL query language flux/triaged kind/feature-request

Comments

@srfraser
Copy link

srfraser commented Aug 4, 2015

Apologies if this is a duplicate, I had a look and couldn't see a relevant issue.

I can see from the documentation how to select from multiple measurements (although it calls them series, still, at https://influxdb.com/docs/v0.9/query_language/data_exploration.html )

For example, with data inserted by telegraf, you can do:
select * from disk_used,disk_total where host = 'myhostname' and path = '/'

How would you express that as a percentage? I've tried variations of the following, and none seem to work:

select disk_used.value/disk_total.value from disk_used, disk_total where host = 'myhostname' and path='/'

The "mydb"."retentionpolicy"."measurement" syntax doesn't work there, either.

Is it a good idea to add aggregation functions for cases like diff(value1, value2) from m1, m2 and divide(value, value) from m1, m2, or should the arithmetic operators be working?

Also, I noticed when experimenting that it's also not possible to divide one derivative by another. For example, if I have two counters, bytes transferred and api calls made - both of which are constantly going up - how would you calculate the mean bytes per api call?

@srfraser srfraser changed the title Mathematics across measurements Mathematics across measurements with 0.9.2 Aug 4, 2015
@srfraser srfraser changed the title Mathematics across measurements with 0.9.2 [0.9.2] Mathematics across measurements Aug 4, 2015
@beckettsean beckettsean changed the title [0.9.2] Mathematics across measurements [feature request] Mathematics across measurements Aug 4, 2015
@beckettsean beckettsean added this to the Longer term milestone Aug 25, 2015
@hexluthor
Copy link

👍
I work with sensor networks and find this limitation frustrating. For example, I wish to compute weighted averages like this:
SELECT sum(oxygen_percentage.value * flow_rate.value) / sum(flow_rate.value) FROM oxygen_percentage, flow_rate WHERE site_id = '3'
But InfluxDB returns nothing. Even SELECT oxygen_percentage.value FROM oxygen_percentage doesn't work. Using 0.9.3-rc1 master (0163945).

@ghost
Copy link

ghost commented Sep 8, 2015

Same here. I'd also like to calculate values across different series like:

select * from mysql_value where type='mysql_commands' and type_instance='show_tables' +
select * from mysql_value where type='mysql_commands' and type_instance='show_databases'

Cheers,
Szop

@bbinet
Copy link
Contributor

bbinet commented Sep 18, 2015

same as @hexluthor, I feel this is very limiting: if we need to correlate data coming from various sensors we currently have to write all data as fields in the same measurement... But would it be a good idea in terms of data structure to have a single measurement with more than 50 fields? Will it impact query performance?
And this sensor data does not always get logged with the same sampling frequency, so this is not always possible to combine data in the same measurement if we want to keep data with high sampling frequency.

I'm not comfortable with distorting the data structure (dropping natural data organization) because of technical limitations.
In the sysadmin world, it would be like putting all the cpu, ram, disk, and apache response time metrics in the same measurement for the sole purpose of being able to correlate apache response time with cpu, ram, or disk metrics.

@bbinet
Copy link
Contributor

bbinet commented Sep 18, 2015

Also, what are the actual technical issues that prevent InfluxDB to support queries with simple math operations across measurements?

@corylanou
Copy link
Contributor

This was recently changed to a "feature request" so that means it will be evaluated in future releases if we are going to add it or not. There are a couple work arounds right now, and that is to save a calculated field when you write data, such as storing another field for oxygen_percentage.value * flow_rate.value. I understand this isn't ideal, but it might get you moving forward.

Otherwise, I think these requests are sane, but they will take some work. I believe sum() / sum() is supposed to work already, but I thought I remember seeing a bug about math still not behaving properly.

@bbinet
Copy link
Contributor

bbinet commented Sep 21, 2015

@corylanou about the work around you're talking about: the oxygen_percentage.value * flow_rate.value field should be created when new points are created or is there a way to compute the calculation afterwards in a continuous query?

@bbinet bbinet mentioned this issue Sep 21, 2015
@corylanou
Copy link
Contributor

Yes, I believe you should be able to do that in a CQ and then you can select from that retention policy.

@srfraser
Copy link
Author

How can we do it in a continuous query? I thought the syntax of normal queries and continuous ones was the same, so if it's possible in one, it should be possible in the other.

@corylanou
Copy link
Contributor

instead of sum(value & value), you are doing a CQ with select val * val as newval and then you can select sum(newval) from your new data that was calculated from a CQ.

@srfraser
Copy link
Author

And that works across measurements? Using @bbinet's example, this would work?

select oxygen_percentage.value * flow_rate.value as newmeasurement from oxygen_percentage, flow_rate 

@corylanou
Copy link
Contributor

Hmm, it should, but I just tried this basic test and it crashed the server 😢

> create database math
> use math
Using database math
> insert mul a=1,b=2
> select * from mul
name: mul
---------
time                            a       b
2015-09-21T12:17:36.377625368Z  1       2

> select a*b as c from mul
ERR: Get http://localhost:8086/query?db=math&q=select+a%2Ab+as+c+from+mul: EOF

I logged another issue here: #4183

@srfraser
Copy link
Author

and that was only from one measurement :)

@corylanou
Copy link
Contributor

Hopefully this is a central bug in our post-processing that when fixed will fix all of it. I'll see if I can fix it today. It appears to be just a bad reference while putting the math together, so it might be a quick fix.

@bbinet
Copy link
Contributor

bbinet commented Sep 21, 2015

Thanks @corylanou, but as @srfraser said in his previous comment, your example comes from the same measurement: is it supposed to work with multiple measurements?
I thought that queries running as continuous were the same as normal queries so if maths does not work across multiple measurements in a normal query, I thought it won't work neither in a continuous query.
Is it wrong?

@corylanou
Copy link
Contributor

Ah, yes, I keep forgetting we don't calculate across values. Although in a simple query we should support this. The biggest problem is type checking and overflow so that when you take an unsigned int and multiple it by a float, etc. that we are able to properly convert to a common type for the math, and not overflow either.

@bbinet
Copy link
Contributor

bbinet commented Sep 22, 2015

Ok, I see.
That would be great if cross measurements calculation could be possible at least for series which shares the same type (since no type conversion would be needed)

@drmclean
Copy link

drmclean commented Oct 7, 2015

+1 We REALLY want this for our use-case!

@thunderstumpges
Copy link

@malnor
Copy link

malnor commented Nov 26, 2015

+1, really missing this feature.

@Millnert
Copy link

+1

2 similar comments
@alintuhut
Copy link

+1

@clongbottom
Copy link

+1

@xaniasd
Copy link

xaniasd commented Jan 27, 2016

👍

@inselbuch
Copy link

Opening a feature request kicks off a discussion.
Requests may be closed if we're not actively planning to work on them.

Proposal: Implement SQL Joins in the Influx Query Language
Current behavior: Not supported.
Desired behavior: Supported.

Use case:

When requesting data from InfluxDB it would be very useful to combine metadata, configuration data, etc., from one "table" with the time-series data. For example:

assets (measurement)
time (actually not used in this table)
id (tag) = cryptic internal identifier
friendly (field) = character string
port (field) = number (TCP/IP port number used by the device)

i.e.,
insert assets,id=y268938rjnau3 friendly='Asset1',port=64200

rundata (measurement)
time
id (tag) = cryptic internal identifier
temperature (field)
pressure (field)

This works great:
select last(temperature) from rundata group by mac

But this is what I really want:

Timestamp Machine Temperature

10/05/2016 8:04:06pm C2475 1675.4
10/02/2016 9:02:11am C7524 850.5

That might be done like this:

select
c.time as "Timestamp", a.friendly as "Machine",last(c.temperature) as "Temperature"
from
rundata c,assets a
where
c.id = a.id
group by
c.mac

@daviesalex
Copy link
Contributor

daviesalex commented Nov 2, 2016

I dont know if this should be a separate issue, but we would really like to be able to do simple stuff like this:

SELECT non_negative_derivative(last(PortXmitData * 4), $interval) from ibstats WHERE "interface" =~ /ib.*/ AND "host" =~ /$hostname/ and $timeFilter group by time($granularity),interface,host fill(none)

Note the PortXmitData * 4; this blows up today. The reason for this specific reason is that the metrics from IB equipment are returned as "octets divided by 4” (¼ of the actual number of Bytes)" which Grafana does not know how to deal with (because its a stupid unit). However, there are other cases where doing this sort of simple arithmetic is very useful, and hopefully this is a fairly simple feature to implement.

One other feature that is commonly requested is the ability to plot a metric against the mean/95%ile for each item in a GROUP BY. Any method to achieve that is fine by us.

@inselbuch
Copy link

I understand your issue… makes sense… the same issue in my opinion.
But I don’t see tx_bytes*4 in your query…

I dont know if this should be a separate issue, but we would really like to be able to do simple stuff like this:

SELECT non_negative_derivative(last(PortXmitData), $interval) from ibstats WHERE ("interface" =~ /ib./ OR "interface" =~ /p./) AND "host" =~ /$hostname/ and $timeFilter group by time($granularity),interface,host fill(none)

Note the tx_bytes * 4; this blows up today. The reason for this specific reason is that the metrics from IB equipment are returned as "octets divided by 4” (¼ of the actual number of Bytes)" which Grafana does not know how to deal with (because its a stupid unit). However, there are other cases where doing this sort of simple arithmetic is very useful, and hopefully this is a fairly simple feature to implement.

One other feature that is commonly requested is the ability to plot a metric against the mean/95%ile for each item in a GROUP BY. Any method to achieve that is fine by us.


You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com//issues/3552#issuecomment-257987465, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVlReR3OTMcZ_DcCy4OJy3ea4qU58p5Pks5q6PFqgaJpZM4FljNk.

@daviesalex
Copy link
Contributor

@inselbuch my apologies, I fixed the example query.

The problem is that this works:
non_negative_derivative(last(PortXmitData), $interval)
And this does not:
non_negative_derivative(last(PortXmitData * 4), $interval)

@espiegel
Copy link

espiegel commented Nov 3, 2016

+1

Edit: +1ed the OP

@cattt84
Copy link

cattt84 commented Nov 7, 2016

+1

@joshzitting
Copy link

joshzitting commented Dec 6, 2016

I am currently in the process of trying to switch from graphite to influx with grafana as our front end.. I currently have queries like this for graphite but I havent found a way to convert them to influx yet..

asPercent(reporting1_dev.memory.used,reporting1_dev.memory.total)
asPercent(reporting1_dev.memory.buffers,reporting1_dev.memory.total)

Also like this

asPercent(nonNegativeDerivative(reporting1_dev.cpu.total.iowait),nonNegativeDerivative(sumSeries(reporting1_dev.cpu.total.*)))
asPercent(sumSeries(nonNegativeDerivative(reporting1_dev.cpu.total.{guest,iowait,nice,steal,irq,softirq})),nonNegativeDerivative(sumSeries(reporting1_dev.cpu.total.*)))

Any help would be great!!!
Thanks!
Josh

So I found that I if I am doing math all of the info has to be in the same table.

@ikkaro
Copy link

ikkaro commented Dec 16, 2016

+1 we need to do some maths between measurements

@lpic10
Copy link

lpic10 commented Jan 9, 2017

This is a huge feature missing from InfluxDB, making it much less powerful than Prometheus.

@actionjax
Copy link

+1

2 similar comments
@trondvh
Copy link

trondvh commented Jan 18, 2017

+1

@andrewpuch
Copy link

+1

@joshzitting
Copy link

I think the point has been stated with all of the +1s... This thread should be locked until there is progress on it.

@tblok
Copy link

tblok commented Feb 10, 2017

+1

@jsternberg
Copy link
Contributor

jsternberg commented Feb 10, 2017

I'm locking this again. While that means you won't be able to add your 👍 reaction, I think there are enough of them that we're well aware people want this feature. We do want to hear any resources that may be useful in terms of implementing this. You can look in the commit log to find my email. If you want to wait for comments regarding this issue, please use the "Subscribe" button on the issue instead of responding to the issue.

Thank you.

@influxdata influxdata locked and limited conversation to collaborators Feb 10, 2017
@jsternberg jsternberg removed their assignment Apr 28, 2017
@rbetts rbetts removed this from the Longer term milestone Oct 27, 2017
@rbetts rbetts changed the title [feature request] Mathematics across measurements Mathematics across measurements Oct 27, 2017
@dgnorton dgnorton added the 1.x label Jan 7, 2019
@influxdata influxdata deleted a comment from nathaniel May 3, 2021
@influxdata influxdata deleted a comment from pauldix May 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
1.x area/influxql Issues related to InfluxQL query language flux/triaged kind/feature-request
Projects
None yet
Development

No branches or pull requests