Allow DISTINCT function to operate on tags #3880

TechniclabErdmann · 2015-08-28T08:00:57Z

I would like to have following feature:

Since the new release 0.9.3 tags are resulting as own columns if you use SELECT * FROM measurement

Currently, it's not possible to use commands on this columns. An example:

SELECT * FROM measurements

returns:

time tagA tagB value
xxx M N 0.3
xxx M O 0.4
xxx P R 0.2

I want to do a query like:

SELECT count(distinct(tagA)) FROM measurements

The result is

2 (M+P)

Anyone else need this feature?

The text was updated successfully, but these errors were encountered:

TechniclabErdmann · 2015-08-28T08:16:27Z

Idea also exist in #1815 at "Not currently implemented (might in the future, but no promises)"

beckettsean · 2015-08-28T19:14:58Z

@The-Nik are you asking for DISTINCT to support tags, or are you asking for the same functionality that SELECT * used to do? To get the similar SELECT * behavior, just include a GROUP BY *

TechniclabErdmann · 2015-08-31T12:33:49Z

I ask for DISTINCT to support tags for counting the different tag values 👍

beckettsean · 2015-09-01T20:36:10Z

@The-Nik you can use SHOW TAG VALUES plus some shell to get what you want:

$ influx -execute 'show tag values with key = a' -database mydb will print a list of all tag values associated with the key a on the database mydb. The output has two header lines, so if you pass it to wc -l just subtract 2 for the actual count:

$ influx -execute 'show tag values with key = a' -database mydb | wc -l and then subtract 2 from the output.

TechniclabErdmann · 2015-09-03T13:51:52Z

Yeah, this is a good way. But in my case, I need the number in Grafana in a single stat panel. In Grafana, there are some aggregate fuctions but no "count". So InfluxDB has to serve the exact value or I build something in my Grafana like a count-function ;-)

yee379 · 2015-09-30T18:36:38Z

+1; being able to quickly summarise the distinct number of datasets / tags directly from the influx SQL would be very handy; e.g. a grafana panel of the number of sensors I have reporting data over time.

edennis-sge · 2015-10-02T02:59:44Z

+1: I have a similar use case to yee379 in mind.

jakefoster · 2015-10-02T19:22:07Z

+1 on being able to count distinct tags.

I also feel like this speaks to the deeper issue of providing guidance on what should be a tag vs. a value. For a schema-less DB there's sure a lot of subtlety around defining your schema! :)

morganda · 2015-10-07T21:02:28Z

I would also like to see this. We use the cpu and load plugins which themselves don't explicitly provide the cpu count. They do provide each cpu as an "instance" tag e.g. a box with 32 cpus will collect metrics on 32 individual cpus tagging them with with their instance number. If I could get the total count of cpus from the tags, then the load numbers would have a little more context in our graphs Grafana and Chronograph.

JulienChampseix · 2015-11-19T14:39:20Z

+1, any progress on it ?

RobertAtomic · 2015-11-20T21:12:38Z

+1. this would makes things quite a bit simpler for some tasks at hand.

rafael84 · 2015-12-10T15:27:55Z

+1

ohmystack · 2016-01-27T03:33:29Z

+1 Really need this. Tag should also support a kind of normal SELECT search, which can be handled by Grafana.

thepolina · 2016-02-01T03:45:30Z

+1
Desperately need this

Guibod · 2016-02-06T10:49:20Z

+ 1

Anybody as a solution to count my hosts in Grafana through Influxdb query language ?

tomhallam · 2016-02-11T13:32:39Z

+1

selzoc · 2016-02-11T21:44:29Z

+1

mosoto · 2016-02-11T21:44:37Z

+1

davidgardner11 · 2016-02-11T22:16:41Z

++++++1 This would really help pulling some of our metrics much much easier

brumfb · 2016-02-11T22:22:18Z

+1

beckettsean · 2016-02-13T01:37:28Z

Perhaps a better way to accomplish the same goals: #5668

bbala-github · 2017-01-05T15:50:08Z

+1

orangle · 2017-01-06T05:30:51Z

+1

damarnez · 2017-01-20T15:54:03Z

+1

SimSimY · 2017-01-23T11:18:43Z

+1 (as described by @cnelissen)

lucadistefano · 2017-01-26T13:22:22Z

+1

JamesClonk · 2017-01-29T17:12:30Z

+1

jsternberg · 2017-01-29T17:53:37Z

Please leave +1 comments to adding a 👍 to the top post using a reaction. Leaving a message notifies everybody who is participating in this conversation and doesn't add anything to the discussion.

joriws · 2017-02-10T14:35:45Z

My need is to count number of unique tags with tag filters in Grafana. I can count fields but it gives incorrect answer. SHOW SERIES cannot be limited enough like return only one TAG which I could the distinct + count.

SELECT count("Incoming_Answers_2xxx") FROM "Realm-day" WHERE "INSTANCE" =~ /IPXDEA/ AND "ANSWERHOST" =~ /dtag/ AND "REALM" =~ /mcc2/ AND time > '2017-02-09T00:00:00Z';
name: Realm-day

time count
1486598400000000001 204

There I would like to have
SELECT count(distict(REALM)) FROM "Realm-day" WHERE "INSTANCE" =~ /IPXDEA/ AND "ANSWERHOST" =~ /dtag/ AND "REALM" =~ /mcc2/ AND time > '2017-02-09T00:00:00Z';
name: Realm-day

time count
1486598400000000001 34

Or someting like SHOW SERIES COUNT(DISTICT(TAG("REALM"))) FROM "Realm-day" WHERE "INSTANCE" =~ /IPXDEA/ AND "ANSWERHOST" =~ /dtag/ AND "REALM" =~ /mcc2/ AND time > '2017-02-09T00:00:00Z';

lpic10 · 2017-02-10T15:03:46Z

I managed to achieve this by using subqueries in influxdb 1.2

Eg. getting number of hosts from telegraf in grafana:

select count(tot) from (SELECT mean("used") as tot FROM "mem" WHERE $timeFilter GROUP BY "host" fill(null))

I'm using a measurement and a field I know it will always be present, it could be anything.

biker73 · 2017-02-10T15:13:17Z

If you get no data for a host for the time period won't it be missed ? I don't think this can be 100% relied upon ?

…

On 10 Feb 2017, at 16:04, lpic ***@***.***> wrote: I managed to achieve this by using subqueries in influxdb 1.2 Eg. getting number of hosts from telegraf in grafana: select count(tot) from (SELECT mean("used") as tot FROM "mem" WHERE $timeFilter GROUP BY "host" fill(null)) I'm using a measurement and a field I know it will always be present, it could be anything. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

lpic10 · 2017-02-10T15:25:06Z

Yes, but that's what I expected. If there is no data for a particular host during the selected query period I don't want to consider it. You can remove or maybe increase this time restriction in the WHERE clause, but then I guess the query can be quite slow.

biker73 · 2017-02-10T15:31:57Z

Ok so it's slightly different use case, I think most want a distinct list of tag key values regardless of time period. i.e. I'd want all time across a year of data for example potentially peta bytes of data where the series count might be 2m cardinality.

…

On 10 Feb 2017, at 16:25, lpic ***@***.***> wrote: Yes, but that's what I expected. If there is no data for a particular host during the selected query period I don't want to consider it. You can remove or maybe increase this time restriction in the WHERE clause, but then I guess the query can be quite slow. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

juddgaddie · 2017-03-10T12:47:44Z

+1

samjetski · 2017-03-15T03:20:04Z

In my case I needed to display the number of sensors which reported within a time interval (to indicate confidence of the mean). I managed to work around it with a subquery, but it's a bit filthy:

SELECT count("first") FROM (
  SELECT first("value") FROM "temperature"
  WHERE "topic" =~ /hub0[1234567]\/sensors\/\d+\/temperature/ AND $timeFilter
  GROUP BY time($interval), topic
)
WHERE $timeFilter
GROUP BY time($interval)

lin-credible · 2017-03-21T12:54:32Z

+1

ampersand8 · 2017-03-27T07:28:49Z

+1

jsternberg · 2017-03-28T15:33:06Z

I'm locking this to prevent further 👍 messages. We will be discussing this to figure out the feasibility of the request and create a timeline. Please push the "Subscribe" button instead to get any updates about this feature.

rbetts · 2017-10-27T20:00:10Z

WIP: there's some work completed to allow distinct / count against a tag key and tag value.

> select distinct(_tagKey) from httpd
name: httpd
time distinct
---- --------
0    bind
0    hostname

> select count(distinct(_tagKey)) from httpd
name: httpd
time count
---- -----
0    2

But there are still wrong answers being resolved:

> select _tagKey,_tagValue from tsm1_wal
name: tsm1_wal
time _tagKey         _tagValue
---- -------         ---------
0    database        _internal
0    engine          tsm1
0    hostname        nuc
0    id              1
0    path            /home/rbetts/.influxdb/data/_internal/monitor/1
0    retentionPolicy monitor
0    walPath         /home/rbetts/.influxdb/wal/_internal/monitor/1
> select _tagKey,_tagValue from tsm1_wal^C
> select _tagKey,_tagValue from tsm1_wal where _tagKey=engine
name: tsm1_wal
time _tagKey         _tagValue
---- -------         ---------
0    database        
0    engine          
0    hostname        
0    id              
0    path            
0    retentionPolicy 
0    walPath

desa changed the title SELECT tag columns of new SELECT * [feature request] SELECT tag columns of new SELECT * Aug 28, 2015

beckettsean changed the title [feature request] SELECT tag columns of new SELECT * [feature request] allow DISTINCT function to operate on tags Aug 31, 2015

beckettsean added this to the Longer term milestone Sep 1, 2015

beckettsean added the area/queries label Sep 1, 2015

beckettsean mentioned this issue Sep 3, 2015

[feature request] SHOW TAG VALUES should respect fields in the WHERE clause #3040

Closed

beckettsean mentioned this issue Sep 28, 2015

feature request: treat tags like fields in select? #4253

Closed

beckettsean mentioned this issue Oct 30, 2015

query parser should throw error on nested functions with tag arguments #4618

Closed

jackzampolin added the kind/feature-request label Nov 3, 2015

beckettsean mentioned this issue Feb 13, 2016

[feature request] SHOW TAG KEYS/VALUES should accept a WHERE time clause #5668

Open

jsternberg mentioned this issue Mar 16, 2017

[bug] simple query becomes extremely slow when nested #8120

Closed

influxdata locked and limited conversation to collaborators Mar 28, 2017

rbetts added this to the 1.4.0 milestone Oct 27, 2017

rbetts assigned e-dard Oct 27, 2017

rbetts added backlog/storage review labels Oct 27, 2017

rbetts added in progress and removed review labels Oct 27, 2017

rbetts changed the title ~~[feature request] allow DISTINCT function to operate on tags~~ Allow DISTINCT function to operate on tags Oct 27, 2017

rbetts removed the backlog/storage label Jan 30, 2018

nathanielc added the flux/triaged label Jan 30, 2018

dgnorton added the 1.x label Jan 7, 2019

Allow DISTINCT function to operate on tags #3880

Allow DISTINCT function to operate on tags #3880

Comments

TechniclabErdmann commented Aug 28, 2015 • edited by timhallinflux Loading

TechniclabErdmann commented Aug 28, 2015

beckettsean commented Aug 28, 2015

TechniclabErdmann commented Aug 31, 2015

beckettsean commented Sep 1, 2015

TechniclabErdmann commented Sep 3, 2015

yee379 commented Sep 30, 2015

edennis-sge commented Oct 2, 2015

jakefoster commented Oct 2, 2015

morganda commented Oct 7, 2015

JulienChampseix commented Nov 19, 2015

RobertAtomic commented Nov 20, 2015

rafael84 commented Dec 10, 2015

ohmystack commented Jan 27, 2016

thepolina commented Feb 1, 2016

Guibod commented Feb 6, 2016

tomhallam commented Feb 11, 2016

selzoc commented Feb 11, 2016

mosoto commented Feb 11, 2016

davidgardner11 commented Feb 11, 2016

brumfb commented Feb 11, 2016

beckettsean commented Feb 13, 2016

bbala-github commented Jan 5, 2017

orangle commented Jan 6, 2017

damarnez commented Jan 20, 2017

SimSimY commented Jan 23, 2017

lucadistefano commented Jan 26, 2017

JamesClonk commented Jan 29, 2017

jsternberg commented Jan 29, 2017

joriws commented Feb 10, 2017

SELECT count("Incoming_Answers_2xxx") FROM "Realm-day" WHERE "INSTANCE" =~ /IPXDEA/ AND "ANSWERHOST" =~ /dtag/ AND "REALM" =~ /mcc2/ AND time > '2017-02-09T00:00:00Z'; name: Realm-day

There I would like to have SELECT count(distict(REALM)) FROM "Realm-day" WHERE "INSTANCE" =~ /IPXDEA/ AND "ANSWERHOST" =~ /dtag/ AND "REALM" =~ /mcc2/ AND time > '2017-02-09T00:00:00Z'; name: Realm-day

lpic10 commented Feb 10, 2017

biker73 commented Feb 10, 2017 via email

lpic10 commented Feb 10, 2017

biker73 commented Feb 10, 2017 via email

juddgaddie commented Mar 10, 2017

samjetski commented Mar 15, 2017 • edited Loading

lin-credible commented Mar 21, 2017

ampersand8 commented Mar 27, 2017

jsternberg commented Mar 28, 2017

rbetts commented Oct 27, 2017

TechniclabErdmann commented Aug 28, 2015 •

edited by timhallinflux

Loading

SELECT count("Incoming_Answers_2xxx") FROM "Realm-day" WHERE "INSTANCE" =~ /IPXDEA/ AND "ANSWERHOST" =~ /dtag/ AND "REALM" =~ /mcc2/ AND time > '2017-02-09T00:00:00Z';
name: Realm-day

There I would like to have
SELECT count(distict(REALM)) FROM "Realm-day" WHERE "INSTANCE" =~ /IPXDEA/ AND "ANSWERHOST" =~ /dtag/ AND "REALM" =~ /mcc2/ AND time > '2017-02-09T00:00:00Z';
name: Realm-day

samjetski commented Mar 15, 2017 •

edited

Loading