Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow DISTINCT function to operate on tags #3880

Open
TechniclabErdmann opened this issue Aug 28, 2015 · 80 comments
Open

Allow DISTINCT function to operate on tags #3880

TechniclabErdmann opened this issue Aug 28, 2015 · 80 comments

Comments

@TechniclabErdmann
Copy link

TechniclabErdmann commented Aug 28, 2015

I would like to have following feature:

Since the new release 0.9.3 tags are resulting as own columns if you use SELECT * FROM measurement

Currently, it's not possible to use commands on this columns. An example:

SELECT * FROM measurements

returns:

time tagA tagB value
xxx M N 0.3
xxx M O 0.4
xxx P R 0.2

I want to do a query like:

SELECT count(distinct(tagA)) FROM measurements

The result is

2 (M+P)

Anyone else need this feature?

@TechniclabErdmann
Copy link
Author

Idea also exist in #1815 at "Not currently implemented (might in the future, but no promises)"

@desa desa changed the title SELECT tag columns of new SELECT * [feature request] SELECT tag columns of new SELECT * Aug 28, 2015
@beckettsean
Copy link
Contributor

@The-Nik are you asking for DISTINCT to support tags, or are you asking for the same functionality that SELECT * used to do? To get the similar SELECT * behavior, just include a GROUP BY *

@TechniclabErdmann
Copy link
Author

I ask for DISTINCT to support tags for counting the different tag values 👍

@beckettsean beckettsean changed the title [feature request] SELECT tag columns of new SELECT * [feature request] allow DISTINCT function to operate on tags Aug 31, 2015
@beckettsean beckettsean added this to the Longer term milestone Sep 1, 2015
@beckettsean
Copy link
Contributor

@The-Nik you can use SHOW TAG VALUES plus some shell to get what you want:

$ influx -execute 'show tag values with key = a' -database mydb will print a list of all tag values associated with the key a on the database mydb. The output has two header lines, so if you pass it to wc -l just subtract 2 for the actual count:

$ influx -execute 'show tag values with key = a' -database mydb | wc -l and then subtract 2 from the output.

@TechniclabErdmann
Copy link
Author

Yeah, this is a good way. But in my case, I need the number in Grafana in a single stat panel. In Grafana, there are some aggregate fuctions but no "count". So InfluxDB has to serve the exact value or I build something in my Grafana like a count-function ;-)

@yee379
Copy link

yee379 commented Sep 30, 2015

+1; being able to quickly summarise the distinct number of datasets / tags directly from the influx SQL would be very handy; e.g. a grafana panel of the number of sensors I have reporting data over time.

@edennis-sge
Copy link

+1: I have a similar use case to yee379 in mind.

@jakefoster
Copy link

+1 on being able to count distinct tags.

I also feel like this speaks to the deeper issue of providing guidance on what should be a tag vs. a value. For a schema-less DB there's sure a lot of subtlety around defining your schema! :)

@morganda
Copy link

morganda commented Oct 7, 2015

I would also like to see this. We use the cpu and load plugins which themselves don't explicitly provide the cpu count. They do provide each cpu as an "instance" tag e.g. a box with 32 cpus will collect metrics on 32 individual cpus tagging them with with their instance number. If I could get the total count of cpus from the tags, then the load numbers would have a little more context in our graphs Grafana and Chronograph.

@JulienChampseix
Copy link

+1, any progress on it ?

@RobertAtomic
Copy link

+1. this would makes things quite a bit simpler for some tasks at hand.

@rafael84
Copy link

+1

@ohmystack
Copy link

+1 Really need this. Tag should also support a kind of normal SELECT search, which can be handled by Grafana.

@thepolina
Copy link

+1
Desperately need this

@Guibod
Copy link

Guibod commented Feb 6, 2016

+ 1

Anybody as a solution to count my hosts in Grafana through Influxdb query language ?

@tomhallam
Copy link

+1

2 similar comments
@selzoc
Copy link

selzoc commented Feb 11, 2016

+1

@mosoto
Copy link

mosoto commented Feb 11, 2016

+1

@davidgardner11
Copy link

++++++1 This would really help pulling some of our metrics much much easier

@brumfb
Copy link

brumfb commented Feb 11, 2016

+1

@beckettsean
Copy link
Contributor

Perhaps a better way to accomplish the same goals: #5668

@bbala-github
Copy link

+1

2 similar comments
@orangle
Copy link

orangle commented Jan 6, 2017

+1

@damarnez
Copy link

+1

@SimSimY
Copy link

SimSimY commented Jan 23, 2017

+1 (as described by @cnelissen)

@lucadistefano
Copy link

+1

1 similar comment
@JamesClonk
Copy link

+1

@jsternberg
Copy link
Contributor

Please leave +1 comments to adding a 👍 to the top post using a reaction. Leaving a message notifies everybody who is participating in this conversation and doesn't add anything to the discussion.

@joriws
Copy link

joriws commented Feb 10, 2017

My need is to count number of unique tags with tag filters in Grafana. I can count fields but it gives incorrect answer. SHOW SERIES cannot be limited enough like return only one TAG which I could the distinct + count.

SELECT count("Incoming_Answers_2xxx") FROM "Realm-day" WHERE "INSTANCE" =~ /IPXDEA/ AND "ANSWERHOST" =~ /dtag/ AND "REALM" =~ /mcc2/ AND time > '2017-02-09T00:00:00Z';
name: Realm-day

time count
1486598400000000001 204

There I would like to have
SELECT count(distict(REALM)) FROM "Realm-day" WHERE "INSTANCE" =~ /IPXDEA/ AND "ANSWERHOST" =~ /dtag/ AND "REALM" =~ /mcc2/ AND time > '2017-02-09T00:00:00Z';
name: Realm-day

time count
1486598400000000001 34

Or someting like SHOW SERIES COUNT(DISTICT(TAG("REALM"))) FROM "Realm-day" WHERE "INSTANCE" =~ /IPXDEA/ AND "ANSWERHOST" =~ /dtag/ AND "REALM" =~ /mcc2/ AND time > '2017-02-09T00:00:00Z';

@lpic10
Copy link

lpic10 commented Feb 10, 2017

I managed to achieve this by using subqueries in influxdb 1.2

Eg. getting number of hosts from telegraf in grafana:

select count(tot) from (SELECT mean("used") as tot FROM "mem" WHERE $timeFilter GROUP BY "host" fill(null))

I'm using a measurement and a field I know it will always be present, it could be anything.

@biker73
Copy link

biker73 commented Feb 10, 2017 via email

@lpic10
Copy link

lpic10 commented Feb 10, 2017

Yes, but that's what I expected. If there is no data for a particular host during the selected query period I don't want to consider it. You can remove or maybe increase this time restriction in the WHERE clause, but then I guess the query can be quite slow.

@biker73
Copy link

biker73 commented Feb 10, 2017 via email

@juddgaddie
Copy link

+1

@samjetski
Copy link

samjetski commented Mar 15, 2017

In my case I needed to display the number of sensors which reported within a time interval (to indicate confidence of the mean). I managed to work around it with a subquery, but it's a bit filthy:

SELECT count("first") FROM (
  SELECT first("value") FROM "temperature"
  WHERE "topic" =~ /hub0[1234567]\/sensors\/\d+\/temperature/ AND $timeFilter
  GROUP BY time($interval), topic
)
WHERE $timeFilter
GROUP BY time($interval)

@lin-credible
Copy link

+1

1 similar comment
@ampersand8
Copy link

+1

@jsternberg
Copy link
Contributor

I'm locking this to prevent further 👍 messages. We will be discussing this to figure out the feasibility of the request and create a timeline. Please push the "Subscribe" button instead to get any updates about this feature.

@influxdata influxdata locked and limited conversation to collaborators Mar 28, 2017
@rbetts rbetts added this to the 1.4.0 milestone Oct 27, 2017
@rbetts
Copy link
Contributor

rbetts commented Oct 27, 2017

WIP: there's some work completed to allow distinct / count against a tag key and tag value.

> select distinct(_tagKey) from httpd
name: httpd
time distinct
---- --------
0    bind
0    hostname

> select count(distinct(_tagKey)) from httpd
name: httpd
time count
---- -----
0    2

But there are still wrong answers being resolved:

> select _tagKey,_tagValue from tsm1_wal
name: tsm1_wal
time _tagKey         _tagValue
---- -------         ---------
0    database        _internal
0    engine          tsm1
0    hostname        nuc
0    id              1
0    path            /home/rbetts/.influxdb/data/_internal/monitor/1
0    retentionPolicy monitor
0    walPath         /home/rbetts/.influxdb/wal/_internal/monitor/1
> select _tagKey,_tagValue from tsm1_wal^C
> select _tagKey,_tagValue from tsm1_wal where _tagKey=engine
name: tsm1_wal
time _tagKey         _tagValue
---- -------         ---------
0    database        
0    engine          
0    hostname        
0    id              
0    path            
0    retentionPolicy 
0    walPath    

@rbetts rbetts added in progress and removed review labels Oct 27, 2017
@rbetts rbetts changed the title [feature request] allow DISTINCT function to operate on tags Allow DISTINCT function to operate on tags Oct 27, 2017
@dgnorton dgnorton added the 1.x label Jan 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests