Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] SHOW TAG KEYS/VALUES should accept a WHERE time clause #5668

Open
beckettsean opened this issue Feb 13, 2016 · 76 comments

Comments

@beckettsean
Copy link
Contributor

commented Feb 13, 2016

Rather than update DISTINCT() to accept tags (#3880), we could accomplish much the same goal if the SHOW TAG VALUES query respected a WHERE time clause. SHOW TAG VALUES already returns the deduplicated set of values.

It is not trivial, since the query can no longer be answered directly from the metastore, but it seems easier than making functions accept tags, which is a totally new code path.

@furnacification

This comment has been minimized.

Copy link

commented Feb 15, 2016

+1 I tried to do this today. It would be a great help.

@groblus

This comment has been minimized.

Copy link

commented Feb 15, 2016

Perhaps it's not the best issue to comment on the case but one of the things that I find stange is the way that tags are treated. Quote from the docs:

You don’t need to have tags in your data structure, but it’s generally a good idea to make use of them because, unlike fields, tags are indexed. This means that queries on tags are faster and that tags are ideal for storing commonly-queried metadata.

So if you think of tags as fields on steroids - indexed fields, then having normal, field like query operators running on them seems natural. Creating a subset of SELECT that works on tags and is called SHOW is nowhere near a generic solution, it rather complicates stuff for the users. There will be cases when someone needs some sort of funcitonality that's already there but works on fields only so even more will be forced onto the SHOW command mirroring SELECT behaviour. I'm no pro to evaluate the difficultness of the task but from the client side I see a lot of sense in it.
From my point of view - the docs should state that tags are not ideal for storing commonly-queried metadata - the I ideal for using group by to group real data that is stored in fields. The way tags are used in InfluxQL makes them a second class data for me.

@beckettsean

This comment has been minimized.

Copy link
Contributor Author

commented Feb 16, 2016

@groblus it is not valid to think of tags as related to fields. They are completely different concepts. All tag keys and values are stored only as strings, and only once, in the index, for each series. Fields are stored as float, int, bool, or string, and are individually stored with each point, and are unindexed.

SHOW queries operate on metadata like databases, measurements, and tags. SELECT queries operate on the metrics themselves, which are the fields. Tag values may also be SELECTed, but only to differentiate the field results.

@AxelVoitier

This comment has been minimized.

Copy link

commented Feb 19, 2016

I would like to see this as well. Here is my use case.

With telegraf and procstat plugin a want to monitor a set of processes that all have the same name (but different pid, obvisouly). Then I plot the data on grafana dashboards, using template features.

With grafana template features I can create a variable based on a query. For instance, I have this template variable to filter the set of processes by the process name I want:

$pname = SHOW TAG VALUES FROM "procstat" WITH KEY = "process_name"

So far so good.
Now, I want to create a second template variable for pids that match pname. Unfortunately,
I cannot do something like this:

$pid = SHOW TAG VALUES FROM "procstat" WITH KEY = "pid" WHERE process_name=$pname

Although, and strangely, the query is considered valid if I try it in the web interface of influxdb. But it returns nothing. (of course in the web interface of influxdb I replaced $pname with a string)
Without the where clause I am left with the full list of pids for all monitored processes, and it is hard to know which ones are the ones corresponding to the particular set of processes I want to monitor.

Actually, same would happen if I wanted to pre-filter by the host key. Fortunately I only have one machine to monitor at the moment...

EDIT: Actually it is even worst. If the template variable contains all pids, then the dashboard gets repeat even if there is no data. So, it's kind of grafana issue, but the most natural way to solve it would be that the pid variable gets populated only with values valid for the current filter. Which is possible only with a where clause for SHOW TAG VALUES.

@prune998

This comment has been minimized.

Copy link

commented Mar 3, 2016

+1. same issue here...
The other solution would be to replicate all TAGS as VALUES. This is not efficient and will add LOTS of data inside series, almost for nothing. But doing it this way I could :

SELECT value_pid from procstat where process_name=$pname

OR, and this is also related to some other opened issues, be able to select tags directly in the result set :

SELECT tag_pid from procstat where process_name=$pname

I relaly think the later would be the better solution.
Of course, a where clause of the SHOW TAG VALUES is also good...

@ChrisGute

This comment has been minimized.

Copy link

commented Apr 1, 2016

+1

3 similar comments
@cnelissen

This comment has been minimized.

Copy link

commented Apr 11, 2016

+1

@alfredocambera

This comment has been minimized.

Copy link

commented Apr 14, 2016

+1

@zarrigoni

This comment has been minimized.

Copy link

commented Jun 27, 2016

+1

@daveselan

This comment has been minimized.

Copy link

commented Jul 7, 2016

+1 on being able to restrict tag values returned based on a time range.

This feature would be extremely useful when using Grafana templating variables. In some cases we can have 100s of values listed when only a few might pertain to time period being viewed.

@MatMeredith

This comment has been minimized.

Copy link

commented Aug 3, 2016

+1. Really need this for Grafana templates. Can't deploy without it.

@seb-koch

This comment has been minimized.

Copy link

commented Sep 10, 2016

+1 Same use case here, grafana dashboards and plenty of customer demand

@furnacification

This comment has been minimized.

Copy link

commented Sep 13, 2016

I've found a workaround that seems to work in my environment.

Create a short retention policy and then create a continuous query that groups by the tags you want to use in the grafana templating.

CREATE RETENTION POLICY "lookup" ON "prod" DURATION 2d REPLICATION 1

I have some extra info that i can group by and push it into a fake measurement called host_info.

CREATE CONTINUOUS QUERY "lookupquery" ON "prod" BEGIN SELECT mean(value) as value INTO "lookup"."host_info" FROM "cpuload" where time > now() - 1h GROUP BY time(1h), host, team, status, location END;

I dont care what the values are, its just about getting a distinct list of hosts that are up and pushing data in the last few days. I have this in my templates in grafana as something like this:

SHOW TAG VALUES FROM "lookup"."host_info" WITH KEY = "location"

I'm just going to reuse the templates in the users dashboards and a few variations of it for other host info like disk etc...

It works for me, hope others can find something similar!

@patrickvallet

This comment has been minimized.

Copy link

commented Sep 21, 2016

+1

@smeapng

This comment has been minimized.

Copy link

commented Nov 15, 2016

+1

2 similar comments
@martinschuberts

This comment has been minimized.

Copy link

commented Nov 16, 2016

+1

@misko321

This comment has been minimized.

Copy link

commented Dec 21, 2016

+1

@abadyan-vonage

This comment has been minimized.

Copy link

commented Jan 4, 2017

+1 Need this for grafana templates as well
@furnacification - Thanks a lot, that worked! Notice that you don't need the where clause for CQ.

@mvadu

This comment has been minimized.

Copy link
Contributor

commented Jan 10, 2017

+1 Main use case is for grafana template variables. Showing a drop down of 300 entries when the time period only has 3 unique values is no fun!!

@0111sandesh

This comment has been minimized.

Copy link

commented Jan 19, 2017

I am running into the same issue when I try to use grafana template variables. Any update on having a WHERE time clause for SHOW TAG VALUES?

@daveselan

This comment has been minimized.

Copy link

commented Jan 19, 2017

This restriction is definetly a big problem for us, especially where some of our tag values can increase over time, and managing the size with retention policies isn't always the answer.

One workaround we had thought about might be the idea of using output from SELECT value,tag WHERE $timeFilter and discarding everything other than the desired tags? Probably horribly expensive, especially seeing as it would be a disk operation instead of access from memory indexes, but a possibility if the selected time series wasn't too big?

@magnuspwhite

This comment has been minimized.

Copy link

commented Jan 24, 2017

I was shocked when I found out this wasn't a feature when using Grafana template variables.

For my usage we have a high number of hosts and a high turnover of hostnames. Even with a retention policy of 30 days, I have some query strings in Grafana using template variables with a filter which exceeds 250k characters. Instead only handful of hosts need to be selected for that period.

This is a feature with should be added.

@opavader

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2017

@beckettsean I dont think its that expensive to implement for the lower bound case i.e where time > now() - current_time_range . For each insert the metadata store is anyway needed to be checked if a new tag value has been added. Instead it can update a last_updated counter if the tag exist, otherwise add a new entry with the current timestamp. This it self will be enough for almost all my use cases of initialization of the live graphs. Similarly a first_updated can be used, which too will suffice for most cases as almost always a tagged resource send the metric without interruption (big gaps) .
I will be happy to give a pull request if agreed on this approach.

@zezuladp

This comment has been minimized.

Copy link

commented Feb 14, 2017

Here's an example of my workaround. Just wrote a script that updates the templated variables to be the ones that I wanted within a specified timeframe
https://gist.github.com/zezuladp/9159a20b409a9f5f5a9fff585991626b

@smaresca

This comment has been minimized.

Copy link

commented Feb 19, 2018

+1

1 similar comment
@kandafir

This comment has been minimized.

Copy link

commented Feb 21, 2018

+1

@lexmag

This comment has been minimized.

Copy link

commented Jul 31, 2018

Rather than +1 in comments it's way better to put 👍 under the initial comment.
There are many people who have subscribed to this issue to get a notification when it will be closed (hopefully implemented), and those +1 only create unnecessary noise for everyone.

@levonet

This comment has been minimized.

Copy link

commented Aug 16, 2018

+1. Really need this for Grafana variables. Or need another feature – remove old tags.

@leobenkel

This comment has been minimized.

Copy link

commented Aug 17, 2018

any update on this one ?
I have a variable of type Query with a query like:
SHOW TAG VALUES FROM "my-metric" WITH KEY = "my-tag"
But it returns all possible values instead of the one present in the time period selected.

@pawel-kontakt

This comment has been minimized.

Copy link

commented Aug 30, 2018

+1

4 similar comments
@leeweir

This comment has been minimized.

Copy link

commented Sep 5, 2018

+1

@ipandula

This comment has been minimized.

Copy link

commented Sep 5, 2018

+1

@miroslawjaworski

This comment has been minimized.

Copy link

commented Sep 10, 2018

+1

@adamantivm

This comment has been minimized.

Copy link

commented Nov 13, 2018

+1

@itshikanov

This comment has been minimized.

Copy link

commented Nov 19, 2018

For Grafana variables the answer is:
select host,value from nginx_value WHERE time > now() - 10m

@heshixiong

This comment has been minimized.

Copy link

commented Dec 19, 2018

  • 1

@dgnorton dgnorton added the 1.x label Jan 7, 2019

@joway

This comment has been minimized.

Copy link

commented Jan 15, 2019

@itshikanov It works, thanks!

@BushnevYuri

This comment has been minimized.

Copy link

commented Feb 28, 2019

@joway @itshikanov, in this case, query returns array and grafana UI fails with "templating" error.
I tried like this:
select "group","value" from "http_req_duration" WHERE time > now() - 90m
Error in response: {"error":"error parsing query: found GROUP, expected identifier, string, number, bool at line 1, char 8"}

Which version of grafana do you use?

@BushnevYuri

This comment has been minimized.

Copy link

commented Feb 28, 2019

Very much surprised that this issue is not solved yet... Completely kills our use case to use influxdb.

@BushnevYuri

This comment has been minimized.

Copy link

commented Feb 28, 2019

Found workaround for all of those who are looking to use tags from InfluxDB with respect of timeFilter for Grafana variables templating.

https://stackoverflow.com/questions/54931888/query-tags-from-influxdb-with-respect-of-timefilter-for-grafana-variables-templa/54933003#54933003

@n1nj4888

This comment has been minimized.

Copy link

commented May 10, 2019

Thanks @BushnevYuri,

Can I use this to convert a current Variable "$host" with the following configuration to add in the timeFilter i.e. to only return the $host variables which are present in the Dashboard's Time Filter timeframe?

SHOW TAG VALUES FROM "snmp.hosts" WITH KEY = "agent_host"

It's not clear to me how I'd need to construct your suggested SELECT suggestion based on the above SHOW query?

select DISTINCT("name") from (select "name","value" from "http_req_duration" WHERE $timeFilter AND "group" =~ /^$Group$/ AND "TestName" =~ /^$TestName$/)

@n1nj4888

This comment has been minimized.

Copy link

commented Jun 9, 2019

Hi @BushnevYuri,

Any chance you could provide an example as to how to use your workaround SELECT query to get the following SHOW TAGS query to respect the timeFilter?

SHOW TAG VALUES FROM "snmp.hosts" WITH KEY = "agent_host"

Ultimately, I don’t see how I use the TestName and Group parameters in your example query for my SHOW TAGS query above?

@ekle

This comment has been minimized.

Copy link

commented Jun 21, 2019

The following works for me in grafana as a variable:
SELECT "host" FROM ( SELECT "value", "host" FROM "services"."default"."test" WHERE time > now()-1h )
it does not do a distinct but this is done by grafana.
host is the tag I want to limit by time.
value just needs to contain any value.
Likely it is not very efficient, but at least it kind of works.

@n1nj4888

This comment has been minimized.

Copy link

commented Jun 21, 2019

Thanks for the clarifications. I’m now able to filter the tag keys by $timeFilter - The work around is good enough but surprised that InfluxDB cant handle the time filter in a standard show tags query.

@cstroe

This comment has been minimized.

Copy link

commented Jul 24, 2019

As an update to @ekle 's workaround, I was able to use distinct() with his query:

SELECT distinct("host") FROM ( SELECT "bytes.committed","host" FROM "jvm.memory" WHERE time > now()-1h )

To make the query respects the currently selected Grafana time window, you use the $timeFilter:
SELECT distinct("host") FROM ( SELECT "bytes.committed","host" FROM "jvm.memory" WHERE $timeFilter )

@brtknr

This comment has been minimized.

Copy link

commented Jul 26, 2019

I am using InfluxDB 1.7.1 and weirdly, I am able to use time filter with SHOW TAG KEYS/VALUES queries as this:

SHOW TAG KEYS ON monasca WHERE (_name = 'tempest-1100328991') AND (_tenant_id =
 '7f8b19504b7646c898025fbc0a0e89e6' AND _region = 'RegionOne' AND time >= 1564148465s AND time < 1565012465s)

SHOW TAG VALUES ON monasca WITH KEY = name_1 WHERE (_tenant_id = '7f8b19504b764
6c898025fbc0a0e89e6' AND _region = 'RegionOne' AND time >= 1564148465s AND time < 1565012465s) AND (_tagKey = 'name_1')

However, the peculiarity is that the finest granularity is 1 day... I can't filter tags on an hourly or a minutely basis... this makes sense from a performance perspective but I cannot find this behaviour documented anywhere. Would anyone be able to shed a light on this?

@davidnewhall

This comment has been minimized.

Copy link

commented Aug 16, 2019

@brtknr - You cannot filter SHOW TAG commands using any fields, including time. Reversing the > to < produces the same results for me; the time filter is effectively ignored. Using InfluxDB 1.7.7.

@brtknr

This comment has been minimized.

Copy link

commented Aug 16, 2019

Hi @davidnewhall, here is the answer: https://community.influxdata.com/t/undocumented-behaviour-with-show-tag-keys-values/10805/6
The short answer is that the accuracy of your search is a minimum of your shardGroupDuration.

@davidnewhall

This comment has been minimized.

Copy link

commented Aug 17, 2019

Interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.