Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] SHOW TAG KEYS/VALUES should accept a WHERE time clause #5668

Open
beckettsean opened this issue Feb 13, 2016 · 84 comments
Open

Comments

@beckettsean
Copy link
Contributor

Rather than update DISTINCT() to accept tags (#3880), we could accomplish much the same goal if the SHOW TAG VALUES query respected a WHERE time clause. SHOW TAG VALUES already returns the deduplicated set of values.

It is not trivial, since the query can no longer be answered directly from the metastore, but it seems easier than making functions accept tags, which is a totally new code path.

@furnacification
Copy link

+1 I tried to do this today. It would be a great help.

@cmdrSpectral
Copy link

Perhaps it's not the best issue to comment on the case but one of the things that I find stange is the way that tags are treated. Quote from the docs:

You don’t need to have tags in your data structure, but it’s generally a good idea to make use of them because, unlike fields, tags are indexed. This means that queries on tags are faster and that tags are ideal for storing commonly-queried metadata.

So if you think of tags as fields on steroids - indexed fields, then having normal, field like query operators running on them seems natural. Creating a subset of SELECT that works on tags and is called SHOW is nowhere near a generic solution, it rather complicates stuff for the users. There will be cases when someone needs some sort of funcitonality that's already there but works on fields only so even more will be forced onto the SHOW command mirroring SELECT behaviour. I'm no pro to evaluate the difficultness of the task but from the client side I see a lot of sense in it.
From my point of view - the docs should state that tags are not ideal for storing commonly-queried metadata - the I ideal for using group by to group real data that is stored in fields. The way tags are used in InfluxQL makes them a second class data for me.

@beckettsean
Copy link
Contributor Author

@Groblus it is not valid to think of tags as related to fields. They are completely different concepts. All tag keys and values are stored only as strings, and only once, in the index, for each series. Fields are stored as float, int, bool, or string, and are individually stored with each point, and are unindexed.

SHOW queries operate on metadata like databases, measurements, and tags. SELECT queries operate on the metrics themselves, which are the fields. Tag values may also be SELECTed, but only to differentiate the field results.

@AxelVoitier
Copy link

I would like to see this as well. Here is my use case.

With telegraf and procstat plugin a want to monitor a set of processes that all have the same name (but different pid, obvisouly). Then I plot the data on grafana dashboards, using template features.

With grafana template features I can create a variable based on a query. For instance, I have this template variable to filter the set of processes by the process name I want:

$pname = SHOW TAG VALUES FROM "procstat" WITH KEY = "process_name"

So far so good.
Now, I want to create a second template variable for pids that match pname. Unfortunately,
I cannot do something like this:

$pid = SHOW TAG VALUES FROM "procstat" WITH KEY = "pid" WHERE process_name=$pname

Although, and strangely, the query is considered valid if I try it in the web interface of influxdb. But it returns nothing. (of course in the web interface of influxdb I replaced $pname with a string)
Without the where clause I am left with the full list of pids for all monitored processes, and it is hard to know which ones are the ones corresponding to the particular set of processes I want to monitor.

Actually, same would happen if I wanted to pre-filter by the host key. Fortunately I only have one machine to monitor at the moment...

EDIT: Actually it is even worst. If the template variable contains all pids, then the dashboard gets repeat even if there is no data. So, it's kind of grafana issue, but the most natural way to solve it would be that the pid variable gets populated only with values valid for the current filter. Which is possible only with a where clause for SHOW TAG VALUES.

@prune998
Copy link

prune998 commented Mar 3, 2016

+1. same issue here...
The other solution would be to replicate all TAGS as VALUES. This is not efficient and will add LOTS of data inside series, almost for nothing. But doing it this way I could :

SELECT value_pid from procstat where process_name=$pname

OR, and this is also related to some other opened issues, be able to select tags directly in the result set :

SELECT tag_pid from procstat where process_name=$pname

I relaly think the later would be the better solution.
Of course, a where clause of the SHOW TAG VALUES is also good...

@ChrisGute
Copy link

+1

3 similar comments
@cnelissen
Copy link

+1

@alfredocambera
Copy link

+1

@zarrigoni
Copy link

+1

@daveselan
Copy link

+1 on being able to restrict tag values returned based on a time range.

This feature would be extremely useful when using Grafana templating variables. In some cases we can have 100s of values listed when only a few might pertain to time period being viewed.

@MatMeredith
Copy link

+1. Really need this for Grafana templates. Can't deploy without it.

@seb-koch
Copy link

+1 Same use case here, grafana dashboards and plenty of customer demand

@furnacification
Copy link

I've found a workaround that seems to work in my environment.

Create a short retention policy and then create a continuous query that groups by the tags you want to use in the grafana templating.

CREATE RETENTION POLICY "lookup" ON "prod" DURATION 2d REPLICATION 1

I have some extra info that i can group by and push it into a fake measurement called host_info.

CREATE CONTINUOUS QUERY "lookupquery" ON "prod" BEGIN SELECT mean(value) as value INTO "lookup"."host_info" FROM "cpuload" where time > now() - 1h GROUP BY time(1h), host, team, status, location END;

I dont care what the values are, its just about getting a distinct list of hosts that are up and pushing data in the last few days. I have this in my templates in grafana as something like this:

SHOW TAG VALUES FROM "lookup"."host_info" WITH KEY = "location"

I'm just going to reuse the templates in the users dashboards and a few variations of it for other host info like disk etc...

It works for me, hope others can find something similar!

@patrickvallet
Copy link

+1

@smeapng
Copy link

smeapng commented Nov 15, 2016

+1

2 similar comments
@martinschuberts
Copy link

+1

@misko321
Copy link

+1

@abadyan-vonage
Copy link

abadyan-vonage commented Jan 4, 2017

+1 Need this for grafana templates as well
@furnacification - Thanks a lot, that worked! Notice that you don't need the where clause for CQ.

@mvadu
Copy link
Contributor

mvadu commented Jan 10, 2017

+1 Main use case is for grafana template variables. Showing a drop down of 300 entries when the time period only has 3 unique values is no fun!!

@0111sandesh
Copy link

I am running into the same issue when I try to use grafana template variables. Any update on having a WHERE time clause for SHOW TAG VALUES?

@daveselan
Copy link

This restriction is definetly a big problem for us, especially where some of our tag values can increase over time, and managing the size with retention policies isn't always the answer.

One workaround we had thought about might be the idea of using output from SELECT value,tag WHERE $timeFilter and discarding everything other than the desired tags? Probably horribly expensive, especially seeing as it would be a disk operation instead of access from memory indexes, but a possibility if the selected time series wasn't too big?

@magnuspwhite
Copy link

I was shocked when I found out this wasn't a feature when using Grafana template variables.

For my usage we have a high number of hosts and a high turnover of hostnames. Even with a retention policy of 30 days, I have some query strings in Grafana using template variables with a filter which exceeds 250k characters. Instead only handful of hosts need to be selected for that period.

This is a feature with should be added.

@opavader
Copy link
Contributor

opavader commented Feb 6, 2017

@beckettsean I dont think its that expensive to implement for the lower bound case i.e where time > now() - current_time_range . For each insert the metadata store is anyway needed to be checked if a new tag value has been added. Instead it can update a last_updated counter if the tag exist, otherwise add a new entry with the current timestamp. This it self will be enough for almost all my use cases of initialization of the live graphs. Similarly a first_updated can be used, which too will suffice for most cases as almost always a tagged resource send the metric without interruption (big gaps) .
I will be happy to give a pull request if agreed on this approach.

@zezuladp
Copy link

zezuladp commented Feb 14, 2017

Here's an example of my workaround. Just wrote a script that updates the templated variables to be the ones that I wanted within a specified timeframe
https://gist.github.com/zezuladp/9159a20b409a9f5f5a9fff585991626b

@itshikanov
Copy link

For Grafana variables the answer is:
select host,value from nginx_value WHERE time > now() - 10m

@heshixiong
Copy link

  • 1

@dgnorton dgnorton added the 1.x label Jan 7, 2019
@joway
Copy link

joway commented Jan 15, 2019

@itshikanov It works, thanks!

@BushnevYuri
Copy link

BushnevYuri commented Feb 28, 2019

@joway @itshikanov, in this case, query returns array and grafana UI fails with "templating" error.
I tried like this:
select "group","value" from "http_req_duration" WHERE time > now() - 90m
Error in response: {"error":"error parsing query: found GROUP, expected identifier, string, number, bool at line 1, char 8"}

Which version of grafana do you use?

@BushnevYuri
Copy link

Very much surprised that this issue is not solved yet... Completely kills our use case to use influxdb.

@BushnevYuri
Copy link

Found workaround for all of those who are looking to use tags from InfluxDB with respect of timeFilter for Grafana variables templating.

https://stackoverflow.com/questions/54931888/query-tags-from-influxdb-with-respect-of-timefilter-for-grafana-variables-templa/54933003#54933003

@n1nj4888
Copy link

Thanks @BushnevYuri,

Can I use this to convert a current Variable "$host" with the following configuration to add in the timeFilter i.e. to only return the $host variables which are present in the Dashboard's Time Filter timeframe?

SHOW TAG VALUES FROM "snmp.hosts" WITH KEY = "agent_host"

It's not clear to me how I'd need to construct your suggested SELECT suggestion based on the above SHOW query?

select DISTINCT("name") from (select "name","value" from "http_req_duration" WHERE $timeFilter AND "group" =~ /^$Group$/ AND "TestName" =~ /^$TestName$/)

@n1nj4888
Copy link

n1nj4888 commented Jun 9, 2019

Hi @BushnevYuri,

Any chance you could provide an example as to how to use your workaround SELECT query to get the following SHOW TAGS query to respect the timeFilter?

SHOW TAG VALUES FROM "snmp.hosts" WITH KEY = "agent_host"

Ultimately, I don’t see how I use the TestName and Group parameters in your example query for my SHOW TAGS query above?

@ekle
Copy link

ekle commented Jun 21, 2019

The following works for me in grafana as a variable:
SELECT "host" FROM ( SELECT "value", "host" FROM "services"."default"."test" WHERE time > now()-1h )
it does not do a distinct but this is done by grafana.
host is the tag I want to limit by time.
value just needs to contain any value.
Likely it is not very efficient, but at least it kind of works.

@n1nj4888
Copy link

Thanks for the clarifications. I’m now able to filter the tag keys by $timeFilter - The work around is good enough but surprised that InfluxDB cant handle the time filter in a standard show tags query.

@cstroe
Copy link

cstroe commented Jul 24, 2019

As an update to @ekle 's workaround, I was able to use distinct() with his query:

SELECT distinct("host") FROM ( SELECT "bytes.committed","host" FROM "jvm.memory" WHERE time > now()-1h )

To make the query respect the currently selected Grafana time window, you use the $timeFilter:
SELECT distinct("host") FROM ( SELECT "bytes.committed","host" FROM "jvm.memory" WHERE $timeFilter )

@brtkwr
Copy link

brtkwr commented Jul 26, 2019

I am using InfluxDB 1.7.1 and weirdly, I am able to use time filter with SHOW TAG KEYS/VALUES queries as this:

SHOW TAG KEYS ON monasca WHERE (_name = 'tempest-1100328991') AND (_tenant_id =
 '7f8b19504b7646c898025fbc0a0e89e6' AND _region = 'RegionOne' AND time >= 1564148465s AND time < 1565012465s)

SHOW TAG VALUES ON monasca WITH KEY = name_1 WHERE (_tenant_id = '7f8b19504b764
6c898025fbc0a0e89e6' AND _region = 'RegionOne' AND time >= 1564148465s AND time < 1565012465s) AND (_tagKey = 'name_1')

However, the peculiarity is that the finest granularity is 1 day... I can't filter tags on an hourly or a minutely basis... this makes sense from a performance perspective but I cannot find this behaviour documented anywhere. Would anyone be able to shed a light on this?

@davidnewhall
Copy link

@brtknr - You cannot filter SHOW TAG commands using any fields, including time. Reversing the > to < produces the same results for me; the time filter is effectively ignored. Using InfluxDB 1.7.7.

@brtkwr
Copy link

brtkwr commented Aug 16, 2019

Hi @davidnewhall, here is the answer: https://community.influxdata.com/t/undocumented-behaviour-with-show-tag-keys-values/10805/6
The short answer is that the accuracy of your search is a minimum of your shardGroupDuration.

@davidnewhall
Copy link

Interesting.

@stale
Copy link

stale bot commented Nov 18, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Nov 18, 2019
@dsvensson
Copy link

Still an issue.

@stale stale bot removed the wontfix label Nov 25, 2019
@R-Studio
Copy link

any news about this issue?

@Amerousful
Copy link

Amerousful commented Jun 10, 2020

I found this workaround:
SELECT "request" FROM(SELECT "min","request" FROM "gatling" WHERE time > now() - 1d)

Where "request" is tag key.

@ghost
Copy link

ghost commented Jun 30, 2020

It's not a fast query, but subselects can make it work for cases where you want to filter by other tag or field values. The sub-query / group by is to return unique values of the tag.

If you want "tag_a" values when "tag_b" = some other value, you can query a related field [1], using a simple aggregate (e.g. last()). I haven't checked if other aggregate functions are faster, because this met my needs.

select "tag_a" 
from (
    SELECT last(some_field) 
    from some_measurement 
    group by "tag_a", "tag_b"
) 
where "tag_b"='value_of_interest'

[1] Yes, tags and fields are separate concepts in the database. By "related" I mean in the practical sense of where the data comes from, not how it's stored.

@yayitazale
Copy link

+1

@AlexHeylin
Copy link

+1

1 similar comment
@czewail
Copy link

czewail commented Dec 13, 2023

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests