Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.9.3-nightly] regexs fails with empty tags #3773

Closed
gallir opened this issue Aug 20, 2015 · 7 comments · Fixed by #6283
Closed

[0.9.3-nightly] regexs fails with empty tags #3773

gallir opened this issue Aug 20, 2015 · 7 comments · Fixed by #6283
Assignees
Milestone

Comments

@gallir
Copy link

gallir commented Aug 20, 2015

If some tags are missing/no used in some points they are no selected by regexes like the "All" used from grafana and similar.

Example, for

select * from serie where sparse_tag =~ /.*/

only those points what stored some value in sparse_tag will be retrieved. The same happens with expression like "sparse_tag = ''" but it's annoying with regexes because you cannot create filters (in grafana, for example, they are called "template variables") if the tag is not stored in every point in the measurement.

I don't know if this is a bug or a constraint, anyway it will be great if it worked as expected.

@beckettsean
Copy link
Contributor

@gallir the regex as written only matches actual values, not the absence of a value. Does a query like this address your needs? I haven't directly tested it but I think it should work.

select * from serie where sparse_tag =~ /.*/ OR sparse_tag !~ /.*/

@gallir
Copy link
Author

gallir commented Aug 21, 2015

Nope, it does not work. Test with actual values (with a last week nightly build because 0.9.2 stable died with "select * ..."):

> select count(value) from data where collection='confirmation' and service =~ /.*/  OR sparse_tag !~ /.*/
name: data
----------
time            count
1970-01-01T00:00:00Z    297166

> select count(value) from data where collection='booking' and service =~ /.*/  OR sparse_tag !~ /.*/
> 

Gosh! when testing different options while I was writing this I found one that works, /^$/

> select count(value) from data where service !~ /^$/
name: data
----------
time            count
1970-01-01T00:00:00Z    516220

Unfortunately Grafana doesn't accept this regex :(

But it's strange, it selects also empty tags when the regex requires the opposite.

@beckettsean beckettsean changed the title regexs fails with empty tags [0.9.3-nightly] regexs fails with empty tags Aug 21, 2015
@jsternberg jsternberg self-assigned this Apr 7, 2016
@jsternberg
Copy link
Contributor

This may be really difficult for us to do. While we keep an index of what values correspond to which tag, we don't keep track of values that don't have a specific tag. I don't think there would be any practical way we could do that either as it would require us to know what tags are allowed to exist and will exist for every point ahead of time. The only way I can think of would be to iterate through all of the possible series, but I don't think that would be a good idea for performance.

Something we may be able to do is allow an empty string as a tag value. If we allowed that, then it would likely be possible to index the empty tag value and probably wouldn't even require the query engine to change to support this kind of functionality.

@pauldix what are your thoughts on this? This might also be something to consider if/when we redesign the line protocol like in #6037.

@pauldix
Copy link
Member

pauldix commented Apr 10, 2016

If you just leave out the WHERE clause then you'll get everything. We wouldn't want to look at the regex to see if it should match against an empty tag.

One thing we may want to do is to make sure WHERE some_tag = '' or WHERE some_tag is null work properly

@jsternberg
Copy link
Contributor

@pauldix the second part is what I was referring to. When a tag isn't specified when writing a metric, it's not indexed anywhere so figuring out WHERE some_tag = '' is difficult because we would have to iterate through all of the possible series.

We could make it possible by allowing tags to be the empty string so the user would write:

> insert cpu,host=server01,region= value=2

And then this would work:

> select value from cpu where host = 'server01' and region =~ /.*/

But anything that doesn't have a region would get ignored since it wasn't indexed when it was written. It would be difficult to implement WHERE some_tag is null because of this indexing. One benefit to this is we could differentiate between a tag that is nil and one that is the empty string pretty easily.

@pauldix
Copy link
Member

pauldix commented Apr 10, 2016

No, they shouldn't be able to write tags with an empty string. If they're matching on the tag empty string in the query, it would be the same as pulling WHERE !~ /.*/ which should also work, no?

@jsternberg
Copy link
Contributor

I think I might have a misunderstanding of what a missing tag counts as. For the series cpu,host=server01 should the region be null or ""? If it's null, is there and should there be a way to have a tag value that is the empty string? If it's "", does that mean every series has a near-infinite number of tags with the empty string?

There seems to be a bit of inconsistency in regards to how a missing tag key is treated. Some places use the empty string and some places treat it as null.

jsternberg added a commit that referenced this issue Apr 11, 2016
A missing tag on a point was sometimes treated as `""` and sometimes
treated as a separate `null` entity. This change modifies the equality
operations to always treat a missing tag as an empty string.

Empty tags are *not* indexed and do not have the same performance as a
tag that exists.

Fixes #3773.
@jsternberg jsternberg added this to the 0.13.0 milestone Apr 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants