Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HWKMETRICS-676] When a simple text is supplied for tag value in a qu… #828

Merged
merged 3 commits into from Jun 2, 2017

Conversation

Projects
None yet
8 participants
@stefannegrea
Copy link
Contributor

stefannegrea commented May 25, 2017

…ery, use a simple query to get data from C*.

Stefan Negrea
[HWKMETRICS-676] When a simple text is supplied for tag value in a qu…
…ery, use a simple query to get data from C*.
@jsanda

This comment has been minimized.

Copy link
Contributor

jsanda commented May 25, 2017

I was talking with @burmanm about this and he raised a good question. How can we be certain that the value is not a regex?

@stefannegrea

This comment has been minimized.

Copy link
Contributor Author

stefannegrea commented May 25, 2017

@jsanda, I limited the scope a little for this initial PR. There are two types of expressions supported by the parser, simple and complex text. The parsing expression for simple text is this: SIMPLETEXT : [a-zA-Z_0-9.]+ ; . This PR only treats the simple text expression as a non-regex expression; and this is inline with actual language expectations.

@pilhuhn

This comment has been minimized.

Copy link
Member

pilhuhn commented May 26, 2017

@stefannegrea Couldn't . be part of a regex?

@stefannegrea

This comment has been minimized.

Copy link
Contributor Author

stefannegrea commented May 26, 2017

@pilhuhn, you are right about the dot being part of regex, however the intent was to allow regex only in the complext text (delimited by ").

@pilhuhn

This comment has been minimized.

Copy link
Member

pilhuhn commented May 26, 2017

@stefannegrea Yes I understand. Which is why I asked about the dot - I would not expect it to be part of the simple expression (unlike e.g. underscore or even a dash)

@stefannegrea

This comment has been minimized.

Copy link
Contributor Author

stefannegrea commented May 26, 2017

@pilhuhn, thanks for the clarification. We need the dot in the simple expression because we allow and have clients that push tag names with the dot character (eg. label.pod).

@burmanm

This comment has been minimized.

Copy link
Contributor

burmanm commented May 27, 2017

Don't we have clients that push / also? And who knows what other classes as well. As we never limited the available characters, we have no safe way of telling which characters are used in the tag names.

Would it make more sense to have a parameter (or a function) that tells if the regexp is used or not?

For example: a1 = regexp(a.b) or a1 = exact(a.b). Problem is, we've never really documented the language behavior so I'm not sure what the "contract" we have currently is. We could go either way though, I might prefer the regexp(searchKey) if we consider breaking the current behavior.

@FilipB

This comment has been minimized.

Copy link

FilipB commented May 29, 2017

Performance results:
Average response times for read tests are 10-30 times better depending on type of query.
E.g. following "gauges/stats?tags=userID${userID} AND group_id=groupID${groupID}&bucketDuration=600000ms" is 30 times faster now.

@jshaughn

This comment has been minimized.

Copy link
Contributor

jshaughn commented May 31, 2017

The suggestion from @burmanm makes sense to me. This interim solution is a weak band-aid that will likely hit issues given the limited character set being used for simple text. As soon as possible the language should be extended with some sort of explicit regex decl when regex matching is desired. The syntax should likely be agreed on and shared with alerting

@stefannegrea

This comment has been minimized.

Copy link
Contributor Author

stefannegrea commented May 31, 2017

Should we extend the proposal from @burmanm to SIMPLETEXT expressions too? Or that proposal is only for COMPLEXTEXT expressions? Right now, the semantics of SIMPLETEXT expression for tag names are very clear, it is a strict equality and never regex.

This PR will stand as is if the new regex notation (regex("asb.sdf*")) is only done for "COMPLEXTEXT".

@burmanm

This comment has been minimized.

Copy link
Contributor

burmanm commented Jun 1, 2017

I don't think we should have a complexText / simpleText detection at all since the simpletext does not match all the exact values.

We should probably just change the semantics to be:

a = ^b$ -> exact match always, no regexp (it wants a tag value ^b$, not b)
a = regexp(^b$) -> regexp evaluate, in this case the match is b

@jshaughn

This comment has been minimized.

Copy link
Contributor

jshaughn commented Jun 1, 2017

Right, unless flagged as a regex match it would always be exact match, there would be no categorization of the match string based on the characters in the string. As an aside, alerting is doing the same thing with the limited character set, although it is already performing exact match (not regex) when the value is appropriate. Both metrics and alerting should support an explicit regex directive. There are some back compat issues here, although I think the change in API is required. If required we could support some sort of sysprop that forces regex always, just to potentially cover the back compat situation.

As for the syntax, instead of function-like expression that adds more parentheses, perhaps we should consider a new operator. Keep '=' for exact match and introduce either "MATCHES" or maybe one of '~' or '~=' for regex.

a = ^b$ -> exact match
a MATCHES ^b$ -> regex match
a ~ ^b$ -> alternative regex (I prefer the keyword)

@stefannegrea @burmanm @lucasponce , thoughts?

@lucasponce

This comment has been minimized.

Copy link
Contributor

lucasponce commented Jun 1, 2017

We would need another operator.
Today we use

EQUAL: '=';
NOTEQUAL: '!=';

So, we would need the same both operators but for regex

EQUAL: '=';
NOTEQUAL: '!=';
EQUALREGEX: '~';
NOTEQUALREGEX: '!~';

I prefer the shorter way, less verbosity the better.
At the moment that you chained several expressions, the string could be unmanegable.

@jshaughn

This comment has been minimized.

Copy link
Contributor

jshaughn commented Jun 1, 2017

@lucasponce +1, looking at it again, I agree. I'd just change the enum:

REGEXMATCH: '~';
NOTREGEXMATCH: '!~';

Actually, regex could do the negation itself, but I like having the second operator.

@stefannegrea

This comment has been minimized.

Copy link
Contributor Author

stefannegrea commented Jun 1, 2017

@jshaughn, @lucasponce, @burmanm , I will rework this PR with new operators.

Stefan Negrea
[HWKMETRICS-676] Introduce ~ and !~ operators for regex matching. The…
… = and != operators are now using the text just as a simple value.
@jshaughn

This comment has been minimized.

Copy link
Contributor

jshaughn commented Jun 2, 2017

Great. LGTM. In alerting we will also make this change for the next version.

@jsanda

This comment has been minimized.

Copy link
Contributor

jsanda commented Jun 2, 2017

@stefannegrea can you add a test for the !~ operator?

@jotak

This comment has been minimized.

Copy link
Contributor

jotak commented Jun 2, 2017

For the record I'll have to add ~ and !~ in grafana as well

@jsanda jsanda merged commit 8f9951e into master Jun 2, 2017

2 of 3 checks passed

haw-metrics-perf-stability-test Build finished.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@stefannegrea stefannegrea deleted the HWKMETRICS-676 branch Aug 21, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.