Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EQL: Revisit case insensitivity #61883

Closed
costin opened this issue Sep 2, 2020 · 13 comments
Closed

EQL: Revisit case insensitivity #61883

costin opened this issue Sep 2, 2020 · 13 comments
Labels
:Analytics/EQL EQL querying >enhancement Team:QL (Deprecated) Meta label for query languages team team-discuss

Comments

@costin
Copy link
Member

costin commented Sep 2, 2020

Elasticsearch is by default case sensitive. EQL on the other hand strives to be case insensitive since matching strings against different OSs is not straight-forward (some are insensitive, some aren't).
Hence why string equality / non-equality are by default case-insensitive.

The current approach requires usage of scripting for functions that are case sensitive and is not fully supported for equality/non-equality. We could expand this to the rest of the operators (like >, >=, etc..) but considering this is a rare occurrence for strings, for the time being the scope is on == and !=.

Using operators

Extending the == operator to be case aware is convenient but also quite impactful. That's because in all languages == is an exact equality, John == john is false.
That is everything is case sensitive and insensitivity needs to be added on top.

Either default (sensitive or insensitive) has pros and cons and having a flag that can change the behavior is the ideal way. Currently there is a default through the case_sensitive parameter which can be kept though it would have to be renamed since it's only the equality that we're after so
case_sensitive --> case_sensitive_equality.

The issue with this type of parameter is that all string comparisons have the same sensitivity. Potentially we can introduce dedicated operator such as ~= or ~== to indicate a case insensitive comparison.

The pro of this approach is that there are defined scopes, the downside is that it might be too subtle for folks to pick it up.

Wrapping functions

Another option would be to use some kind of function say insensitive(foo == bar) or sensitive(foo == bar) which is a more verbose way of supporting == and ~= and offering both sensitivities regardless of the global setting.

Impact on functions

As described in #61162, case insensitivity will be an option on a limited number of queries. Currently this translates to:

  • string equality (term and terms)
  • startsWith (prefix query)
  • pattern matching, match, wildcard (wildcard query)

It's worth revising the semantics of insensitivity over all the functions in particular:

  • between, indexOf
  • endsWith- might be rewritten to a wildcard query
  • stringContains - wildcard again

As last note, a global setting will affect both the operator and the functions. Meaning if we need scoping - have functions with both types of sensitivity as well as operators, we need to introduce either dedicated switches/wrapping functions.

My proposal is to look at the case insensitive usage of functions in existing queries and where needed, try to retrofit them onto the existing queries. While we might not cover all possible options, the vast majority of cases / rules might be covered.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (:Query Languages/EQL)

@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Sep 2, 2020
@astefan
Copy link
Contributor

astefan commented Sep 3, 2020

I'm not a big fan of insensitive(foo == bar) as it's too verbose and hard to use. I'd be in favor of considering EQL case insensitive by default, remove the existent case_sensitive parameter and just be explicit about being case sensitive everywhere else and use something like ~= (and ~!=) for a case sensitive equality. For functions, I'd be in favor of an additional optional parameter which will make that functions' functionality to be case sensitive.

@costin
Copy link
Member Author

costin commented Sep 3, 2020

~= (and ~!=) for a case sensitive equality

That to me is confusing. == is exact/case sensitive and ~= inexact/fuzzy/insensitive; not the other way around.
If we overload == to be inexact, I don't see any value in introducing another operator but instead add a parameter for the whole request to indicate the desired behavior.
In other words one cannot mix and match sensitive comparisons - which considering that Python EQL is insensitive, doesn't seem to be a problem.

And if that's the case, there's no need to have a per-string function sensitivity switch.

@astefan
Copy link
Contributor

astefan commented Sep 3, 2020

but instead add a parameter for the whole request to indicate the desired behavior.

And then we have the other issue of having a wide parameter for case sensitivity that will not apply for comparisons and sorting...

@costin
Copy link
Member Author

costin commented Sep 3, 2020

And then we have the other issue of having a wide parameter for case sensitivity that will not apply for comparisons and sorting...

We can rename the parameter to better indicate its scope case_insensitive_equals as oppose to case_insensitiveand even introduce one for functionscase_insensitive_functions` though I don't like the second idea since it applies all functions.

I don't like the idea of a query wide parameter however it some kind of option for changing behavior

@astefan
Copy link
Contributor

astefan commented Sep 3, 2020

There is one other significant advantage with not having wide (per request) values, imo: flexibility. A case sensitive equals and a case insensitive equals could not be used in the same query. One could argue though that eql is case insensitive by default and the number of scenarios where both are used is small. But if there is one scenario where an user will want to use a case sensitive equals, imo that scenario is likely to involve another at least one operation that’s case insensitive.

@costin
Copy link
Member Author

costin commented Sep 10, 2020

If we aim to use EQL (or sub parts of it) in other context, such as Elastic Agent (elastic/beats#20994) or Kibana, I think there is no other way than to build it into the language while keeping the existing behavior around through a flag (like we do today) that we should consider deprecated.

As mentioned before, operators do not allow switches and since == means exact equality I propose to:

  1. ~==/~!=
    Introduce case-insensitive equal/non-equal.

  2. case-sensitivity parameter to string functions

Introduce case-sensitivity param to string comparison functions where this makes sense. Currently this affects startsWith, endsWith, containsString, indexOf and for regex between, match, wildcard.

  1. keep flag for EQL requests to preserve existing case sensitivity behavior but consider it deprecated.

This means queries such as foo == "foo" would be equivalent to foo ~== "foo" using said flag but one would get a deprecating warning to avoid using said flag moving forward.
Internally this means whatever rules we push forward

P.S. It's also worth considering dropping = and favor only == moving forward to avoid having to introduce ~= instead of just ~==.

@rw-access
Copy link
Contributor

rw-access commented Sep 10, 2020

I think custom overloaded operators is a strange leak into the syntax, and I don't understand the appeal. The global toggle should be sufficient. I'm strongly against this ~ proposal, and can share more in the weekly meetup if necessary. It also has a little bit of javascript-like smell, where it has == with coercion and === for equality without coercion. We might want to be more careful before we introduce a similarly controversial syntax to EQL.

I don't see any problem continuing to support = and == from a technical perspective. Both are used very heavily already and there are no assignments in EQL so this could be okay. Although, I do like the idea of only having == for "is equivalent to" (depending on the case-sensitivity toggle), and the usual approach of raising a specific error message for this deprecation/break: "= is no longer supported, please use ==". I think in some contexts, a single equals sign is more appropriate: maxspan=1m, fork=true, etc. Raising an error on = in an expression would make = available to us later. One hypothetical example: an assignment pipe: | set x = "foo"

I think we need to continue keeping the end user in mind, and I'm not so sure that ~!= is a desired operator. Within the rules team which writes EQL and KQL rules, this syntax was not well received. There's also the collision with Lua which uses ~= to mean "not equals" not case "insensitive equals".

@costin
Copy link
Member Author

costin commented Sep 10, 2020

After discussing this topic again in our meeting, it was decided that EQL in Elasticsearch will be case insensitive where needed, without offering an option to turn this off.
That means the case_sensitive parameter will be removed, the equality for strings will be case insensitive as will be the string functions that needed it.
We need to make sure this is properly documented in the docs and individual functions to avoid any confusion.

If/when the topic of case sensitivity arises, we'll deal with it then.

Raised #62255 to track down code changes.

Thanks for the discussions folks!

@costin costin closed this as completed Sep 10, 2020
@costin
Copy link
Member Author

costin commented Sep 17, 2020

Opening this ticket again after the comments on #62255

@sethpayne
Copy link

Given that we'll be supporting runtime fields in ES soon, should we reserve = for defining these fields in EQL?

@costin
Copy link
Member Author

costin commented Sep 22, 2020

Closing per #62255 (comment)

@costin costin closed this as completed Sep 22, 2020
@costin
Copy link
Member Author

costin commented Sep 22, 2020

@sethpayne Let's move the conversion over to #62650

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/EQL EQL querying >enhancement Team:QL (Deprecated) Meta label for query languages team team-discuss
Projects
None yet
Development

No branches or pull requests

5 participants