Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Fix issue with wrong NULL optimization #37124

Merged
merged 7 commits into from Jan 6, 2019
Merged

Conversation

matriv
Copy link
Contributor

@matriv matriv commented Jan 3, 2019

Logical operators OR and AND as well as conditional functions
(COALESCE, LEAST, GREATEST, etc.) cannot be folded to NULL if one
of their children is NULL as is the case for most of the functions.
Therefore, their nullable() implementation cannot return true. On
the other hand they cannot return false as if they're wrapped within
an IS NULL or IS NOT NULL expression, the expression will be folded
to false and true respectively leading to wrong results.

Change the signature of nullable() method and add a third value UKNOWN
to handle these cases.

Fixes: #35872

Logical operators `OR` and `AND` as well as conditional functions
(`COALESCE`, `LEAST`, `GREATEST`, etc.) cannot be folded to NULL if one
of their children is NULL as is the case for most of the functions.
Therefore, their `nullable()` implementation cannot return `true`. On
the other hand they cannot return `false` as if they're wrapped within
an `IS NULL` or `IS NOT NULL` expression, the expression will be folded
to `false` and `true` respectively leading to wrong results.

Change the signature of `nullable()` method and add a third value `UKNOWN`
to handle these cases.

Fixes: elastic#35872
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.
I think we're trying to address two issues at once:

  1. figuring out whether something produces null or not
  2. figure out if something supports null as a input

I'm not sure whether 1 and 2 need to be handled separately and in the meantime I think the current solution is sound. Especially since it affects older versions as well.

@@ -29,6 +29,32 @@
*/
public abstract class Expression extends Node<Expression> implements Resolvable {

public enum Nullable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good used enough enum that should be promoted to its own class - no need to keep it under Expression.
I wonder if Nullability is a better name - it avoid the clash with the Nullable annotation and also indicates a value of nullable as oppose to being an attribute by itself (or of itself).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for Nullability

@@ -29,6 +29,32 @@
*/
public abstract class Expression extends Node<Expression> implements Resolvable {

public enum Nullable {
POSSIBLY, // Whether the expression becomes null if at least one param/input is null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POSSIBLY is ambiguous as oppose to NEVER which is certain.
I would go for something simpler - TRUE, FALSE, UNKNOWN - shorter, clear and close to the SQL semantics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the comment is inaccurate - null can be returned regardless of the input (there might not be any).

NEVER, // The expression can never become null
UNKNOWN; // Cannot determine if the expression supports possible null folding

public static Nullable and(Nullable... args) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should one for or as well, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed so far.

// UKNOWN AND <anything> => UKNOWN
// NEVER AND NEVER => NEVER
// POSSIBLE AND NEVER/POSSIBLE => POSSIBLE
for (int i = 1; i < args.length; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NEVER case doesn't seem to be handled. Also it might make sense to assign some bitmasks to the enum to make the comparison simpler and use just one accumulator.
That is, instead of checking the return value and the next value, it would be easier to just 'combine' the current value (returnValue) with the next value and be done with.
(an OO way would be to define a method on the enum but that adds a virtual call for minimum gain).
Further more since UNKNOWN trumps everything, the loop could be skipped if this value is found:

Nullability value = null;
for (Nullability n: nullables) {
    switch (n) {
       case UNKNOWN: return UNKNOWN;
       case POSSIBLE: value = n;
       case NEVER: 
          if (value == null) {
              value = n;
          }
     }
}

return value;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the bitmask idea!

@matriv
Copy link
Contributor Author

matriv commented Jan 3, 2019

@costin Addressed your comments. Could you please check again?

@matriv matriv added the >bug label Jan 3, 2019
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied

}
}
return false;
return Nullability.UNKNOWN;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false -> unknown? Does this fall under the "and" rule?

DataType dataType, Literal literal) {
super(location, name, dataType, qualifier, nullable, id, synthetic);
public LiteralAttribute(Location location, String name, String qualifier, Nullability nullability, ExpressionId id, boolean synthetic,
DataType dataType, Literal literal) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect formatting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? It's the method signature and the args should align.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the signature is different than before; as far as I recall the alignment doesn't have to be under the same param (see the other constructors).

import org.elasticsearch.xpack.sql.SqlIllegalArgumentException;

public enum Nullability {
TRUE((byte) 1), // Whether the expression can become null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the bitmask - it doesn't seem to add much value (see the previous method suggestions for implementing and); shorter and clearer than using bits.

@matriv
Copy link
Contributor Author

matriv commented Jan 3, 2019

@costin Thanks again, fixup pushed.

public boolean nullable() {
return field().nullable() && pattern != null;
public Nullability nullable() {
if (pattern == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the change here has the same outcome as the previous code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, but before the logic was wrong as LIKE/RLIKE return null if the pattern is null or if the value checked is null.

@@ -1097,12 +1097,12 @@ private boolean canPropagateFoldable(LogicalPlan p) {
@Override
protected Expression rule(Expression e) {
if (e instanceof IsNotNull) {
if (((IsNotNull) e).field().nullable() == false) {
if (((IsNotNull) e).field().nullable() == Nullability.FALSE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you static import this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot, mixes up with Literal.TRUE and Literal.FALSE which are already statically imported.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left couple of comments.

@astefan
Copy link
Contributor

astefan commented Jan 4, 2019

I think the correct label for 6.6.x is 6.6.0.

@matriv matriv added v6.6.0 and removed v6.6.1 labels Jan 4, 2019
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

value = TRUE;
break;
case FALSE:
if (value == null || value == FALSE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (value == FALSE) value = FALSE; doesn't change anything so it can be removed.

}
}
}
return value != null ? value : UNKNOWN;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ever occur? Is the method called over an empty list?

Copy link
Contributor Author

@matriv matriv Jan 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's a "safety net". Should we throw an exception instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might make sense though my preference is to handle this corner cases leniently.
I was a bit confused by UNKNOWN, I would argue an empty list has FALSE nullability (it can never be null) but then again maybe it's something that's worth having a check.

@matriv matriv merged commit da3d8fb into elastic:master Jan 6, 2019
@matriv matriv deleted the mt/fix-35872 branch January 6, 2019 16:29
matriv added a commit that referenced this pull request Jan 6, 2019
Logical operators OR and AND as well as conditional functions
(COALESCE, LEAST, GREATEST, etc.) cannot be folded to NULL if one
of their children is NULL as is the case for most of the functions.
Therefore, their nullable() implementation cannot return true. On
the other hand they cannot return false as if they're wrapped within
an IS NULL or IS NOT NULL expression, the expression will be folded
to false and true respectively leading to wrong results.

Change the signature of nullable() method and add a third value UKNOWN
to handle these cases.

Fixes: #35872
@matriv
Copy link
Contributor Author

matriv commented Jan 6, 2019

Backported to 6.x with a6e9e7c

matriv added a commit that referenced this pull request Jan 6, 2019
Logical operators OR and AND as well as conditional functions
(COALESCE, LEAST, GREATEST, etc.) cannot be folded to NULL if one
of their children is NULL as is the case for most of the functions.
Therefore, their nullable() implementation cannot return true. On
the other hand they cannot return false as if they're wrapped within
an IS NULL or IS NOT NULL expression, the expression will be folded
to false and true respectively leading to wrong results.

Change the signature of nullable() method and add a third value UKNOWN
to handle these cases.

Fixes: #35872
@matriv
Copy link
Contributor Author

matriv commented Jan 6, 2019

Backported to 6.6 with 88a3c84

matriv added a commit that referenced this pull request Jan 6, 2019
Logical operators OR and AND as well as conditional functions
(COALESCE, LEAST, GREATEST, etc.) cannot be folded to NULL if one
of their children is NULL as is the case for most of the functions.
Therefore, their nullable() implementation cannot return true. On
the other hand they cannot return false as if they're wrapped within
an IS NULL or IS NOT NULL expression, the expression will be folded
to false and true respectively leading to wrong results.

Change the signature of nullable() method and add a third value UKNOWN
to handle these cases.

Fixes: #35872
@matriv
Copy link
Contributor Author

matriv commented Jan 6, 2019

Backported to 6.5 with be638a3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants