New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Fix issue with wrong NULL optimization #37124
Conversation
Logical operators `OR` and `AND` as well as conditional functions (`COALESCE`, `LEAST`, `GREATEST`, etc.) cannot be folded to NULL if one of their children is NULL as is the case for most of the functions. Therefore, their `nullable()` implementation cannot return `true`. On the other hand they cannot return `false` as if they're wrapped within an `IS NULL` or `IS NOT NULL` expression, the expression will be folded to `false` and `true` respectively leading to wrong results. Change the signature of `nullable()` method and add a third value `UKNOWN` to handle these cases. Fixes: elastic#35872
Pinging @elastic/es-search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments.
I think we're trying to address two issues at once:
- figuring out whether something produces null or not
- figure out if something supports null as a input
I'm not sure whether 1 and 2 need to be handled separately and in the meantime I think the current solution is sound. Especially since it affects older versions as well.
@@ -29,6 +29,32 @@ | |||
*/ | |||
public abstract class Expression extends Node<Expression> implements Resolvable { | |||
|
|||
public enum Nullable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good used enough enum that should be promoted to its own class - no need to keep it under Expression
.
I wonder if Nullability
is a better name - it avoid the clash with the Nullable
annotation and also indicates a value of nullable as oppose to being an attribute by itself (or of itself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for Nullability
@@ -29,6 +29,32 @@ | |||
*/ | |||
public abstract class Expression extends Node<Expression> implements Resolvable { | |||
|
|||
public enum Nullable { | |||
POSSIBLY, // Whether the expression becomes null if at least one param/input is null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
POSSIBLY
is ambiguous as oppose to NEVER
which is certain.
I would go for something simpler - TRUE
, FALSE
, UNKNOWN
- shorter, clear and close to the SQL semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the comment is inaccurate - null can be returned regardless of the input (there might not be any).
NEVER, // The expression can never become null | ||
UNKNOWN; // Cannot determine if the expression supports possible null folding | ||
|
||
public static Nullable and(Nullable... args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should one for or
as well, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed so far.
// UKNOWN AND <anything> => UKNOWN | ||
// NEVER AND NEVER => NEVER | ||
// POSSIBLE AND NEVER/POSSIBLE => POSSIBLE | ||
for (int i = 1; i < args.length; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NEVER
case doesn't seem to be handled. Also it might make sense to assign some bitmasks to the enum to make the comparison simpler and use just one accumulator.
That is, instead of checking the return value and the next value, it would be easier to just 'combine' the current value (returnValue) with the next value and be done with.
(an OO way would be to define a method on the enum but that adds a virtual call for minimum gain).
Further more since UNKNOWN trumps everything, the loop could be skipped if this value is found:
Nullability value = null;
for (Nullability n: nullables) {
switch (n) {
case UNKNOWN: return UNKNOWN;
case POSSIBLE: value = n;
case NEVER:
if (value == null) {
value = n;
}
}
}
return value;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the bitmask idea!
@costin Addressed your comments. Could you please check again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied
} | ||
} | ||
return false; | ||
return Nullability.UNKNOWN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
false -> unknown? Does this fall under the "and" rule?
DataType dataType, Literal literal) { | ||
super(location, name, dataType, qualifier, nullable, id, synthetic); | ||
public LiteralAttribute(Location location, String name, String qualifier, Nullability nullability, ExpressionId id, boolean synthetic, | ||
DataType dataType, Literal literal) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect formatting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? It's the method signature and the args should align.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the signature is different than before; as far as I recall the alignment doesn't have to be under the same param (see the other constructors).
import org.elasticsearch.xpack.sql.SqlIllegalArgumentException; | ||
|
||
public enum Nullability { | ||
TRUE((byte) 1), // Whether the expression can become null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove the bitmask - it doesn't seem to add much value (see the previous method suggestions for implementing and); shorter and clearer than using bits.
@costin Thanks again, fixup pushed. |
public boolean nullable() { | ||
return field().nullable() && pattern != null; | ||
public Nullability nullable() { | ||
if (pattern == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the change here has the same outcome as the previous code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, but before the logic was wrong as LIKE/RLIKE return null if the pattern is null or if the value checked is null.
@@ -1097,12 +1097,12 @@ private boolean canPropagateFoldable(LogicalPlan p) { | |||
@Override | |||
protected Expression rule(Expression e) { | |||
if (e instanceof IsNotNull) { | |||
if (((IsNotNull) e).field().nullable() == false) { | |||
if (((IsNotNull) e).field().nullable() == Nullability.FALSE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you static import this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot, mixes up with Literal.TRUE
and Literal.FALSE
which are already statically imported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left couple of comments.
I think the correct label for 6.6.x is 6.6.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
value = TRUE; | ||
break; | ||
case FALSE: | ||
if (value == null || value == FALSE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (value == FALSE) value = FALSE;
doesn't change anything so it can be removed.
} | ||
} | ||
} | ||
return value != null ? value : UNKNOWN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this ever occur? Is the method called over an empty list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's a "safety net". Should we throw an exception instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might make sense though my preference is to handle this corner cases leniently.
I was a bit confused by UNKNOWN
, I would argue an empty list has FALSE
nullability (it can never be null) but then again maybe it's something that's worth having a check.
Logical operators OR and AND as well as conditional functions (COALESCE, LEAST, GREATEST, etc.) cannot be folded to NULL if one of their children is NULL as is the case for most of the functions. Therefore, their nullable() implementation cannot return true. On the other hand they cannot return false as if they're wrapped within an IS NULL or IS NOT NULL expression, the expression will be folded to false and true respectively leading to wrong results. Change the signature of nullable() method and add a third value UKNOWN to handle these cases. Fixes: #35872
Backported to |
Logical operators OR and AND as well as conditional functions (COALESCE, LEAST, GREATEST, etc.) cannot be folded to NULL if one of their children is NULL as is the case for most of the functions. Therefore, their nullable() implementation cannot return true. On the other hand they cannot return false as if they're wrapped within an IS NULL or IS NOT NULL expression, the expression will be folded to false and true respectively leading to wrong results. Change the signature of nullable() method and add a third value UKNOWN to handle these cases. Fixes: #35872
Backported to |
Logical operators OR and AND as well as conditional functions (COALESCE, LEAST, GREATEST, etc.) cannot be folded to NULL if one of their children is NULL as is the case for most of the functions. Therefore, their nullable() implementation cannot return true. On the other hand they cannot return false as if they're wrapped within an IS NULL or IS NOT NULL expression, the expression will be folded to false and true respectively leading to wrong results. Change the signature of nullable() method and add a third value UKNOWN to handle these cases. Fixes: #35872
Backported to |
Logical operators
OR
andAND
as well as conditional functions(
COALESCE
,LEAST
,GREATEST
, etc.) cannot be folded to NULL if oneof their children is NULL as is the case for most of the functions.
Therefore, their
nullable()
implementation cannot returntrue
. Onthe other hand they cannot return
false
as if they're wrapped withinan
IS NULL
orIS NOT NULL
expression, the expression will be foldedto
false
andtrue
respectively leading to wrong results.Change the signature of
nullable()
method and add a third valueUKNOWN
to handle these cases.
Fixes: #35872