Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Handle NaN returned for aggs in case of nulls #35164

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -515,7 +515,8 @@ public void testTranslateQueryWithGroupByAndHaving() throws IOException {
@SuppressWarnings("unchecked")
Map<String, Object> filterScript = (Map<String, Object>) bucketSelector.get("script");
assertEquals(3, filterScript.size());
assertEquals("InternalSqlScriptUtils.nullSafeFilter(InternalSqlScriptUtils.gt(params.a0,params.v0))",
assertEquals("InternalSqlScriptUtils.nullSafeFilter(InternalSqlScriptUtils.gt(" +
"InternalSqlScriptUtils.nanSafeFilter(params.a0),params.v0))",
filterScript.get("source"));
assertEquals("painless", filterScript.get("lang"));
@SuppressWarnings("unchecked")
Expand Down
13 changes: 13 additions & 0 deletions x-pack/plugin/sql/qa/src/main/resources/agg.sql-spec
Expand Up @@ -467,6 +467,19 @@ SELECT languages, COUNT(*) c FROM test_emp GROUP BY languages HAVING NOT COUNT(*
selectLangGroupByLangHavingDifferent
SELECT languages, COUNT(*) c FROM test_emp GROUP BY languages HAVING COUNT(*) <> 1 ORDER BY languages DESC;

aggWithHavingAndComparisonOnNaN1
SELECT max(languages) AS maxl, gender FROM "test_emp" WHERE emp_no IN(10018, 10019, 10020, 10021) GROUP BY gender HAVING maxl <= 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please try other aggs as well?
For example use min and max to trigger a stats agg, kurtosis for matrix, sum (which should always return zero), percentiles and percentile ranks.
If indeed the bucket selector gives us just the value, I think the proper fix should actually be inside ES...
@polyfractal this looks like another instanceof #34903 - what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, playing catch-up here. What's the underlying aggregation structure being generated? Aggs + bucket_selector? Which value(s) being aggregated are NaN/null?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example above HAVING maxl <=2 becomes a Painless script doing a bucket selector pointing to MAX agg with the content of (max_agg <=2).
This issue appears though for any agg - when running against a null bucket, instead of returning nulls, the aggs selected by the bucket selector their default values (like in #34903). In case of Max is a NaN but this clearly differs from case to case.
In scripting however we have no understanding of the source so we can't determine that it's an agg since we only receive the value (NaN) or whatever.

I would argue this is an actual bug in ES.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@costin Ahh, yes I see. I agree it's related to #34903. Bucket selector is just resolving to the internal agg object and asking for it's value(), which returns the primitive double internal state unlike xcontent which does the null conversion.

I think we can probably expose the hasValue() method (or whatever the mechanism ends up being) in the bucket_selector painless context so that it can be used from scripts. I don't think it's something we can do automatically though, because of how the pipeline framework works right now. It's a similar issue to #27377 (not quite the same, but similar root issue). I don't think we can fix that easily until the transport client goes away, or else we'll have a big breaking change to how agg objects work (double to Double or Optional)

So the script might need to do a check if there is a value before actually using the value. Something like:

if (hasValue(max1) && max1.value <= 2) {
   ...
}

Not ideal but probably the path of least resistance right now. Or maybe introduce a new valueOrNull() method in the context or something.

I'll make a note on #34903 about pipelines

aggWithHavingAndComparisonOnNaN2
SELECT max(languages) AS maxl, gender FROM "test_emp" WHERE emp_no IN(10018, 10019, 10020, 10021) GROUP BY gender HAVING maxl < 2;
aggWithHavingAndIsNull
SELECT max(languages) AS maxl, gender FROM "test_emp" WHERE emp_no IN(10018, 10019, 10020, 10021) GROUP BY gender HAVING maxl IS NULL;
aggWithHavingAndIsNullAndNegation
SELECT max(languages) AS maxl, gender FROM "test_emp" WHERE emp_no IN(10018, 10019, 10020, 10021) GROUP BY gender HAVING NOT (maxl IS NULL);
aggWithHavingAndIsNotNull
SELECT max(languages) AS maxl, gender FROM "test_emp" WHERE emp_no IN(10018, 10019, 10020, 10021) GROUP BY gender HAVING maxl IS NOT NULL;
aggWithHavingAndIsNotNullAndNegation
SELECT max(languages) AS maxl, gender FROM "test_emp" WHERE emp_no IN(10018, 10019, 10020, 10021) GROUP BY gender HAVING NOT (maxl IS NOT NULL);


// filter with IN
aggMultiWithHavingUsingInAndNullHandling
Expand Down
Expand Up @@ -21,8 +21,8 @@
import org.elasticsearch.xpack.sql.expression.function.scalar.string.ReplaceFunctionProcessor;
import org.elasticsearch.xpack.sql.expression.function.scalar.string.StringProcessor.StringOperation;
import org.elasticsearch.xpack.sql.expression.function.scalar.string.SubstringFunctionProcessor;
import org.elasticsearch.xpack.sql.expression.predicate.logical.BinaryLogicProcessor.BinaryLogicOperation;
import org.elasticsearch.xpack.sql.expression.predicate.conditional.CoalesceProcessor;
import org.elasticsearch.xpack.sql.expression.predicate.logical.BinaryLogicProcessor.BinaryLogicOperation;
import org.elasticsearch.xpack.sql.expression.predicate.logical.NotProcessor;
import org.elasticsearch.xpack.sql.expression.predicate.nulls.IsNotNullProcessor;
import org.elasticsearch.xpack.sql.expression.predicate.operator.arithmetic.BinaryArithmeticProcessor.BinaryArithmeticOperation;
Expand All @@ -44,7 +44,8 @@
@SuppressWarnings("unused")
public final class InternalSqlScriptUtils {

private InternalSqlScriptUtils() {}
private InternalSqlScriptUtils() {
}

//
// Utilities
Expand All @@ -60,7 +61,7 @@ public static <T> Object docValue(Map<String, ScriptDocValues<T>> doc, String fi
}
return null;
}

public static boolean nullSafeFilter(Boolean filter) {
return filter == null ? false : filter.booleanValue();
}
Expand All @@ -73,6 +74,17 @@ public static String nullSafeSortString(Object sort) {
return sort == null ? StringUtils.EMPTY : sort.toString();
}

public static Number nanSafeFilter(Number aggResult) {
if (aggResult == null) {
return null;
}
if ((aggResult instanceof Double && ((Double) aggResult).isNaN()) ||
(aggResult instanceof Float && ((Float) aggResult).isNaN())) {
return null;
}
return aggResult;
}


//
// Operators
Expand All @@ -92,7 +104,7 @@ public static Boolean neq(Object left, Object right) {
public static Boolean lt(Object left, Object right) {
return BinaryComparisonOperation.LT.apply(left, right);
}

public static Boolean lte(Object left, Object right) {
return BinaryComparisonOperation.LTE.apply(left, right);
}
Expand All @@ -108,7 +120,7 @@ public static Boolean gte(Object left, Object right) {
public static Boolean and(Boolean left, Boolean right) {
return BinaryLogicOperation.AND.apply(left, right);
}

public static Boolean or(Boolean left, Boolean right) {
return BinaryLogicOperation.OR.apply(left, right);
}
Expand Down Expand Up @@ -279,21 +291,21 @@ public static Integer dateTimeChrono(Object dateTime, String tzId, String chrono
}
return DateTimeFunction.dateTimeChrono(asDateTime(dateTime), tzId, chronoName);
}

public static String dayName(Object dateTime, String tzId) {
if (dateTime == null || tzId == null) {
return null;
}
return NameExtractor.DAY_NAME.extract(asDateTime(dateTime), tzId);
}

public static String monthName(Object dateTime, String tzId) {
if (dateTime == null || tzId == null) {
return null;
}
return NameExtractor.MONTH_NAME.extract(asDateTime(dateTime), tzId);
}

public static Integer quarter(Object dateTime, String tzId) {
if (dateTime == null || tzId == null) {
return null;
Expand All @@ -307,14 +319,14 @@ private static ZonedDateTime asDateTime(Object dateTime) {
}
throw new SqlIllegalArgumentException("Invalid date encountered [{}]", dateTime);
}

//
// String functions
//
public static Integer ascii(String s) {
return (Integer) StringOperation.ASCII.apply(s);
}

public static Integer bitLength(String s) {
return (Integer) StringOperation.BIT_LENGTH.apply(s);
}
Expand All @@ -326,7 +338,7 @@ public static String character(Number n) {
public static Integer charLength(String s) {
return (Integer) StringOperation.CHAR_LENGTH.apply(s);
}

public static String concat(String s1, String s2) {
return (String) ConcatFunctionProcessor.process(s1, s2);
}
Expand All @@ -350,31 +362,31 @@ public static Integer length(String s) {
public static Integer locate(String s1, String s2) {
return locate(s1, s2, null);
}

public static Integer locate(String s1, String s2, Number pos) {
return LocateFunctionProcessor.doProcess(s1, s2, pos);
}

public static String ltrim(String s) {
return (String) StringOperation.LTRIM.apply(s);
}

public static Integer octetLength(String s) {
return (Integer) StringOperation.OCTET_LENGTH.apply(s);
}

public static Integer position(String s1, String s2) {
return (Integer) BinaryStringStringOperation.POSITION.apply(s1, s2);
}

public static String repeat(String s, Number count) {
return BinaryStringNumericOperation.REPEAT.apply(s, count);
}

public static String replace(String s1, String s2, String s3) {
return (String) ReplaceFunctionProcessor.doProcess(s1, s2, s3);
}

public static String right(String s, Number count) {
return BinaryStringNumericOperation.RIGHT.apply(s, count);
}
Expand Down
Expand Up @@ -58,7 +58,7 @@ default ScriptTemplate scriptWithScalar(ScalarFunctionAttribute scalar) {
}

default ScriptTemplate scriptWithAggregate(AggregateFunctionAttribute aggregate) {
return new ScriptTemplate(processScript("{}"),
return new ScriptTemplate(processScript(Scripts.SQL_SCRIPTS + ".nanSafeFilter({})"),
paramsBuilder().agg(aggregate).build(),
dataType());
}
Expand All @@ -76,4 +76,4 @@ default String processScript(String script) {
default String formatTemplate(String template) {
return Scripts.formatTemplate(template);
}
}
}
Expand Up @@ -15,6 +15,7 @@ class org.elasticsearch.xpack.sql.expression.function.scalar.whitelist.InternalS
boolean nullSafeFilter(Boolean)
double nullSafeSortNumeric(Number)
String nullSafeSortString(Object)
Number nanSafeFilter(Number)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put the elements in this group in alphabetical order, please? I think that was the original intention with the list of functions here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is misleading - the above methods are used to wrap an expression used in filtering; this method should be used against all aggs, regardless of their position or whether the expression is a filter or not.
How about nullAgg()?


#
# Comparison
Expand Down
Expand Up @@ -161,6 +161,21 @@ public void testLikeConstructsNotSupported() {
assertEquals("Scalar function (LTRIM(keyword)) not allowed (yet) as arguments for LIKE", ex.getMessage());
}

public void testTranslateIsNullExpression_HavingClause_Painless() {
LogicalPlan p = plan("SELECT keyword, max(int) FROM test GROUP BY keyword HAVING max(int) IS NULL");
assertTrue(p instanceof Project);
assertTrue(p.children().get(0) instanceof Filter);
Expression condition = ((Filter) p.children().get(0)).condition();
assertFalse(condition.foldable());
QueryTranslation translation = QueryTranslator.toQuery(condition, true);
assertNull(translation.query);
AggFilter aggFilter = translation.aggFilter;
assertEquals("InternalSqlScriptUtils.nullSafeFilter(InternalSqlScriptUtils.not" +
"(InternalSqlScriptUtils.notNull(InternalSqlScriptUtils.nanSafeFilter(params.a0))))",
aggFilter.scriptTemplate().toString());
assertThat(aggFilter.scriptTemplate().params().toString(), startsWith("[{a=MAX(int){a->"));
}

public void testTranslateInExpression_WhereClause() {
LogicalPlan p = plan("SELECT * FROM test WHERE keyword IN ('foo', 'bar', 'lala', 'foo', concat('la', 'la'))");
assertTrue(p instanceof Project);
Expand Down Expand Up @@ -225,7 +240,8 @@ public void testTranslateInExpression_HavingClause_Painless() {
QueryTranslation translation = QueryTranslator.toQuery(condition, true);
assertNull(translation.query);
AggFilter aggFilter = translation.aggFilter;
assertEquals("InternalSqlScriptUtils.nullSafeFilter(InternalSqlScriptUtils.in(params.a0, params.v0))",
assertEquals("InternalSqlScriptUtils.nullSafeFilter(InternalSqlScriptUtils.in(" +
"InternalSqlScriptUtils.nanSafeFilter(params.a0), params.v0))",
aggFilter.scriptTemplate().toString());
assertThat(aggFilter.scriptTemplate().params().toString(), startsWith("[{a=MAX(int){a->"));
assertThat(aggFilter.scriptTemplate().params().toString(), endsWith(", {v=[10, 20]}]"));
Expand All @@ -240,7 +256,8 @@ public void testTranslateInExpression_HavingClause_PainlessOneArg() {
QueryTranslation translation = QueryTranslator.toQuery(condition, true);
assertNull(translation.query);
AggFilter aggFilter = translation.aggFilter;
assertEquals("InternalSqlScriptUtils.nullSafeFilter(InternalSqlScriptUtils.in(params.a0, params.v0))",
assertEquals("InternalSqlScriptUtils.nullSafeFilter(InternalSqlScriptUtils.in(" +
"InternalSqlScriptUtils.nanSafeFilter(params.a0), params.v0))",
aggFilter.scriptTemplate().toString());
assertThat(aggFilter.scriptTemplate().params().toString(), startsWith("[{a=MAX(int){a->"));
assertThat(aggFilter.scriptTemplate().params().toString(), endsWith(", {v=[10]}]"));
Expand All @@ -256,7 +273,8 @@ public void testTranslateInExpression_HavingClause_PainlessAndNullHandling() {
QueryTranslation translation = QueryTranslator.toQuery(condition, true);
assertNull(translation.query);
AggFilter aggFilter = translation.aggFilter;
assertEquals("InternalSqlScriptUtils.nullSafeFilter(InternalSqlScriptUtils.in(params.a0, params.v0))",
assertEquals("InternalSqlScriptUtils.nullSafeFilter(InternalSqlScriptUtils.in(" +
"InternalSqlScriptUtils.nanSafeFilter(params.a0), params.v0))",
aggFilter.scriptTemplate().toString());
assertThat(aggFilter.scriptTemplate().params().toString(), startsWith("[{a=MAX(int){a->"));
assertThat(aggFilter.scriptTemplate().params().toString(), endsWith(", {v=[10, null, 20, 30]}]"));
Expand Down