ESQL: Add support for TEXT fields in comparison operators and SORT #98528

luigidellaquila · 2023-08-16T09:58:19Z

(Refactoring of #98317, extending the operators instead of modifying QL module)

This fix adds support for TEXT fields in comparison operators (==, !=, >, >= etc., including LIKE and RLIKE) and in SORT command.

The fix consists in passing one more parameter to BinaryComparison, Order and RegexMatch constructors (and all the subclasses) to define whether the instance should resolve on TEXT fields (for ESQL) or not (for SQL and EQL).

Operators pushdown (both for filtering and sorting) is enabled only if the property also declares an exact multi-field that can be used as a fallback, otherwise the optimization gets skipped and the operation is executed in-line.

[edit]

Added a rule that replaces fields with their exact subfield when possible

Fixes #98642

elasticsearchmachine · 2023-08-16T09:58:43Z

Pinging @elastic/es-ql (Team:QL)

elasticsearchmachine · 2023-08-16T09:58:44Z

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

luigidellaquila · 2023-08-16T13:22:58Z

@elasticmachine run elasticsearch-ci/part-1

luigidellaquila · 2023-08-16T13:23:14Z

@elasticmachine run elasticsearch-ci/bwc

bpintea

Left one more relevant note, otherwise it LGTM.

bpintea · 2023-08-16T12:56:09Z

...main/java/org/elasticsearch/xpack/esql/expression/predicate/operator/regex/WildcardLike.java

+import static org.elasticsearch.xpack.ql.expression.TypeResolutions.ParamOrdinal.DEFAULT;
+import static org.elasticsearch.xpack.ql.expression.TypeResolutions.isString;
+
+public class WildcardLike extends org.elasticsearch.xpack.ql.expression.predicate.regex.WildcardLike {


do we expect a case insensitive like to be added? if not, i guess we could get rid of the c'tor parameter.

bpintea · 2023-08-16T14:21:45Z

...a/org/elasticsearch/xpack/esql/expression/predicate/operator/comparison/LessThanOrEqual.java

+
+    @Override
+    public BinaryComparison reverse() {
+        return super.reverse();


This will return a QL GT.

I'm wondering if all these functions shouldn't rather extend QL's BinaryComparsion directly, since they override most methods anyways. Or alterantively, there could be an EsqlBinaryComparion (extending QL's), which could implement the type resolution, which is identical in all the new comparisons.

Argh! The whole point of extending that method was to return the right instance, and I just missed it.
Fixing it right away

bpintea · 2023-08-16T14:24:54Z

...ck/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/io/stream/PlanNamedTypesTests.java

@@ -354,11 +354,16 @@ public void testLiteralSimple() throws IOException {
    }

    public void testOrderSimple() throws IOException {
-        var orig = new Order(Source.EMPTY, field("val", DataTypes.INTEGER), Order.OrderDirection.ASC, Order.NullsPosition.FIRST);
+        var orig = new org.elasticsearch.xpack.esql.expression.Order(


I guess ESQL's Order could be imported directly.

costin

Thanks for waiting Luigi and looking into this. Left a couple of comments - this is going to clash with #98628, thanks in advanced for working around it.

costin · 2023-08-22T01:46:00Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java

This rule made sense before since the comparison classes were part of QL - now that we have our own classes, the type validation should be within the comparison classes themselves.
There's no advantage of it being outside rather it's confusing since the classes inherit the QL behavior which is used inside resolution and then complemented through the Verifier, which is both redundant and error-prone (the type resolution and Verifier need to be kept in sync and not trip over one another).

I did some tests, moving this logic inside the resolution flow; the fix is not complex, but it has some practical implications:

the Analyzer does some automatic conversion (KEYWORD->DATETIME), that happen before this rule, and only if the expressions are foldable. At the same time, the resolution is used before the conversion (eg. in ResolveFunctions rule) and fails if we keep the logic as it is. So we need to review the logic a bit.

AbstractBinaryComparisonTestCase does not consider these automatic conversions (and today it makes no distinction between foldable and non-foldable expressions, see above), so we have to review that as well.

If you don't mind, I'd prefer to address this problem with a follow-up PR and discuss the changes separately

WFM - please raise an issue that explains the follow-up items and link it here.

costin · 2023-08-22T01:51:34Z

...in/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizer.java

+                    if (fa.getExactInfo().hasExact() == false) {
+                        return false;
+                    }
+                } else {
+                    return false;
+                }


if (order.child() instanceof FieldAttribute && fa.getExactInfo().hasExact()) == false { return false; }

probably a bit more readable:

return orders.stream().allMatch(o -> o.child() instanceof FieldAttribute fa && fa.getExactInfo().hasExact());

costin · 2023-08-22T01:54:54Z

...in/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizer.java

        }

        private List<EsQueryExec.FieldSort> buildFieldSorts(List<Order> orders) {
            List<EsQueryExec.FieldSort> sorts = new ArrayList<>(orders.size());
            for (Order o : orders) {
-                sorts.add(new EsQueryExec.FieldSort(((FieldAttribute) o.child()), o.direction(), o.nullsPosition()));
+                sorts.add(new EsQueryExec.FieldSort(((FieldAttribute) o.child()).exactAttribute(), o.direction(), o.nullsPosition()));


I think the sort would have to be extracted as part of #98642 and instead of letting the original field be passed in and select its exact attribute, have an optimization rule that determines (potentially locally) that an exact field is available an use that instead.
This would apply across sort and aggregations transparently without having to switch things in multiple places.

I added a rule that implements #98642 and that also covers this case.
I wanted to do it with a separate PR, but with currently supported data types the only way to write thorough tests is to have both fixes together, so I ended up adding it here.

costin · 2023-08-22T01:57:16Z

...in/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizer.java

-                return ExpressionTranslator.wrapIfNested(new SingleValueQuery(querySupplier.get(), fa.name()), field);
+                return ExpressionTranslator.wrapIfNested(
+                    new SingleValueQuery(querySupplier.get(), fa.exactAttribute().name()),
+                    ((FieldAttribute) field).exactAttribute()


costin · 2023-08-22T01:59:32Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

+                            ? new org.elasticsearch.xpack.esql.expression.predicate.operator.comparison.Equals(
+                                k.source(),
+                                k,
+                                v.iterator().next(),
+                                finalZoneId


Introducing a createEquals similar to createIn, avoids the need to duplicate this class and instead keep using the original rule.
Or am I missing something else?

costin · 2023-08-22T02:01:20Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

@@ -684,4 +771,35 @@ protected LogicalPlan rule(Limit plan) {
            return p;
        }
    }
+
+    public static class ReplaceRegexMatch extends OptimizerRules.OptimizerExpressionRule<RegexMatch<?>> {


No need to copy the class - extend ReplaceRegexMatch in QL and override regextToEquals to return the ESQL Equals class similar to EQL.

costin · 2023-08-22T02:02:46Z

...ql/src/main/java/org/elasticsearch/xpack/esql/expression/predicate/operator/regex/RLike.java

Why are these (regex) classes added?

only to override resolveType()

luigidellaquila · 2023-08-22T09:16:19Z

Thanks for your feedback @costin

this is going to clash with #98628, thanks in advanced for working around it.

No problem, I'll take care of resolving the conflicts

and fix stats validation

elasticsearchmachine · 2023-08-24T07:56:43Z

Hi @luigidellaquila, I've created a changelog YAML for you.

elasticsearchmachine · 2023-08-24T07:58:38Z

Hi @luigidellaquila, I've updated the changelog YAML for you.

costin

Thanks for incorporating the feedback Luigi on this wide PR. Left a round of comments.

costin · 2023-08-24T16:46:40Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java

@@ -271,6 +271,9 @@ public static Failure validateBinaryComparison(BinaryComparison bc) {
        if (false == r.resolved()) {
            return fail(bc, r.message());
        }
+        if (DataTypes.isString(bc.left().dataType()) && DataTypes.isString(bc.right().dataType())) {


Why is this needed? Aren't the operators performing their own, full type resolution?
In fact why still have this rule in place?

See my comment above #98528 (comment)

costin · 2023-08-24T16:48:37Z

...c/main/java/org/elasticsearch/xpack/esql/evaluator/predicate/operator/comparison/Equals.java

+        if (e instanceof FieldAttribute fa && fa.dataType() == DataTypes.TEXT) {
+            return TypeResolution.TYPE_RESOLVED;
+        }


Makes for an utility method in EsqlTypeResolutions or even in the original TypeResolutions class.

costin · 2023-08-24T16:48:53Z

...n/java/org/elasticsearch/xpack/esql/evaluator/predicate/operator/comparison/GreaterThan.java

+        if (e instanceof FieldAttribute fa && fa.dataType() == DataTypes.TEXT) {
+            return TypeResolution.TYPE_RESOLVED;
+        }


See comment above.

costin · 2023-08-24T16:50:08Z

...sql/src/main/java/org/elasticsearch/xpack/esql/evaluator/predicate/operator/regex/RLike.java

Remind me again why are these two classes here? They don't seem to add anything over the QL ones...

protected TypeResolution resolveType() { return isString(field(), sourceText(), DEFAULT); }

vs

protected TypeResolution resolveType() { return isStringAndExact(field(), sourceText(), DEFAULT); }

When pushing the filter down, we might want to keep as is since it can be used as a prefix query for example.
Worth raising an issue for it for when we have more time to investigate (or get some advice from the search team).

costin · 2023-08-24T16:52:29Z

...gin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Count.java

+
+    @Override
+    protected TypeResolution resolveType() {
+        return field().dataType() == DataTypes.TEXT ? TypeResolution.TYPE_RESOLVED : super.resolveType();


This could be another resolution utility method - such as isString() which for ESQL includes TEXT or isStringFamily to not confuse it with the method from QL.

costin · 2023-08-24T16:54:58Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

+
+        protected Equals createEquals(Expression k, Set<Expression> v, ZoneId finalZoneId) {
+            return new Equals(k.source(), k, v.iterator().next(), finalZoneId);
+        }


costin · 2023-08-24T16:57:59Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

@@ -88,6 +92,7 @@ protected static List<Batch<LogicalPlan>> rules() {

        var operators = new Batch<>(
            "Operator Optimization",
+            new ReplaceAttributesWithExact(),


The Rewrite/Replacement only needs to occur once hence move it in the "Substitutions" batch above.

the name should reflect that not all attributes are changed --> ReplaceFieldAttributesWithExactSubfield, quite long but clear about the intent.

costin · 2023-08-24T17:03:57Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

+        }
+    }
+
+    private static class ReplaceAttributesWithExact extends OptimizerRules.OptimizerRule<LogicalPlan> {


Only the fields that occur inside certain operations need to be replaced - doing this across all fields is going to have side effects. Fields inside sort, stats need to be changed; potentially those used inside filters (though that depends whether full text search is used on them or not).
If the list is too broad, the rule could look into the fields that need to be preserved as are and then change everything else to be exact (though again that might backfire).

👍 I'm changing this to limit the scope to OrderBy, Aggregate and Filter plans. I'm including Filter since we have no full text search for now, and it would be a shame not pushing down filters when possible. Likely, we'll have to refine this rule in the future.

costin · 2023-08-24T17:05:03Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

+        private NamedExpression toExactAlias(FieldAttribute e, List<? extends NamedExpression> projections) {
+
+            if (e.getExactInfo().hasExact() && e.exactAttribute() != e) {
+                FieldAttribute calculatedExact = toExact(e);
+
+                // avoid using multiple IDs just to load the same field twice
+                NamedExpression exact = projections.stream()
+                    .filter(x -> x.name().equals(calculatedExact.name()))
+                    .map(NamedExpression.class::cast)
+                    .findFirst()
+                    .orElse(calculatedExact);
+
+                return new Alias(e.source(), e.name(), exact);
+            }
+            return e;
+        }
+
+        private static FieldAttribute toExact(FieldAttribute fa) {
+            if (fa.getExactInfo().hasExact() && fa.exactAttribute() != fa) {
+                return fa.exactAttribute();
+            }
+            return fa;


No need to add new aliases and fields - simply replace the FieldAttribute in place, with the same underlying NameId but set the EsField to its exact counterpart. That is the attribute remains essentially the same, it's just the backing field that has changed.

…to esql/text_operators

luigidellaquila · 2023-08-25T11:22:49Z

@elasticmachine run elasticsearch-ci/bwc

costin

Grazie Luigi for patiently iterating on this PR.
Looks good to me - however I left a comment on why both the field and its raw subfield are being extracted; should just raw be used instead?

Thanks!

costin · 2023-08-29T05:28:10Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/EsqlTypeResolutions.java

costin · 2023-08-29T05:30:16Z

...in/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java

+            .mapToInt(
+                f -> (EstimatesRowSize.estimateSize(EsqlDataTypes.widenSmallNumericTypes(f.getDataType())) + f.getProperties()
+                    .values()
+                    .stream()
+                    .mapToInt(x -> EstimatesRowSize.estimateSize(EsqlDataTypes.widenSmallNumericTypes(x.getDataType())))
+                    .sum())
+            )


Please leave a comment on why the properties are considered as well.

costin · 2023-08-29T05:31:52Z

...in/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java

        assertEquals(Set.of("emp_no"), Sets.newHashSet(names(extract.attributesToExtract())));

        var query = as(extract.child(), EsQueryExec.class);
        assertThat(query.estimatedRowSize(), equalTo(Integer.BYTES + allFieldRowSize));
    }

+    private Set<String> allFields(Map<String, EsField> mapping) {


What purpose does this method serve; in particular where it is being used?

It's used in two tests that check the output based on the mapping.
Before this change, it was just mapping.keySet(), but now the mapping also contains a TEXT field with a KEYWORD subfield, so we have to take it into consideration as well.

costin · 2023-08-29T05:33:19Z

...in/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java

@@ -423,7 +419,7 @@ public void testExtractorMultiEvalWithSameName() {
        var extract = as(project.child(), FieldExtractExec.class);
        assertThat(
            names(extract.attributesToExtract()),
-            contains("_meta_field", "emp_no", "first_name", "gender", "job.raw", "languages", "last_name", "salary")
+            contains("_meta_field", "emp_no", "first_name", "gender", "job", "job.raw", "languages", "last_name", "salary")


Why are both job and job.raw extracted? Wouldn't job.raw be enough or does jaw leak somewhere?

In the initial version of this PR, I replaced all the occurrences in the plan with their exact subfield (including EsRelation), and as a result there was no need to load job anymore, but it made the planning a bit more convoluted, with the addition of some aliases to keep the original names (otherwise the output for from idx would result in job.raw instead of job).
There could be a smarter way to do it and avoid fetching TEXT fields; in case we can further investigate and do it with a follow-up PR (at low level there are other optimizations, so we should not load from _source anyway)

luigidellaquila added 4 commits August 14, 2023 18:34

Add support for TEXT operator for binary comparison and order

3079bad

Add support TEXT fields in LIKE and RLIKE

76d9c7b

Add tests

515dbe6

Merge branch 'feature/esql' into esql/text_operators

96acce2

luigidellaquila added >enhancement :Analytics/ES|QL AKA ESQL labels Aug 16, 2023

luigidellaquila requested a review from costin August 16, 2023 09:58

elasticsearchmachine added the Team:QL (Deprecated) Meta label for query languages team label Aug 16, 2023

luigidellaquila mentioned this pull request Aug 16, 2023

ESQL: Add support for TEXT fields in comparison operators and SORT #98317

Closed

Merge branch 'feature/esql' into esql/text_operators

158d743

bpintea approved these changes Aug 16, 2023

View reviewed changes

Implement review suggestions

c739886

luigidellaquila mentioned this pull request Aug 17, 2023

Add ESQL own flavor of arithmetic operators #98558

Closed

ChrisHegarty deleted the branch elastic:main August 17, 2023 13:05

ChrisHegarty closed this Aug 17, 2023

luigidellaquila reopened this Aug 17, 2023

luigidellaquila changed the base branch from feature/esql to main August 17, 2023 13:28

costin reviewed Aug 22, 2023

View reviewed changes

luigidellaquila added 4 commits August 23, 2023 14:18

Use exact fields

b80187c

and fix stats validation

Merge branch 'main' into esql/text_operators

3bd2bee

Merge branch 'main' into esql/text_operators

915c30d

Format QL code

ffd27c3

luigidellaquila added the v8.11.0 label Aug 24, 2023

Update docs/changelog/98528.yaml

27814b8

Update docs/changelog/98528.yaml

1496dbe

costin reviewed Aug 24, 2023

View reviewed changes

luigidellaquila added 2 commits August 25, 2023 12:43

Implement review suggestions

8462246

Merge remote-tracking branch 'luigidellaquila/esql/text_operators' in…

f589afa

…to esql/text_operators

Merge branch 'main' into esql/text_operators

7e0c36a

costin approved these changes Aug 29, 2023

View reviewed changes

Add comments

5bfd341

This was referenced Aug 29, 2023

ES|QL: investigate on regex filters pushdown #98997

Open

ES|QL: move type checks for binary operators from Verifier to the single operators #99035

Open

luigidellaquila merged commit ba87357 into elastic:main Aug 30, 2023
12 checks passed

ESQL: Add support for TEXT fields in comparison operators and SORT #98528

ESQL: Add support for TEXT fields in comparison operators and SORT #98528

Conversation

luigidellaquila commented Aug 16, 2023 • edited

elasticsearchmachine commented Aug 16, 2023

elasticsearchmachine commented Aug 16, 2023

luigidellaquila commented Aug 16, 2023

luigidellaquila commented Aug 16, 2023

bpintea left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

costin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luigidellaquila commented Aug 22, 2023

elasticsearchmachine commented Aug 24, 2023

elasticsearchmachine commented Aug 24, 2023

costin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luigidellaquila commented Aug 25, 2023

costin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luigidellaquila commented Aug 16, 2023 •

edited