[ESQL] Binary Comparison Serialization #107921

not-napoleon · 2024-04-25T18:47:37Z

Prior to this PR, serializing a binary comparison in ES|QL depended on the enum BinaryComparisonProcessor.BinaryComparisonOperator from the QL binary comparison code. That put some distance between the ESQL classes and their serialization logic, while also limiting our ability to make adjustments to that logic (since doing so would have ramifications for SQL and EQL)

This PR introduces a new ESQL specific enum for binary comparisons, which has a Writer and a Reader built in, and which implements the standard Writable interface. This enum is constructed in such a way as to be wire-compatible with the existing enum, thus not requiring a transport version change (although any future changes to this probably will require a transport version change).

A side effect of this change is removing Null Equals from ESQL serialization. We never actually implemented Null Equals, and the existing class is a stub. I infer that it was only created to allow use of the QL BinaryComparisonOperator enum, which specifies a Null Equals. I did not include it in the ESQL specific enum I just added, and as such removed it from places that reference that enum.

There is also a "shim" mapping from the new ESQL specific enum to the general QL enum. This is necessary for passing up to the parent BinaryOperation class. Changing the argument for that to use an interface like ArithmeticOperation does would require some non-trivial changes to how QL does serialization, which would dramatically increase the surface area of this PR. Medium term, I would like to change EsqlBinaryComparison to inherit directly from BinaryOperator, which will remove the need for that shim. Unfortunately, doing so proved non-trivial, and so I'm saving that for follow up work.

Follow up work:
- Remove remaining references to Null Equals, and the ESQL Null Equals class.
- Move PlanNamedTypes.writeBinComparison and PlanNamedTypes.readBinComparison into EsqlBinaryComparison, and make EsqlBinaryComparison Writable. This will finish putting the serialization logic next to the object being serialized, for binary comparisons.
- Remove the "shim" by changing EsqlBinaryComparison to inherit directly from BinaryOperation

elasticsearchmachine · 2024-04-25T18:48:01Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

nik9000 · 2024-04-25T20:33:36Z

...g/elasticsearch/xpack/esql/evaluator/predicate/operator/comparison/EsqlBinaryComparison.java

+                }
+            }
+            throw new IOException("No BinaryComparisonOperation found for id [" + id + "]");
+        }


This is a more normal way to serialize enums and let's us implement wire compatibility more easily. It's annoying to type it out for every single enum, but through hard won experience, we found it to be better.

What is the advantage of having individual almost identical readFromStream/writeTo methods for every enum, as opposed to the current, centralized (in StreamInput/StreamOuput) approach?

I see the serialization tests make use of the transport version (maybe this already exists now in "old" serialization code, not sure) as a novelty, but can't we do the same in ESQL (outside StreamOutput/StreamInput) and, also, have a common Enum serialization/deserialization code?

I can explain. We used to serialize enums the way that QL does - via the ordinals. It means fairly simple things like changing the order of the enums can break serialization. Or inserting a new enum. And adding wire backwards compatibility because a problem. You see here that we have to skip a number for wire compatibility. That's easy enough with the id member but requires an extra unused member. We broke enough enum serialization without realizing it that we decided we should be pedantic about it and make a method like this on each enum.

So, partly it's to remove the concerns around sorting enums or inserting new entries. And partly its to make wire backwards compatibility easier in the future. Well, in this case, now. And, partly, it's to make QLs enums look like the other well behaved serializable enums in Elasticsearch.

FWIW, I tend to use a switch statement instead of the loop. But that's not a big deal either way. I'd have done it like this:

@Override public void writeTo(StreamOutput out) throws IOException { out.writeByte(code); } public static Role readFrom(StreamInput in) throws IOException { return switch (in.readByte()) { case 0 -> DEFAULT; case 1 -> INDEX_ONLY; case 2 -> SEARCH_ONLY; default -> throw new IllegalStateException("unknown role"); }; }

But it doesn't matter.

FWIW there are about 200 calls to StreamOutput#writeEnum and StreamOutput#writeOptionalEnum which does the ordinal thing. Spot checking enums in core yields a bunch that delegate to that and a bunch that do the switch or loop way. I think its not as settled of a thing as I make it out to be. But it really makes me feel good knowing I don't have to worry about changing the order of the constants breaking things over the wire.

Thanks @nik9000

I opened another issue to talk about removing the enum serialization this way. There's some discussion around doing some other third thing instead. I dunno. I was more sure about this a few hours ago. I mean, I'm sure it's an improvement over what we have, but I'm not sure there aren't other good ways too. OTOH, better is better.

nik9000 · 2024-04-25T20:35:43Z

...ck/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/io/stream/PlanNamedTypesTests.java

@@ -334,15 +331,15 @@ public void testBinComparisonSimple() throws IOException {
        var orig = new Equals(Source.EMPTY, field("foo", DataTypes.DOUBLE), field("bar", DataTypes.DOUBLE));
        BytesStreamOutput bso = new BytesStreamOutput();
        PlanStreamOutput out = new PlanStreamOutput(bso, planNameRegistry);
-        out.writeNamed(BinaryComparison.class, orig);
-        var deser = (Equals) planStreamInput(bso).readNamed(BinaryComparison.class);
+        out.writeNamed(EsqlBinaryComparison.class, orig);


Does this cause us trouble with the name? I know PlanNamedTypes uses the class name sometimes.

I think the PlanNameRegistry only uses the plain class name, so as long as Equals extends BinaryComparison is replaced by Equals extends EsqlBinaryComparison - with the same name Equals, there's a chance this works without issue.

Changing the "category class" EsqlBinaryComparison under which we register Equals could in theory break things, but in practice when we read/write expressions, the PlanNamedTypes only cares that we have a subclass of Expression (which is unchanged).

alex-spies

LGTM, I'd wait for a review by Costin/Andrei though since they're more familiar with the QL project.

alex-spies · 2024-04-26T08:25:41Z

...g/elasticsearch/xpack/esql/evaluator/predicate/operator/comparison/EsqlBinaryComparison.java

+        BinaryComparisonOperation(
+            int id,
+            String symbol,
+            BinaryComparisonProcessor.BinaryComparisonOperation shim,


super-nit: maybe a short comment on why this is needed would help.

I really hoped it wouldn't live long enough to need such a comment, but you're probably right. I'll add one.

alex-spies · 2024-04-26T09:11:14Z

...ck/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/io/stream/PlanNamedTypesTests.java

@@ -334,15 +331,15 @@ public void testBinComparisonSimple() throws IOException {
        var orig = new Equals(Source.EMPTY, field("foo", DataTypes.DOUBLE), field("bar", DataTypes.DOUBLE));
        BytesStreamOutput bso = new BytesStreamOutput();
        PlanStreamOutput out = new PlanStreamOutput(bso, planNameRegistry);
-        out.writeNamed(BinaryComparison.class, orig);
-        var deser = (Equals) planStreamInput(bso).readNamed(BinaryComparison.class);
+        out.writeNamed(EsqlBinaryComparison.class, orig);


I think the PlanNameRegistry only uses the plain class name, so as long as Equals extends BinaryComparison is replaced by Equals extends EsqlBinaryComparison - with the same name Equals, there's a chance this works without issue.

Changing the "category class" EsqlBinaryComparison under which we register Equals could in theory break things, but in practice when we read/write expressions, the PlanNamedTypes only cares that we have a subclass of Expression (which is unchanged).

astefan

I understand the disconnect from QL and I agree with it by creating our own EsqlBinaryComparison, but I fail to see the benefits of duplicating serialization logic for enums.

astefan · 2024-04-26T09:05:25Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/io/stream/PlanNamedTypes.java

        var left = in.readExpression();
        var right = in.readExpression();
+        // TODO: Remove zoneId entirely


Why this TODO?

I mean, isn't there the possibility of having date > '2024-01-01T01:00:00` on a specific timezone (see Kibana) sometime in the future?

The TODO is because it's a field we don't currently use or test; none of the actual ES|QL implementations actually use it, so it only looks like it does something, while actually doing nothing. I find that confusing, since I tend to not expect to carry around unused fields. In this particular case, you can see I am not even passing this value on to the constructor.

Beyond that, I don't see the reason that binary comparisons, and only binary comparisons, need to track the time zone they are operating in. If a date is time zone aware, that information should be encoded in the date (2024-04-25T09:15:00+06:00 or similar). Carrying the timezone as part of the operator makes that very confusing. What does it mean to say date > 2024-04-25T09:15:00+06:00 in CET? And if that does have a meaning, why do we not also need it for (e.g.) subtraction of dates?

Ultimately, ES|QL doesn't have timezone support yet, and we don't know what that will look like. Maybe it will be the case that when building that, we decided we need to add back in a timezone here. But until we have a plan for that, I don't see why we should carry around a confusing, unused value.

astefan · 2024-04-26T09:49:23Z

...g/elasticsearch/xpack/esql/evaluator/predicate/operator/comparison/EsqlBinaryComparison.java

+                }
+            }
+            throw new IOException("No BinaryComparisonOperation found for id [" + id + "]");
+        }


What is the advantage of having individual almost identical readFromStream/writeTo methods for every enum, as opposed to the current, centralized (in StreamInput/StreamOuput) approach?

I see the serialization tests make use of the transport version (maybe this already exists now in "old" serialization code, not sure) as a novelty, but can't we do the same in ESQL (outside StreamOutput/StreamInput) and, also, have a common Enum serialization/deserialization code?

astefan

LGTM

not-napoleon · 2024-04-26T17:47:55Z

@elasticmachine update branch

…omparison-serialization' into esql-binary-comparison-serialization

not-napoleon added 8 commits April 25, 2024 09:09

change the category class for binary comparisions in ESQL serialization

27f55c2

let the comparison operators only reference EsqlBinaryComparison

8f206d8

plumb in a new, wire compatible, BinaryComparisonOperation enum

b586400

restore missing import

1864f3e

restore missing import

e36ee0f

wire up the new enum for serialization

c41f404

spelling mistake

9fc9c2c

remove asserts that aren't going to pass yet

91df788

not-napoleon added >non-issue :Analytics/ES|QL AKA ESQL v8.15.0 labels Apr 25, 2024

not-napoleon requested review from costin and nik9000 April 25, 2024 18:47

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 25, 2024

not-napoleon added 2 commits April 25, 2024 15:06

spotless apply

750d62c

fix test class name

62609f0

nik9000 reviewed Apr 25, 2024

View reviewed changes

alex-spies reviewed Apr 26, 2024

View reviewed changes

astefan reviewed Apr 26, 2024

View reviewed changes

astefan self-requested a review April 26, 2024 17:45

astefan approved these changes Apr 26, 2024

View reviewed changes

elasticmachine and others added 3 commits April 26, 2024 18:47

Merge branch 'main' into esql-binary-comparison-serialization

c4045a3

add requested coment

57d9759

Merge remote-tracking branch 'refs/remotes/not-napoleon/esql-binary-c…

356ae45

…omparison-serialization' into esql-binary-comparison-serialization

not-napoleon merged commit 4664ced into elastic:main Apr 26, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ESQL] Binary Comparison Serialization #107921

[ESQL] Binary Comparison Serialization #107921

not-napoleon commented Apr 25, 2024

elasticsearchmachine commented Apr 25, 2024

nik9000 Apr 25, 2024

astefan Apr 26, 2024

nik9000 Apr 26, 2024

astefan Apr 26, 2024

nik9000 Apr 26, 2024

nik9000 Apr 25, 2024

alex-spies Apr 26, 2024

alex-spies left a comment

alex-spies Apr 26, 2024

not-napoleon Apr 26, 2024

alex-spies Apr 26, 2024

astefan left a comment

astefan Apr 26, 2024

astefan Apr 26, 2024

not-napoleon Apr 26, 2024

astefan Apr 26, 2024

astefan left a comment

not-napoleon commented Apr 26, 2024

[ESQL] Binary Comparison Serialization #107921

[ESQL] Binary Comparison Serialization #107921

Conversation

not-napoleon commented Apr 25, 2024

elasticsearchmachine commented Apr 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alex-spies left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astefan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astefan left a comment

Choose a reason for hiding this comment

not-napoleon commented Apr 26, 2024