ESQL: Load values a different way #101235

nik9000 · 2023-10-23T20:05:12Z

This changes how we load values in ESQL, delegating to the MappedFieldType like we do with doc values and synthetic source. This allows a much more OO way of getting the loads working which makes that path much easier to read. And! It means those code paths look like doc values. So there's symmetry. It's like it rhymes.

There are a few side effects here:

It's fairly simple to load from ordinals efficiently. I wrote some block-at-a-time code for resolving ordinals and it's about twice as fast. With more work it should be possible to make custom ordinal-shaped blocks move through the system to save space and speed things up.
Most fields can now be loaded from _source. Everything that can be loaded from _source in scripts will load from _source in ESQL.
We get a lot more tests for loading fields in different configurations by piggybacking on the synthetic source testing framework.
Loading from _source no longer sorts the fields. Same for stored fields. Now we keep them in whatever they were stored in. This is a pretty marginal time save because loading from _source is so much more time consuming than the sort. But it's something.

elasticsearchmachine · 2023-10-23T20:05:35Z

Pinging @elastic/es-ql (Team:QL)

elasticsearchmachine · 2023-10-23T20:05:36Z

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

elasticsearchmachine · 2023-10-23T20:05:36Z

Hi @nik9000, I've created a changelog YAML for you.

nik9000 · 2023-10-23T20:07:37Z

Here's the speedup from decoding ordinals in a more sensible way:

Before ValuesSourceReaderBenchmark.benchmark  in_order  keyword  avgt    7  369.033 ± 12.640  ns/op
 After ValuesSourceReaderBenchmark.benchmark  in_order  keyword  avgt    7  142.628 ±  2.022  ns/op

One thing that's crying out to me - we really want to load from stored fields like _source in a different way. Right now we load just like doc values, but those are fundamentally row-at-a-time mechanisms. And we can do that. Just not right now.

…' into load_different_way

nik9000 · 2023-10-23T20:08:03Z

.../esql/qa/server/single-node/src/yamlRestTest/resources/rest-api-spec/test/90_non_indexed.yml

  - match: { values.0.14: 20 }
-  - match: { values.0.15: null }
+  - match: { values.0.15: 20 }


These are mostly supported now. I suppose they should move out.

nik9000 · 2023-10-23T20:24:19Z

dnhatn · 2023-10-23T20:36:54Z

Before ValuesSourceReaderBenchmark.benchmark in_order keyword avgt 7 369.033 ± 12.640 ns/op
After ValuesSourceReaderBenchmark.benchmark in_order keyword avgt 7 142.628 ± 2.022 ns/op

Wow, this is awesome.

nik9000 · 2023-10-24T13:52:44Z

Wow, this is awesome.

Just use the ordinals properly and everything is faster!

Seriously, though, if we had a BytesRefBlock that allowed an extra layer of indirection we could resolve the ordinals one time and let them flow through the system. That'd be pretty sweet!

dnhatn

Beautiful! I think we can leverage the sequential stored-fields reader after this change too. Thank you, Nik!

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/package-info.java

.../esql/compute/src/main/java/org/elasticsearch/compute/operator/exchange/ExchangeService.java

...l/compute/src/main/java/org/elasticsearch/compute/operator/exchange/ExchangeSinkHandler.java

...compute/src/main/java/org/elasticsearch/compute/operator/exchange/ExchangeSourceHandler.java

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/package-info.java

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/X-Block.java.st

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java

.../plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/TransportEsqlQueryAction.java

dnhatn · 2023-10-25T00:04:11Z

server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java

+     * {@link #beginPositionEntry} followed by two or more {@code append<Type>}
+     * calls, and then {@link #endPositionEntry}.
+     */
+    interface Builder {


I think we should make this Builder Releasable and release it if we hit a breaker when reading values. Let's do this in a follow-up.

++. I think we have to do it to properly track memory on blocks built by field loading. We just aren't doing that yet.

nik9000 · 2023-10-25T00:14:55Z

I think we can leverage the sequential stored-fields reader after this change too.

I'd love to rework stored fields a bit before trying that. I think there should be a "load stored fields" operator. And we can do that by expanding BlockLoader to add stuff like storedFieldsInfo or something. We could, like, merge all the stored field loading. and if we did that we could super use the sequential reader. Or not. I mean, we have a list of doc ids we're loading then and we could use it!

nik9000 · 2023-10-25T00:21:41Z

I talked to @dnhatn and @ChrisHegarty about the funny interface I made in code that only has a production impl in compute. It's a "cute" trick to prevent having to drag all of the Block infrastructure into server. It's a lot to drag around and I think it'd be a big change to move it. So I made the interface. I've grown to like it a fair bit because there's lots of complexity in compute that I don't have to share with everyone. I really like that hacking on Block doesn't mean I have to recompile the world. I get that it's weird, but I've grown to like it.

The interface we expose is fairly small. Not actually small. But, like, small-ish. We may one day pull Block into server and remove the interface. But for now we're going to avoid it.

This removes a no longer used java file from ESQL. We stopped using it in elastic#101235.

This removes a no longer used java file from ESQL. We stopped using it in #101235.

This removes a no longer used java file from ESQL. We stopped using it in elastic#101235.

nik9000 · 2023-12-26T14:35:49Z

This also significantly lowered the per-field overhead of loading fields. Especially empty fields. It turns out that sometimes this is a huge deal.

nik9000 added 10 commits October 11, 2023 16:45

ESQL: Load directly from MappedFieldType

b9d7ff4

wip

40890e1

WIP

2fb64ff

Update bechmark

22af819

Tests

ae0a564

Better signature

89c3822

More signature

e560fb4

Version too

5e5048e

Merge branch 'main' into load_different_way

83a1aec

More counter checks

951423f

nik9000 added >enhancement :Analytics/ES|QL AKA ESQL v8.12.0 labels Oct 23, 2023

elasticsearchmachine added the Team:QL (Deprecated) Meta label for query languages team label Oct 23, 2023

Update docs/changelog/101235.yaml

8419e11

nik9000 added 3 commits October 23, 2023 16:09

Strange

96edc7b

Explain

7aefa23

Merge remote-tracking branch 'refs/remotes/nik9000/load_different_way…

cd47b31

…' into load_different_way

nik9000 commented Oct 23, 2023

View reviewed changes

nik9000 requested review from dnhatn, ChrisHegarty and not-napoleon October 23, 2023 20:16

nik9000 changed the title ~~Load different way~~ ESQL: Load values a different way Oct 23, 2023

Looks like this happens sometimes

eb68358

dnhatn approved these changes Oct 25, 2023

View reviewed changes

nik9000 added 3 commits October 25, 2023 09:53

Merge branch 'main' into load_different_way

4676c48

Fix some renames

a42bf67

More cleanup

f5fc4a2

nik9000 merged commit 4ca793e into elastic:main Oct 25, 2023
13 checks passed

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Oct 26, 2023

ESQL: Remove unused code

60f823e

This removes a no longer used java file from ESQL. We stopped using it in elastic#101235.

nik9000 mentioned this pull request Oct 26, 2023

ESQL: Remove unused code #101397

Merged

nik9000 added a commit that referenced this pull request Oct 26, 2023

ESQL: Remove unused code (#101397)

a706288

This removes a no longer used java file from ESQL. We stopped using it in #101235.

mark-vieira pushed a commit to mark-vieira/elasticsearch that referenced this pull request Nov 2, 2023

ESQL: Remove unused code (elastic#101397)

0246675

This removes a no longer used java file from ESQL. We stopped using it in elastic#101235.

craigtaverner mentioned this pull request Nov 10, 2023

ESQL: Performance degradation in dissect benchmark #101997

Closed

dnhatn mentioned this pull request Nov 28, 2023

Synthetic source index out of bounds exception #102679

Closed

luigidellaquila mentioned this pull request Nov 30, 2023

ESQL: fallback to source extraction for numeric fields that have fielddata disabled #99213

Closed

2 tasks

nik9000 mentioned this pull request Dec 26, 2023

ESQL: Check on really slow remotes #103704

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Load values a different way #101235

ESQL: Load values a different way #101235

nik9000 commented Oct 23, 2023

elasticsearchmachine commented Oct 23, 2023

elasticsearchmachine commented Oct 23, 2023

elasticsearchmachine commented Oct 23, 2023

nik9000 commented Oct 23, 2023

nik9000 Oct 23, 2023

nik9000 commented Oct 23, 2023

dnhatn commented Oct 23, 2023

nik9000 commented Oct 24, 2023

dnhatn left a comment

dnhatn Oct 25, 2023

nik9000 Oct 25, 2023

nik9000 commented Oct 25, 2023

nik9000 commented Oct 25, 2023

nik9000 commented Dec 26, 2023

ESQL: Load values a different way #101235

ESQL: Load values a different way #101235

Conversation

nik9000 commented Oct 23, 2023

elasticsearchmachine commented Oct 23, 2023

elasticsearchmachine commented Oct 23, 2023

elasticsearchmachine commented Oct 23, 2023

nik9000 commented Oct 23, 2023

nik9000 Oct 23, 2023

Choose a reason for hiding this comment

nik9000 commented Oct 23, 2023

dnhatn commented Oct 23, 2023

nik9000 commented Oct 24, 2023

dnhatn left a comment

Choose a reason for hiding this comment

dnhatn Oct 25, 2023

Choose a reason for hiding this comment

nik9000 Oct 25, 2023

Choose a reason for hiding this comment

nik9000 commented Oct 25, 2023

nik9000 commented Oct 25, 2023

nik9000 commented Dec 26, 2023