Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Load values a different way #101235

Merged
merged 18 commits into from Oct 25, 2023
Merged

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Oct 23, 2023

This changes how we load values in ESQL, delegating to the MappedFieldType like we do with doc values and synthetic source. This allows a much more OO way of getting the loads working which makes that path much easier to read. And! It means those code paths look like doc values. So there's symmetry. It's like it rhymes.

There are a few side effects here:

  1. It's fairly simple to load from ordinals efficiently. I wrote some block-at-a-time code for resolving ordinals and it's about twice as fast. With more work it should be possible to make custom ordinal-shaped blocks move through the system to save space and speed things up.
  2. Most fields can now be loaded from _source. Everything that can be loaded from _source in scripts will load from _source in ESQL.
  3. We get a lot more tests for loading fields in different configurations by piggybacking on the synthetic source testing framework.
  4. Loading from _source no longer sorts the fields. Same for stored fields. Now we keep them in whatever they were stored in. This is a pretty marginal time save because loading from _source is so much more time consuming than the sort. But it's something.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine elasticsearchmachine added the Team:QL (Deprecated) Meta label for query languages team label Oct 23, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

@nik9000
Copy link
Member Author

nik9000 commented Oct 23, 2023

Here's the speedup from decoding ordinals in a more sensible way:

Before ValuesSourceReaderBenchmark.benchmark  in_order  keyword  avgt    7  369.033 ± 12.640  ns/op
 After ValuesSourceReaderBenchmark.benchmark  in_order  keyword  avgt    7  142.628 ±  2.022  ns/op

One thing that's crying out to me - we really want to load from stored fields like _source in a different way. Right now we load just like doc values, but those are fundamentally row-at-a-time mechanisms. And we can do that. Just not right now.

- match: { values.0.14: 20 }
- match: { values.0.15: null }
- match: { values.0.15: 20 }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are mostly supported now. I suppose they should move out.

@nik9000 nik9000 changed the title Load different way ESQL: Load values a different way Oct 23, 2023
@nik9000
Copy link
Member Author

nik9000 commented Oct 23, 2023

image

@dnhatn
Copy link
Member

dnhatn commented Oct 23, 2023

Before ValuesSourceReaderBenchmark.benchmark in_order keyword avgt 7 369.033 ± 12.640 ns/op
After ValuesSourceReaderBenchmark.benchmark in_order keyword avgt 7 142.628 ± 2.022 ns/op

Wow, this is awesome.

@nik9000
Copy link
Member Author

nik9000 commented Oct 24, 2023

Wow, this is awesome.

Just use the ordinals properly and everything is faster!

Seriously, though, if we had a BytesRefBlock that allowed an extra layer of indirection we could resolve the ordinals one time and let them flow through the system. That'd be pretty sweet!

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! I think we can leverage the sequential stored-fields reader after this change too. Thank you, Nik!

* {@link #beginPositionEntry} followed by two or more {@code append<Type>}
* calls, and then {@link #endPositionEntry}.
*/
interface Builder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this Builder Releasable and release it if we hit a breaker when reading values. Let's do this in a follow-up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++. I think we have to do it to properly track memory on blocks built by field loading. We just aren't doing that yet.

@nik9000
Copy link
Member Author

nik9000 commented Oct 25, 2023

I think we can leverage the sequential stored-fields reader after this change too.

I'd love to rework stored fields a bit before trying that. I think there should be a "load stored fields" operator. And we can do that by expanding BlockLoader to add stuff like storedFieldsInfo or something. We could, like, merge all the stored field loading. and if we did that we could super use the sequential reader. Or not. I mean, we have a list of doc ids we're loading then and we could use it!

@nik9000
Copy link
Member Author

nik9000 commented Oct 25, 2023

I talked to @dnhatn and @ChrisHegarty about the funny interface I made in code that only has a production impl in compute. It's a "cute" trick to prevent having to drag all of the Block infrastructure into server. It's a lot to drag around and I think it'd be a big change to move it. So I made the interface. I've grown to like it a fair bit because there's lots of complexity in compute that I don't have to share with everyone. I really like that hacking on Block doesn't mean I have to recompile the world. I get that it's weird, but I've grown to like it.

The interface we expose is fairly small. Not actually small. But, like, small-ish. We may one day pull Block into server and remove the interface. But for now we're going to avoid it.

@nik9000 nik9000 merged commit 4ca793e into elastic:main Oct 25, 2023
13 checks passed
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Oct 26, 2023
This removes a no longer used java file from ESQL. We stopped using it
in elastic#101235.
nik9000 added a commit that referenced this pull request Oct 26, 2023
This removes a no longer used java file from ESQL. We stopped using it
in #101235.
mark-vieira pushed a commit to mark-vieira/elasticsearch that referenced this pull request Nov 2, 2023
This removes a no longer used java file from ESQL. We stopped using it
in elastic#101235.
@nik9000
Copy link
Member Author

nik9000 commented Dec 26, 2023

This also significantly lowered the per-field overhead of loading fields. Especially empty fields. It turns out that sometimes this is a huge deal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:QL (Deprecated) Meta label for query languages team v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants