Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Speed up reading many nulls #105088

Merged
merged 6 commits into from Feb 9, 2024

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Feb 2, 2024

Sometimes we produce output with a ton of nulls and when we do a significant amount of time is spent on tracking the memory usage of each chunk of null colunms. Whole milliseconds in a request that takes dozens of milliseconds. We can avoid all of this by sharing all constant-null vectors produced for a single block.

When running FROM * in ESQL on a data-less coordinating node this cuts reading reading pages from 15% of the time to 3.4% of the time, cutting the entire operation from 634ms to 599ms. Note that #105067 is already open and should save about 540ms from the operation. It's unlikely that combining these two will yield a 59ms operation, but we live in hope.

Relates to #103369

Sometimes we produce output with a *ton* of `null`s and when we do a
significant amount of time is spent on tracking the memory usage of each
chunk of null colunms. Whole milliseconds in a request that takes dozens
of milliseconds. We can avoid all of this by sharing all constant-null
vectors produced for a single block.

When running `FROM *` in ESQL on a data-less coordinating node this cuts
reading reading pages from 15% of the time to 3.4% of the time, cutting
the entire operation from 634ms to 599ms. Note that elastic#105067 is
already open and should save about 540ms from the operation. It's
unlikely that combining these two will yield a 59ms operation, but we
live in hope.

Relates to elastic#103369
@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 2, 2024
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of using the local breaker; however, it seems unsafe to use it here. So this approach makes sense to me. LGTM, thanks, Nik!

*/
Block readConstantNullBlock() throws IOException {
int positions = readVInt();
if (lastConstantNullBlock == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (lastConstantNullBlock == null || lastConstantNullBlock.getPositionCount() != positions || lastConstantNullBlock.tryIncRef() == false) {
    lastConstantNullBlock = blockFactory.newConstantNullBlock(positions);
}
return lastConstantNullBlock;

If we don't retain the reference in this stream, and use tryIncRef(), then we could avoid managing the life cycle of this input stream. WDYT? However, I am okay with this approach too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spent 20 minutes trying to decide if it was ok to have a reference that may be invalid. I hadn't noticed tryIncRef, but that does make sense. Let me think some more! I do feel like "this is a ref, let's count it". But it is a funny ref. And tryIncRef would do the right stuff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll keep it the way I have it. We can flip it later if we feel the need.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with that too.

@nik9000 nik9000 merged commit 21b0caa into elastic:main Feb 9, 2024
14 checks passed
martijnvg pushed a commit to martijnvg/elasticsearch that referenced this pull request Feb 12, 2024
Sometimes we produce output with a *ton* of `null`s and when we do a
significant amount of time is spent on tracking the memory usage of each
chunk of null colunms. Whole milliseconds in a request that takes dozens
of milliseconds. We can avoid all of this by sharing all constant-null
vectors produced for a single block.

When running `FROM *` in ESQL on a data-less coordinating node this cuts
reading reading pages from 15% of the time to 3.4% of the time, cutting
the entire operation from 634ms to 599ms. Note that elastic#105067 is
already open and should save about 540ms from the operation. It's
unlikely that combining these two will yield a 59ms operation, but we
live in hope.

Relates to elastic#103369
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants