New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Speed up reading many nulls #105088
Conversation
Sometimes we produce output with a *ton* of `null`s and when we do a significant amount of time is spent on tracking the memory usage of each chunk of null colunms. Whole milliseconds in a request that takes dozens of milliseconds. We can avoid all of this by sharing all constant-null vectors produced for a single block. When running `FROM *` in ESQL on a data-less coordinating node this cuts reading reading pages from 15% of the time to 3.4% of the time, cutting the entire operation from 634ms to 599ms. Note that elastic#105067 is already open and should save about 540ms from the operation. It's unlikely that combining these two will yield a 59ms operation, but we live in hope. Relates to elastic#103369
Hi @nik9000, I've created a changelog YAML for you. |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of using the local breaker; however, it seems unsafe to use it here. So this approach makes sense to me. LGTM, thanks, Nik!
*/ | ||
Block readConstantNullBlock() throws IOException { | ||
int positions = readVInt(); | ||
if (lastConstantNullBlock == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (lastConstantNullBlock == null || lastConstantNullBlock.getPositionCount() != positions || lastConstantNullBlock.tryIncRef() == false) {
lastConstantNullBlock = blockFactory.newConstantNullBlock(positions);
}
return lastConstantNullBlock;
If we don't retain the reference in this stream, and use tryIncRef()
, then we could avoid managing the life cycle of this input stream. WDYT? However, I am okay with this approach too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spent 20 minutes trying to decide if it was ok to have a reference that may be invalid. I hadn't noticed tryIncRef
, but that does make sense. Let me think some more! I do feel like "this is a ref, let's count it". But it is a funny ref. And tryIncRef would do the right stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll keep it the way I have it. We can flip it later if we feel the need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with that too.
Sometimes we produce output with a *ton* of `null`s and when we do a significant amount of time is spent on tracking the memory usage of each chunk of null colunms. Whole milliseconds in a request that takes dozens of milliseconds. We can avoid all of this by sharing all constant-null vectors produced for a single block. When running `FROM *` in ESQL on a data-less coordinating node this cuts reading reading pages from 15% of the time to 3.4% of the time, cutting the entire operation from 634ms to 599ms. Note that elastic#105067 is already open and should save about 540ms from the operation. It's unlikely that combining these two will yield a 59ms operation, but we live in hope. Relates to elastic#103369
Sometimes we produce output with a ton of
null
s and when we do a significant amount of time is spent on tracking the memory usage of each chunk of null colunms. Whole milliseconds in a request that takes dozens of milliseconds. We can avoid all of this by sharing all constant-null vectors produced for a single block.When running
FROM *
in ESQL on a data-less coordinating node this cuts reading reading pages from 15% of the time to 3.4% of the time, cutting the entire operation from 634ms to 599ms. Note that #105067 is already open and should save about 540ms from the operation. It's unlikely that combining these two will yield a 59ms operation, but we live in hope.Relates to #103369