ESQL: Speed up reading many nulls #105088

nik9000 · 2024-02-02T20:54:55Z

Sometimes we produce output with a ton of nulls and when we do a significant amount of time is spent on tracking the memory usage of each chunk of null colunms. Whole milliseconds in a request that takes dozens of milliseconds. We can avoid all of this by sharing all constant-null vectors produced for a single block.

When running FROM * in ESQL on a data-less coordinating node this cuts reading reading pages from 15% of the time to 3.4% of the time, cutting the entire operation from 634ms to 599ms. Note that #105067 is already open and should save about 540ms from the operation. It's unlikely that combining these two will yield a 59ms operation, but we live in hope.

Relates to #103369

Sometimes we produce output with a *ton* of `null`s and when we do a significant amount of time is spent on tracking the memory usage of each chunk of null colunms. Whole milliseconds in a request that takes dozens of milliseconds. We can avoid all of this by sharing all constant-null vectors produced for a single block. When running `FROM *` in ESQL on a data-less coordinating node this cuts reading reading pages from 15% of the time to 3.4% of the time, cutting the entire operation from 634ms to 599ms. Note that elastic#105067 is already open and should save about 540ms from the operation. It's unlikely that combining these two will yield a 59ms operation, but we live in hope. Relates to elastic#103369

elasticsearchmachine · 2024-02-02T20:55:19Z

Hi @nik9000, I've created a changelog YAML for you.

elasticsearchmachine · 2024-02-02T20:55:19Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

costin

LGTM

dnhatn

I was thinking of using the local breaker; however, it seems unsafe to use it here. So this approach makes sense to me. LGTM, thanks, Nik!

dnhatn · 2024-02-03T05:48:32Z

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/BlockStreamInput.java

+     */
+    Block readConstantNullBlock() throws IOException {
+        int positions = readVInt();
+        if (lastConstantNullBlock == null) {


if (lastConstantNullBlock == null || lastConstantNullBlock.getPositionCount() != positions || lastConstantNullBlock.tryIncRef() == false) { lastConstantNullBlock = blockFactory.newConstantNullBlock(positions); } return lastConstantNullBlock;

If we don't retain the reference in this stream, and use tryIncRef(), then we could avoid managing the life cycle of this input stream. WDYT? However, I am okay with this approach too.

Spent 20 minutes trying to decide if it was ok to have a reference that may be invalid. I hadn't noticed tryIncRef, but that does make sense. Let me think some more! I do feel like "this is a ref, let's count it". But it is a funny ref. And tryIncRef would do the right stuff.

I think I'll keep it the way I have it. We can flip it later if we feel the need.

I am okay with that too.

…nt_null' into cache_read_constant_null

Sometimes we produce output with a *ton* of `null`s and when we do a significant amount of time is spent on tracking the memory usage of each chunk of null colunms. Whole milliseconds in a request that takes dozens of milliseconds. We can avoid all of this by sharing all constant-null vectors produced for a single block. When running `FROM *` in ESQL on a data-less coordinating node this cuts reading reading pages from 15% of the time to 3.4% of the time, cutting the entire operation from 634ms to 599ms. Note that elastic#105067 is already open and should save about 540ms from the operation. It's unlikely that combining these two will yield a 59ms operation, but we live in hope. Relates to elastic#103369

nik9000 added >enhancement :Analytics/ES|QL AKA ESQL v8.13.0 labels Feb 2, 2024

nik9000 requested review from not-napoleon, dnhatn and ChrisHegarty February 2, 2024 20:54

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 2, 2024

Update docs/changelog/105088.yaml

9d53062

nik9000 mentioned this pull request Feb 2, 2024

ESQL: Improve performance for queries with small limit across large clusters #103369

Closed

2 tasks

costin approved these changes Feb 3, 2024

View reviewed changes

dnhatn approved these changes Feb 3, 2024

View reviewed changes

nik9000 added 4 commits February 6, 2024 12:03

Merge branch 'main' into cache_read_constant_null

a27d371

Spotless

6e0a7b4

tMerge remote-tracking branch 'refs/remotes/nik9000/cache_read_consta…

cd4cb2d

…nt_null' into cache_read_constant_null

Merge branch 'main' into cache_read_constant_null

77aeef3

nik9000 merged commit 21b0caa into elastic:main Feb 9, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Speed up reading many nulls #105088

ESQL: Speed up reading many nulls #105088

nik9000 commented Feb 2, 2024

elasticsearchmachine commented Feb 2, 2024

elasticsearchmachine commented Feb 2, 2024

costin left a comment

dnhatn left a comment

dnhatn Feb 3, 2024

nik9000 Feb 3, 2024

nik9000 Feb 6, 2024

dnhatn Feb 6, 2024

ESQL: Speed up reading many nulls #105088

ESQL: Speed up reading many nulls #105088

Conversation

nik9000 commented Feb 2, 2024

elasticsearchmachine commented Feb 2, 2024

elasticsearchmachine commented Feb 2, 2024

costin left a comment

Choose a reason for hiding this comment

dnhatn left a comment

Choose a reason for hiding this comment

dnhatn Feb 3, 2024

Choose a reason for hiding this comment

nik9000 Feb 3, 2024

Choose a reason for hiding this comment

nik9000 Feb 6, 2024

Choose a reason for hiding this comment

dnhatn Feb 6, 2024

Choose a reason for hiding this comment