ESQL: Load stored fields sequentially #102727

nik9000 · 2023-11-28T21:41:05Z

The lucene APIs can load stored fields more quickly if flip them into "sequential" mode - but it's only faster when you are loading a dense set of documents - all bunched up. This flips the stored field loading code to use a sequential reader when the documents are fully dense - literally just counting without gaps - which is precisely what the fetch phase does.

elasticsearchmachine · 2023-11-28T21:41:29Z

Hi @nik9000, I've created a changelog YAML for you.

elasticsearchmachine · 2023-11-28T21:41:29Z

Pinging @elastic/es-ql (Team:QL)

elasticsearchmachine · 2023-11-28T21:41:29Z

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

nik9000 · 2023-11-28T21:42:15Z

The tests I've been using only load a small number of documents so they don't see any benefit from this. I'm going to run a larger test set and see if that shows anything.

nik9000 · 2023-11-28T22:57:37Z

Oh boy does this help if you run a STATS on a stored field:

before: 193984ms
 after:  82688ms

dnhatn · 2023-11-29T06:28:16Z

.../esql/compute/src/main/java/org/elasticsearch/compute/lucene/ValuesSourceReaderOperator.java

+     * when reading stored fields for the documents contained in {@code docIds}.
+     */
+    private boolean isSequential(IntVector docIds) {
+        return docIds.getInt(docIds.getPositionCount() - 1) - docIds.getInt(0) == docIds.getPositionCount() - 1;


I think we should only switch to the sequential mode when the number of documents exceeds a certain threshold? The search API uses 10 for this. The sequential (i.e., merging) stored field reader eagerly decompresses an entire block.

I can do that! FWIW, we're not going to have many blocks less than 10, but it's fine to do.

…ored_fields' into esql_sequential_stored_fields

nik9000 · 2023-11-29T17:47:50Z

@dnhatn , I think this is ready for another round.

dnhatn

LGTM. Thanks @nik9000

dnhatn · 2023-11-29T18:10:27Z

.../esql/compute/src/main/java/org/elasticsearch/compute/lucene/ValuesSourceReaderOperator.java

+     * <p>
+     *     The sequential stored field reader decompresses a whole block of docs
+     *     at a time so for very short lists it won't be faster to use it. We use
+     *     {@code 10} documents as the boundary for "very short" because it's what


nik9000 · 2023-11-29T18:36:06Z

Thanks @dnhatn !

The lucene APIs can load stored fields more quickly if flip them into "sequential" mode - but it's only faster when you are loading a dense set of documents - all bunched up. This flips the stored field loading code to use a sequential reader when the documents are fully dense - literally just counting without gaps - which is precisely what the fetch phase does.

nik9000 added >enhancement :Analytics/ES|QL AKA ESQL v8.12.0 labels Nov 28, 2023

elasticsearchmachine added the Team:QL (Deprecated) Meta label for query languages team label Nov 28, 2023

Update docs/changelog/102727.yaml

c0219ff

nik9000 requested a review from dnhatn November 28, 2023 22:57

dnhatn reviewed Nov 29, 2023

View reviewed changes

dnhatn self-requested a review November 29, 2023 06:28

nik9000 added 3 commits November 29, 2023 11:19

only when bigger

210a177

Spotless

9f2fc68

Merge remote-tracking branch 'refs/remotes/nik9000/esql_sequential_st…

ee95dd4

…ored_fields' into esql_sequential_stored_fields

dnhatn approved these changes Nov 29, 2023

View reviewed changes

nik9000 merged commit e802a2d into elastic:main Nov 29, 2023
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Load stored fields sequentially #102727

ESQL: Load stored fields sequentially #102727

nik9000 commented Nov 28, 2023

elasticsearchmachine commented Nov 28, 2023

elasticsearchmachine commented Nov 28, 2023

elasticsearchmachine commented Nov 28, 2023

nik9000 commented Nov 28, 2023

nik9000 commented Nov 28, 2023

dnhatn Nov 29, 2023

nik9000 Nov 29, 2023

nik9000 commented Nov 29, 2023

dnhatn left a comment

dnhatn Nov 29, 2023

nik9000 commented Nov 29, 2023

ESQL: Load stored fields sequentially #102727

ESQL: Load stored fields sequentially #102727

Conversation

nik9000 commented Nov 28, 2023

elasticsearchmachine commented Nov 28, 2023

elasticsearchmachine commented Nov 28, 2023

elasticsearchmachine commented Nov 28, 2023

nik9000 commented Nov 28, 2023

nik9000 commented Nov 28, 2023

dnhatn Nov 29, 2023

Choose a reason for hiding this comment

nik9000 Nov 29, 2023

Choose a reason for hiding this comment

nik9000 commented Nov 29, 2023

dnhatn left a comment

Choose a reason for hiding this comment

dnhatn Nov 29, 2023

Choose a reason for hiding this comment

nik9000 commented Nov 29, 2023