New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Load stored fields sequentially #102727
Conversation
The lucene APIs can load stored fields more quickly if flip them into "sequential" mode - but it's only faster when you are loading a dense set of documents - all bunched up. This flips the stored field loading code to use a sequential reader when the documents are fully dense - literally just counting without gaps - which is precisely what the fetch phase does.
Hi @nik9000, I've created a changelog YAML for you. |
Pinging @elastic/es-ql (Team:QL) |
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL) |
The tests I've been using only load a small number of documents so they don't see any benefit from this. I'm going to run a larger test set and see if that shows anything. |
Oh boy does this help if you run a
|
* when reading stored fields for the documents contained in {@code docIds}. | ||
*/ | ||
private boolean isSequential(IntVector docIds) { | ||
return docIds.getInt(docIds.getPositionCount() - 1) - docIds.getInt(0) == docIds.getPositionCount() - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only switch to the sequential mode when the number of documents exceeds a certain threshold? The search API uses 10 for this. The sequential (i.e., merging) stored field reader eagerly decompresses an entire block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that! FWIW, we're not going to have many blocks less than 10, but it's fine to do.
…ored_fields' into esql_sequential_stored_fields
@dnhatn , I think this is ready for another round. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @nik9000
* <p> | ||
* The sequential stored field reader decompresses a whole block of docs | ||
* at a time so for very short lists it won't be faster to use it. We use | ||
* {@code 10} documents as the boundary for "very short" because it's what |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
Thanks @dnhatn ! |
The lucene APIs can load stored fields more quickly if flip them into "sequential" mode - but it's only faster when you are loading a dense set of documents - all bunched up. This flips the stored field loading code to use a sequential reader when the documents are fully dense - literally just counting without gaps - which is precisely what the fetch phase does.
The lucene APIs can load stored fields more quickly if flip them into "sequential" mode - but it's only faster when you are loading a dense set of documents - all bunched up. This flips the stored field loading code to use a sequential reader when the documents are fully dense - literally just counting without gaps - which is precisely what the fetch phase does.