Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Load stored fields sequentially #102727

Merged
merged 5 commits into from Nov 29, 2023

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Nov 28, 2023

The lucene APIs can load stored fields more quickly if flip them into "sequential" mode - but it's only faster when you are loading a dense set of documents - all bunched up. This flips the stored field loading code to use a sequential reader when the documents are fully dense - literally just counting without gaps - which is precisely what the fetch phase does.

The lucene APIs can load stored fields more quickly if flip them into
"sequential" mode - but it's only faster when you are loading a dense
set of documents - all bunched up. This flips the stored field loading
code to use a sequential reader when the documents are fully dense -
literally just counting without gaps - which is precisely what the fetch
phase does.
@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

@nik9000
Copy link
Member Author

nik9000 commented Nov 28, 2023

The tests I've been using only load a small number of documents so they don't see any benefit from this. I'm going to run a larger test set and see if that shows anything.

@nik9000
Copy link
Member Author

nik9000 commented Nov 28, 2023

Oh boy does this help if you run a STATS on a stored field:

before: 193984ms
 after:  82688ms

@nik9000 nik9000 requested a review from dnhatn November 28, 2023 22:57
* when reading stored fields for the documents contained in {@code docIds}.
*/
private boolean isSequential(IntVector docIds) {
return docIds.getInt(docIds.getPositionCount() - 1) - docIds.getInt(0) == docIds.getPositionCount() - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only switch to the sequential mode when the number of documents exceeds a certain threshold? The search API uses 10 for this. The sequential (i.e., merging) stored field reader eagerly decompresses an entire block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that! FWIW, we're not going to have many blocks less than 10, but it's fine to do.

@dnhatn dnhatn self-requested a review November 29, 2023 06:28
@nik9000
Copy link
Member Author

nik9000 commented Nov 29, 2023

@dnhatn , I think this is ready for another round.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @nik9000

* <p>
* The sequential stored field reader decompresses a whole block of docs
* at a time so for very short lists it won't be faster to use it. We use
* {@code 10} documents as the boundary for "very short" because it's what
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@nik9000 nik9000 merged commit e802a2d into elastic:main Nov 29, 2023
14 checks passed
@nik9000
Copy link
Member Author

nik9000 commented Nov 29, 2023

Thanks @dnhatn !

timgrein pushed a commit to timgrein/elasticsearch that referenced this pull request Nov 30, 2023
The lucene APIs can load stored fields more quickly if flip them into
"sequential" mode - but it's only faster when you are loading a dense
set of documents - all bunched up. This flips the stored field loading
code to use a sequential reader when the documents are fully dense -
literally just counting without gaps - which is precisely what the fetch
phase does.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:QL (Deprecated) Meta label for query languages team v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants