-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Align merging defaults with Lucene's new defaults. #133946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Lucene recently updated its merging defaults to bias a bit less towards indexing performance and a bit more towards search performance by: - Increasing the floor segment size from 2MB to 16MB. Segments between 2MB and 16MB will now be merged more aggressively. This is expected to result in ~10 fewer segments per shard. - Decreasing the number of segments per tier from 10 to 8. This is expected to result in 20% fewer segments between 16MB and 5GB (the min and max merged segment sizes). This PR aligns Elasticsearch's defaults with these new Lucene defaults. This should especially help queries that have a high per-segment overhead, such as multi-term queries (e.g. fuzzy queries) and vector search. On the other hand, indexing performance may decrease a bit due to more merging. Note that time-based data (indexes that have a `@timestamp` field) have their own merge factor of 32, so they only get the bump of the floor segment size to 16MB, not the decrease of the number of segments per tier. Furthermore, Lucene now allows merging up to `maxMergeAtOnce` segments if the merged segment size is below the floor segment size (16MB by default). When `maxMergeAtOnce` is greater than `segmentsPerTier`, this helps tiny segments grow more quickly with less write amplification. So to take advantage of it, I bumped `maxMergeAtOnce` from 10 to 16. This anticipates upcoming behavior in Lucene 11 where `maxMergeAtOnce` gets removed and Lucene will happily merge lots of segments together in a single merge as long as the merged segment size is below the floor segment size. Closes elastic#129764 Closes elastic#130328
|
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
|
Hi @jpountz, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| Setting.Property.NodeScope | ||
| ); | ||
| public static final double DEFAULT_SEGMENTS_PER_TIER = 10.0d; | ||
| public static final double DEFAULT_SEGMENTS_PER_TIER = 8.0d; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can potentially cause serverless tests to fail though I'd expect the PR build to catch that (looks like it successfully ran serverless tests). We saw that when the lucene default changed. Perhaps we can run one more CI run (to get another randomized sample)?
|
Thanks @henningandersen. I'm running CI tests once again. |
|
I cannot see the connection between the Regarding the Serverless test failure ( |
This comment was marked as resolved.
This comment was marked as resolved.
|
buildkite test this |
This comment was marked as resolved.
This comment was marked as resolved.
1 similar comment
|
buildkite test this |
Lucene recently updated its merging defaults to bias a bit less towards indexing performance and a bit more towards search performance by:
This PR aligns Elasticsearch's defaults with these new Lucene defaults. This should especially help queries that have a high per-segment overhead, such as multi-term queries (e.g. fuzzy queries) and vector search. On the other hand, indexing performance may decrease a bit due to more merging.
Note that time-based data (indexes that have a
@timestampfield) have their own merge factor of 32, so they only get the bump of the floor segment size to 16MB, not the decrease of the number of segments per tier.Furthermore, Lucene now allows merging up to
maxMergeAtOncesegments if the merged segment size is below the floor segment size (16MB by default). WhenmaxMergeAtOnceis greater thansegmentsPerTier, this helps tiny segments grow more quickly with less write amplification. So to take advantage of it, I bumpedmaxMergeAtOncefrom 10 to 16. This anticipates upcoming behavior in Lucene 11 wheremaxMergeAtOncegets removed and Lucene will happily merge lots of segments together in a single merge as long as the merged segment size is below the floor segment size.Closes #120624
Closes #129764
Closes #130328