Align merging defaults with Lucene's new defaults. #133946

jpountz · 2025-09-01T14:32:55Z

Lucene recently updated its merging defaults to bias a bit less towards indexing performance and a bit more towards search performance by:

Increasing the floor segment size from 2MB to 16MB. Segments between 2MB and 16MB will now be merged more aggressively. This is expected to result in ~10 fewer segments per shard.
Decreasing the number of segments per tier from 10 to 8. This is expected to result in 20% fewer segments between 16MB and 5GB (the min and max merged segment sizes).

This PR aligns Elasticsearch's defaults with these new Lucene defaults. This should especially help queries that have a high per-segment overhead, such as multi-term queries (e.g. fuzzy queries) and vector search. On the other hand, indexing performance may decrease a bit due to more merging.

Note that time-based data (indexes that have a @timestamp field) have their own merge factor of 32, so they only get the bump of the floor segment size to 16MB, not the decrease of the number of segments per tier.

Furthermore, Lucene now allows merging up to maxMergeAtOnce segments if the merged segment size is below the floor segment size (16MB by default). When maxMergeAtOnce is greater than segmentsPerTier, this helps tiny segments grow more quickly with less write amplification. So to take advantage of it, I bumped maxMergeAtOnce from 10 to 16. This anticipates upcoming behavior in Lucene 11 where maxMergeAtOnce gets removed and Lucene will happily merge lots of segments together in a single merge as long as the merged segment size is below the floor segment size.

Closes #120624
Closes #129764
Closes #130328

Lucene recently updated its merging defaults to bias a bit less towards indexing performance and a bit more towards search performance by: - Increasing the floor segment size from 2MB to 16MB. Segments between 2MB and 16MB will now be merged more aggressively. This is expected to result in ~10 fewer segments per shard. - Decreasing the number of segments per tier from 10 to 8. This is expected to result in 20% fewer segments between 16MB and 5GB (the min and max merged segment sizes). This PR aligns Elasticsearch's defaults with these new Lucene defaults. This should especially help queries that have a high per-segment overhead, such as multi-term queries (e.g. fuzzy queries) and vector search. On the other hand, indexing performance may decrease a bit due to more merging. Note that time-based data (indexes that have a `@timestamp` field) have their own merge factor of 32, so they only get the bump of the floor segment size to 16MB, not the decrease of the number of segments per tier. Furthermore, Lucene now allows merging up to `maxMergeAtOnce` segments if the merged segment size is below the floor segment size (16MB by default). When `maxMergeAtOnce` is greater than `segmentsPerTier`, this helps tiny segments grow more quickly with less write amplification. So to take advantage of it, I bumped `maxMergeAtOnce` from 10 to 16. This anticipates upcoming behavior in Lucene 11 where `maxMergeAtOnce` gets removed and Lucene will happily merge lots of segments together in a single merge as long as the merged segment size is below the floor segment size. Closes elastic#129764 Closes elastic#130328

elasticsearchmachine · 2025-09-01T14:33:19Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

elasticsearchmachine · 2025-09-09T06:57:39Z

Hi @jpountz, I've created a changelog YAML for you.

henningandersen

LGTM.

henningandersen · 2025-09-12T08:57:33Z

server/src/main/java/org/elasticsearch/index/MergePolicyConfig.java

        Setting.Property.NodeScope
    );
-    public static final double DEFAULT_SEGMENTS_PER_TIER = 10.0d;
+    public static final double DEFAULT_SEGMENTS_PER_TIER = 8.0d;


This can potentially cause serverless tests to fail though I'd expect the PR build to catch that (looks like it successfully ran serverless tests). We saw that when the lucene default changed. Perhaps we can run one more CI run (to get another randomized sample)?

jpountz · 2025-09-12T11:35:11Z

Thanks @henningandersen. I'm running CI tests once again.

jpountz · 2025-09-12T12:35:15Z

I cannot see the connection between the elasticsearch-ci/bwc-snapshots-part2 failure and this PR.

Regarding the Serverless test failure (SearchCommitPrefetcherIT), I wonder if the failure happens due to the number of commits being greater than 8, which triggers a background merge. If this is the case, changing the number of commits from randomIntBetween(5, 8) instead of randomIntBetween(5, 10) should fix the issue.

brianseeders · 2025-09-23T15:36:13Z

buildkite test this

henningandersen · 2025-09-25T10:51:48Z

buildkite test this

jpountz added the :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. label Sep 1, 2025

elasticsearchmachine added Team:Distributed Indexing Meta label for Distributed Indexing team v9.2.0 labels Sep 1, 2025

jpountz added the >enhancement label Sep 9, 2025

jpountz added 4 commits September 9, 2025 08:57

Update docs/changelog/133946.yaml

9e4b054

changelog

6b3bb74

Fix test failure

8fb9eca

Merge branch 'main' into update_merging_defaults

c94c94b

henningandersen approved these changes Sep 12, 2025

View reviewed changes

Merge branch 'main' into update_merging_defaults

713ee92

Merge branch 'main' into update_merging_defaults

30af1e6

joegallo assigned henningandersen Sep 23, 2025

joegallo added the external-contributor Pull request authored by a developer outside the Elasticsearch team label Sep 23, 2025

This comment was marked as resolved.

Sign in to view

elasticsearchmachine and others added 2 commits September 23, 2025 16:01

[CI] Update transport version definitions

7d7dbfe

Merge branch 'main' into update_merging_defaults

323c767

This comment was marked as resolved.

Sign in to view

henningandersen added 2 commits September 25, 2025 21:42

Merge branch 'main' into update_merging_defaults

774ee44

Merge branch 'main' into update_merging_defaults

af939b7

elasticsearchmachine added v9.3.0 and removed v9.2.0 labels Oct 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Align merging defaults with Lucene's new defaults. #133946

Align merging defaults with Lucene's new defaults. #133946

jpountz commented Sep 1, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Sep 1, 2025

Uh oh!

elasticsearchmachine commented Sep 9, 2025

Uh oh!

henningandersen left a comment

Uh oh!

henningandersen Sep 12, 2025

Uh oh!

jpountz commented Sep 12, 2025

Uh oh!

jpountz commented Sep 12, 2025

Uh oh!

This comment was marked as resolved.

brianseeders commented Sep 23, 2025

Uh oh!

This comment was marked as resolved.

henningandersen commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Align merging defaults with Lucene's new defaults. #133946

Are you sure you want to change the base?

Align merging defaults with Lucene's new defaults. #133946

Conversation

jpountz commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Sep 1, 2025

Uh oh!

elasticsearchmachine commented Sep 9, 2025

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

henningandersen Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

jpountz commented Sep 12, 2025

Uh oh!

jpountz commented Sep 12, 2025

Uh oh!

This comment was marked as resolved.

brianseeders commented Sep 23, 2025

Uh oh!

This comment was marked as resolved.

henningandersen commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jpountz commented Sep 1, 2025 •

edited

Loading