Skip to content

Conversation

@jpountz
Copy link
Contributor

@jpountz jpountz commented Sep 1, 2025

Lucene recently updated its merging defaults to bias a bit less towards indexing performance and a bit more towards search performance by:

  • Increasing the floor segment size from 2MB to 16MB. Segments between 2MB and 16MB will now be merged more aggressively. This is expected to result in ~10 fewer segments per shard.
  • Decreasing the number of segments per tier from 10 to 8. This is expected to result in 20% fewer segments between 16MB and 5GB (the min and max merged segment sizes).

This PR aligns Elasticsearch's defaults with these new Lucene defaults. This should especially help queries that have a high per-segment overhead, such as multi-term queries (e.g. fuzzy queries) and vector search. On the other hand, indexing performance may decrease a bit due to more merging.

Note that time-based data (indexes that have a @timestamp field) have their own merge factor of 32, so they only get the bump of the floor segment size to 16MB, not the decrease of the number of segments per tier.

Furthermore, Lucene now allows merging up to maxMergeAtOnce segments if the merged segment size is below the floor segment size (16MB by default). When maxMergeAtOnce is greater than segmentsPerTier, this helps tiny segments grow more quickly with less write amplification. So to take advantage of it, I bumped maxMergeAtOnce from 10 to 16. This anticipates upcoming behavior in Lucene 11 where maxMergeAtOnce gets removed and Lucene will happily merge lots of segments together in a single merge as long as the merged segment size is below the floor segment size.

Closes #120624
Closes #129764
Closes #130328

Lucene recently updated its merging defaults to bias a bit less towards
indexing performance and a bit more towards search performance by:
 - Increasing the floor segment size from 2MB to 16MB. Segments between 2MB and
   16MB will now be merged more aggressively. This is expected to result in ~10
   fewer segments per shard.
 - Decreasing the number of segments per tier from 10 to 8. This is expected to
   result in 20% fewer segments between 16MB and 5GB (the min and max merged
   segment sizes).

This PR aligns Elasticsearch's defaults with these new Lucene defaults. This
should especially help queries that have a high per-segment overhead, such as
multi-term queries (e.g. fuzzy queries) and vector search. On the other hand,
indexing performance may decrease a bit due to more merging.

Note that time-based data (indexes that have a `@timestamp` field) have their
own merge factor of 32, so they only get the bump of the floor segment size to
16MB, not the decrease of the number of segments per tier.

Furthermore, Lucene now allows merging up to `maxMergeAtOnce` segments if the
merged segment size is below the floor segment size (16MB by default). When
`maxMergeAtOnce` is greater than `segmentsPerTier`, this helps tiny segments
grow more quickly with less write amplification. So to take advantage of it, I
bumped `maxMergeAtOnce` from 10 to 16. This anticipates upcoming behavior in
Lucene 11 where `maxMergeAtOnce` gets removed and Lucene will happily merge
lots of segments together in a single merge as long as the merged segment size
is below the floor segment size.

Closes elastic#129764
Closes elastic#130328
@jpountz jpountz added the :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. label Sep 1, 2025
@elasticsearchmachine elasticsearchmachine added Team:Distributed Indexing Meta label for Distributed Indexing team v9.2.0 labels Sep 1, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@elasticsearchmachine
Copy link
Collaborator

Hi @jpountz, I've created a changelog YAML for you.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Setting.Property.NodeScope
);
public static final double DEFAULT_SEGMENTS_PER_TIER = 10.0d;
public static final double DEFAULT_SEGMENTS_PER_TIER = 8.0d;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can potentially cause serverless tests to fail though I'd expect the PR build to catch that (looks like it successfully ran serverless tests). We saw that when the lucene default changed. Perhaps we can run one more CI run (to get another randomized sample)?

@jpountz
Copy link
Contributor Author

jpountz commented Sep 12, 2025

Thanks @henningandersen. I'm running CI tests once again.

@jpountz
Copy link
Contributor Author

jpountz commented Sep 12, 2025

I cannot see the connection between the elasticsearch-ci/bwc-snapshots-part2 failure and this PR.

Regarding the Serverless test failure (SearchCommitPrefetcherIT), I wonder if the failure happens due to the number of commits being greater than 8, which triggers a background merge. If this is the case, changing the number of commits from randomIntBetween(5, 8) instead of randomIntBetween(5, 10) should fix the issue.

@joegallo joegallo added the external-contributor Pull request authored by a developer outside the Elasticsearch team label Sep 23, 2025
@joegallo

This comment was marked as resolved.

@brianseeders
Copy link
Contributor

buildkite test this

@joegallo

This comment was marked as resolved.

1 similar comment
@henningandersen
Copy link
Contributor

buildkite test this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Distributed Indexing Meta label for Distributed Indexing team v9.3.0

Projects

None yet

5 participants