Skip to content

Conversation

@nielsbauman
Copy link
Contributor

@nielsbauman nielsbauman commented Sep 29, 2025

By checking if the setting value ends with a b, we can skip the ratio/double parsing. The default value is 1gb, so this will skip the ratio parsing by default.
I was doing some benchmarking on component templates and noticed the MergePolicyConfig constructor sticking out slightly in flamegraphs due to the failing double parsing. This minor optimization is almost not worth opening a PR for, but I figured it doesn't hurt to do.

Since the default value for `index.compound_format` is `1gb`, it makes
more sense to try to parse a `ByteSizeValue` first and only after that
try to parse a raw double.
I was doing some benchmarking on component templates and noticed the
`MergePolicyConfig` constructor sticking out slightly in flamegraphs due
to the failing double parsing. This minor optimization is almost not
worth opening a PR for, but I figured it doesn't hurt to do.
@nielsbauman nielsbauman added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >refactoring labels Sep 29, 2025
@elasticsearchmachine elasticsearchmachine added Team:Distributed Indexing Meta label for Distributed Indexing team v9.2.0 labels Sep 29, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

assertCompoundThreshold(build("false"), 0.0, ByteSizeValue.ofBytes(Long.MAX_VALUE));
assertCompoundThreshold(build(false), 0.0, ByteSizeValue.ofBytes(Long.MAX_VALUE));
assertCompoundThreshold(build(0), 0.0, ByteSizeValue.ofBytes(Long.MAX_VALUE));
assertCompoundThreshold(build(0), 1.0, ByteSizeValue.ofBytes(0));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see in this test change, there is a slight change in behavior when the input value is 0. No other test seems to break on this change. If this change does have unwanted side effects, I'm fine with closing this PR and keeping the code as-is.

@lkts
Copy link
Contributor

lkts commented Oct 1, 2025

What even is index.compound_format? I can't find any documentation for it.

@nielsbauman
Copy link
Contributor Author

What even is index.compound_format? I can't find any documentation for it.

I have absolutely no idea. I didn't really look into the setting itself, just at the parsing.

@lkts
Copy link
Contributor

lkts commented Oct 2, 2025

I am wondering if it's maybe deprecated and we can remove it in the future completely. In the meantime is it maybe better to check for if (INDEX_COMPOUND_FORMAT_SETTING.exists(indexSettings.getSettings())) inside MergePolicyConfig constructor and if it doesn't skip the parsing and use a known default instance?

@nielsbauman
Copy link
Contributor Author

Thanks for the suggestion! I've reverted my original commit and added a9de2dc to implement your suggestion.

Copy link
Contributor

@lkts lkts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but i wonder if we can find anyone familiar with this code to cross check.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was almost not worth opening a PR for, i wonder if the added complexity here is worth it? If it is important we sort of want to capture it in some test or benchmark. If not, we should not do this.

I wonder if a setting infra change could fix this more generically instead? Would be a shame to guard many settings reads like this.

@nielsbauman
Copy link
Contributor Author

@henningandersen the part that was causing it to show up in the flamegraph was the stacktrace creation:
image
Therefore, this doesn't look like something we can tackle with generic setting infra change - please correct me if I'm wrong.

My original change was not to guard the setting reads, but to just swap the order of parsing (see 78e5fcc). That doesn't add any extra complexity, but the "right" order depends on the default value, although I don't expect this default value to change much.

@nielsbauman
Copy link
Contributor Author

@henningandersen any thoughts on my previous comment?

@henningandersen
Copy link
Contributor

henningandersen commented Oct 14, 2025

I prefer the original version (only saw the new version). Can I suggest to follow this pattern, i.e., skip the double parsing if the value ends in b? That way the default value and order are unrelated.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've introduced a change, which needs addressing.

try {
return new CompoundFileThreshold(Double.parseDouble(noCFSRatio));
} catch (NumberFormatException ex) {
throw new IllegalArgumentException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still parse it as bytes if it is not parseable as a double. For instance, ByteSizeValue.parseBytesSizeValue trims white space, so " 1gb " would no longer be legal with this change, but was in the past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added two test cases locally:

        assertCompoundThreshold(build(" 1gb"), 1.0, ByteSizeValue.ofGb(1));
        assertCompoundThreshold(build(" 0"), 0, ByteSizeValue.ofBytes(Long.MAX_VALUE));

and both pass on my current changes. Looking at the source code of Double.parseDouble, that calls FloatingDecimal.readJavaFormatString under the hood, which trims white spaces too:

            in = in.trim(); // don't fool around with white space.

Are there any other examples you can think of that wouldn't be legal with this change? I can invert the if-statement to if (noCFSRatio.endsWith("b") == false && noCFSRatio.endsWith("B") == false); that allows us to do the bytes parsing at the end, if you feel more confident with that change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think " 1gb " would fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I missed the white space at the end of your example. My bad...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried " 1gb " and that works as well. It took me a minute to realize why, but I noticed we have the following line:

private static CompoundFileThreshold parseCompoundFormat(String noCFSRatio) {
noCFSRatio = noCFSRatio.trim();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, then "1k" will fail I think? Or "1t"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, you're right. Sorry, I'm not sharp on this PR... Fixed in 88c62e0.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

throw e;
return new CompoundFileThreshold(Double.parseDouble(noCFSRatio));
} catch (NumberFormatException e) {
// ignore, see if it parses as bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to retain this as suppressed if the parsing below fails too, just like the original version kept the byte size parsing as suppressed (fine to swap which is suppressed I think).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I realized that too, but couldn't think of a clean way to implement that. I added some logic in 86b35e5. Let me know if that matches what you had in mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me.

@nielsbauman nielsbauman enabled auto-merge (squash) October 15, 2025 14:40
@nielsbauman nielsbauman disabled auto-merge October 15, 2025 14:40
@nielsbauman nielsbauman changed the title Swap MergePolicyConfig compound format parsing order Optimize parsing of compound format in MergePolicyConfig Oct 15, 2025
@nielsbauman nielsbauman merged commit c09dc8e into elastic:main Oct 15, 2025
34 checks passed
@nielsbauman nielsbauman deleted the merge-policy-parsing branch October 15, 2025 16:54
Kubik42 pushed a commit to Kubik42/elasticsearch that referenced this pull request Oct 16, 2025
…35643)

By checking if the setting value ends with a `b`, we can skip the ratio/double parsing. The default value is `1gb`, so this will skip the ratio parsing by default.
I was doing some benchmarking on component templates and noticed the `MergePolicyConfig` constructor sticking out slightly in flamegraphs due to the failing double parsing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >refactoring Team:Distributed Indexing Meta label for Distributed Indexing team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants