-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Optimize parsing of compound format in MergePolicyConfig
#135643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Since the default value for `index.compound_format` is `1gb`, it makes more sense to try to parse a `ByteSizeValue` first and only after that try to parse a raw double. I was doing some benchmarking on component templates and noticed the `MergePolicyConfig` constructor sticking out slightly in flamegraphs due to the failing double parsing. This minor optimization is almost not worth opening a PR for, but I figured it doesn't hurt to do.
|
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
| assertCompoundThreshold(build("false"), 0.0, ByteSizeValue.ofBytes(Long.MAX_VALUE)); | ||
| assertCompoundThreshold(build(false), 0.0, ByteSizeValue.ofBytes(Long.MAX_VALUE)); | ||
| assertCompoundThreshold(build(0), 0.0, ByteSizeValue.ofBytes(Long.MAX_VALUE)); | ||
| assertCompoundThreshold(build(0), 1.0, ByteSizeValue.ofBytes(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you can see in this test change, there is a slight change in behavior when the input value is 0. No other test seems to break on this change. If this change does have unwanted side effects, I'm fine with closing this PR and keeping the code as-is.
|
What even is |
I have absolutely no idea. I didn't really look into the setting itself, just at the parsing. |
|
I am wondering if it's maybe deprecated and we can remove it in the future completely. In the meantime is it maybe better to check for |
|
Thanks for the suggestion! I've reverted my original commit and added a9de2dc to implement your suggestion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but i wonder if we can find anyone familiar with this code to cross check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this was almost not worth opening a PR for, i wonder if the added complexity here is worth it? If it is important we sort of want to capture it in some test or benchmark. If not, we should not do this.
I wonder if a setting infra change could fix this more generically instead? Would be a shame to guard many settings reads like this.
|
@henningandersen the part that was causing it to show up in the flamegraph was the stacktrace creation: My original change was not to guard the setting reads, but to just swap the order of parsing (see 78e5fcc). That doesn't add any extra complexity, but the "right" order depends on the default value, although I don't expect this default value to change much. |
|
@henningandersen any thoughts on my previous comment? |
|
I prefer the original version (only saw the new version). Can I suggest to follow this pattern, i.e., skip the double parsing if the value ends in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've introduced a change, which needs addressing.
| try { | ||
| return new CompoundFileThreshold(Double.parseDouble(noCFSRatio)); | ||
| } catch (NumberFormatException ex) { | ||
| throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still parse it as bytes if it is not parseable as a double. For instance, ByteSizeValue.parseBytesSizeValue trims white space, so " 1gb " would no longer be legal with this change, but was in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added two test cases locally:
assertCompoundThreshold(build(" 1gb"), 1.0, ByteSizeValue.ofGb(1));
assertCompoundThreshold(build(" 0"), 0, ByteSizeValue.ofBytes(Long.MAX_VALUE));and both pass on my current changes. Looking at the source code of Double.parseDouble, that calls FloatingDecimal.readJavaFormatString under the hood, which trims white spaces too:
in = in.trim(); // don't fool around with white space.Are there any other examples you can think of that wouldn't be legal with this change? I can invert the if-statement to if (noCFSRatio.endsWith("b") == false && noCFSRatio.endsWith("B") == false); that allows us to do the bytes parsing at the end, if you feel more confident with that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think " 1gb " would fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I missed the white space at the end of your example. My bad...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried " 1gb " and that works as well. It took me a minute to realize why, but I noticed we have the following line:
elasticsearch/server/src/main/java/org/elasticsearch/index/MergePolicyConfig.java
Lines 395 to 396 in 5d1a0a6
| private static CompoundFileThreshold parseCompoundFormat(String noCFSRatio) { | |
| noCFSRatio = noCFSRatio.trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, then "1k" will fail I think? Or "1t"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, you're right. Sorry, I'm not sharp on this PR... Fixed in 88c62e0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| throw e; | ||
| return new CompoundFileThreshold(Double.parseDouble(noCFSRatio)); | ||
| } catch (NumberFormatException e) { | ||
| // ignore, see if it parses as bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to retain this as suppressed if the parsing below fails too, just like the original version kept the byte size parsing as suppressed (fine to swap which is suppressed I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I realized that too, but couldn't think of a clean way to implement that. I added some logic in 86b35e5. Let me know if that matches what you had in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me.
MergePolicyConfig compound format parsing orderMergePolicyConfig
…35643) By checking if the setting value ends with a `b`, we can skip the ratio/double parsing. The default value is `1gb`, so this will skip the ratio parsing by default. I was doing some benchmarking on component templates and noticed the `MergePolicyConfig` constructor sticking out slightly in flamegraphs due to the failing double parsing.

By checking if the setting value ends with a
b, we can skip the ratio/double parsing. The default value is1gb, so this will skip the ratio parsing by default.I was doing some benchmarking on component templates and noticed the
MergePolicyConfigconstructor sticking out slightly in flamegraphs due to the failing double parsing. This minor optimization is almost not worth opening a PR for, but I figured it doesn't hurt to do.