New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8261847: performance of java.lang.Record::toString should be improved #6403
8261847: performance of java.lang.Record::toString should be improved #6403
Conversation
👋 Welcome back vromero! A progress list of the required criteria for merging this PR into |
@vicente-romero-oracle The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
There is a limit on the number of arguments that you can feed into |
Class<?>[] types = Stream.of(getters) | ||
.map(g -> g.type().returnType()) | ||
.toList() | ||
.toArray(new Class<?>[getters.length]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not new Class[0]
? It's likely to be faster, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm not reviewer.)
I think .toArray(Class<?>[]::new)
should be better here. .toList
seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class<?>[] types = Stream.of(getters)
.map(g -> g.type().returnType())
.toArray(Class<?>[]::new);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do not use a stream but a simple loop, codes in bootstrap methods should be low-level
As Aleksey mentioned, you need a fallback for more that 200 components (limit of StringConcatFactory). |
I recall string concat indy supports up to 200 args while a record can have max 254 components. should we add a test case with a 254-component record to ensure its toString will execute without exceptions? |
More specifically the Tests exercising maxed out records sounds like something we need to add if such doesn't already exist somewhere. |
thanks all for the comments! I will be uploading a new iteration soon |
By the way, considering the fact that the limit of For example, I have to duplicate this value based on Javadoc here while there could have been something similar to |
I have uploaded another iteration, this one can deal with any number of record components, some numbers for when the number of record components is pushed to the maximum (254 components): JDK18 vanilla:
With this PR:
some observations, we are doing better than before but still worst than lombok. I don't think having to create a more complicated MH tree is the key, our numbers are still worst even if there is no need to create too much levels in the MH tree like for 100 record components occupying a spot each. Another observation is that even though our slowest implementation is the one for the Thanks! |
I have done some additional experiments as suggested by Claes, thanks, for different number of record components. In all cases all the components are of type
Then also after a suggestion from Claes I modified the maximum number of slots I would be chopping the arguments into, I gave a try with 20 slots and got these numbers:
It seems like the execution is way faster for these number of slots. |
FYI: this patch also seems to solve JDK-8265747. |
yep, although I prefer to keep JDK-8265747 because it is also referring to the hashCode method, and so far I have been focusing on the |
I suggested this experiment to split up the concatenations more aggressively to diagnose if we're having an issue here where the performance of the method handle produced by While I haven't analyzed the root cause here (I think it seems likely that the JIT bails out at some point when the MH combinator tree grows too large or more likely too deep (long?)) I'd characterize this is a performance bug in the Assuming all you had to do to improve over the current patch was set your local |
Regarding the slot limit in |
I think this splitting logic only works for special cases. In this case, when there is only one object acting as the source of all values. Other splitting solutions needing more inputs require multiple method handles and multiple calls because of these complexly determined limits. The immediate solution here would be to define a static final constant like MAX–STRING-CONCAT–SLOTS and leave the number unspecified. I believe 200 was chosen because it was arbitrarily safe. As Vicente and Claes have shown and I verified in the templated string work, 20-25 slots per call seems to be the sweet spot. Having the resulting method with more inputs seems to make it more difficult to JIT optimize. |
@stuart-marks yes, a general purpose splitting logic moved into the @JimLaskey I don't see why it wouldn't work generally from the point of view of the |
Too me it looks as if it something that has to be considered as part of the specification, at least in the current state.
If it was an unspecified implementation detail, than arbitrary compilers (at least if we speak about For this reason the current state (IMO) should be considered the part of the specification and thus it seems reasonable to either:
I understand that this is mostly out of this PR's scope and should definitely be discussed more wisely but I guess that it's worth starting this discussion here where it can be seen as an issue. |
@cl4es I was thinking of the more general case 200 < N. You were in the 200 < N < 253 case. I think it would be better to pass in a chunk size (<= 200) and an array of ptype and get an array of method handles each taking a chunk size of arguments plus result of prior call. This is more like what javac does. |
hi guys, can I have a thumbs up for this one? I will continue doing additional research for the TIA |
constants[1 + 2 * i] = names.get(i) + "="; | ||
if (i != names.size() - 1) { | ||
constants[1 + 2 * i + 1] = ", "; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if it would be cleaner to use a full recipe.
StringBuilder recipeSB = new StringBuilder();
recipeSB.append(receiverClass.getSimpleName());
recipeSB.append('[');
for (int i = 0; i < names.size(); i++) {
if (i != 0) {
recipeSB.append(", ");
}
recipeSB.append(names.get(i));
recipeSB.append('=');
recipeSB.append('\1');
}
recipeSB.append(']');
String recipe = recipeSB.toString();
FWIW I did a few experiments trying to move the chunking to The threshold for when the JIT fails to optimize seem to be pushed up a bit and I get a 4x gains when concatenating 100 Strings (though it takes a good while for the microbenchmark to stabilize). It also fails to make much of a difference when looking at 200 arguments (maybe 1.4x win). The experiment I have done so far are crude and aggregates the results into a String builder with no size estimation, so it adds a but of unnecessary allocation that could be improved. I think this needs way more work to be a good enhancement and that splitting up outside is going to remain better for now. A "hidden" drawback with chunking is that you're likely going to be allocating more. I also re-realized that it'll be pretty hard (or impossible) to get around having some MH slot limit: even though we chunk it up internally, the MH we return must have a type that matches the type given to the bootstrap method, and any operations on the tree that require some temporary arguments we need to pass around will require some argument slots. The current strategy doesn't use too many temporaries, so the real limit might be around 250, but it makes sense to give some room for alternative implementations that use more temporaries if that makes things more efficient. |
MethodHandle[] toSplit = getters; | ||
int namesIndex = 0; | ||
do { | ||
/* StringConcatFactory::makeConcatWithConstants can only deal with 200 spots, longs and double occupy two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typos: spot -> slot, chunck -> chunk, diference -> difference
} | ||
|
||
/** | ||
* Chops the getters into smaller chunks according to the maximum number of spots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: spots -> slots
* Chops the getters into smaller chunks according to the maximum number of spots | ||
* StringConcatFactory::makeConcatWithConstants can chew | ||
* @param getters the current getters | ||
* @return chunks that wont surpass the maximum number of spots StringConcatFactory::makeConcatWithConstants can chew |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
new Object[0] | ||
).getTarget(); | ||
mhs[splitIndex] = MethodHandles.filterArguments(mhs[splitIndex], 0, currentSplitGetters); | ||
mhs[splitIndex] = MethodHandles.permuteArguments( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is some gnarly logic. Could we add some comments on what permuteArguments with a reorder array of just zeros is doing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not unusual. It spreads a single argument across several "getters". But a comment wouldn't hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen it used like this before. The duplication behavior is mentioned explicitly with an example in the javadoc, sure, but it still took me a deep reading of the docs to understand what was going on here and what the intent was.
@JimLaskey I'm generating several recipes one per each invocation of StringConcatFactory::makeConcatWithConstants I think that generating only one recipe is possible only if the number of |
Thanks for sharing this and for doing the experiment! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the comment and fixing typos.
this ship is sailing! thanks everyone for the feedback and interest! |
/integrate |
@vicente-romero-oracle This pull request has not yet been marked as ready for integration. |
sure, thanks to you |
@vicente-romero-oracle This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 120 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
/integrate |
Going to push as commit 7b67a49.
Your commit was automatically rebased without conflicts. |
@vicente-romero-oracle Pushed as commit 7b67a49. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Please review this PR which aims to optimize the implementation of the
toString
method we provide for records. A benchmark comparing the implementation we are providing for records with lombok found out that lombok is much faster mainly because our implementation usesString::format
. This fix is basically delegating on StringConcatFactory::makeConcatWithConstants which is faster.TIA
This is the result of the benchmark comparing records to lombok with vanilla JDK:
after this patch:
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6403/head:pull/6403
$ git checkout pull/6403
Update a local copy of the PR:
$ git checkout pull/6403
$ git pull https://git.openjdk.java.net/jdk pull/6403/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 6403
View PR using the GUI difftool:
$ git pr show -t 6403
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6403.diff