-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8263087: Add a MethodHandle combinator that switches over a set of MethodHandles #3401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back jvernee! A progress list of the required criteria for merging this PR into |
/csr |
@JornVernee has indicated that a compatibility and specification (CSR) request is needed for this pull request. |
@JornVernee The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
Mailing list message from Remi Forax on core-libs-dev: ----- Mail original -----
I think the combinator should be lookupswitch which is more general than tableswitch with a special case when generating the bytecode to generate a tableswitch instead of a lookupswitch if the indexes are subsequent.
As i said in the bug when we discuss about that the filtering function,
This scheme also allows to never JIT compile a branch which is never used. |
Mailing list message from Jorn Vernee on core-libs-dev: On 09/04/2021 18:54, Remi Forax wrote:
One of the bigger downsides I see in supporting lookupswitch directly is
Right, but lookupswitch also ties us into C2's optimization strategy for Though, I'm not saying that it's not worth it to add a lookupswitch WRT picking the translation strategy based on the set of keys; I'm note
Yes, that's a good point, thanks. Thanks for the input, |
Mailing list message from John Rose on core-libs-dev: On Apr 9, 2021, at 9:55 AM, Remi Forax <forax at univ-mlv.fr> wrote:
We can get there in the simpler steps Jorn has outlined. The combinator is much simpler if the case numbers are implicit in [0,N). Then it?s natural to filter on the [0,N) input as a separately factored choice. That also scales to pattern-switch. I agree with the choice to have N call sites. It?s possible to build the one call site version on top using constant combinators but not vice versa. |
Mailing list message from forax at univ-mlv.fr on core-libs-dev: ----- Mail original -----
Hi John,
I fail to see how it can work.
An array of MethodHandles + a default method handle is simpler than an array of sorted ints + an array of MethodHandles + a default method, but not much simpler.
yes, for all the switches, pattern-switch, enum-switch but not for the string switch which requires a lookup switch.
yes, |
Jan Lahoda has made several good examples of that: https://github.com/lahodaj/jdk/blob/switch-bootstrap/src/java.base/share/classes/java/lang/runtime/SwitchBootstraps.java i.e. several filtering strategies for translating a String into a table index (which can then be fed to I ran some benchmarks: Here, 'legacy' is what C2 does with Maybe it's worth it to include such an example & benchmark in this patch as well (show off how to emulate |
I've uploaded a benchmark that simulates a lookup switch using the tableSwitch combinator as well, using a HashMap lookup as a filter: a7157eb For that particular key set (the same as from the graph above), HashMap is faster:
But, I've found it really depends on the key set. However, this is mostly to demonstrate that emulation can have competitive performance with native |
Ok, let restart from the beginning,
The first strategy is an optimization, it will get you good performance by example if you compare a switch on a hirerachy on types and the equivalent method call. But you can not use that strategy for all switch, it's more an optimization. The tests above are using the first strategy, which I think is not what we should implement by default given that it doesn't work will all cases. In the particular case of a switch on string, javac generates two switches, the front one and the back one, if we want to compare, we should implement the second strategy, so indy or the equivalent constant handle should take a String as parameter and return an int. On the test themselves, for the hash, the Map should be directly bound, it will be more efficient, the asm version doesn't not appear in the graphics and there is a missing strategy that is using a MutableCallSite to switch from the a cascade of guards using the real values (store the String value even if it goes to |
Mailing list message from John Rose on core-libs-dev: On Apr 9, 2021, at 11:15 AM, forax at univ-mlv.fr wrote:
If you have a fixed set of N cases it is always valid In the second case, C2 will inline the lookupswitch and The MH combinator for lookupswitch can use a data-driven The SwitchBootstraps class is the place to match a
Simpler by the complexity of the sorting, which is a sharp edge.
Nope; see above.
Above; reduce to perfect hash plus lookupswitch Summary: Switches only need one case label domain: [0,N). ? John |
Mailing list message from John Rose on core-libs-dev: On Apr 9, 2021, at 4:00 PM, John Rose <john.r.rose at oracle.com<mailto:john.r.rose at oracle.com>> wrote: The MH combinator for lookupswitch can use a data-driven This may be confusing on a couple of points. Second, when the input array or List is of int (or |
No, they are using the second strategy. The SwitchBootstraps patch I linked to replaces the front end As John also described, a hypothetical lookupSwitch combinator can be emulated by using a The combinator added by this PR is not meant to replace any part of the String switch translation. For pattern switch the |
Mailing list message from Remi Forax on core-libs-dev:
As you said, the classifier is either a lookup switch or a hashmap.get() or a perfect hash function like ordinal(). This does not mean we do not need the tableswitch combinator, it means we need both. Firthermore, if we do have both combinators, there is no need to a special mechanism, or am i missing something ? R?mi |
Mailing list message from Remi Forax on core-libs-dev: ----- Mail original -----
I agree this is orthogonal and we can continue that discussion without blocking this PR. About your benchmark, did you test with some strings going into "default", because it is usually in that case that you need a proper lookup switch, R?mi |
Yes, for the benchmarks I ran, the default case was just as likely as the other cases, so e.g. if there were 10 cases, there was a 1/11 chance the default case was hit. This might need tweaking to be more realistic but... Note that the cascading guard with test actually works more like a binary search, where each guard tests against a pivot point in the search, and then decides to go either to the left or the right side of the tree. So, when looking up the default value we don't necessarily need to do a search over all the cases. Only for hash collisions does it fall back to a linear search over all the values with the same hash code. This is also how C2 translates |
@JornVernee This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
I've just gotten back after 3 weeks of being sick, and would like to try moving this PR forward again. Are there any remaining concerns with adding a tableSwitch combinator? Reading back some of the discussion, I think the remaining point of contention is about adding a lookupSwitch combinator as well, which I think is a good idea (as a followup) in order to expose the lookupswitch bytecode as a combinator as well, but which doesn't seem like a blocker for this patch. |
I hope you are well now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks great!
I have a number of nits and a request to try and remodel so that NamedFunction
isn't responsible for holding the arbitrary intrinsic data that you want to add here.
src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java
Outdated
Show resolved
Hide resolved
@@ -1068,20 +1095,28 @@ boolean contains(Name name) { | |||
private @Stable MethodHandle resolvedHandle; | |||
@Stable MethodHandle invoker; | |||
private final MethodHandleImpl.Intrinsic intrinsicName; | |||
private final Object intrinsicData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the wrong place to add arbitrary data related only to intrinsics. Could this be moved either into MethodHandleImpl$IntrinsicMethodHandle
or - perhaps less appealingly - rewrite the MethodHandleImpl.Intrinsic
to a regular class which can then be instantiated with extra data? That NamedFunction
holds a field of type MethodHandleImpl.Intrinsic
instead of delegating to resolvedHandle.intrinsicName()
seem like a pre-existing kludge that would be good to sort out why/if it's really necessary.
src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java
Outdated
Show resolved
Hide resolved
src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java
Outdated
Show resolved
Hide resolved
test/micro/org/openjdk/bench/java/lang/invoke/MethodHandlesTableSwitchRandom.java
Show resolved
Hide resolved
test/micro/org/openjdk/bench/java/lang/invoke/MethodHandlesTableSwitchOpaqueSingle.java
Show resolved
Hide resolved
- Reduce benchmark cases - Remove NamedFunction::intrinsicName
f26a908
to
80a706f
Compare
Thanks for the review @cl4es I've addressed your review comments. I've reduced the number of cases in the benchmark to 5, 10, and 25, which is the most interesting area to look at, and also removed the offset cases for the non-constant input benchmarks (proving the offset is constant folded only needs to be done once I think). WRT Also, I've rebased the patch onto the latest mainline, since I was having some issues compiling (something with the boot JDK being the wrong version it seems). I've squashed all my previous commits into a commit labeled 'All changes prior to review' FYI (because of the rebase, this commit shows up after your review comments on the timeline on GitHub). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working through my comments, especially cleaning up how intrinsic data is handled which now looks much more like a natural fit!
I've added a usage example to the javadoc in response to a review comment on the CSR. |
@JornVernee This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 142 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
/integrate |
@JornVernee Since your change was applied there have been 151 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 3623abb. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This patch adds a
tableSwitch
combinator that can be used to switch over a set of method handles given an index, with a fallback in case the index is out of bounds, much like thetableswitch
bytecode. Here is a description of how it works (copied from the javadoc):The combinator does not support specifying the starting index, so the switch cases always run from 0 to however many target handles are specified. A starting index can be added manually with another combination step that filters the input index by adding or subtracting a constant from it, which does not affect performance. One of the reasons for not supporting a starting index is that it allows for more lambda form sharing, but also simplifies the implementation somewhat. I guess an open question is if a convenience overload should be added for that case?
Lookup switch can also be simulated by filtering the input through an injection function that translates it into a case index, which has also proven to have the ability to have comparable performance to, or even better performance than, a bytecode-native
lookupswitch
instruction. I plan to add such an injection function to the runtime libraries in the future as well. Maybe at that point it could be evaluated if it's worth it to add a lookup switch combinator as well, but I don't see an immediate need to include it in this patch.The current bytecode intrinsification generates a call for each switch case, which guarantees full inlining of the target method handles. Alternatively we could only have 1 callsite at the end of the switch, where each case just loads the target method handle, but currently this does not allow for inlining of the handles, since they are not constant.
Maybe a future C2 optimization could look at the receiver input for invokeBasic call sites, and if the input is a phi node, clone the call for each constant input of the phi. I believe that would allow simplifying the bytecode without giving up on inlining.
Some numbers from the added benchmarks:
Testing:
Thanks,
Jorn
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3401/head:pull/3401
$ git checkout pull/3401
Update a local copy of the PR:
$ git checkout pull/3401
$ git pull https://git.openjdk.java.net/jdk pull/3401/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 3401
View PR using the GUI difftool:
$ git pr show -t 3401
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3401.diff