Skip to content

8263087: Add a MethodHandle combinator that switches over a set of MethodHandles #3401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

JornVernee
Copy link
Member

@JornVernee JornVernee commented Apr 8, 2021

This patch adds a tableSwitch combinator that can be used to switch over a set of method handles given an index, with a fallback in case the index is out of bounds, much like the tableswitch bytecode. Here is a description of how it works (copied from the javadoc):

 Creates a table switch method handle, which can be used to switch over a set of target
 method handles, based on a given target index, called selector.

 For a selector value of {@code n}, where {@code n} falls in the range {@code [0, N)},
 and where {@code N} is the number of target method handles, the table switch method
 handle will invoke the n-th target method handle from the list of target method handles.

 For a selector value that does not fall in the range {@code [0, N)}, the table switch
 method handle will invoke the given fallback method handle.

 All method handles passed to this method must have the same type, with the additional
 requirement that the leading parameter be of type {@code int}. The leading parameter
 represents the selector.

 Any trailing parameters present in the type will appear on the returned table switch
 method handle as well. Any arguments assigned to these parameters will be forwarded,
 together with the selector value, to the selected method handle when invoking it.

The combinator does not support specifying the starting index, so the switch cases always run from 0 to however many target handles are specified. A starting index can be added manually with another combination step that filters the input index by adding or subtracting a constant from it, which does not affect performance. One of the reasons for not supporting a starting index is that it allows for more lambda form sharing, but also simplifies the implementation somewhat. I guess an open question is if a convenience overload should be added for that case?

Lookup switch can also be simulated by filtering the input through an injection function that translates it into a case index, which has also proven to have the ability to have comparable performance to, or even better performance than, a bytecode-native lookupswitch instruction. I plan to add such an injection function to the runtime libraries in the future as well. Maybe at that point it could be evaluated if it's worth it to add a lookup switch combinator as well, but I don't see an immediate need to include it in this patch.

The current bytecode intrinsification generates a call for each switch case, which guarantees full inlining of the target method handles. Alternatively we could only have 1 callsite at the end of the switch, where each case just loads the target method handle, but currently this does not allow for inlining of the handles, since they are not constant.

Maybe a future C2 optimization could look at the receiver input for invokeBasic call sites, and if the input is a phi node, clone the call for each constant input of the phi. I believe that would allow simplifying the bytecode without giving up on inlining.

Some numbers from the added benchmarks:

Benchmark                                        (numCases)  (offset)  (sorted)  Mode  Cnt   Score   Error  Units
MethodHandlesTableSwitchConstant.testSwitch               5         0       N/A  avgt   30   4.186 � 0.054  ms/op
MethodHandlesTableSwitchConstant.testSwitch               5       150       N/A  avgt   30   4.164 � 0.057  ms/op
MethodHandlesTableSwitchConstant.testSwitch              10         0       N/A  avgt   30   4.124 � 0.023  ms/op
MethodHandlesTableSwitchConstant.testSwitch              10       150       N/A  avgt   30   4.126 � 0.025  ms/op
MethodHandlesTableSwitchConstant.testSwitch              25         0       N/A  avgt   30   4.137 � 0.042  ms/op
MethodHandlesTableSwitchConstant.testSwitch              25       150       N/A  avgt   30   4.113 � 0.016  ms/op
MethodHandlesTableSwitchConstant.testSwitch              50         0       N/A  avgt   30   4.118 � 0.028  ms/op
MethodHandlesTableSwitchConstant.testSwitch              50       150       N/A  avgt   30   4.127 � 0.019  ms/op
MethodHandlesTableSwitchConstant.testSwitch             100         0       N/A  avgt   30   4.116 � 0.013  ms/op
MethodHandlesTableSwitchConstant.testSwitch             100       150       N/A  avgt   30   4.121 � 0.020  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch           5         0       N/A  avgt   30   4.113 � 0.009  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch           5       150       N/A  avgt   30   4.149 � 0.041  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch          10         0       N/A  avgt   30   4.121 � 0.026  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch          10       150       N/A  avgt   30   4.113 � 0.021  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch          25         0       N/A  avgt   30   4.129 � 0.028  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch          25       150       N/A  avgt   30   4.105 � 0.019  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch          50         0       N/A  avgt   30   4.097 � 0.021  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch          50       150       N/A  avgt   30   4.131 � 0.037  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch         100         0       N/A  avgt   30   4.135 � 0.025  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch         100       150       N/A  avgt   30   4.139 � 0.145  ms/op
MethodHandlesTableSwitchRandom.testSwitch                 5         0      true  avgt   30   4.894 � 0.028  ms/op
MethodHandlesTableSwitchRandom.testSwitch                 5         0     false  avgt   30  11.526 � 0.194  ms/op
MethodHandlesTableSwitchRandom.testSwitch                 5       150      true  avgt   30   4.882 � 0.025  ms/op
MethodHandlesTableSwitchRandom.testSwitch                 5       150     false  avgt   30  11.532 � 0.034  ms/op
MethodHandlesTableSwitchRandom.testSwitch                10         0      true  avgt   30   5.065 � 0.076  ms/op
MethodHandlesTableSwitchRandom.testSwitch                10         0     false  avgt   30  13.016 � 0.020  ms/op
MethodHandlesTableSwitchRandom.testSwitch                10       150      true  avgt   30   5.103 � 0.051  ms/op
MethodHandlesTableSwitchRandom.testSwitch                10       150     false  avgt   30  12.984 � 0.102  ms/op
MethodHandlesTableSwitchRandom.testSwitch                25         0      true  avgt   30   8.441 � 0.165  ms/op
MethodHandlesTableSwitchRandom.testSwitch                25         0     false  avgt   30  13.371 � 0.060  ms/op
MethodHandlesTableSwitchRandom.testSwitch                25       150      true  avgt   30   8.628 � 0.032  ms/op
MethodHandlesTableSwitchRandom.testSwitch                25       150     false  avgt   30  13.542 � 0.020  ms/op
MethodHandlesTableSwitchRandom.testSwitch                50         0      true  avgt   30   4.701 � 0.015  ms/op
MethodHandlesTableSwitchRandom.testSwitch                50         0     false  avgt   30  13.562 � 0.063  ms/op
MethodHandlesTableSwitchRandom.testSwitch                50       150      true  avgt   30   7.991 � 3.111  ms/op
MethodHandlesTableSwitchRandom.testSwitch                50       150     false  avgt   30  13.543 � 0.088  ms/op
MethodHandlesTableSwitchRandom.testSwitch               100         0      true  avgt   30   4.712 � 0.020  ms/op
MethodHandlesTableSwitchRandom.testSwitch               100         0     false  avgt   30  13.600 � 0.085  ms/op
MethodHandlesTableSwitchRandom.testSwitch               100       150      true  avgt   30   4.676 � 0.011  ms/op
MethodHandlesTableSwitchRandom.testSwitch               100       150     false  avgt   30  13.476 � 0.043  ms/op

Testing:

  • Running of included benchmarks
  • Inspecting inlining trace and verifying method handle targets are inlined
  • Running TestTableSwitch test (currently the only user of the new code)
  • Running java/lang/invoke tests (just in case)
  • Some manual testing

Thanks,
Jorn


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8263087: Add a MethodHandle combinator that switches over a set of MethodHandles

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3401/head:pull/3401
$ git checkout pull/3401

Update a local copy of the PR:
$ git checkout pull/3401
$ git pull https://git.openjdk.java.net/jdk pull/3401/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3401

View PR using the GUI difftool:
$ git pr show -t 3401

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3401.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 8, 2021

👋 Welcome back jvernee! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@JornVernee
Copy link
Member Author

/csr

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label Apr 8, 2021
@openjdk
Copy link

openjdk bot commented Apr 8, 2021

@JornVernee has indicated that a compatibility and specification (CSR) request is needed for this pull request.
@JornVernee please create a CSR request and add link to it in JDK-8263087. This pull request cannot be integrated until the CSR request is approved.

@openjdk
Copy link

openjdk bot commented Apr 8, 2021

@JornVernee The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Apr 8, 2021
@JornVernee JornVernee marked this pull request as ready for review April 9, 2021 10:46
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 9, 2021
@mlbridge
Copy link

mlbridge bot commented Apr 9, 2021

Webrevs

@mlbridge
Copy link

mlbridge bot commented Apr 9, 2021

Mailing list message from Remi Forax on core-libs-dev:

----- Mail original -----

De: "Jorn Vernee" <jvernee at openjdk.java.net>
?: "core-libs-dev" <core-libs-dev at openjdk.java.net>
Envoy?: Vendredi 9 Avril 2021 12:51:53
Objet: RFR: 8263087: Add a MethodHandle combinator that switches over a set of MethodHandles

This patch adds a `tableSwitch` combinator that can be used to switch over a set
of method handles given an index, with a fallback in case the index is out of
bounds, much like the `tableswitch` bytecode.

The combinator does not support specifying the starting index, so the switch
cases always run from 0 to however many target handles are specified. A
starting index can be added manually with another combination step that filters
the input index by adding or subtracting a constant from it, which does not
affect performance. One of the reasons for not supporting a starting index is
that it allows for more lambda form sharing, but also simplifies the
implementation somewhat. I guess an open question is if a convenience overload
should be added for that case?

I think the combinator should be lookupswitch which is more general than tableswitch with a special case when generating the bytecode to generate a tableswitch instead of a lookupswitch if the indexes are subsequent.

Lookup switch can also be simulated by filtering the input through an injection
function that translates it into a case index, which has also proven to have
the ability to have comparable performance to, or even better performance than,
a bytecode-native `lookupswitch` instruction. I plan to add such an injection
function to the runtime libraries in the future as well. Maybe at that point it
could be evaluated if it's worth it to add a lookup switch combinator as well,
but I don't see an immediate need to include it in this patch.

As i said in the bug when we discuss about that the filtering function,
i believe that the filtering function for emulating lookupswitch is lookupswitch itself.

The current bytecode intrinsification generates a call for each switch case,
which guarantees full inlining of the target method handles. Alternatively we
could only have 1 callsite at the end of the switch, where each case just loads
the target method handle, but currently this does not allow for inlining of the
handles, since they are not constant.

This scheme also allows to never JIT compile a branch which is never used.

@mlbridge
Copy link

mlbridge bot commented Apr 9, 2021

Mailing list message from Jorn Vernee on core-libs-dev:

On 09/04/2021 18:54, Remi Forax wrote:

----- Mail original -----

De: "Jorn Vernee" <jvernee at openjdk.java.net>
?: "core-libs-dev" <core-libs-dev at openjdk.java.net>
Envoy?: Vendredi 9 Avril 2021 12:51:53
Objet: RFR: 8263087: Add a MethodHandle combinator that switches over a set of MethodHandles
This patch adds a `tableSwitch` combinator that can be used to switch over a set
of method handles given an index, with a fallback in case the index is out of
bounds, much like the `tableswitch` bytecode.

The combinator does not support specifying the starting index, so the switch
cases always run from 0 to however many target handles are specified. A
starting index can be added manually with another combination step that filters
the input index by adding or subtracting a constant from it, which does not
affect performance. One of the reasons for not supporting a starting index is
that it allows for more lambda form sharing, but also simplifies the
implementation somewhat. I guess an open question is if a convenience overload
should be added for that case?
I think the combinator should be lookupswitch which is more general than tableswitch with a special case when generating the bytecode to generate a tableswitch instead of a lookupswitch if the indexes are subsequent.

One of the bigger downsides I see in supporting lookupswitch directly is
that the lambda form and intrinsified bytecode become dependent on the
key set, which allows for less sharing. Something that is not/less of a
problem with tableswitch + filter function, because the filter function
could potentially be the same for any key set (where the key set is
bound to the filter function instead).

Lookup switch can also be simulated by filtering the input through an injection
function that translates it into a case index, which has also proven to have
the ability to have comparable performance to, or even better performance than,
a bytecode-native `lookupswitch` instruction. I plan to add such an injection
function to the runtime libraries in the future as well. Maybe at that point it
could be evaluated if it's worth it to add a lookup switch combinator as well,
but I don't see an immediate need to include it in this patch.

As i said in the bug when we discuss about that the filtering function,
i believe that the filtering function for emulating lookupswitch is lookupswitch itself.

Right, but lookupswitch also ties us into C2's optimization strategy for
lookupswitch. Having the ability to specify the filter function allows
picking a better one for the particular use-case. For instance for
switches with a large-ish number of cases (15+) it's faster to use a
HashMap lookup as a filtering function (according to my benchmarking),
with comparinble results to native lookupswitch if the filter function
uses a tree of if/else.

Though, I'm not saying that it's not worth it to add a lookupswitch
combinator as well, to me it seems like tableswitch is the more
flexible/minimal primitive, because it doesn't force the use of a
particular lookup strategy.

WRT picking the translation strategy based on the set of keys; I'm note
super keen on that. Since the MethodHandle combinators are a low-level
API, I ended up adopting a simple 'what you see is what you get'
philosophy as much as possible, with the possibility of building other
use-cases on top. i.e. a tableSwitch combinator that reliably translates
into the tableswitch bytecode, a lookupSwitch combinator that reliably
translates into the lookupswitch bytecode, and an exception if I get the
key set wrong, rather than silently switching strategies to one or the
other.

The current bytecode intrinsification generates a call for each switch case,
which guarantees full inlining of the target method handles. Alternatively we
could only have 1 callsite at the end of the switch, where each case just loads
the target method handle, but currently this does not allow for inlining of the
handles, since they are not constant.
This scheme also allows to never JIT compile a branch which is never used.

Yes, that's a good point, thanks.

Thanks for the input,
Jorn

@mlbridge
Copy link

mlbridge bot commented Apr 9, 2021

Mailing list message from John Rose on core-libs-dev:

On Apr 9, 2021, at 9:55 AM, Remi Forax <forax at univ-mlv.fr> wrote:

I think the combinator should be lookupswitch which is more general than tableswitch with a special case when generating the bytecode to generate a tableswitch instead of a lookupswitch if the indexes are subsequent.

We can get there in the simpler steps Jorn has outlined.

The combinator is much simpler if the case numbers are implicit in [0,N). Then it?s natural to filter on the [0,N) input as a separately factored choice. That also scales to pattern-switch.

I agree with the choice to have N call sites. It?s possible to build the one call site version on top using constant combinators but not vice versa.

@mlbridge
Copy link

mlbridge bot commented Apr 9, 2021

Mailing list message from forax at univ-mlv.fr on core-libs-dev:

----- Mail original -----

De: "John Rose" <john.r.rose at oracle.com>
?: "Remi Forax" <forax at univ-mlv.fr>
Cc: "Jorn Vernee" <jvernee at openjdk.java.net>, "core-libs-dev" <core-libs-dev at openjdk.java.net>
Envoy?: Vendredi 9 Avril 2021 20:01:18
Objet: Re: RFR: 8263087: Add a MethodHandle combinator that switches over a set of MethodHandles

Hi John,

On Apr 9, 2021, at 9:55 AM, Remi Forax <forax at univ-mlv.fr> wrote:

I think the combinator should be lookupswitch which is more general than
tableswitch with a special case when generating the bytecode to generate a
tableswitch instead of a lookupswitch if the indexes are subsequent.

We can get there in the simpler steps Jorn has outlined.

I fail to see how it can work.

The combinator is much simpler if the case numbers are implicit in [0,N). Then
it?s natural to filter on the [0,N) input as a separately factored choice.

An array of MethodHandles + a default method handle is simpler than an array of sorted ints + an array of MethodHandles + a default method, but not much simpler.

That also scales to pattern-switch.

yes, for all the switches, pattern-switch, enum-switch but not for the string switch which requires a lookup switch.
Can you outline how to use the tableswitch combinator in the case of a switch on strings ?

I agree with the choice to have N call sites. It?s possible to build the one
call site version on top using constant combinators but not vice versa.

yes,
R?mi

@JornVernee
Copy link
Member Author

JornVernee commented Apr 9, 2021

yes, for all the switches, pattern-switch, enum-switch but not for the string switch which requires a lookup switch.
Can you outline how to use the tableswitch combinator in the case of a switch on strings ?

Jan Lahoda has made several good examples of that: https://github.com/lahodaj/jdk/blob/switch-bootstrap/src/java.base/share/classes/java/lang/runtime/SwitchBootstraps.java i.e. several filtering strategies for translating a String into a table index (which can then be fed to tableswitch)

I ran some benchmarks:

results

Here, 'legacy' is what C2 does with lookupswitch.

Maybe it's worth it to include such an example & benchmark in this patch as well (show off how to emulate lookupswitch)

@JornVernee
Copy link
Member Author

JornVernee commented Apr 9, 2021

I've uploaded a benchmark that simulates a lookup switch using the tableSwitch combinator as well, using a HashMap lookup as a filter: a7157eb

For that particular key set (the same as from the graph above), HashMap is faster:

Benchmark                                              Mode  Cnt   Score   Error  Units
MethodHandlesEmulateLookupSwitch.emulatedLookupSwitch  avgt   30  19.450 � 0.079  ms/op
MethodHandlesEmulateLookupSwitch.nativeLookupSwitch    avgt   30  25.370 � 0.159  ms/op

But, I've found it really depends on the key set. However, this is mostly to demonstrate that emulation can have competitive performance with native lookupswitch. e.g. to get constant folding for constant inputs another filter has to be used, since C2 can not see through the HashMap lookups.

@forax
Copy link
Member

forax commented Apr 9, 2021

Ok, let restart from the beginning,
you have two strategy to de-sugar a switch,

  • if what you do after the case do not mutate any variables, you can desugar each case to a method more or less like a lambda (it's not exactly like a lambda because there is no capture) and you have an indy in front that will call the right method handles
  • you have a front end, with an indy that associate the object to an int and a backend which is tableswitch in the bytecode

The first strategy is an optimization, it will get you good performance by example if you compare a switch on a hirerachy on types and the equivalent method call. But you can not use that strategy for all switch, it's more an optimization.
The second strategy let you encode any switches.

The tests above are using the first strategy, which I think is not what we should implement by default given that it doesn't work will all cases. In the particular case of a switch on string, javac generates two switches, the front one and the back one, if we want to compare, we should implement the second strategy, so indy or the equivalent constant handle should take a String as parameter and return an int.

On the test themselves, for the hash, the Map should be directly bound, it will be more efficient, the asm version doesn't not appear in the graphics and there is a missing strategy that is using a MutableCallSite to switch from the a cascade of guards using the real values (store the String value even if it goes to default) and then switch to a lookup switch which i've found is the optimal strategy in real code (it works a lot like a bi-morphic cache but on string values instead of on classes).

@mlbridge
Copy link

mlbridge bot commented Apr 9, 2021

Mailing list message from John Rose on core-libs-dev:

On Apr 9, 2021, at 11:15 AM, forax at univ-mlv.fr wrote:

----- Mail original -----

De: "John Rose" <john.r.rose at oracle.com>
?: "Remi Forax" <forax at univ-mlv.fr>
Cc: "Jorn Vernee" <jvernee at openjdk.java.net>, "core-libs-dev" <core-libs-dev at openjdk.java.net>
Envoy?: Vendredi 9 Avril 2021 20:01:18
Objet: Re: RFR: 8263087: Add a MethodHandle combinator that switches over a set of MethodHandles

Hi John,

On Apr 9, 2021, at 9:55 AM, Remi Forax <forax at univ-mlv.fr> wrote:

I think the combinator should be lookupswitch which is more general than
tableswitch with a special case when generating the bytecode to generate a
tableswitch instead of a lookupswitch if the indexes are subsequent.

We can get there in the simpler steps Jorn has outlined.

I fail to see how it can work.

If you have a fixed set of N cases it is always valid
to number them compactly in [0,N). If there is
another association of keys in some set K to cases,
then you simply build a mapping K ? [0,N). That
works for enums, lookupswitch, strings, and patterns.
The mapping functions would be:
- Enum::ordinal
- a lookupswitch of cases `case i: return n`, n in [0,N]
- some perfect hash, composed with lookupswitch
- some decision tree that outputs compact case numbers

In the second case, C2 will inline the lookupswitch and
tableswitch together and do branch-to-branch tensioning.
The result will be the same as if the intermediate tableswitch
had not been used.

The MH combinator for lookupswitch can use a data-driven
reverse lookup in a (frozen/stable) int[] array, using binary
search. The bytecode emitter can render such a thing as
an internal lookupswitch, if that seems desirable. But
the stable array with binary search scales to other types
besides int, so it?s the right primitive.

The SwitchBootstraps class is the place to match a
static decision tree or decision chain (of N cases) of
an arbitrary shape to compact case labels in [0,N).
Then they all feed to Jorn?s primitive.

The combinator is much simpler if the case numbers are implicit in [0,N). Then
it?s natural to filter on the [0,N) input as a separately factored choice.

An array of MethodHandles + a default method handle is simpler than an array of sorted ints + an array of MethodHandles + a default method, but not much simpler.

Simpler by the complexity of the sorting, which is a sharp edge.
The type ?sorted int array without duplicates and unaliased
or frozen? is pretty involved. Easy to make mistakes with it.

That also scales to pattern-switch.

yes, for all the switches, pattern-switch, enum-switch but not for the string switch which requires a lookup switch.

Nope; see above.

Can you outline how to use the tableswitch combinator in the case of a switch on strings ?

Above; reduce to perfect hash plus lookupswitch
producing compact int values to feed to a tableswitch.

Summary: Switches only need one case label domain: [0,N).
Everything else is case label mappings.

? John

@mlbridge
Copy link

mlbridge bot commented Apr 9, 2021

Mailing list message from John Rose on core-libs-dev:

On Apr 9, 2021, at 4:00 PM, John Rose <john.r.rose at oracle.com<mailto:john.r.rose at oracle.com>> wrote:

The MH combinator for lookupswitch can use a data-driven
reverse lookup in a (frozen/stable) int[] array, using binary
search. The bytecode emitter can render such a thing as
an internal lookupswitch, if that seems desirable. But
the stable array with binary search scales to other types
besides int, so it?s the right primitive.

This may be confusing on a couple of points.
First, the mapping function I?m talking about is not
a MH combinator, but rather a MH factory, which takes
a non-MH argument, probably an unsorted array or List
of items of any type. It then DTRT (internal hash map
or tree map or flat binary search or flat table lookup or
lookupswitch or any combination thereof) to get
an algorithm to classify the array or List elements
into a compact enumeration [0,N).

Second, when the input array or List is of int (or
Integer) then it *might* be a lookupswitch internally,
and I?m abusing the terminology to call this particular
case a lookupswitch. But it?s really just a classifier
factory, whose output is a MH of type K ? [0,N) for
some K. The output might also be ToIntFunction<K>
for all I care; that can be inter-converted with a MH.

@JornVernee
Copy link
Member Author

you have two strategy to de-sugar a switch,

  • if what you do after the case do not mutate any variables, you can desugar each case to a method more or less like a lambda (it's not exactly like a lambda because there is no capture) and you have an indy in front that will call the right method handles

  • you have a front end, with an indy that associate the object to an int and a backend which is tableswitch in the bytecode

...

The tests above are using the first strategy

No, they are using the second strategy. The SwitchBootstraps patch I linked to replaces the front end lookupswitch of a String switch with an invokedynamic that computes an index for the back end jump table, which is still a tableswitch in the bytecode.

As John also described, a hypothetical lookupSwitch combinator can be emulated by using a k -> [0, N) projection that feeds into the tableSwitch combinator that is proposed by this PR. The point of the examples I linked to was to show several flavors of projection functions as an example of how this could be implemented, and to show that they have competitive performance with a native lookupswitch instruction (the 'legacy' case). i.e. the benchmarks show the difference between lookupswitch implemented in bytecode, and a k -> [0, N) projection function built by an invokedynamic. (sorry, I should have offered more explanation in the first place)

The combinator added by this PR is not meant to replace any part of the String switch translation. For pattern switch the tableSwitch combinator could be used to implement the front end k -> [0, N) projection, but it is not strictly required. Either way, that seems orthogonal to this PR.

@mlbridge
Copy link

mlbridge bot commented Apr 13, 2021

Mailing list message from Remi Forax on core-libs-dev:

De: "John Rose" <john.r.rose at oracle.com>
?: "Remi Forax" <forax at univ-mlv.fr>
Cc: "Jorn Vernee" <jvernee at openjdk.java.net>, "core-libs-dev"
<core-libs-dev at openjdk.java.net>
Envoy?: Samedi 10 Avril 2021 01:43:49
Objet: Re: [External] : Re: RFR: 8263087: Add a MethodHandle combinator that
switches over a set of MethodHandles

On Apr 9, 2021, at 4:00 PM, John Rose < [ mailto:john.r.rose at oracle.com |
john.r.rose at oracle.com ] > wrote:

The MH combinator for lookupswitch can use a data-driven
reverse lookup in a (frozen/stable) int[] array, using binary
search. The bytecode emitter can render such a thing as
an internal lookupswitch, if that seems desirable. But
the stable array with binary search scales to other types
besides int, so it?s the right primitive.

This may be confusing on a couple of points.
First, the mapping function I?m talking about is not
a MH combinator, but rather a MH factory, which takes
a non-MH argument, probably an unsorted array or List
of items of any type. It then DTRT (internal hash map
or tree map or flat binary search or flat table lookup or
lookupswitch or any combination thereof) to get
an algorithm to classify the array or List elements
into a compact enumeration [0,N).

Second, when the input array or List is of int (or
Integer) then it *might* be a lookupswitch internally,
and I?m abusing the terminology to call this particular
case a lookupswitch. But it?s really just a classifier
factory, whose output is a MH of type K ? [0,N) for
some K. The output might also be ToIntFunction<K>
for all I care; that can be inter-converted with a MH.

As you said, the classifier is either a lookup switch or a hashmap.get() or a perfect hash function like ordinal().
The last two can be already be seen as MH, that you can already compose.
The only one we can not currently, without generating bytecode, is the lookup switch, so we should have a lookupswitch combinator.

This does not mean we do not need the tableswitch combinator, it means we need both.

Firthermore, if we do have both combinators, there is no need to a special mechanism, or am i missing something ?

R?mi

@mlbridge
Copy link

mlbridge bot commented Apr 13, 2021

Mailing list message from Remi Forax on core-libs-dev:

----- Mail original -----

De: "Jorn Vernee" <jvernee at openjdk.java.net>
?: "core-libs-dev" <core-libs-dev at openjdk.java.net>
Envoy?: Mardi 13 Avril 2021 16:59:58
Objet: Re: RFR: 8263087: Add a MethodHandle combinator that switches over a set of MethodHandles

On Thu, 8 Apr 2021 18:51:21 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

This patch adds a `tableSwitch` combinator that can be used to switch over a set
of method handles given an index, with a fallback in case the index is out of
bounds, much like the `tableswitch` bytecode. Here is a description of how it
works (copied from the javadoc):

 Creates a table switch method handle\, which can be used to switch over a set of
 target
 method handles\, based on a given target index\, called selector\.

 For a selector value of \{\@<!-- -->code n\}\, where \{\@<!-- -->code n\} falls in the range \{\@<!-- -->code \[0\,
 N\)\}\,
 and where \{\@<!-- -->code N\} is the number of target method handles\, the table switch
 method
 handle will invoke the n\-th target method handle from the list of target method
 handles\.

 For a selector value that does not fall in the range \{\@<!-- -->code \[0\, N\)\}\, the table
 switch
 method handle will invoke the given fallback method handle\.

 All method handles passed to this method must have the same type\, with the
 additional
 requirement that the leading parameter be of type \{\@<!-- -->code int\}\. The leading
 parameter
 represents the selector\.

 Any trailing parameters present in the type will appear on the returned table
 switch
 method handle as well\. Any arguments assigned to these parameters will be
 forwarded\,
 together with the selector value\, to the selected method handle when invoking
 it\.

The combinator does not support specifying the starting index, so the switch
cases always run from 0 to however many target handles are specified. A
starting index can be added manually with another combination step that filters
the input index by adding or subtracting a constant from it, which does not
affect performance. One of the reasons for not supporting a starting index is
that it allows for more lambda form sharing, but also simplifies the
implementation somewhat. I guess an open question is if a convenience overload
should be added for that case?

Lookup switch can also be simulated by filtering the input through an injection
function that translates it into a case index, which has also proven to have
the ability to have comparable performance to, or even better performance than,
a bytecode-native `lookupswitch` instruction. I plan to add such an injection
function to the runtime libraries in the future as well. Maybe at that point it
could be evaluated if it's worth it to add a lookup switch combinator as well,
but I don't see an immediate need to include it in this patch.

The current bytecode intrinsification generates a call for each switch case,
which guarantees full inlining of the target method handles. Alternatively we
could only have 1 callsite at the end of the switch, where each case just loads
the target method handle, but currently this does not allow for inlining of the
handles, since they are not constant.

Maybe a future C2 optimization could look at the receiver input for invokeBasic
call sites, and if the input is a phi node, clone the call for each constant
input of the phi. I believe that would allow simplifying the bytecode without
giving up on inlining.

Some numbers from the added benchmarks:

Benchmark (numCases) (offset) (sorted)
Mode Cnt Score Error Units
MethodHandlesTableSwitchConstant.testSwitch 5 0 N/A
avgt 30 4.186 ? 0.054 ms/op
MethodHandlesTableSwitchConstant.testSwitch 5 150 N/A
avgt 30 4.164 ? 0.057 ms/op
MethodHandlesTableSwitchConstant.testSwitch 10 0 N/A
avgt 30 4.124 ? 0.023 ms/op
MethodHandlesTableSwitchConstant.testSwitch 10 150 N/A
avgt 30 4.126 ? 0.025 ms/op
MethodHandlesTableSwitchConstant.testSwitch 25 0 N/A
avgt 30 4.137 ? 0.042 ms/op
MethodHandlesTableSwitchConstant.testSwitch 25 150 N/A
avgt 30 4.113 ? 0.016 ms/op
MethodHandlesTableSwitchConstant.testSwitch 50 0 N/A
avgt 30 4.118 ? 0.028 ms/op
MethodHandlesTableSwitchConstant.testSwitch 50 150 N/A
avgt 30 4.127 ? 0.019 ms/op
MethodHandlesTableSwitchConstant.testSwitch 100 0 N/A
avgt 30 4.116 ? 0.013 ms/op
MethodHandlesTableSwitchConstant.testSwitch 100 150 N/A
avgt 30 4.121 ? 0.020 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 5 0 N/A
avgt 30 4.113 ? 0.009 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 5 150 N/A
avgt 30 4.149 ? 0.041 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 10 0 N/A
avgt 30 4.121 ? 0.026 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 10 150 N/A
avgt 30 4.113 ? 0.021 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 25 0 N/A
avgt 30 4.129 ? 0.028 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 25 150 N/A
avgt 30 4.105 ? 0.019 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 50 0 N/A
avgt 30 4.097 ? 0.021 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 50 150 N/A
avgt 30 4.131 ? 0.037 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 100 0 N/A
avgt 30 4.135 ? 0.025 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 100 150 N/A
avgt 30 4.139 ? 0.145 ms/op
MethodHandlesTableSwitchRandom.testSwitch 5 0 true
avgt 30 4.894 ? 0.028 ms/op
MethodHandlesTableSwitchRandom.testSwitch 5 0 false
avgt 30 11.526 ? 0.194 ms/op
MethodHandlesTableSwitchRandom.testSwitch 5 150 true
avgt 30 4.882 ? 0.025 ms/op
MethodHandlesTableSwitchRandom.testSwitch 5 150 false
avgt 30 11.532 ? 0.034 ms/op
MethodHandlesTableSwitchRandom.testSwitch 10 0 true
avgt 30 5.065 ? 0.076 ms/op
MethodHandlesTableSwitchRandom.testSwitch 10 0 false
avgt 30 13.016 ? 0.020 ms/op
MethodHandlesTableSwitchRandom.testSwitch 10 150 true
avgt 30 5.103 ? 0.051 ms/op
MethodHandlesTableSwitchRandom.testSwitch 10 150 false
avgt 30 12.984 ? 0.102 ms/op
MethodHandlesTableSwitchRandom.testSwitch 25 0 true
avgt 30 8.441 ? 0.165 ms/op
MethodHandlesTableSwitchRandom.testSwitch 25 0 false
avgt 30 13.371 ? 0.060 ms/op
MethodHandlesTableSwitchRandom.testSwitch 25 150 true
avgt 30 8.628 ? 0.032 ms/op
MethodHandlesTableSwitchRandom.testSwitch 25 150 false
avgt 30 13.542 ? 0.020 ms/op
MethodHandlesTableSwitchRandom.testSwitch 50 0 true
avgt 30 4.701 ? 0.015 ms/op
MethodHandlesTableSwitchRandom.testSwitch 50 0 false
avgt 30 13.562 ? 0.063 ms/op
MethodHandlesTableSwitchRandom.testSwitch 50 150 true
avgt 30 7.991 ? 3.111 ms/op
MethodHandlesTableSwitchRandom.testSwitch 50 150 false
avgt 30 13.543 ? 0.088 ms/op
MethodHandlesTableSwitchRandom.testSwitch 100 0 true
avgt 30 4.712 ? 0.020 ms/op
MethodHandlesTableSwitchRandom.testSwitch 100 0 false
avgt 30 13.600 ? 0.085 ms/op
MethodHandlesTableSwitchRandom.testSwitch 100 150 true
avgt 30 4.676 ? 0.011 ms/op
MethodHandlesTableSwitchRandom.testSwitch 100 150 false
avgt 30 13.476 ? 0.043 ms/op

Testing:
- [x] Running of included benchmarks
- [x] Inspecting inlining trace and verifying method handle targets are inlined
- [x] Running TestTableSwitch test (currently the only user of the new code)
- [x] Running java/lang/invoke tests (just in case)
- [x] Some manual testing

Thanks,
Jorn

you have two strategy to de-sugar a switch,

* if what you do after the case do not mutate any variables, you can desugar
each case to a method more or less like a lambda (it's not exactly like a
lambda because there is no capture) and you have an indy in front that will
call the right method handles

* you have a front end, with an indy that associate the object to an int and a
backend which is tableswitch in the bytecode

...

The tests above are using the first strategy

No, they are using the second strategy. The SwitchBootstraps patch I linked to
replaces the front end `lookupswitch` of a String switch with an
`invokedynamic` that computes an index for the back end jump table, which is
still a `tableswitch` in the bytecode.

As John also described, a hypothetical lookupSwitch combinator can be emulated
by using a `k -> [0, N)` projection that feeds into the tableSwitch combinator
that is proposed by this PR. The point of the examples I linked to was to show
several flavors of projection functions as an example of how this could be
implemented, and to show that they have competitive performance with a native
`lookupswitch` instruction (the 'legacy' case). i.e. the benchmarks show the
difference between `lookupswitch` implemented in bytecode, and a `k -> [0, N)`
projection function built by an `invokedynamic`. (sorry, I should have offered
more explanation in the first place)

The combinator added by _this_ PR is not meant to replace any part of the String
switch translation. For pattern switch the `tableSwitch` combinator _could_ be
used to implement the front end `k -> [0, N)` projection, but it is not
strictly required. Either way, that seems orthogonal to this PR.

I agree this is orthogonal and we can continue that discussion without blocking this PR.

About your benchmark, did you test with some strings going into "default", because it is usually in that case that you need a proper lookup switch,
another way to say it is that, your results are too good when you use a cascade of guardWithTest.

R?mi

@JornVernee
Copy link
Member Author

JornVernee commented Apr 13, 2021

About your benchmark, did you test with some strings going into "default", because it is usually in that case that you need a proper lookup switch,
another way to say it is that, your results are too good when you use a cascade of guardWithTest.

Yes, for the benchmarks I ran, the default case was just as likely as the other cases, so e.g. if there were 10 cases, there was a 1/11 chance the default case was hit. This might need tweaking to be more realistic but...

Note that the cascading guard with test actually works more like a binary search, where each guard tests against a pivot point in the search, and then decides to go either to the left or the right side of the tree. So, when looking up the default value we don't necessarily need to do a search over all the cases. Only for hash collisions does it fall back to a linear search over all the values with the same hash code.

This is also how C2 translates lookupswitch as far as I know (but maybe John can confirm or deny whether my reading of the C2 code is correct), so I'm not surprised to see that the if-tree approach is so close to a native lookupswitch instruction in performance.

@bridgekeeper
Copy link

bridgekeeper bot commented May 12, 2021

@JornVernee This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@JornVernee
Copy link
Member Author

I've just gotten back after 3 weeks of being sick, and would like to try moving this PR forward again.

Are there any remaining concerns with adding a tableSwitch combinator?

Reading back some of the discussion, I think the remaining point of contention is about adding a lookupSwitch combinator as well, which I think is a good idea (as a followup) in order to expose the lookupswitch bytecode as a combinator as well, but which doesn't seem like a blocker for this patch.

@forax
Copy link
Member

forax commented May 12, 2021

I hope you are well now.
You are right, adding a lookupswitch can be done later, i'm fine with the current state of this patch.

Copy link
Member

@cl4es cl4es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great!

I have a number of nits and a request to try and remodel so that NamedFunction isn't responsible for holding the arbitrary intrinsic data that you want to add here.

@@ -1068,20 +1095,28 @@ boolean contains(Name name) {
private @Stable MethodHandle resolvedHandle;
@Stable MethodHandle invoker;
private final MethodHandleImpl.Intrinsic intrinsicName;
private final Object intrinsicData;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the wrong place to add arbitrary data related only to intrinsics. Could this be moved either into MethodHandleImpl$IntrinsicMethodHandle or - perhaps less appealingly - rewrite the MethodHandleImpl.Intrinsic to a regular class which can then be instantiated with extra data? That NamedFunction holds a field of type MethodHandleImpl.Intrinsic instead of delegating to resolvedHandle.intrinsicName() seem like a pre-existing kludge that would be good to sort out why/if it's really necessary.

- Reduce benchmark cases
- Remove NamedFunction::intrinsicName
@JornVernee JornVernee force-pushed the SwitchCombinator-SQUASH branch from f26a908 to 80a706f Compare May 17, 2021 17:13
@JornVernee
Copy link
Member Author

JornVernee commented May 17, 2021

Thanks for the review @cl4es

I've addressed your review comments.

I've reduced the number of cases in the benchmark to 5, 10, and 25, which is the most interesting area to look at, and also removed the offset cases for the non-constant input benchmarks (proving the offset is constant folded only needs to be done once I think).

WRT NamedFunction::intrinsicName, I think you're right that we can also just delegate to IntrinsicMethodHandle::intrinsicName, and have it indicate the intrinsic. I've implemented that change. As a result, some NamedFunction constructor call sites no longer attach the intrinsic name to the NamedFunction itself, but re-wrap the resolvedHandle they use in an IntrinsicMethodHandle with the right intrinsic name. This leads to a nice code simplification from being able to remove all the NamedFunction constructor overloads. java/lang/invoke tests are still green, but I'll re-run Tier 1-3 as well to make sure that the difference in resolvedHandle being used for some NamedFunctions doesn't cause any other problems.

Also, I've rebased the patch onto the latest mainline, since I was having some issues compiling (something with the boot JDK being the wrong version it seems). I've squashed all my previous commits into a commit labeled 'All changes prior to review' FYI (because of the rebase, this commit shows up after your review comments on the timeline on GitHub).

Copy link
Member

@cl4es cl4es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through my comments, especially cleaning up how intrinsic data is handled which now looks much more like a natural fit!

@JornVernee
Copy link
Member Author

I've added a usage example to the javadoc in response to a review comment on the CSR.

@openjdk openjdk bot removed the csr Pull request needs approved CSR before integration label May 26, 2021
@openjdk
Copy link

openjdk bot commented May 26, 2021

@JornVernee This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8263087: Add a MethodHandle combinator that switches over a set of MethodHandles

Reviewed-by: redestad

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 142 new commits pushed to the master branch:

  • 8c4719a: 8265248: Implementation Specific Properties: change prefix, plus add existing properties
  • c59484e: 8267653: Remove Mutex::_safepoint_check_sometimes
  • de91643: 8267611: Print more info when pointer_delta assert fails
  • a4c46e1: 8263202: Update Hebrew/Indonesian/Yiddish ISO 639 language codes to current
  • 9c346a1: 8266963: Remove safepoint poll introduced in 8262443 due to reentrance issue
  • 45e0597: 8264302: Create implementation for Accessibility native peer for Splitpane java role
  • 4343997: 8267708: Remove references to com.sun.tools.javadoc.**
  • f632254: 8267221: jshell feedback is incorrect when creating method with array varargs parameter
  • bf8d4a8: 8267583: jmod fails on symlink to class file
  • 083416d: 8267130: Memory Overflow in Disassembler::load_library
  • ... and 132 more: https://git.openjdk.java.net/jdk/compare/cf97252f3fd4e7bdb57271b92dd2866101d4a94b...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 26, 2021
@JornVernee
Copy link
Member Author

/integrate

@openjdk openjdk bot closed this May 27, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 27, 2021
@openjdk
Copy link

openjdk bot commented May 27, 2021

@JornVernee Since your change was applied there have been 151 commits pushed to the master branch:

  • 85f6165: 8267817: [TEST] Remove unnecessary init in test/micro/org/openjdk/bench/javax/crypto/full/AESGCMBench:setup
  • 7278f56: 8267800: Remove the '_dirty' set in BCEscapeAnalyzer
  • bfa46f0: 8252476: as_Worker_thread() doesn't check what it intends
  • 37bc4e2: 8263635: Add --servername option to jhsdb debugd
  • 6ffa3e6: 8267754: cds/appcds/loaderConstraints/LoaderConstraintsTest.java fails on x86_32 due to customized class loader is not supported
  • 1899f02: 8267805: Add UseVtableBasedCHA to the list of JVM flags known to jtreg
  • 0fc7c8d: 8267751: (test) jtreg.SkippedException has no serial VersionUID
  • a859d87: 8267721: Enable sun/security/pkcs11 tests for Amazon Linux 2 AArch64
  • e630235: 8266851: Implement JEP 403: Strongly Encapsulate JDK Internals
  • 8c4719a: 8265248: Implementation Specific Properties: change prefix, plus add existing properties
  • ... and 141 more: https://git.openjdk.java.net/jdk/compare/cf97252f3fd4e7bdb57271b92dd2866101d4a94b...master

Your commit was automatically rebased without conflicts.

Pushed as commit 3623abb.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@JornVernee JornVernee deleted the SwitchCombinator-SQUASH branch December 5, 2022 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants