Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8247937: Specialize downcall binding recipes using MethodHandle combinators #212

Closed

Conversation

@JornVernee
Copy link
Member

JornVernee commented Jun 19, 2020

Hi,

This PR adds specialization of downcall binding recipes using MethodHandles. The main addition to the code is the ProgrammableInvoker::specialize method, but there are also some refactorings which I will explain;

The current code has a single invoke method that takes the high-level input arguments of the down call as an Object[]. This method calls the BindingInterpreter to pre-process arguments, and also move any low-level values produced by the pre-processing directly into the intermediate argument buffer that is later passed to the native code that copies the values into registers and then calls the target function.

This invoke method is split into 2; invokeInterpBindings, which pre-processes the arguments and outputs any low-level values produced by the pre-processing into an Object[].

This Object[] is then passed to invokeMoves, which moves the low-level values into the intermediate argument buffer that is passed to native.

This split is done so that we can replace the invokeInterpBindings call with a specialized MethodHandle chain based on the binding recipe instead, which is built on top a type handle of invokeMoves.

This takes care of the binding recipe specialization. invokeMoves can later be replaced during C2 compilation with code that passes these low-level values into registers directly (but that is for another patch).

BindingInterpreter now doesn't write values to the intermediate buffer directly, and so instead of passing functors to obtain a pointer to write a low-level value to, it now takes functors that handle the reading or writing (see StoreFunc and LoadFunc).

Since the read/write methods in BindingInterpreter are now called from multiple places, I've moved them to SharedUtils.


The process of specializing the binding recipe is as follows:

We first calculate a low-level method type (the 'intrinsicType') based on the MOVE bindings of a particular recipe, this gives us a method type that takes only primitives as arguments, which represent the values to be copied into the various CPU registers before calling the native target function. We then get a low-level MethodHandle that calls invokeMoves, and adapt it to the intrinsic type. On top of that we build the specialized binding recipe method handle chain.

For each argument, we iterate through the bindings in reverse, and for each binding insert a new filter MH onto the handle we already have. At the end we will end up with a handle that features the high-level arguments of our native function (MemorySegment, MemoryAddress, etc.).

The same is done for the return bindings; the return value is repeatedly filtered according to the bindings.

This currently doesn't work for multiple return values, in which case invokeMoves will return an Object[] instead of just a plain Object. This case is not specialized with method handles, but is instead handled solely by invokeInterpBindings. (we detect this in getBoundMethodHandle)

In case the bindings need to do allocation, a NativeAllocationScope is created and passed in as the first argument, which is then forwarded to the various filter MH using mergeArguments (which is a wrapper for MethodHandles::permuteArguments that merges 2 arguments given their indices). The allocation and cleanup of the NativeAllocationScope are handled with a tryFinally combinator.


I've added a system property to disable the MethodHandle specialization so that the performance difference can be measured. The speed up I've seen on the CallOverhead benchmark between this and the current code is about 1.5x (but with the C2 intrinsics this gets us on par with JNI). The difference between specialization turned off and on is a little bigger than that, since the split of the 'invoke' method adds a bit of overhead as well, but it's still on overall gain compared to what we currently have.

Finally, I've also fixed a minor problems with TestUpcall where the expected and actual values were reversed when calling asserts.


If I've missed anything in the explanation, please feel free to ask!

Thanks,
Jorn


Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed

Issue

  • JDK-8247937: Specialize downcall binding recipes using MethodHandle combinators

Reviewers

Download

$ git fetch https://git.openjdk.java.net/panama-foreign pull/212/head:pull/212
$ git checkout pull/212

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 19, 2020

👋 Welcome back jvernee! A progress list of the required criteria for merging this PR into foreign-abi will be added to the body of your pull request.

@openjdk openjdk bot added the rfr label Jun 19, 2020
@mlbridge
Copy link

mlbridge bot commented Jun 19, 2020

Webrevs

Copy link
Collaborator

mcimadamore left a comment

Overall looks good, I've left a couple of minor editorial comments.

The architectural approach, while sound, to seems a bit risky. We're essentially adding yet another way to interpret bindings, and one that is less scrutable (since interpreting happens implicitly, via a MH chain) which will add cost in terms of maintenance going forwards. The speedup, seems good, but at the same time I can't help thinking that the basic binding interpreter has not been optimized much, so we don't really know how much of that performance gap is really due to the fact that interpreting bindings using a chain of MH is faster (I can see advantages in not allocating the binding array, but other than that things look less obvious).

As a code organization strategy, I also wonder if it would pay off to bring more API to bindings, e.g. so that each binding could support a box/unbox/specialize triad of methods - rather than having switches scattered in several places?

@openjdk
Copy link

openjdk bot commented Jun 22, 2020

@JornVernee This change now passes all automated pre-integration checks, type /integrate in a new comment to proceed. After integration, the commit message will be:

8247937: Specialize downcall binding recipes using MethodHandle combinators

Reviewed-by: mcimadamore
  • If you would like to add a summary, use the /summary command.
  • To credit additional contributors, use the /contributor command.
  • To add additional solved issues, use the /issue command.

There are currently no new commits on the foreign-abi branch since the last update of the source branch of this PR. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you would like to avoid potential automatic rebasing, specify the current head hash when integrating, like this: /integrate 54e3c3e088dcf3297e8ecb5e01ea6147e8fc542d.

➡️ To integrate this PR with the above commit message to the foreign-abi branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Jun 22, 2020
…hes with virtual calls to method on Binding.

- Expand CallOverhead benchmark
@JornVernee JornVernee force-pushed the JornVernee:Call_Intrinsics_Final branch from 816b8ec to 1f5a6b2 Jun 23, 2020
@openjdk openjdk bot removed ready rfr labels Jun 23, 2020
@JornVernee
Copy link
Member Author

JornVernee commented Jun 23, 2020

I've applied your suggestion of putting the code for interpreting, verifying and specializing the Bindings in methods on Binding itself, instead of scattered throughout the switch statements. This removes a bit of code as well, since there is no more need for casting, and fields are directly accessible. There was one place where this didn't completely work out; in the specialization we need to update the argument index (insertPos) when we encounter a MOVE binding, so the code currently has to explicitly check for MOVE bindings.

I've also added a few more CallOverhead benchmarks, with a more diverse argument set. The results show that especially once you start adding more arguments, or the arguments become more complex (see the MemoryAddress case), that the MH specialization starts to make more of a difference.

Benchmark                                            Mode  Cnt     Score    Error  Units
CallOverhead.jni_blank                               avgt   30     8.415 □  0.076  ns/op
CallOverhead.jni_identity                            avgt   30    12.370 □  0.254  ns/op
CallOverhead.panama_args10                           avgt   30   583.976 □  5.895  ns/op
CallOverhead.panama_args10_NO_SPEC                   avgt   30  1319.897 □ 22.884  ns/op
CallOverhead.panama_args5                            avgt   30   502.865 □  8.452  ns/op
CallOverhead.panama_args5_NO_SPEC                    avgt   30   919.254 □ 11.101  ns/op
CallOverhead.panama_blank                            avgt   30   199.373 □  2.102  ns/op
CallOverhead.panama_blank_NO_SPEC                    avgt   30   317.260 □  4.067  ns/op
CallOverhead.panama_identity                         avgt   30   254.405 □  7.173  ns/op
CallOverhead.panama_identity_NO_SPEC                 avgt   30   494.224 □  2.359  ns/op
CallOverhead.panama_identity_memory_address          avgt   30   265.507 □  4.160  ns/op
CallOverhead.panama_identity_memory_address_NO_SPEC  avgt   30   592.381 □ 14.314  ns/op
CallOverhead.panama_identity_struct                  avgt   30   601.288 □ 16.147  ns/op
CallOverhead.panama_identity_struct_NO_SPEC          avgt   30   951.495 □ 15.450  ns/op
@openjdk openjdk bot added ready rfr labels Jun 23, 2020
Copy link
Collaborator

mcimadamore left a comment

Code looks really good - I like how moving stuff to bindings turned out.

Binding interpreter is now very thin, we might consider later if we wanna remove it and add maybe some static helpers to Binding. But that's something that can wait.

@JornVernee
Copy link
Member Author

JornVernee commented Jun 23, 2020

/integrate

@openjdk openjdk bot closed this Jun 23, 2020
@openjdk openjdk bot added integrated and removed ready labels Jun 23, 2020
@openjdk
Copy link

openjdk bot commented Jun 23, 2020

@JornVernee
Pushed as commit 5db208a.

@openjdk openjdk bot removed the rfr label Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.