Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8262118: Specialize upcalls #457

Closed

Conversation

@JornVernee
Copy link
Member

@JornVernee JornVernee commented Feb 22, 2021

Hi,

This patch adds specialization of upcalls, which includes both the specialization of the binding recipe using MethodHandle combinators, as for downcalls, as well as VM support for generating a customized wrapper for a 'low-level' method handle taking and returning only primitive types, which can be used as a base handle in the binding specialization.

The latter is based on Vladimir Ivanov's earlier linkToNative work, with a few changes to the frame layout, exception handling, and added support for attaching non-java threads, as well as re-writing it to support an arbitrary caller ABI.

The current implementation supports x86 only for now - though I've made sure all the tests still pass on AArch64. Stack argument passing is disabled, as for downcalls - there is a mismatch between the encoding for stack slots we use on the Java side, and the encoding we use in the VM, which needs some more consideration. And finally, as with downcalls, multi-register returns are disabled as well. We need to figure out what the right protocol for those should be, maybe taking inspiration from Valhalla.

In the code I've moved some things from ProgrammableInvoker to SharedUtils in order to reuse for upcalls. I also added a bit of debugging code to BufferLayout for dumping stack arguments. Note that there is quite a bit of motion in ProgrammableUpcallHandler as well. This is mostly to untangle the part of an upcall that is specialized using MethodHandle combinators (invokeInterpBindings), from the part that is replaced by the VM's wrapper stub (invokeMoves). I've for instance introduced a simple helper class, called InterpretedHandler, which acts as a wrapper around the target MethodHandle for doing the old-style interpreted calls, instead of using the whole ProgrammableUpcallHandler instance.

I've also removed some of the test configurations. We now only test the default settings of various specialization and intrinsification flags when running TestDowncall, TestUpcall, and TestUpcallHighArity. This patch 2 more of these flags, making a total of 16 combinations, which would take way too long to test, and is probably not necessary. I gave the full testing matrix a run, and found no issues. We can repeat that process periodically to check that all the combinations still work. I've also added some more benchmark cases for the upcalls benchmark.

I also gave the Windows jextract samples a try with this patch and found no issues.

Here are some of the benchmark numbers (from Linux-x64):

Benchmark                Mode  Cnt    Score   Error  Units
Upcalls.jni_args10       avgt   30  134.220 ? 1.553  ns/op
Upcalls.jni_args5        avgt   30   87.340 ? 1.753  ns/op
Upcalls.jni_blank        avgt   30   57.590 ? 0.877  ns/op
Upcalls.jni_identity     avgt   30  109.859 ? 0.930  ns/op
Upcalls.panama_args10    avgt   30   30.606 ? 0.088  ns/op
Upcalls.panama_args5     avgt   30   23.908 ? 0.121  ns/op
Upcalls.panama_blank     avgt   30   20.314 ? 0.137  ns/op
Upcalls.panama_identity  avgt   30   24.894 ? 0.111  ns/op

While these are the numbers with the optimizations disabled (quite a bit worse):

Upcalls.panama_args10    avgt   30  886.885 ? 22.133  ns/op
Upcalls.panama_args5     avgt   30  566.600 ? 12.442  ns/op
Upcalls.panama_blank     avgt   30  212.962 ?  2.386  ns/op
Upcalls.panama_identity  avgt   30  361.358 ?  3.212  ns/op

We beat JNI across the board, and also seem to scale a lot better for larger numbers of arguments. So, this seems like a success :)

Thanks,
Jorn


Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed

Issue

Reviewers

Download

$ git fetch https://git.openjdk.java.net/panama-foreign pull/457/head:pull/457
$ git checkout pull/457

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Feb 22, 2021

👋 Welcome back jvernee! A progress list of the required criteria for merging this PR into foreign-memaccess+abi will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Loading

@openjdk openjdk bot added the rfr label Feb 22, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Feb 22, 2021

Loading

Copy link
Collaborator

@mcimadamore mcimadamore left a comment

Not reviewed in full - added some comments on tests

Loading

test/jdk/java/foreign/TestDowncall.java Show resolved Hide resolved
Loading
test/jdk/java/foreign/TestUpcall.java Show resolved Hide resolved
Loading
Copy link
Collaborator

@mcimadamore mcimadamore left a comment

Looks a very solid piece of work (I've mostly checked the Java side). I like how the code has been consolidated between upcall and downcall paths. I've added some minor comments around for your consideration.

Loading

@mcimadamore
Copy link
Collaborator

@mcimadamore mcimadamore commented Feb 23, 2021

When trying this on my machine (Ubuntu 20.04) I get this:

In file included from ../../src/hotspot/share/prims/foreign_globals.cpp:25:
../../src/hotspot/share/prims/foreign_globals.hpp:40:46: error: 'VMRegPair' has not been declared
   40 |   void calling_convention(BasicType* sig_bt, VMRegPair *parm_regs, uint argcnt) const;
      |                                              ^~~~~~~~~
../../src/hotspot/share/prims/foreign_globals.cpp:102:6: error: no declaration matches 'void CallRegs::calling_convention(BasicType*, VMRegPair*, uint) const'
  102 | void CallRegs::calling_convention(BasicType* sig_bt, VMRegPair *parm_regs, uint argcnt) const {
      |      ^~~~~~~~
In file included from ../../src/hotspot/share/prims/foreign_globals.cpp:25:
../../src/hotspot/share/prims/foreign_globals.hpp:40:8: note: candidate is: 'void CallRegs::calling_convention(BasicType*, int*, uint) const'
   40 |   void calling_convention(BasicType* sig_bt, VMRegPair *parm_regs, uint argcnt) const;
      |        ^~~~~~~~~~~~~~~~~~
../../src/hotspot/share/prims/foreign_globals.hpp:33:8: note: 'struct CallRegs' defined here
   33 | struct CallRegs {
      |        ^~~~~~~~

I compile with disabled precompiled headers, which typically causes issues like this - is there some include missing somewhere?

It seems like I need to add this:

#include "code/vmreg.hpp"

In foreign_globals.hpp to get things working.

Loading

@mcimadamore
Copy link
Collaborator

@mcimadamore mcimadamore commented Feb 23, 2021

Other includes seem to be missing - here's the patch I needed to get it working:

+++ b/src/hotspot/cpu/x86/universalUpcallHandler_x86_64.cpp
@@ -31,6 +31,8 @@
 #include "memory/resourceArea.hpp"
 #include "prims/universalUpcallHandler.hpp"
 #include "runtime/sharedRuntime.hpp"
+#include "runtime/signature.hpp"
+#include "utilities/formatBuffer.hpp"
 #include "utilities/globalDefinitions.hpp"
 
 #define __ _masm->
diff --git a/src/hotspot/share/prims/foreign_globals.hpp b/src/hotspot/share/prims/foreign_globals.hpp
index 4f02ee625cb..44c65771e7c 100644
--- a/src/hotspot/share/prims/foreign_globals.hpp
+++ b/src/hotspot/share/prims/foreign_globals.hpp
@@ -27,6 +27,7 @@
 #include "oops/oopsHierarchy.hpp"
 #include "utilities/growableArray.hpp"
 #include "utilities/macros.hpp"
+#include "code/vmreg.hpp"
 
 #include CPU_HEADER(foreign_globals)

Loading

@mcimadamore
Copy link
Collaborator

@mcimadamore mcimadamore commented Feb 23, 2021

Executed benchmark on my machine:

Benchmark                Mode  Cnt    Score   Error  Units
Upcalls.jni_args10       avgt   30  102.245 ? 1.952  ns/op
Upcalls.jni_args5        avgt   30   59.715 ? 1.488  ns/op
Upcalls.jni_blank        avgt   30   43.189 ? 0.642  ns/op
Upcalls.jni_identity     avgt   30   90.224 ? 1.775  ns/op
Upcalls.panama_args10    avgt   30   24.633 ? 0.260  ns/op
Upcalls.panama_args5     avgt   30   18.708 ? 0.314  ns/op
Upcalls.panama_blank     avgt   30   15.925 ? 0.262  ns/op
Upcalls.panama_identity  avgt   30   19.769 ? 0.266  ns/op

Seems to be from 2x to 4x faster - awesome!

Loading

@JornVernee
Copy link
Member Author

@JornVernee JornVernee commented Feb 23, 2021

Thanks for testing. I forgot that precompiled headers can be an issue, and was testing on multiple Linux machines, one of which doesn't test with PCH disabled. Will give it another try here as well.

Loading

@JornVernee
Copy link
Member Author

@JornVernee JornVernee commented Feb 23, 2021

I've address review comments.

I've updated the AArch64 CallArranger as well as the test to use float as a carrier type when also using vector registers, to avoid the same mismatch.

Loading

Copy link
Collaborator

@mcimadamore mcimadamore left a comment

Looks good!

Loading

@openjdk
Copy link

@openjdk openjdk bot commented Feb 23, 2021

@JornVernee This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8262118: Specialize upcalls

Reviewed-by: mcimadamore, vlivanov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 385 new commits pushed to the foreign-memaccess+abi branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the foreign-memaccess+abi branch, type /integrate in a new comment.

Loading

@openjdk openjdk bot added the ready label Feb 23, 2021
Copy link
Collaborator

@iwanowww iwanowww left a comment

Looks good!

Loading

@JornVernee
Copy link
Member Author

@JornVernee JornVernee commented Mar 1, 2021

/integrate

Loading

@openjdk openjdk bot closed this Mar 1, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Mar 1, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Mar 1, 2021

@JornVernee Since your change was applied there have been 385 commits pushed to the foreign-memaccess+abi branch:

Your commit was automatically rebased without conflicts.

Pushed as commit 96c29d5.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Loading

@JornVernee JornVernee deleted the Upcall_Intrin_Rebase branch Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants