Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8264288: Performance issue with MethodHandle.asCollector #3306

Closed
wants to merge 6 commits into from

Conversation

@JornVernee
Copy link
Member

@JornVernee JornVernee commented Apr 1, 2021

This patch speeds up MethodHandle.asCollector handles where the array type is not Object[], as well as speeding up all collectors where the arity is greater than 10.

The old code is creating a collector handle by combining a set of hard coded methods for collecting arguments into an Object[], up to a maximum of ten elements, and then copying this intermediate array into a final array.

In principle, it shouldn't matter how slow (or fast) this handle is, because it should be replaced by existing bytecode intrinsification which does the right thing. But, through investigation it turns out that the intrinsification is only being applied in a very limited amount of cases: Object[] with max 10 elements only, only for the intermediate collector handles. Every other collector shape incurs overhead because it essentially ends up calling the ineffecient fallback handle.

Rather than sticking on a band aid (I tried this, but it turned out to be very tricky to untangle the existing code), the new code replaces the existing implementation with a collector handle implemented using a LambdaForm, which removes the need for intrinsification, and also greatly reduces code-complexity of the implementation. (I plan to expose this collector using a public API in the future as well, so users don't have to go through MHs::identity to make a collector).

The old code was also using a special lambda form transform for collecting arguments into an array. I believe this was done to take advantage of the existing-but-limited bytecode intrinsification, at least for Object[] with less than 10 elements.

The new code just uses the existing collect arguments transform with the newly added collector handle as filter, and this works just as well for the existing case, but as a bonus is also much simpler, since no separate transform is needed. Using the collect arguments transform should also improve sharing.

As a result of these changes a lot of code was unused and has been removed in this patch.

Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), as well as another variant of the benchmark that used a declared static identity method instead of MHs::identity (not included). Before/after comparison of MethodHandleAs* benchmarks (no differences there).

Here are some numbers from the added benchmark:

Before:

Benchmark                                    Mode  Cnt    Score    Error  Units
TypedAsCollector.testIntCollect              avgt   30  189.156 �  1.796  ns/op
TypedAsCollector.testIntCollectHighArity     avgt   30  660.549 � 10.159  ns/op
TypedAsCollector.testObjectCollect           avgt   30    7.092 �  0.042  ns/op
TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  ns/op
TypedAsCollector.testStringCollect           avgt   30   28.511 �  0.243  ns/op
TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  ns/op

(as you can see, just the Object[] with arity less than 10 case is fast here)
After:

Benchmark                                    Mode  Cnt  Score   Error  Units
TypedAsCollector.testIntCollect              avgt   30  6.569 � 0.131  ns/op
TypedAsCollector.testIntCollectHighArity     avgt   30  8.923 � 0.066  ns/op
TypedAsCollector.testObjectCollect           avgt   30  6.813 � 0.035  ns/op
TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
TypedAsCollector.testStringCollect           avgt   30  6.737 � 0.016  ns/op
TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op

Thanks,
Jorn


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8264288: Performance issue with MethodHandle.asCollector

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3306/head:pull/3306
$ git checkout pull/3306

Update a local copy of the PR:
$ git checkout pull/3306
$ git pull https://git.openjdk.java.net/jdk pull/3306/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3306

View PR using the GUI difftool:
$ git pr show -t 3306

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3306.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Apr 1, 2021

👋 Welcome back jvernee! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Apr 1, 2021

@JornVernee The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs label Apr 1, 2021
@JornVernee JornVernee marked this pull request as ready for review Apr 1, 2021
@openjdk openjdk bot added the rfr label Apr 1, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Apr 1, 2021

@rose00
rose00 approved these changes Apr 1, 2021
Copy link
Contributor

@rose00 rose00 left a comment

So many red deletion lines; I'm looking at a beautiful sunset!
It's a sunset for some very old code, some of the first code
I wrote for method handles, long before Lambda Forms
made this sort of task easier.

Thanks very much for cleaning this out. See a few
minor comments on some diff lines.

@@ -648,57 +648,6 @@ LambdaForm collectArgumentsForm(int pos, MethodType collectorType) {
return putInCache(key, form);
}

LambdaForm collectArgumentArrayForm(int pos, MethodHandle arrayCollector) {

This comment has been minimized.

@rose00

rose00 Apr 1, 2021
Contributor

It's counter-intuitive that removing a LFE tactic would be harmless.
Each LFE tactic is a point where LFs can be shared, reducing footprint.
But in this case collectArgumentArrayForm is always paired with
collectArgumentsForm, so the latter takes care of sharing.
The actual code which makes the arrays is also shared, via
Makers.TYPED_COLLECTORS (unchanged).

@openjdk
Copy link

@openjdk openjdk bot commented Apr 1, 2021

@JornVernee This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8264288: Performance issue with MethodHandle.asCollector

Reviewed-by: jrose, vlivanov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 55 new commits pushed to the master branch:

  • 39719da: 8253266: JList and JTable constructors should clear OPAQUE_SET before calling updateUI
  • a8005ef: 8166727: javac crashed: [jimage.dll+0x1942] ImageStrings::find+0x28
  • 7f9ece2: 8264650: Cross-compilation to macos/aarch64
  • 0039c18: 8264475: CopyArea ignores clip state in metal rendering pipeline
  • f084bd2: 8262355: Support for AVX-512 opmask register allocation.
  • 0780666: 8254050: HotSpot Style Guide should permit using the "override" virtual specifier
  • f259eea: 8264393: JDK-8258284 introduced dangling TLH race
  • 9b2232b: 8264123: add ThreadsList.is_valid() support
  • e8eda65: 8264664: use text blocks in javac module tests
  • cec66cf: 8264572: ForkJoinPool.getCommonPoolParallelism() reports always 1
  • ... and 45 more: https://git.openjdk.java.net/jdk/compare/9061271b0b477dae9db11112a236bedf77df4ac5...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Apr 1, 2021
Copy link
Member

@PaulSandoz PaulSandoz left a comment

That's an elegant solution.

At first i thought it might unduly perturb lambda form generation and caching. but you slotted a different lambda form implementation underneath the varargs implementation.

- Use cached version of store func getter
- Use ARRAY_STORE intrinsic for array stores
- Generate direct call to Array.newInstance instead of using an array constructor handle
- Intrinsify call to Array.newInstance if the component type is constant
@JornVernee
Copy link
Member Author

@JornVernee JornVernee commented Apr 2, 2021

I've addressed review comments, plus some other things:

  • I realized that I was calling the uncached version of the store function factory. Fixed that.
  • I also realized that there's already an ARRAY_STORE intrinsic, which I'm now using to avoid generating a call.
  • I also realized that since we only have 1 array creation handle per lambda form, we can instead generate a direct call to Array::newInstance instead of going through the array constructor handle (which also avoids having to use a BoundMethodHandle).
  • Finally, I added an instrinsic, under the old NEW_ARRAY name, that intrinsifies a call to Array::newInstance if the component type argument is constant (which it is in this case).

As a result, the lambda form is now fully intrinsified (no more calls in the generated bytecode) e.g.:

static java.lang.Object collector001_LLLL_L(java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object);
    Code:
       0: iconst_3
       1: anewarray     #12                 // class java/lang/String
       4: astore        4
       6: aload         4
       8: checkcast     #14                 // class "[Ljava/lang/String;"
      11: dup
      12: astore        4
      14: iconst_0
      15: aload_1
      16: checkcast     #12                 // class java/lang/String
      19: aastore
      20: aload         4
      22: iconst_1
      23: aload_2
      24: checkcast     #12                 // class java/lang/String
      27: aastore
      28: aload         4
      30: iconst_2
      31: aload_3
      32: checkcast     #12                 // class java/lang/String
      35: aastore
      36: aload         4
      38: areturn

Thanks,
Jorn

- Add lambda form sharing
- Add test case for collecting a custom class
@JornVernee
Copy link
Member Author

@JornVernee JornVernee commented Apr 5, 2021

Addressed latest review comments:

  • Reverted back to using an injected constructor handle (to be able to take advantage of lambda form sharing). Sorry for the back and forth.
  • Added lambda form sharing for empty and reference arrays
  • Added a test case for a custom class to VarargsArrayTest, which catches the illegal access error in the intrinsified case that Vlad pointed out (though the itrinsic itself is now removed).

I also had to switch back to the un-cached version for creating the array element setter, since the cached version casts the array to be a specific type, and if the lambda form is shared, this won't work (e.g. casting an Object[] to String[], depending on the order of creating the collectors). Adding caching there is left as a followup.

I also did some benchmarks where I introduced profile pollution manually (by using collectors of different reference types a bunch in the static initializer), but this didn't affect the results, so I think we're indeed safe there.

Copy link

@iwanowww iwanowww left a comment

Looks good.

@JornVernee
Copy link
Member Author

@JornVernee JornVernee commented Apr 5, 2021

Thanks for the reviews. I've submitted one more tier1-3 test run and if that's all green I'll go ahead and integrate this.

@JornVernee
Copy link
Member Author

@JornVernee JornVernee commented Apr 5, 2021

/integrate

@openjdk openjdk bot closed this Apr 5, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Apr 5, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Apr 5, 2021

@JornVernee Since your change was applied there have been 59 commits pushed to the master branch:

  • 9201899: 8264729: Random check-in failing header checks.
  • d920f85: 8264540: WhiteBox.metaspaceReserveAlignment should return shared region alignment
  • 104e925: 8264512: jdk/test/jdk/java/util/prefs/ExportNode.java relies on default platform encoding
  • a0ec2cb: 8248862: Implement Enhanced Pseudo-Random Number Generators
  • 39719da: 8253266: JList and JTable constructors should clear OPAQUE_SET before calling updateUI
  • a8005ef: 8166727: javac crashed: [jimage.dll+0x1942] ImageStrings::find+0x28
  • 7f9ece2: 8264650: Cross-compilation to macos/aarch64
  • 0039c18: 8264475: CopyArea ignores clip state in metal rendering pipeline
  • f084bd2: 8262355: Support for AVX-512 opmask register allocation.
  • 0780666: 8254050: HotSpot Style Guide should permit using the "override" virtual specifier
  • ... and 49 more: https://git.openjdk.java.net/jdk/compare/9061271b0b477dae9db11112a236bedf77df4ac5...master

Your commit was automatically rebased without conflicts.

Pushed as commit b7baca7.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants