Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8252505: C1/C2 compiler support for blackholes #1203

Closed
wants to merge 41 commits into from

Conversation

shipilev
Copy link
Member

@shipilev shipilev commented Nov 13, 2020

JMH uses the Blackhole::consume methods to avoid dead-code elimination of the code that produces benchmark values. It now relies on producing opaque side-effects and breaking inlining. While it was proved useful for many years, it unfortunately comes with several major drawbacks:

  1. Call costs dominate nanobenchmarks. On TR 3970X, the call cost is several nanoseconds.
  2. The work spent in Blackhole.consume dominates nanobenchmarks too. It takes about a nanosecond on TR 3970X.
  3. Argument preparation for call makes different argument types behave differently. This is prominent on architectures where calling conventions for passing e.g. floating-point arguments require elaborate dance.

Supporting this directly in compilers would improve nanobenchmark fidelity.

Instead of introducing public APIs or special-casing JMH methods in JVM, we can hook a new command to compiler control, and let JMH sign up its Blackhole methods for it with -XX:CompileCommand=blackhole,org.openjdk.jmh.infra.Blackhole::consume. This is being prototyped as CODETOOLS-7902762. It makes Blackholes behave substantially better.

C1 code is platform-independent, and it handles blackhole via the intrinsics paths, lowering it to nothing.

C2 code is more complicated. There were four attempts to implement this, and what you see in the PR is the final attempt.

First attempt was to introduce fake store like StoreV ("store void"), and then lower them to nothing. It runs into a series of funky problems: you would like to have at least two shapes of the store to match the store type width not to confuse the optimizer, or even have the whole mirror of Store* hierarchy. Additionally, optimizer tweaks were needed. The awkward problem of GC barrier verification showed up: if StoreV* is a subclass of Store*, then verificators rightfully expect GC barriers before them. If we emit GC, then we need to handle walking over StoreV* nodes in optimizers.

Second attempt was to introduce the special Blackhole node that consumes the values -- basically like C1 implementation does it. Alas, the many attempts to make sure the new node is not DCE'd failed. Roland looked at it, and suggested that there seems to be no way to model the effects we are after: consume the value, but have no observable side effects. So, suggested we instead put the boolean flag onto CallJavaNode, and then match it to nothing in .ad.

...which is the essence of the third attempt. Drag the blackhole through C2 as if it has call-like side effects, and then emit nothing. Instead of boolean flag, the subsequent iteration introduced a full new CallBlackhole node, that is a call as far as optimizers are concerned, and then it is matched to nothing in .ad.

The fourth, and hopefully final attempt is in this PR. It makes the Blackhole the subclass of MemBar, and use the same Matcher path as Op_MemCPUOrder: it does not match to anything, but it survives until matching, and keeps arguments alive. Additionally, C1 and C2 hooks are now using the synthetic _blackhole intrinsic, similarly to existing _compiledLambdaForm. It avoids introducing new nodes in C1. It also seems to require the least fiddling with C2 internals.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/1203/head:pull/1203
$ git checkout pull/1203

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 13, 2020

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 13, 2020

@shipilev The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Nov 13, 2020
@shipilev shipilev marked this pull request as ready for review November 20, 2020 12:26
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 20, 2020
@mlbridge
Copy link

mlbridge bot commented Nov 20, 2020

@cl4es
Copy link
Member

cl4es commented Nov 20, 2020

Looks like a reasonable enhancement.

Should the BlackholeCommand be a new option instead of a new top-level command? #1276 is about to improve the structure and usability of CompileCommand=option a lot so I suspect it'll be about as straightforward implementation-wise and not that much worse to use (-XX:CompileCommand=Blackhole,<pattern>,true).

When possible I think predicates such as supports_blackhole should be modelled as static const bool fields. Most compilers can't inline methods defined in Matcher, even from code in matcher.cpp. See Matcher::misaligned_doubles_ok et.c.

@shipilev
Copy link
Member Author

Yeah, I can remodel should_blackhole as static const bool and look at #1276 (which I believe integrates soon), and finally pick up the build fixes after foreign-linker integration.

But more importantly, I'd like to see if doing this in .ad is even acceptable. That's my major question here.

@adinn
Copy link
Contributor

adinn commented Nov 25, 2020

But more importantly, I'd like to see if doing this in .ad is even acceptable. That's my major question here.

The problem being that this introduces extra call handling logic into the ad file which clouds the implementation of the arch-specific handling of a CallJavaNode

Modulo the supports_blackholes switch that logic is essentially the same on all platforms, isn't it? Why can you not just fold the relevant case handling into the generated matcher code? Wouldn't that just require a special case switch to decide whether to use the AD file rule to reduce a CallJavaNode vs reducing it to nothing?

@shipilev
Copy link
Member Author

Modulo the supports_blackholes switch that logic is essentially the same on all platforms, isn't it? Why can you not just fold the relevant case handling into the generated matcher code? Wouldn't that just require a special case switch to decide whether to use the AD file rule to reduce a CallJavaNode vs reducing it to nothing?

Is there a guide rail how to do that? Because I cannot see how we can "reduce to nothing" during matching. We still have to match CallJavaNode to something, if not to MachCallJavaNode. For that, we need .ad match rules, AFAICS.

I am exploring if we I can instead do CallBlackholeNode as the subclass of CallJavaNode, and match it directly. It would also allow ask Matcher if underlying .ad supports blackholes by doing is_match_rule_supported. It would still add match rules for blackholes to .ad-s, but at least it would not need to inject into current match rules.

@iwanowww
Copy link

Even if there are no instructions issued, some of the unfortunate effects of a call may be still there (e.g., spills around the place where the node is scheduled, memory state is effectively killed). Fixing that would involve overriding calling conventions, in_RegMask()/out_RegMask(), customize memory effects.

Can you elaborate on your experiment with introducing custom node you mentioned?
Have you tried introducing new control node and just wire data nodes to it?

@adinn
Copy link
Contributor

adinn commented Nov 25, 2020

Is there a guide rail how to do that? Because I cannot see how we can "reduce to nothing" during matching.

Sorry, I should have said 'generated code' not 'generated matcher code'. I wasn't actually thinking of doing anything different during matching per se.

What I thought was you could do something different at emit time. One way would be to redefine the logic of the call node's (generated) emit method. If that code tested the method it was being asked to call and found it marked as a black hole then it could skip executing the emit statements culled from matching rules. That would involve changing the code in adlc/output_c/h.cpp. It might perhaps also require tweaking the code in adlc/formssel.cpp/hpp.

Anyway, given what @iwanowww has said this may be the wrong way to go about it.

@mlbridge
Copy link

mlbridge bot commented Nov 25, 2020

Mailing list message from Andrew Haley on hotspot-compiler-dev:

On 25/11/2020 12:45, Vladimir Ivanov wrote:

Even if there are no instructions issued, some of the unfortunate effects of a call may be still there (e.g., spills around the place where the node is scheduled, memory state is effectively killed). Fixing that would involve overriding calling conventions, in_RegMask()/out_RegMask(), customize memory effects.

Is that even a downside? It does at least allow everything in flight to
become visible. It's certainly an improvement over what we have at the
moment.

But Aleksey, there is an alternative: a store that doesn't do anything.
Did you consider that instead? I guess the problem is that there'd be a lot
more nodes.

--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

@mlbridge
Copy link

mlbridge bot commented Nov 25, 2020

Mailing list message from Vladimir Ivanov on hotspot-compiler-dev:

Even if there are no instructions issued, some of the unfortunate effects of a call may be still there (e.g., spills around the place where the node is scheduled, memory state is effectively killed). Fixing that would involve overriding calling conventions, in_RegMask()/out_RegMask(), customize memory effects.

Is that even a downside? It does at least allow everything in flight to
become visible. It's certainly an improvement over what we have at the
moment.

I'm under the impression that the main driver for this feature is
performance, so I consider anything which affects performance a downside.

But Aleksey, there is an alternative: a store that doesn't do anything.
Did you consider that instead? I guess the problem is that there'd be a lot
more nodes.

Yes, wiring the node into memory graph should work as well.

I don't see why a single node (covering all basic types) can't do the job.

Best regards,
Vladimir Ivanov

@openjdk
Copy link

openjdk bot commented Nov 25, 2020

@shipilev this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8252505-blackholes
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch hotspot hotspot-dev@openjdk.org and removed rfr Pull request is ready for review hotspot-compiler hotspot-compiler-dev@openjdk.org labels Nov 25, 2020
@shipilev
Copy link
Member Author

Can you elaborate on your experiment with introducing custom node you mentioned?
Have you tried introducing new control node and just wire data nodes to it?

See the updated PR description. Yes, I tried to introduce a new node and just wire the data nodes in it, but then I failed (miserably) to make sure the node is not considered dead by subsequent optimizations. Roland looked at it too, and did not think we can manage it. So we decided instead to piggyback on calls. New version hopefully makes it much cleaner: it is now CallBlackholeJava node. We can try and unhook it from CallJava hierarchy, and try to manage its effects more explicitly, but my prior experience tells me it is not as simple as it looks at the beginning.

@shipilev
Copy link
Member Author

Even if there are no instructions issued, some of the unfortunate effects of a call may be still there (e.g., spills around the place where the node is scheduled, memory state is effectively killed). Fixing that would involve overriding calling conventions, in_RegMask()/out_RegMask(), customize memory effects.

Right, that is what Ideal and RegMask handling in new (Mach)CallBlackholeJava node does. On the upside, it IMO makes Blackhole semantics close to what I want in JMH: it is like a call, but without the actual call. So obvious code generation quirks handled already, I think other effects are good to have.

@shipilev
Copy link
Member Author

Mailing list message from Andrew Haley on hotspot-compiler-dev:
But Aleksey, there is an alternative: a store that doesn't do anything.
Did you consider that instead? I guess the problem is that there'd be a lot
more nodes.

I did try that. It was the very first attempt at doing it in C2, but it is harder than it looks. I updated the PR description with some history of attempts.

@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 25, 2020
@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 4, 2020
@shipilev
Copy link
Member Author

shipilev commented Dec 4, 2020

I think testing is all-green in GH Actions. I also passed tier{1,2} locally. Also passes JMH validation tests. I think we are ready to integrate this. @theRealAph, @adinn, @cl4es, want to ack the current patch?

@theRealAph
Copy link
Contributor

I think testing is all-green in GH Actions. I also passed tier{1,2} locally. Also passes JMH validation tests. I think we are ready to integrate this. @theRealAph, @adinn, @cl4es, want to ack the current patch?

I'm testing on AArch64 now.

Copy link
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AArch64 looks good. This is a substantial improvement. There's still a pesky volatile load of isDone inside the inner loop, but that's for another day.

@shipilev
Copy link
Member Author

shipilev commented Dec 6, 2020

Thank you all, that was a wild ride.

/integrate

@openjdk openjdk bot closed this Dec 6, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 6, 2020
@openjdk
Copy link

openjdk bot commented Dec 6, 2020

@shipilev Since your change was applied there have been 28 commits pushed to the master branch:

  • 972bc3b: 8256167: Convert JDK use of Reference::get to Reference::refersTo
  • 78be334: 8242332: Add SHA3 support to SunPKCS11 provider
  • c4339c3: 8243614: Typo in ReentrantLock's Javadoc
  • d3ac1bf: 8198390: Test MultiResolutionDrawImageWithTransformTest.java fails when -esa is passed
  • 51d325e: 8257633: Missing -mmacosx-version-min=X flag when linking libjvm
  • e27ea4d: 8257750: writeBuffer field of java.io.DataOutputStream should be final
  • dd0b945: 8257531: Super word not applied to a loop of simple Buffer operations
  • d76039d: 8257725: No throws of SSLHandshakeException
  • fcc8479: 8257724: Incorrect package of the linked class in BaseSSLSocketImpl
  • d8ac76f: 8257651: LambdaEagerInit.java test failed in 2 different ways
  • ... and 18 more: https://git.openjdk.java.net/jdk/compare/d08c612b4059f0ffa0dc97b5df45a5d7e4b2de8a...master

Your commit was automatically rebased without conflicts.

Pushed as commit e590618.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@cl4es
Copy link
Member

cl4es commented Dec 6, 2020

Looks much better! No platform dependent changes and seems relatively straightforward.

Since it's only used by BlackholeNode then maybe RegMask::All should have been a field in BlackholeNode rather than a global?

(Caught a cold last week and lost track of this - apologies for going silent and being late to the party here.)

@thomaswue
Copy link
Contributor

thomaswue commented Dec 6, 2020

Is this the first compile command that can be used to modify the functional behaviour of the Java user application?

@shipilev
Copy link
Member Author

shipilev commented Dec 7, 2020

Is this the first compile command that can be used to modify the functional behaviour of the Java user application?

I believe you can say that if we ignore BreakAt* commands.

@thomaswue
Copy link
Contributor

That could set a dangerous precedent as it is otherwise a strongly assumed global invariant that the executing Java code has identical semantics independent of whether it is compiled or interpreted. It is also the first intrinsic that is dynamically configurable from the command line.

I do agree introducing a whitebox API recognized by the compiler for microbenchmarking is the right approach; and much better than the previous one trying to outsmart the compiler with some complex Java code.

Why did you dismiss the option to add a blackhole method to the JDK?

@shipilev
Copy link
Member Author

shipilev commented Dec 7, 2020

That could set a dangerous precedent as it is otherwise a strongly assumed global invariant that the executing Java code has identical semantics independent of whether it is compiled or interpreted. It is also the first intrinsic that is dynamically configurable from the command line.

As I said above, BreakAt already changes the semantics. Compiler commands is a low-level API to instruct compiler to do something special. Here, we want compiler to do something special. blackhole being treated as intrinsic in VM code is the implementation detail. The previous incarnations of the patch were not doing it, and it was only done to simplify the internal compiler hooks.

Why did you dismiss the option to add a blackhole method to the JDK?

Because in my mind, public Java API does not carry its weight here, for a single-purpose use in JMH.

@thomaswue
Copy link
Contributor

It is not an "implementation detail" if the change in semantics is visible to the user application. We are forced to implement this new feature in the GraalVM compiler or there will be unexpected observable differences in behaviour when executing the same Java application with the same command line flags.

It is still a public API if it is a flag on the Java command line.

What do you mean by "does not carry its weight"? Is the process to introduce a Java API too heavy-weight and therefore this solution was chosen?

@shipilev
Copy link
Member Author

shipilev commented Dec 7, 2020

It is not an "implementation detail" if the change in semantics is visible to the user application. We are forced to implement this new feature in the GraalVM compiler or there will be unexpected observable differences in behaviour when executing the same Java application with the same command line flags.

The fact that blackhole is implemented as C1/C2 intrinsic is the implementation detail. It does not have to be, and it would not affect the externally usable CompileCommand. If Graal (or any other JIT compiler, for that matter) wants to hook into OpenJDK JIT compiler infrastructure, then I think it is reasonable to expect it reacts on compile commands in the similar manner.

It is still a public API if it is a flag on the Java command line.

I think that redefines what public API is. JVM flags are not Java public API.

What do you mean by "does not carry its weight"? Is the process to introduce a Java API too heavy-weight and therefore this solution was chosen?

Public Java API comes with the associated maintainability, reliability, security costs. Once you put the method into public Java API, there is (almost) no way back. If you have a bug in public Java API method, you have a problem. If that bug is a security one (e.g. crashing the JVM by 3rd party code), you have a huge problem. For the similar reasons, we have -XX:CompileCommand=inline,*, but @ForceInline is not a public API -- we can instruct compiler to do special thing, but that requires the explicit opt-in by those who control the JVM run.

@thomaswue
Copy link
Contributor

There are similar expectations for backwards compatibility and ongoing maintenance of command line flags (except the ones marked explicitly as "experimental"). There is certainly also an expectation that flags are not creating new security vulnerabilities.

Whether a compiler inlines a method or not is not observable other than performance differences. This new command however can change the semantics of the Java program in unexpected ways by effectively removing the content of a method. The "inline" compile command and this new "blackhole" compile command are therefore not comparable.

I think introducing a blackhole method in the JDK would be the right thing to do instead of short-cutting the process. Maintaining such a method would also be trivial as it is only a few lines of code. Even an empty method would be OK. The method becomes part of the API; the intrinsification is part of the "implementation detail".

@shipilev
Copy link
Member Author

shipilev commented Dec 7, 2020

There are similar expectations for backwards compatibility and ongoing maintenance of command line flags (except the ones marked explicitly as "experimental"). There is certainly also an expectation that flags are not creating new security vulnerabilities.

Whether a compiler inlines a method or not is not observable other than performance differences. This new command however can change the semantics of the Java program in unexpected ways by effectively removing the content of a method. The "inline" compile command and this new "blackhole" compile command are therefore not comparable.

BreakAt compile command also changes the semantics of the Java program.

Speaking of Graal, I see that it already intrinsifies JMH Blackhole methods directly. I cannot see how escapes your argument: compiler changes the semantics by effectively removing the content of a method. And it does so without user explicitly asking for it?

I think introducing a blackhole method in the JDK would be the right thing to do instead of short-cutting the process. Maintaining such a method would also be trivial as it is only a few lines of code. Even an empty method would be OK. The method becomes part of the API; the intrinsification is part of the "implementation detail".

I think you severely underestimate the costs of going that way. LOC count has little to do with the costs here. I have been on that road before, and that is why I believe it does not worth it ("does not carry its weight"), when there is a way to get what JMH wants with the VM-specific compile commands. You are welcome to try proposing the public APIs for this.

I can retract the -XX:CompileCommand addition and intrinsify JMH methods directly, like Graal does. But I think that would be less nice, as it makes JDK aware about non-JDK code, and forces JMH to stick with the current Blackhole Java signatures.

@iwanowww
Copy link

iwanowww commented Dec 7, 2020

As an afterthought, I would file a CSR request for this change (though CompileCommand has diagnostic nature, it is a product flag after all). If it turns out it's not an appropriate addition, a separate diagnostic/experimental flag can be added instead. (Though that would be unfortunate.)

If possible change in behavior is a real concern, the implementation can be changed to affect only effect-free methods (e.g., applied only to empty methods).

The upside of exposing the functionality as a command-line option (diagnostic or experimental, but even product) is it has much lower level of commitment associated with it. If there's enough motivation/justification to graduate it into public Java API, it can be done later when there's enough confidence that it's the right fit there.

@iwanowww
Copy link

iwanowww commented Dec 7, 2020

If it turns out it's not an appropriate addition, a separate diagnostic/experimental flag can be added instead.

Or, as an alternative, -XX:CompileCommand=blackhole could require -XX:+UnlockDiagnosticVMOptions/-XX:+UnlockExperimentalVMOptions to have any effect (as if it was a non-product flag).

@thomaswue
Copy link
Contributor

If possible change in behavior is a real concern, the implementation can be changed to affect only effect-free methods (e.g., applied only to empty methods).

Yes, applying it only to effect-free methods would be perfect. For such methods it is indeed comparable with the other compiler commands.

@simonis
Copy link
Member

simonis commented Dec 7, 2020

This change would have definitely required a CSR as described in Hotspot Command-line Flags: Kinds, Lifecycle and the CSR Process.

Besides that, I don't understand @thomaswue argumentation that he's "forced to implement this new feature in the GraalVM compiler or there will be unexpected observable differences in behaviour when executing the same Java application with the same command line flags". -XX: options are definitely an implementation detail of OpenJDK and as such not related to the Java SE specification in any way. The fact that the -XX: options are sub-devided into product, diagnostic and experimental options is again specific to the OpenJDK project and only denotes the level of support these options get within the project and not their relation to the Java SE specification.

Misusing the new new option will already have unpredictable effects in OpenJDK itself and make it behave "Java SE" incompatible. I don't think there's a requirement that all configurations (i.e. all combinations of command line options) of a "Java SE" implementation have to be "Java SE" compatible and this is especially true for extended options like -XX:. And by the way, in addition to the BreakAt* command mentioned before, we already have other such options like for example -XX:AbortVMOnException= and -XX:AbortVMOnExceptionMessage= which make the OpenJDK behave non-Java SE compatible (not sure if you've implemented these in GraalVM in order to fully emulate OpenJDK :)

As far as I can see, a CSR (JDK-8257827) for this issue has already been created, which is good. Details like declaring this new option product, experimental or diagnostic should be discussed in the CSR, but in the end I don't think the OpenJDK project has a responsibility to justify its implementation details to other projects which try to emulate its behaviour.

@thomaswue
Copy link
Contributor

thomaswue commented Dec 8, 2020

@simonis As one of the many OpenJDK downstream projects, GraalVM inherits by default all code and flags. We add one additional JIT compiler option and test thoroughly that there are zero observable differences other than performance when that option is enabled (which is the default in the GraalVM distribution).

I don't think the OpenJDK project has a responsibility to justify its implementation details to other projects which try to emulate its behaviour.

My understanding as a member of the OpenJDK community was that a GitHub PR like this is the appropriate place to engage in collaborative discussions. Let me know if this kind of engagement is undesirable.

@simonis
Copy link
Member

simonis commented Dec 8, 2020

I've just realized that apparently, after this PR was closed, comments on the PR were still forwarded to the corresponding mailing list thread whereas answers on the mailing list thread were not appended to this PR any more. If you want to enjoy the full conversation please have a look at the hotspot-dev mailing list archive.

PS: I've also opened SKARA-843: Mails are not forwarded to a closed PR any more for the Skara team to check this behaviour.

@shipilev shipilev deleted the JDK-8252505-blackholes branch January 4, 2021 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
7 participants