New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8252505: C1/C2 compiler support for blackholes #1203
Conversation
👋 Welcome back shade! A progress list of the required criteria for merging this PR into |
51b6c8c
to
0cced3d
Compare
Webrevs
|
Looks like a reasonable enhancement. Should the When possible I think predicates such as |
Yeah, I can remodel But more importantly, I'd like to see if doing this in |
The problem being that this introduces extra call handling logic into the ad file which clouds the implementation of the arch-specific handling of a Modulo the supports_blackholes switch that logic is essentially the same on all platforms, isn't it? Why can you not just fold the relevant case handling into the generated matcher code? Wouldn't that just require a special case switch to decide whether to use the AD file rule to reduce a |
Is there a guide rail how to do that? Because I cannot see how we can "reduce to nothing" during matching. We still have to match I am exploring if we I can instead do |
Even if there are no instructions issued, some of the unfortunate effects of a call may be still there (e.g., spills around the place where the node is scheduled, memory state is effectively killed). Fixing that would involve overriding calling conventions, in_RegMask()/out_RegMask(), customize memory effects. Can you elaborate on your experiment with introducing custom node you mentioned? |
Sorry, I should have said 'generated code' not 'generated matcher code'. I wasn't actually thinking of doing anything different during matching per se. What I thought was you could do something different at emit time. One way would be to redefine the logic of the call node's (generated) emit method. If that code tested the method it was being asked to call and found it marked as a black hole then it could skip executing the emit statements culled from matching rules. That would involve changing the code in adlc/output_c/h.cpp. It might perhaps also require tweaking the code in adlc/formssel.cpp/hpp. Anyway, given what @iwanowww has said this may be the wrong way to go about it. |
Mailing list message from Andrew Haley on hotspot-compiler-dev: On 25/11/2020 12:45, Vladimir Ivanov wrote:
Is that even a downside? It does at least allow everything in flight to But Aleksey, there is an alternative: a store that doesn't do anything. -- |
Mailing list message from Vladimir Ivanov on hotspot-compiler-dev:
I'm under the impression that the main driver for this feature is
Yes, wiring the node into memory graph should work as well. I don't see why a single node (covering all basic types) can't do the job. Best regards, |
@shipilev this pull request can not be integrated into git checkout JDK-8252505-blackholes
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
See the updated PR description. Yes, I tried to introduce a new node and just wire the data nodes in it, but then I failed (miserably) to make sure the node is not considered dead by subsequent optimizations. Roland looked at it too, and did not think we can manage it. So we decided instead to piggyback on calls. New version hopefully makes it much cleaner: it is now |
Right, that is what |
I did try that. It was the very first attempt at doing it in C2, but it is harder than it looks. I updated the PR description with some history of attempts. |
test/hotspot/jtreg/compiler/blackhole/BlackholeNonVoidWarning.java
Outdated
Show resolved
Hide resolved
I think testing is all-green in GH Actions. I also passed tier{1,2} locally. Also passes JMH validation tests. I think we are ready to integrate this. @theRealAph, @adinn, @cl4es, want to ack the current patch? |
I'm testing on AArch64 now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AArch64 looks good. This is a substantial improvement. There's still a pesky volatile load of isDone inside the inner loop, but that's for another day.
Thank you all, that was a wild ride. /integrate |
@shipilev Since your change was applied there have been 28 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit e590618. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Looks much better! No platform dependent changes and seems relatively straightforward. Since it's only used by (Caught a cold last week and lost track of this - apologies for going silent and being late to the party here.) |
Is this the first compile command that can be used to modify the functional behaviour of the Java user application? |
I believe you can say that if we ignore |
That could set a dangerous precedent as it is otherwise a strongly assumed global invariant that the executing Java code has identical semantics independent of whether it is compiled or interpreted. It is also the first intrinsic that is dynamically configurable from the command line. I do agree introducing a whitebox API recognized by the compiler for microbenchmarking is the right approach; and much better than the previous one trying to outsmart the compiler with some complex Java code. Why did you dismiss the option to add a blackhole method to the JDK? |
As I said above,
Because in my mind, public Java API does not carry its weight here, for a single-purpose use in JMH. |
It is not an "implementation detail" if the change in semantics is visible to the user application. We are forced to implement this new feature in the GraalVM compiler or there will be unexpected observable differences in behaviour when executing the same Java application with the same command line flags. It is still a public API if it is a flag on the Java command line. What do you mean by "does not carry its weight"? Is the process to introduce a Java API too heavy-weight and therefore this solution was chosen? |
The fact that
I think that redefines what public API is. JVM flags are not Java public API.
Public Java API comes with the associated maintainability, reliability, security costs. Once you put the method into public Java API, there is (almost) no way back. If you have a bug in public Java API method, you have a problem. If that bug is a security one (e.g. crashing the JVM by 3rd party code), you have a huge problem. For the similar reasons, we have |
There are similar expectations for backwards compatibility and ongoing maintenance of command line flags (except the ones marked explicitly as "experimental"). There is certainly also an expectation that flags are not creating new security vulnerabilities. Whether a compiler inlines a method or not is not observable other than performance differences. This new command however can change the semantics of the Java program in unexpected ways by effectively removing the content of a method. The "inline" compile command and this new "blackhole" compile command are therefore not comparable. I think introducing a blackhole method in the JDK would be the right thing to do instead of short-cutting the process. Maintaining such a method would also be trivial as it is only a few lines of code. Even an empty method would be OK. The method becomes part of the API; the intrinsification is part of the "implementation detail". |
Speaking of Graal, I see that it already intrinsifies JMH
I think you severely underestimate the costs of going that way. LOC count has little to do with the costs here. I have been on that road before, and that is why I believe it does not worth it ("does not carry its weight"), when there is a way to get what JMH wants with the VM-specific compile commands. You are welcome to try proposing the public APIs for this. I can retract the |
As an afterthought, I would file a CSR request for this change (though CompileCommand has diagnostic nature, it is a product flag after all). If it turns out it's not an appropriate addition, a separate diagnostic/experimental flag can be added instead. (Though that would be unfortunate.) If possible change in behavior is a real concern, the implementation can be changed to affect only effect-free methods (e.g., applied only to empty methods). The upside of exposing the functionality as a command-line option (diagnostic or experimental, but even product) is it has much lower level of commitment associated with it. If there's enough motivation/justification to graduate it into public Java API, it can be done later when there's enough confidence that it's the right fit there. |
Or, as an alternative, |
Yes, applying it only to effect-free methods would be perfect. For such methods it is indeed comparable with the other compiler commands. |
This change would have definitely required a CSR as described in Hotspot Command-line Flags: Kinds, Lifecycle and the CSR Process. Besides that, I don't understand @thomaswue argumentation that he's "forced to implement this new feature in the GraalVM compiler or there will be unexpected observable differences in behaviour when executing the same Java application with the same command line flags". Misusing the new new option will already have unpredictable effects in OpenJDK itself and make it behave "Java SE" incompatible. I don't think there's a requirement that all configurations (i.e. all combinations of command line options) of a "Java SE" implementation have to be "Java SE" compatible and this is especially true for extended options like As far as I can see, a CSR (JDK-8257827) for this issue has already been created, which is good. Details like declaring this new option product, experimental or diagnostic should be discussed in the CSR, but in the end I don't think the OpenJDK project has a responsibility to justify its implementation details to other projects which try to emulate its behaviour. |
@simonis As one of the many OpenJDK downstream projects, GraalVM inherits by default all code and flags. We add one additional JIT compiler option and test thoroughly that there are zero observable differences other than performance when that option is enabled (which is the default in the GraalVM distribution).
My understanding as a member of the OpenJDK community was that a GitHub PR like this is the appropriate place to engage in collaborative discussions. Let me know if this kind of engagement is undesirable. |
I've just realized that apparently, after this PR was closed, comments on the PR were still forwarded to the corresponding mailing list thread whereas answers on the mailing list thread were not appended to this PR any more. If you want to enjoy the full conversation please have a look at the hotspot-dev mailing list archive. PS: I've also opened SKARA-843: Mails are not forwarded to a closed PR any more for the Skara team to check this behaviour. |
JMH uses the
Blackhole::consume
methods to avoid dead-code elimination of the code that produces benchmark values. It now relies on producing opaque side-effects and breaking inlining. While it was proved useful for many years, it unfortunately comes with several major drawbacks:Supporting this directly in compilers would improve nanobenchmark fidelity.
Instead of introducing public APIs or special-casing JMH methods in JVM, we can hook a new command to compiler control, and let JMH sign up its Blackhole methods for it with
-XX:CompileCommand=blackhole,org.openjdk.jmh.infra.Blackhole::consume
. This is being prototyped as CODETOOLS-7902762. It makes Blackholes behave substantially better.C1 code is platform-independent, and it handles blackhole via the intrinsics paths, lowering it to nothing.
C2 code is more complicated. There were four attempts to implement this, and what you see in the PR is the final attempt.
First attempt was to introduce fake store like
StoreV
("store void"), and then lower them to nothing. It runs into a series of funky problems: you would like to have at least two shapes of the store to match the store type width not to confuse the optimizer, or even have the whole mirror ofStore*
hierarchy. Additionally, optimizer tweaks were needed. The awkward problem of GC barrier verification showed up: ifStoreV*
is a subclass ofStore*
, then verificators rightfully expect GC barriers before them. If we emit GC, then we need to handle walking overStoreV*
nodes in optimizers.Second attempt was to introduce the special
Blackhole
node that consumes the values -- basically like C1 implementation does it. Alas, the many attempts to make sure the new node is not DCE'd failed. Roland looked at it, and suggested that there seems to be no way to model the effects we are after: consume the value, but have no observable side effects. So, suggested we instead put the boolean flag ontoCallJavaNode
, and then match it to nothing in.ad
....which is the essence of the third attempt. Drag the blackhole through C2 as if it has call-like side effects, and then emit nothing. Instead of boolean flag, the subsequent iteration introduced a full new
CallBlackhole
node, that is a call as far as optimizers are concerned, and then it is matched to nothing in.ad
.The fourth, and hopefully final attempt is in this PR. It makes the
Blackhole
the subclass ofMemBar
, and use the sameMatcher
path asOp_MemCPUOrder
: it does not match to anything, but it survives until matching, and keeps arguments alive. Additionally, C1 and C2 hooks are now using the synthetic_blackhole
intrinsic, similarly to existing_compiledLambdaForm
. It avoids introducing new nodes in C1. It also seems to require the least fiddling with C2 internals.Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/1203/head:pull/1203
$ git checkout pull/1203