-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8321371: SpinPause() not implemented for bsd_aarch64/macOS #16994
8321371: SpinPause() not implemented for bsd_aarch64/macOS #16994
Conversation
👋 Welcome back fbredberg! A progress list of the required criteria for merging this PR into |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@fbredber This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 196 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@fisk, @dholmes-ora, @dcubed-ojdk, @eastig, @shipilev) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me, but we really need the Aarch64 folk to chime in on this one.
Thanks
Paging @eastig and @theRealAph here. |
Hi @fbredber, If you want to implement
|
Based on |
The better way to find out an instruction to use is to run microbenchmarks/benchmarks. See #6803. |
Hi @eastig, I initially did the things you suggest, but after some internal discussions changed the implementation into a single yield instruction. Here's some background info from JDK-8321371: Fredrik: My thought was to implement SpinPause() for MacOS by copying the implementing from src/hotspot/os_cpu/linux_aarch64/os_linux_aarch64.cpp to src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp. @shipilev I remember trying the same thing along with JDK-8318986, but I realized SpinPause() is only used from the quite hot VM native code, and so it would probably affect GC and runtime performance. The actual Thread.onSpinWait from Java code should be already handled by intrinsics. I suspect the overhead of doing the stub call is already similar to whatever hint we finally emit in the stub, but the WX transition back and forth is likely to be quite bad to make often. SpinWait is quite likely used in busy loops, so this would add up. So if we are doing this, we need to check how much does this actually cost. Fredrik: I was running some performance tests after removing ObjectMonitor::NotRunnable() (see JDK-8320317). The performance went up on Linux x86 and Windows x86 by approximately 12%, but went down with roughly the same amount on macOS AArch64. The performance decreased only slightly on Linux AArch64. So I stated to focus on the differences between macOS and Linux on AArch64 and found out that SpinPause() is implemented on Linux but not on macOS. So I copied the source from Linux to macOS (or bsd_aarch64 if you'd like) and re-run the tests. This seemed to help bringing back macOS to the Linux level on AArch64. I do agree that the overhead of doing the stub call is already similar to whatever hint we finally emit in the stub, and that the WX transition back and forth is likely to be quite bad. My measurements showed that among the different OnSpinWaitInst options, "isb" generated the best result. If we could get rid of the OnSpinWaitInst options and just hard code an isb instruction (or any other instruction that people can agree upon) like it's done on x86, that would probably be best. For now I just wanted macOS AArch64 to be on par with Linux after the removal of NotRunnable(). About measuring the actual cost. Some of the performance tests show notoriously unstable values, when run multiple times. I've focused on the ones that I feel is stable. After having some internal discussions, it seems like the most reasonable thing to do is to implement SpinPause() using a single inline yield instruction. This way we get rid of both the call and the WX stuff. This solution also showed better performance figures than the OnSpinWaitInst options did. The reason for using the yield instruction instead of the the isb instruction (which showed slightly better performance figures) is that the yield instruction is meant for this kind of use cases. So even if isb is slightly better on today's silicon, yield is likely to be better in the long run. |
@fbredber I read comments on JDK-8321371.
Also without reviewing your benchmarking results, it is difficult to say what you measured. Could you share them? You might also try another implementation: #5562 (comment) |
According to #5562 (comment), |
@shipilev wrote on JDK-8321371:
I don't know what WX transition is.
As bsd_aarch64 does not define what to use for spin wait, the intrinsic has no code. |
The discussion of the ISB based approach: RFC: AArch64: Implementing spin pauses with ISB |
I wanted the same but it was decided it'd be nice to have other options for spin wait/pause: |
I understand. Here the discussion went something like this: The BSD port is almost strictly a MacOSX port. The configurability is more costly on MacOSX, because of the need to call os::current_thread_enable_wx(). The vast majority of MacOSX users don't want to configure which SpinPause instruction to use, they want something that is good straight out of the box. Since there is a dedicated instruction (yield) in the AArch64 for this use case, we ought to use it. Because it's in the interest of the CPU vendor to have as good yield implementation as possible for each and every variety of the CPU. So if the user upgrades to a new CPU version, the SpinPause should still perform as good as possible, without the need to reconfigure. For more info about the WX stuff, see here: |
Am I correct that in our current implementation if we call code of any generated stub we need to use
And they don't need to do. The configuration feature is for us to define the best spin pause implementation for an AArch64 CPU. As I wrote if you don't choose an instruction for spin pause, you will have an empty onSpinWait intrinsic.
It looks like vendors have not been interested in
We usually know ahead of a user, when a widely used CPU family gets new features useful for JVM. By the time users get a new CPU, they will get JVM supporting the new hardware features. With the current JDK Project release model, it's more likely users get them as soon as possible. |
@fbredber |
Let’s zoom out and look at the big picture for a bit. So yield is the obvious instruction dedicated for this purpose in the ISA, and has been for a very long time. I suppose early ARM chips didn’t have a whole lot of concurrency and implementing it wasn’t all that beneficial as you frankly didn’t spend considerable time spinning. And now we seemingly have come to a classic chicken and egg problem. Software people like us don’t want to use the obvious yield instruction intended for this exact purpose, because hardware vendors haven’t implemented it. And hardware vendors don’t want to implement it, because no software is using it. It feels like we had a sort of similar situation with neon vs SVE. All hardware was running neon, and nobody was running SVE. That doesn’t make it very encouraging as a software developer to implement SVE support in software, for an imaginary chip that doesn’t exist. And the fact that software doesn’t implement SVE doesn’t make it very encouraging to implement it. Yet we did it because it was the right thing to do, and no benchmark thanked us for it. In the short term, it would seem like a better idea to use ISB, if you look at micro benchmarks. But what we are doing then is IMO what we tell our Java users not to do, and for good reason. We say “don’t use Unsafe to expose JDK and JVM internals that you happen to know how it works today, but you don’t know how it will work tomorrow, so you can look like a winner in a microbenchmark”. We are looking past the intended ISA contract, and look at current implementations today, and finding that as a hack, the ISB instruction which was designed to deal with cross modifying code, and not at all designed for this, is currently a better fit for doing what the yield instruction should have done had it been implemented, as the current ISB implementation has a long latency. And then a couple of years later, when the next LTS is released, a the Apple M3 is released. And then by the time we hit bulk of mainstream adoption, maybe we cycle through M4 and M5, and perhaps we end up with a hyper threaded core with 4 threads per core, and the isb turns out to be a disaster as it impedes progress of the other threads on the core to avoid shared resources being tripped over by cross modifying code, which is the exact opposite behaviour of what a pause instruction should do. Or maybe it won’t happen and everything is fine. The point is that we don’t know. We are now in a situation where on MacOS the spinning hint does nothing. There is no cost over the baseline of making that spinning hint bind to the obvious yield instruction. It might not do a lot on Apple silicon today, and as of today, isb would probably yield better results (pun intended). But we don’t have to choose ISB just because it looks better today. We can instead be bold and break the chicken and egg situation, and use yield. Then the ball is on the HW court to do the right thing. Because of this, I still think the right thing to do, is to bind SpinPause to the obvious ISA intended “yield” instruction, even though it is currently a nop. |
@fisk I don't mind to use
I think hardcoding @fbredber @fisk In
In
To enable onSpinWait intrinsic (this or a separate PR) you will need to set
You will also need to update tests:
|
@eastig I think the essence of what you are proposing - having a non-WX solution for making the instruction selection configurable, but defaulting to yield - sounds very reasonable. Good suggestion! |
Sounds like a way forward. I'll try it out. Thanks! |
I've now rewritten |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@fbredber
|
I just like to keep away from conditional branches in code that is supposed to be in tight loops. :) |
@fbredber - This PR's description and the final comment in JDK-8321371 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thumbs up with a couple of minor nits.
I like the idea of hooking the os_bsd_aarch64 version of SpinPause
into the OnSpinWaitInst option. This will enable easier experimentation
in the future without specially built binaries to switch the SpinPause
implementation.
Fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thumbs up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks for the review guys. Since it's now the night before my Christmas vacation, I will wait and integrate this PR another year. |
/integrate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, this one is pretty wild, but I somehow like it.
/sponsor |
Going to push as commit fc04750.
Your commit was automatically rebased without conflicts. |
The SpinPause() function only returns 0 on bsd_aarch64 (i.e. macOS)
This PR implements SpinPause() for MacOS on AArch64 and makes it possible to choose between none, nop, isb and yield by using the OnSpinWaitInst option. The same functionality is found on AArch64 based Linux platforms.
Tested successfully on macosx-aarch64 tier1-tier5.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16994/head:pull/16994
$ git checkout pull/16994
Update a local copy of the PR:
$ git checkout pull/16994
$ git pull https://git.openjdk.org/jdk.git pull/16994/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16994
View PR using the GUI difftool:
$ git pr show -t 16994
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16994.diff
Webrev
Link to Webrev Comment