-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent #25483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ot sequentially consistent
|
👋 Welcome back eosterlund! A progress list of the required criteria for merging this PR into |
|
@fisk This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 424 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
|
/cc hotspot-runtime |
|
Can confirm that this observably mitigates the reported issue with Exchanger in ExchangeLoops. |
|
@robcasloz |
|
/label remove hotspot-compiler |
|
@fisk |
|
Good catch. It's useful to look at how this mistake was made. The code was edited by four or five different authors, none of whom were intimately familiar with the port, leaving traps into which others fell. Silently clobbering a register in a convenience method is so dangerous that it (eh, probably) should never be done.
|
|
I agree with you @theRealAph, but I wonder if we should separate the bug fix (which probably needs extensive back porting), from the very reasonable refactoring you propose. |
|
/label add hotspot |
|
@fisk |
I could live with that, but IMO it's reasonable to do it now. |
tschatzl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aarch64 code seems good. However, riscv code seems broken too but the others fine.
@RealFYang , maybe you can have a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch. I think this is fine without further refactoring, especially since I see it should be backported into JDK 21 as well.
I was wondering why haven't we caught this in jcstress. jcstress runs in int/C1/C2 modes specifically to catch issues like these. I believe this slipped through because all of our seqcst tests, Dekker included, operate on primitives. So we never actually explore what happens with reference load/stores, and as this bug shows, there are interesting interactions with GC barriers. I'll see how to amend jcstress to cover this...
It's many things, but IMO fine is not one of them. It would surely be better if this evil were expunged from JDK 21 as well, lest it also confuse a backporter.
It's interesting to see how to do that. One question for @fisk : did this problem manifest interpreter-only, or with a combination of interpreted in one thread, compiled in the other? |
Maybe a "here be dragons" warning would suffice. |
|
I tried to follow the Maybe we should add an |
Indeed, yes. My only defence is that I was following the practice in the x86 port. It's not much of an excuse, but there it is...
That wouldn't hurt. |
Yeah it looks like this will need backporting all the way back to 8 - before the barrier registers were explicit at all at the use site, and before there was even an access API.
I was also confused why jcstress hasn't caught this - seems like basic seq cst testing would have hashed this out. Great that you found out why. :-) |
It manifested with -Xint and with -XX:TieredStopAtLevel=1, but mysteriously not when allowing C2 compilation. I traced down the reason to be the extra dmb you added to the following conditional leading fence for volatile loads in the interpreter: This extra leading fence on volatile loads when C2 is available masked the lack of trailing fence on the store side. Therefore, the Dekker duality in the test that hung worked out anyway then. |
Ha, yes. It's a funny old world. For what it's worth, I wanted to use seq cst loads and stores for all volatile accesses in the interpreter, but I was talked out of it. |
Yeah, here it is: ...on Graviton 3: Perhaps confusingly, this only reproduces when I supply In retrospect, I think conditionalizing barrier emit scheme on the presence of particular compilers is counter-productive, especially when the default behavior (C2 is enabled) is to emit the barriers. In this instance, this would have eliminated another degree of freedom in testing, and maybe made this bug less of a bug, but merely a nuisance :) Test starts to pass with the patch from this PR. |
That wouldn't hurt in that function, and indeed in most functions that have a bunch of register arguments. And even better would be to explicitly pass in that temp register instead of hard coding it. Having said that, I hesitate a bit mixing in orthogonal changes to this train that will have to go all the way back to JDK 8. But I'm happy to add that assert for the follow-up patch that tries to strengthen this code for the future. Hope that's okay. |
Thanks for checking.
Yeah I got a bit confused about that too. On the one hand side it looks like a weird optimization for a mode of execution that seems to care less about optimizations. But I suppose the reason for making it conditional might have rather been to be more precise about what the actual constraints are and not try to conservatively mask cases that absolutely should not need the fence for correctness. Then we want to know if our understanding of what we need for correctness is off, then something is very wrong, which fortunately we now found that it was indeed. Otherwise we would probably never have noticed it, but it would still arguably be wrong. For example if the store is interpreted (but with bugged out missing trailing fence) and the load is C2 compiled, there is still a problem, right? Just less likely to happen to get that kind of mixed execution in code exercising the races.
Awesome. |
I would have liked that solution. We have MO_ decorators, so would have been pretty neat to do the right thing in the backend instead, and not have mixed seq cst bindings that are subtly different and have to play along with each other. Oh well. |
fbredber
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is well understood by now, and I'm sure this will generate at least one follow-up patch. But for now, I'm happy with this PR as is. Great work!
I like to think so, but I can't rightly remember. At the time, I don't think I knew of any test failures: the problem was purely theoretical. |
If you add the following comment above every call to
|
Hmm yes that feels like a good compromise. I added the comment. |
shipilev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, since we are introducing the hunks near do_oop_store-s, and thus extending the scope of the patch. At this point, we can just inline do_oop_store (and maybe do_oop_load?), like Andrew initially suggested. This will also match what RISC-V already did: c5a1543
RISC-V doesn't really have the backporting until JDK 8 problem. I'd really like to make that cosmetic change in the next follow-up PR instead, as previously discussed. The comments hold true all the way back to JDK 8 and don't change the logic, so I can go along with that. And I'd rather take the risk of getting some comment wrong on the way back to JDK 8, than fiddling with the guts of all this unrelated code, that has changed substantially since back then. Does that sound okay? |
shipilev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, straight-up inlining:
__ store_heap_oop(dst, val, r10, r11, r3, decorators);
...conveys the similar message as // Clobbers: r10, r11, r3.
But I shall not quibble.
|
Thanks for the reviews everyone! |
|
/integrate |
|
Going to push as commit 83b15da.
Your commit was automatically rebased without conflicts. |
The optimized fast_aputfield bytecode on AArch64 stores the field flags in r3, and performs the leading and trailing fencing depending on its volatile bit being set or not. However, r3 is also the last temp register passed in to the barrier set for reference stores, and G1 clobbers it in a way that may clear the volatile bit. Then the trailing fence won't get executed, and sequential consistency is broken.
My fix puts the flags in r5 instead, which is the register that was used by normal aputfield bytecodes. This way, barriers don't clobber the volatile bits.
This bug has been observed to mess up a classic Dekker duality in the java.util.concurrent.Exchanger class, leading to a hang in the test/jdk/java/util/concurrent/Exchanger/ExchangeLoops.java test that exercises it. Using G1 and -Xint a reproducer hangs 30/100 times in mach5. With the fix, the same reproducer hangs 0/100 times.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25483/head:pull/25483$ git checkout pull/25483Update a local copy of the PR:
$ git checkout pull/25483$ git pull https://git.openjdk.org/jdk.git pull/25483/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 25483View PR using the GUI difftool:
$ git pr show -t 25483Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25483.diff
Using Webrev
Link to Webrev Comment