-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8326936: RISC-V: Shenandoah GC crashes due to incorrect atomic memory operations #18039
Conversation
👋 Welcome back MaxXSoft! A progress list of the required criteria for merging this PR into |
Webrevs
|
Hi, I don't witness such an issue when running those regression tests with jdk binaries built with native GCC-12 on Licheepi-4a or Unmatched boards. Do you know why? Does that mean this will only trigger for jdk binaries built with GCC-13? Guess @robehn who owns https://bugs.openjdk.org/browse/JDK-8316186 might also be interested in this. |
Hello, @RealFYang ! I can reproduce this crash using the following environment and configuration:
This bug produces a bad
With this patch, the dump of
(The base commit of jdk is 4dd6c44) The good and bad libjvm.so file: libjvm.so.tar.gz I haven't tried other GCC versions yet. |
Hi, on vacation, back next week. In this case old_value is unsigned, but in the assembly we loaded a sign extended value. Anyhow can't we just use a int32_t when comparing, and cast that back to the 'uint32_t' on return? (and still use _atomic...) Sorry not really available, as I said back next week. |
Hello, @robehn ! Thanks for your reply and suggestion. I tried this before, and this workaround does not work. That's why I reported this bug to GCC, it's so weird.
According to what I observed, it's correct. RISC-V calling convention says:
And then it says:
So the return value should be sign-extended, even if it's unsigned (according to my understand). Here's a simple example: https://godbolt.org/z/n6vfoYPPx. GCC generates
Seems we can't. Check this example for more details: https://godbolt.org/z/cv5ejTfxh. I wrote both the inline assembly version and the built-in version CAS, and tested them on the same code snippet. The inline assembly version works well due to the ABI, but the built-in version is still buggy. I also tried to build JDK with built-in atomic with a unsigned to signed conversion as the above example, and unsurprisingly it failed on Shenandoah-related tests. In addition I removed the redundant
|
Hi, When I use this patch, it doesn't compile, and reports the following error msg:
gcc version:
|
Sorry for that, I fixed this assertion in the latest commit. Now it should compile. Please let me know if not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I have several minor comments. BTW: I am curious about why GCC old versions like GCC-11/12 would work. Can you take a look? Thanks.
#elif defined(__GNUC__) && __riscv_xlen <= 32 | ||
#define CORRECT_COMPILER_ATOMIC_SUPPORT | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make sense to me to have this CORRECT_COMPILER_ATOMIC_SUPPORT
macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason for adding this macro:
__atomic_compare_exchange
is correct in Clang, we can directly use it withuint32_t
for potential optimization opportunities.- If this bug has been fixed in a later version of GCC, we can enable this macro for it, and leave the macro disabled for other GCC versions.
#ifndef CORRECT_COMPILER_ATOMIC_SUPPORT | ||
template<> | ||
template<typename T> | ||
inline T Atomic::PlatformCmpxchg<4>::operator()(T volatile* dest __attribute__((unused)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a code comment about why we should have this defined? Better to add a link to the related GCC bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
STATIC_ASSERT(4 == sizeof(T)); | ||
|
||
T old_value; | ||
long flag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with the preceding Atomic::PlatformCmpxchg<1>
, I would suggest change long flag;
into uint64_t rc_temp;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@RealFYang Thanks for your comments!
I'm trying to figure this out. Are you building in |
Yeah, I am talking about linux-riscv64 release build built with GCC-11/12. |
Hey,
This method should, at least often, be inlined, so old_value will not be passed as a return value.
Yes, I don't find a good way to force it. It possible using volatile but it involves storing and load from stack. Thanks, Robbin |
Hello! I tried to build linux-riscv64 release with GCC 12, the test result shows that it works. I also checked the dump of
But it seems that other optimizations interfere with the generated code. If someone changes the source code, the sign extension will disappear due to the GCC bug. Check out this short example simplified from the code of So we still need this patch on GCC 11/12. |
Hi @robehn , Thanks for your suggestions!
You are right. I tried the inline assembly version CAS on Clang (although we don't enable this patch for Clang), and the parameter don't seem to be sign-extended unless we force it: https://godbolt.org/z/fM8n9GMcz. I added an explicit type cast for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
I played a bit inline asm and it's actually handled pretty horrible with regards to zero/sign extensions. It seems like in some cases it is not possible to get the compiler to do the correct thing.
|
@MaxXSoft This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 70 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@RealFYang, @robehn) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
if (order != memory_order_relaxed) { | ||
FULL_MEM_BARRIER; | ||
} | ||
return old_value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do an explicit type conversion here? Like: return (T)old_value;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@RealFYang @robehn Thanks for your review! /integrate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks. The condition for defining macro CORRECT_COMPILER_ATOMIC_SUPPORT
needs to be improved when the gcc bugfix is backported to gcc branches 11, 12 and 13 [1].
[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646822.html
/sponsor |
Going to push as commit a089ed2.
Your commit was automatically rebased without conflicts. |
@RealFYang @MaxXSoft Pushed as commit a089ed2. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
/backport jdk22u |
@MaxXSoft To use the |
/backport jdk22u |
@zifeihan the backport was successfully created on the branch backport-zifeihan-a089ed2b in my personal fork of openjdk/jdk22u. To create a pull request with this backport targeting openjdk/jdk22u:master, just click the following link: The title of the pull request is automatically filled in correctly and below you find a suggestion for the pull request body:
If you need to update the source branch of the pull then run the following commands in a local clone of your personal fork of openjdk/jdk22u:
|
With compressed oops enabled, Shenandoah GC crashes in the concurrent marking phase due to some incorrect atomic memory operations. This resulted in the failure of multiple related tests, including
gc/shenandoah/TestSmallHeap.java
,gc/metaspace/TestMetaspacePerfCounters.java#Shenandoah-64
, and so on, tested on XuanTie C910 and QEMU.This failure is related to a GCC bug we recently discovered: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114130.
In detail, there's a pattern in method
ShenandoahHeap::conc_update_with_forwarded
:atomic_update_oop
then compressesobj
into a 32-bit word and calls the built-in__atomic_compare_exchange
. The latter produces incorrect assembly that comparing this unsigned 32-bit word with the sign-extended result oflr.w
instructions.This bug can be bypassed by adding an explicit null check (like
if (obj && in_collection_set(obj))
), or adding compiler flag-fno-delete-null-pointer-checks
.In previous commits, JDK-8316186 removed RISC-V's
PlatformCmpxchg<4>
but left the flag-fno-delete-null-pointer-checks
enabled. Then JDK-8316893 removed the flag and made the bug visible. This patch addsPlatformCmpxchg<4>
back.Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18039/head:pull/18039
$ git checkout pull/18039
Update a local copy of the PR:
$ git checkout pull/18039
$ git pull https://git.openjdk.org/jdk.git pull/18039/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18039
View PR using the GUI difftool:
$ git pr show -t 18039
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18039.diff
Webrev
Link to Webrev Comment