Skip to content

Conversation

@fisk
Copy link
Contributor

@fisk fisk commented May 28, 2025

The optimized fast_aputfield bytecode on AArch64 stores the field flags in r3, and performs the leading and trailing fencing depending on its volatile bit being set or not. However, r3 is also the last temp register passed in to the barrier set for reference stores, and G1 clobbers it in a way that may clear the volatile bit. Then the trailing fence won't get executed, and sequential consistency is broken.

My fix puts the flags in r5 instead, which is the register that was used by normal aputfield bytecodes. This way, barriers don't clobber the volatile bits.

This bug has been observed to mess up a classic Dekker duality in the java.util.concurrent.Exchanger class, leading to a hang in the test/jdk/java/util/concurrent/Exchanger/ExchangeLoops.java test that exercises it. Using G1 and -Xint a reproducer hangs 30/100 times in mach5. With the fix, the same reproducer hangs 0/100 times.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent (Bug - P2)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25483/head:pull/25483
$ git checkout pull/25483

Update a local copy of the PR:
$ git checkout pull/25483
$ git pull https://git.openjdk.org/jdk.git pull/25483/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25483

View PR using the GUI difftool:
$ git pr show -t 25483

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25483.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 28, 2025

👋 Welcome back eosterlund! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 28, 2025

@fisk This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent

Reviewed-by: shade, aph, fbredberg

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 424 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label May 28, 2025
@openjdk
Copy link

openjdk bot commented May 28, 2025

@fisk The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label May 28, 2025
@mlbridge
Copy link

mlbridge bot commented May 28, 2025

Webrevs

@robcasloz
Copy link
Contributor

/cc hotspot-runtime

@viktorklang-ora
Copy link
Contributor

Can confirm that this observably mitigates the reported issue with Exchanger in ExchangeLoops.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label May 28, 2025
@openjdk
Copy link

openjdk bot commented May 28, 2025

@robcasloz
The hotspot-runtime label was successfully added.

@fisk
Copy link
Contributor Author

fisk commented May 28, 2025

/label remove hotspot-compiler

@openjdk openjdk bot removed the hotspot-compiler hotspot-compiler-dev@openjdk.org label May 28, 2025
@openjdk
Copy link

openjdk bot commented May 28, 2025

@fisk
The hotspot-compiler label was successfully removed.

@theRealAph
Copy link
Contributor

Good catch.

It's useful to look at how this mistake was made. The code was edited by four or five different authors, none of whom were intimately familiar with the port, leaving traps into which others fell. Silently clobbering a register in a convenience method is so dangerous that it (eh, probably) should never be done.

do_oop_store does nothing useful. Please delete it and replace its usages with explicit calls to store_heap_oop, with clearly labelled parameters, like this:

diff --git a/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp b/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp
index 4c1e4ce3a05..bf688cc01b7 100644
--- a/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp
+++ b/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp
@@ -1144,7 +1144,7 @@ void TemplateTable::aastore() {
   // Get the value we will store
   __ ldr(r0, at_tos());
   // Now store using the appropriate barrier
-  do_oop_store(_masm, element_address, r0, IS_ARRAY);
+  __ store_heap_oop(element_address, r0, /*temps*/ r10, r11, r3, IS_ARRAY);
   __ b(done);
 
   // Have a null in r0, r3=array, r2=index.  Store null at ary[idx]

@fisk
Copy link
Contributor Author

fisk commented May 28, 2025

I agree with you @theRealAph, but I wonder if we should separate the bug fix (which probably needs extensive back porting), from the very reasonable refactoring you propose.

@fisk
Copy link
Contributor Author

fisk commented May 28, 2025

/label add hotspot

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label May 28, 2025
@openjdk
Copy link

openjdk bot commented May 28, 2025

@fisk
The hotspot label was successfully added.

@fisk
Copy link
Contributor Author

fisk commented May 28, 2025

@robehn BTW looks like RISC-V has the same issue. Thanks @tschatzl for noticing.

@theRealAph
Copy link
Contributor

I agree with you @theRealAph, but I wonder if we should separate the bug fix (which probably needs extensive back porting), from the very reasonable refactoring you propose.

I could live with that, but IMO it's reasonable to do it now.

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aarch64 code seems good. However, riscv code seems broken too but the others fine.
@RealFYang , maybe you can have a look?

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch. I think this is fine without further refactoring, especially since I see it should be backported into JDK 21 as well.

I was wondering why haven't we caught this in jcstress. jcstress runs in int/C1/C2 modes specifically to catch issues like these. I believe this slipped through because all of our seqcst tests, Dekker included, operate on primitives. So we never actually explore what happens with reference load/stores, and as this bug shows, there are interesting interactions with GC barriers. I'll see how to amend jcstress to cover this...

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 28, 2025
@theRealAph
Copy link
Contributor

Ouch. I think this is fine without further refactoring, especially since I see it should be backported into JDK 21 as well.

It's many things, but IMO fine is not one of them. It would surely be better if this evil were expunged from JDK 21 as well, lest it also confuse a backporter.

I was wondering why haven't we caught this in jcstress. jcstress runs in int/C1/C2 modes specifically to catch issues like these. I believe this slipped through because all of our seqcst tests, Dekker included, operate on primitives. So we never actually explore what happens with reference load/stores, and as this bug shows, there are interesting interactions with GC barriers. I'll see how to amend jcstress to cover this...

It's interesting to see how to do that. One question for @fisk : did this problem manifest interpreter-only, or with a combination of interpreted in one thread, compiled in the other?

@theRealAph
Copy link
Contributor

It would surely be better if this evil were expunged from JDK 21 as well, lest it also confuse a backporter.

Maybe a "here be dragons" warning would suffice.

@robehn
Copy link
Contributor

robehn commented May 28, 2025

@robehn BTW looks like RISC-V has the same issue. Thanks @tschatzl for noticing.

Yes, thanks.

@fbredber
Copy link
Contributor

I tried to follow the r5 register to see if it's safe to use, and got a bit scared when I saw that that r5 is used inside TemplateTable::load_resolved_field_entry() as a temp register when calling MacroAssembler::resolve_oop_handle(). But that's no problem since is_static is false. Doing manual register allocation in the interpreter is a roller coaster that travels between hope and despair.

Maybe we should add an assert_different_registers() statement that includes both r5 and rscratch2 after if (is_static) in TemplateTable::load_resolved_field_entry()?

@theRealAph
Copy link
Contributor

I tried to follow the r5 register to see if it's safe to use, and got a bit scared when I saw that that r5 is used inside TemplateTable::load_resolved_field_entry() as a temp register when calling MacroAssembler::resolve_oop_handle(). But that's no problem since is_static is false. Doing manual register allocation in the interpreter is a roller coaster that travels between hope and despair.

Indeed, yes. My only defence is that I was following the practice in the x86 port. It's not much of an excuse, but there it is...

Maybe we should add an assert_different_registers() statement that includes both r5 and rscratch2 after if (is_static) in TemplateTable::load_resolved_field_entry()?

That wouldn't hurt.

@fisk
Copy link
Contributor Author

fisk commented May 28, 2025

Ouch. I think this is fine without further refactoring, especially since I see it should be backported into JDK 21 as well.

Yeah it looks like this will need backporting all the way back to 8 - before the barrier registers were explicit at all at the use site, and before there was even an access API.

I was wondering why haven't we caught this in jcstress. jcstress runs in int/C1/C2 modes specifically to catch issues like these. I believe this slipped through because all of our seqcst tests, Dekker included, operate on primitives. So we never actually explore what happens with reference load/stores, and as this bug shows, there are interesting interactions with GC barriers. I'll see how to amend jcstress to cover this...

I was also confused why jcstress hasn't caught this - seems like basic seq cst testing would have hashed this out. Great that you found out why. :-)

@fisk
Copy link
Contributor Author

fisk commented May 28, 2025

One question for @fisk : did this problem manifest interpreter-only, or with a combination of interpreted in one thread, compiled in the other?

It manifested with -Xint and with -XX:TieredStopAtLevel=1, but mysteriously not when allowing C2 compilation. I traced down the reason to be the extra dmb you added to the following conditional leading fence for volatile loads in the interpreter:

  // 8179954: We need to make sure that the code generated for
  // volatile accesses forms a sequentially-consistent set of
  // operations when combined with STLR and LDAR.  Without a leading
  // membar it's possible for a simple Dekker test to fail if loads
  // use LDR;DMB but stores use STLR.  This can happen if C2 compiles
  // the stores in one method and we interpret the loads in another.
  if (!CompilerConfig::is_c1_or_interpreter_only_no_jvmci()) {
    Label notVolatile;
    __ tbz(r3, ResolvedFieldEntry::is_volatile_shift, notVolatile);
    __ membar(MacroAssembler::AnyAny);
    __ bind(notVolatile);
  }

This extra leading fence on volatile loads when C2 is available masked the lack of trailing fence on the store side. Therefore, the Dekker duality in the test that hung worked out anyway then.

@theRealAph
Copy link
Contributor

This extra leading fence on volatile loads when C2 is available masked the lack of trailing fence on the store side. Therefore, the Dekker duality in the test that hung worked out anyway then.

Ha, yes. It's a funny old world. For what it's worth, I wanted to use seq cst loads and stores for all volatile accesses in the interpreter, but I was talked out of it.

@shipilev
Copy link
Member

shipilev commented May 28, 2025

I was wondering why haven't we caught this in jcstress. jcstress runs in int/C1/C2 modes specifically to catch issues like these. I believe this slipped through because all of our seqcst tests, Dekker included, operate on primitives. So we never actually explore what happens with reference load/stores, and as this bug shows, there are interesting interactions with GC barriers. I'll see how to amend jcstress to cover this...

Yeah, here it is:

@JCStressTest
@Outcome(id = {"null, A", "B, null", "B, A"}, expect = ACCEPTABLE, desc = "Trivial under sequential consistency")
@Outcome(id = "null, null",                   expect = FORBIDDEN,  desc = "Violates sequential consistency")
@State
public class RefDekkerTest {
    volatile Object a;
    volatile Object b;

    @Actor
    public void actor1(LL_Result r) {
        a = new String("A");
        r.r1 = b;
    }

    @Actor
    public void actor2(LL_Result r) {
        b = new String("B");
        r.r2 = a;
    }
}

...on Graviton 3:

% build/linux-aarch64-server-release/images/jdk/bin/java -jar jcstress.jar -t RefDekker -tb 1m -f 10 -sc false

...... [FAILED] o.o.j.t.volatiles.RefDekkerTest

  Results across all configurations:

      RESULT      SAMPLES     FREQ      EXPECT  DESCRIPTION
        B, A      377,670    0.06%  Acceptable  Trivial under sequential consistency
     B, null  288,159,312   46.22%  Acceptable  Trivial under sequential consistency
     null, A  331,478,806   53.17%  Acceptable  Trivial under sequential consistency
  null, null    3,385,362    0.54%   Forbidden  Violates sequential consistency

org.openjdk.jcstress.tests.volatiles.RefDekkerTest [-XX:TieredStopAtLevel=1]: Observed forbidden state: null, null (Violates sequential consistency)
org.openjdk.jcstress.tests.volatiles.RefDekkerTest [-Xint]: Observed forbidden state: null, null (Violates sequential consistency)

Perhaps confusingly, this only reproduces when I supply -sc false. A "normal" way for jcstress to separately compile/interpret methods is via compiler control, this is what -sc true (default) does. Which, I think, accidentally passes due to Erik's comment above: #25483 (comment) -- we still interpret in the mode that have extra fence. WIth -sc false, we have a more blunt -Xint, -XX:TieredStopAtLevel=1 gets used and is seen to fail.

In retrospect, I think conditionalizing barrier emit scheme on the presence of particular compilers is counter-productive, especially when the default behavior (C2 is enabled) is to emit the barriers. In this instance, this would have eliminated another degree of freedom in testing, and maybe made this bug less of a bug, but merely a nuisance :)

Test starts to pass with the patch from this PR.

@fisk
Copy link
Contributor Author

fisk commented May 28, 2025

I tried to follow the r5 register to see if it's safe to use, and got a bit scared when I saw that that r5 is used inside TemplateTable::load_resolved_field_entry() as a temp register when calling MacroAssembler::resolve_oop_handle(). But that's no problem since is_static is false. Doing manual register allocation in the interpreter is a roller coaster that travels between hope and despair.

Maybe we should add an assert_different_registers() statement that includes both r5 and rscratch2 after if (is_static) in TemplateTable::load_resolved_field_entry()?

That wouldn't hurt in that function, and indeed in most functions that have a bunch of register arguments. And even better would be to explicitly pass in that temp register instead of hard coding it. Having said that, I hesitate a bit mixing in orthogonal changes to this train that will have to go all the way back to JDK 8. But I'm happy to add that assert for the follow-up patch that tries to strengthen this code for the future. Hope that's okay.

@fisk
Copy link
Contributor Author

fisk commented May 28, 2025

Perhaps confusingly, this only reproduces when I supply -sc false. A "normal" way for jcstress to separately compile/interpret methods is via compiler control, this is what -sc true (default) does. Which, I think, accidentally passes due to Erik's comment above: #25483 (comment) -- we still interpret in the mode that have extra fence. WIth -sc false, we have a more blunt -Xint, -XX:TieredStopAtLevel=1 gets used and is seen to fail.

Thanks for checking.

In retrospect, I think conditionalizing barrier emit scheme on the presence of particular compilers is counter-productive, especially when the default behavior (C2 is enabled) is to emit the barriers. In this instance, this would have simplified testing a bit, and maybe this bug less of a nuisance :)

Yeah I got a bit confused about that too. On the one hand side it looks like a weird optimization for a mode of execution that seems to care less about optimizations. But I suppose the reason for making it conditional might have rather been to be more precise about what the actual constraints are and not try to conservatively mask cases that absolutely should not need the fence for correctness. Then we want to know if our understanding of what we need for correctness is off, then something is very wrong, which fortunately we now found that it was indeed. Otherwise we would probably never have noticed it, but it would still arguably be wrong. For example if the store is interpreted (but with bugged out missing trailing fence) and the load is C2 compiled, there is still a problem, right? Just less likely to happen to get that kind of mixed execution in code exercising the races.

Test starts to pass with the patch from this PR.

Awesome.

@fisk
Copy link
Contributor Author

fisk commented May 28, 2025

This extra leading fence on volatile loads when C2 is available masked the lack of trailing fence on the store side. Therefore, the Dekker duality in the test that hung worked out anyway then.

Ha, yes. It's a funny old world. For what it's worth, I wanted to use seq cst loads and stores for all volatile accesses in the interpreter, but I was talked out of it.

I would have liked that solution. We have MO_ decorators, so would have been pretty neat to do the right thing in the backend instead, and not have mixed seq cst bindings that are subtly different and have to play along with each other. Oh well.

Copy link
Contributor

@fbredber fbredber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is well understood by now, and I'm sure this will generate at least one follow-up patch. But for now, I'm happy with this PR as is. Great work!

@theRealAph
Copy link
Contributor

But I suppose the reason for making it conditional might have rather been to be more precise about what the actual constraints are and not try to conservatively mask cases that absolutely should not need the fence for correctness.

I like to think so, but I can't rightly remember. At the time, I don't think I knew of any test failures: the problem was purely theoretical.

@theRealAph
Copy link
Contributor

It would surely be better if this evil were expunged from JDK 21 as well, lest it also confuse a backporter.

Maybe a "here be dragons" warning would suffice.

If you add the following comment above every call to do_oop_store() I'll approve this patch:

// Clobbers: r10, r11, r3

@fisk
Copy link
Contributor Author

fisk commented Jun 2, 2025

It would surely be better if this evil were expunged from JDK 21 as well, lest it also confuse a backporter.

Maybe a "here be dragons" warning would suffice.

If you add the following comment above every call to do_oop_store() I'll approve this patch:

// Clobbers: r10, r11, r3

Hmm yes that feels like a good compromise. I added the comment.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Jun 2, 2025
Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, since we are introducing the hunks near do_oop_store-s, and thus extending the scope of the patch. At this point, we can just inline do_oop_store (and maybe do_oop_load?), like Andrew initially suggested. This will also match what RISC-V already did: c5a1543

@fisk
Copy link
Contributor Author

fisk commented Jun 2, 2025

Well, since we are introducing the hunks near do_oop_store-s, and thus extending the scope of the patch. At this point, we can just inline do_oop_store (and maybe do_oop_load?), like Andrew initially suggested. This will also match what RISC-V already did: c5a1543

RISC-V doesn't really have the backporting until JDK 8 problem. I'd really like to make that cosmetic change in the next follow-up PR instead, as previously discussed. The comments hold true all the way back to JDK 8 and don't change the logic, so I can go along with that. And I'd rather take the risk of getting some comment wrong on the way back to JDK 8, than fiddling with the guts of all this unrelated code, that has changed substantially since back then. Does that sound okay?

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 2, 2025
Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, straight-up inlining:

  __ store_heap_oop(dst, val, r10, r11, r3, decorators);

...conveys the similar message as // Clobbers: r10, r11, r3.

But I shall not quibble.

@fisk
Copy link
Contributor Author

fisk commented Jun 2, 2025

Thanks for the reviews everyone!

@fisk
Copy link
Contributor Author

fisk commented Jun 2, 2025

/integrate

@openjdk
Copy link

openjdk bot commented Jun 2, 2025

Going to push as commit 83b15da.
Since your change was applied there have been 427 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 2, 2025
@openjdk openjdk bot closed this Jun 2, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 2, 2025
@openjdk
Copy link

openjdk bot commented Jun 2, 2025

@fisk Pushed as commit 83b15da.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

8 participants