Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8252857: AArch64: Shenandoah C1 CAS is not sequentially consistent #280

Closed
wants to merge 2 commits into from

Conversation

@nick-arm
Copy link
Member

@nick-arm nick-arm commented Sep 21, 2020

C1 atomic operations are supposed to be sequentially consistent by
default but the variant in the Shenandoah C1 barrier set assembler only
provides a half-barrier when the CAS succeeds. Added a trailing full
barrier and load-acquire to exactly match the non-Shenandoah C1 CAS
implementation. This prevents any memory accesses following the CAS
operation being observed before it.

Also adjusted the documentation for ShenandoahBarrierSetAssembler
::cmpxchg_oop as it currently claims to be sequentially consistent by
default but it's not clear what the "default" values of acquire and
release should be, and the comment for acquire/release implies to me
that setting them to true would relax the ordering guarantees but
actually it's the opposite. I tried to simplify this and make it less
ambiguous.

One other thing I noticed when reading this: the comment

// Step 4. CAS has failed because the value most recently fetched
// from addr (which is now held in tmp1) ...

is wrong on non-LSE systems because tmp1 is rscratch1 and that is
clobbered by the call to __ cmpxchg() at the end of step3. Although it
doesn't matter because the value in tmp1 is not used after that point.
Adjusted the comment to clarify this.

Tested hotspot_all_no_apps, jdk_core plus jcstress with -jvmArgs
'-XX:+UseShenandoahGC -XX:ShenandoahGCMode=iu -XX:TieredStopAtLevel=1'.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8252857: AArch64: Shenandoah C1 CAS is not sequentially consistent

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/280/head:pull/280
$ git checkout pull/280

C1 atomic operations are supposed to be sequentially consistent by
default but the variant in the Shenandoah C1 barrier set assembler only
provides a half-barrier when the CAS succeeds. Added a trailing full
barrier and load-acquire to exactly match the non-Shenandoah C1 CAS
implementation. This prevents any memory accesses following the CAS
operation being observed before it.

Also adjusted the documentation for ShenandoahBarrierSetAssembler
::cmpxchg_oop as it currently claims to be sequentially consistent by
default but it's not clear what the "default" values of acquire and
release should be, and the comment for acquire/release implies to me
that setting them to true would relax the ordering guarantees but
actually it's the opposite. I tried to simplify this and make it less
ambiguous.

One other thing I noticed when reading this: the comment

   // Step 4. CAS has failed because the value most recently fetched
   // from addr (which is now held in tmp1) ...

is wrong on non-LSE systems because tmp1 is rscratch1 and that is
clobbered by the call to __ cmpxchg() at the end of step3. Although it
doesn't matter because the value in tmp1 is not used after that point.
Adjusted the comment to clarify this.

Tested hotspot_all_no_apps, jdk_core plus jcstress with -jvmArgs
'-XX:+UseShenandoahGC -XX:ShenandoahGCMode=iu -XX:TieredStopAtLevel=1'.
@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Sep 21, 2020

👋 Welcome back ngasson! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Sep 21, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Sep 21, 2020

@nick-arm The following labels will be automatically applied to this pull request: hotspot shenandoah.

When this pull request is ready to be reviewed, an RFR email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label (add|remove) "label" command.

@nick-arm
Copy link
Member Author

@nick-arm nick-arm commented Sep 21, 2020

/label add hotspot-gc
/label remove hotspot

@openjdk openjdk bot added the hotspot-gc label Sep 21, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Sep 21, 2020

@nick-arm
The hotspot-gc label was successfully added.

@nick-arm
Copy link
Member Author

@nick-arm nick-arm commented Sep 21, 2020

/label remove hotspot

@openjdk openjdk bot removed the hotspot label Sep 21, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Sep 21, 2020

@nick-arm
The hotspot label was successfully removed.

@openjdk
Copy link

@openjdk openjdk bot commented Sep 21, 2020

@nick-arm The hotspot label was not set.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 21, 2020

Webrevs

Copy link
Contributor

@rkennke rkennke left a comment

Looks good to me. @shipilev should look at it, too.

@openjdk
Copy link

@openjdk openjdk bot commented Sep 21, 2020

@nick-arm This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for more details.

After integration, the commit message for the final commit will be:

8252857: AArch64: Shenandoah C1 CAS is not sequentially consistent

Reviewed-by: rkennke, shade

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 88 new commits pushed to the master branch:

  • e9c1782: 8252752: Clear card table for old regions during scan in G1
  • 276fcee: 8252835: Revert fix for JDK-8246051
  • ca1ed16: 8253639: Change os::attempt_reserve_memory_at parameter order
  • fed3636: 8252219: C2: Randomize IGVN worklist for stress testing
  • 625a935: 8253638: Cleanup os::reserve_memory and remove MAP_FIXED
  • f014854: 8253624: gtest fails when run from make with read-only source directory
  • 7817963: 8247691: [aarch64] Incorrect handling of VM exceptions in C1 deopt stub/traps
  • 79904c1: 8252730: jlink does not create reproducible builds on different servers
  • ea7c47c: 7179006: [macosx] Print-to-file doesn't work: printing to the default printer instead.
  • b66fa8f: 8253572: [windows] CDS archive may fail to open with long file names
  • ... and 78 more: https://git.openjdk.java.net/jdk/compare/34ec1bedd1c2d774c52f3664f9b57a4422877bfe...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Sep 21, 2020
@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 21, 2020

Mailing list message from Andrew Haley on shenandoah-dev:

On 21/09/2020 10:11, Nick Gasson wrote:

C1 atomic operations are supposed to be sequentially consistent by
default but the variant in the Shenandoah C1 barrier set assembler only
provides a half-barrier when the CAS succeeds. Added a trailing full
barrier and load-acquire to exactly match the non-Shenandoah C1 CAS
implementation. This prevents any memory accesses following the CAS
operation being observed before it.

We only need the trailing membar if is_c1_or_interpreter_only(). Otherwise
all volatile loads in C1 are preceded by a leading membar() so we don't
need a trailing one as well.

462 // By default, this operation has relaxed memory ordering

No, it has whatever memory ordering you tell it. Plus, if you pass
both acquire and release, sequential consistency.

--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 22, 2020

Mailing list message from Nick Gasson on hotspot-gc-dev:

On 09/21/20 18:23 pm, Andrew Haley wrote:

We only need the trailing membar if is_c1_or_interpreter_only(). Otherwise
all volatile loads in C1 are preceded by a leading membar() so we don't
need a trailing one as well.

Yes, OK. But the same applies to LIR_Assembler::casw() and casl().
Should I change those as well? I don't see any reason for them to be
different.

462 // By default, this operation has relaxed memory ordering

No, it has whatever memory ordering you tell it. Plus, if you pass
both acquire and release, sequential consistency.

If volatile load is implemented as "ldr ; dmb ishld" then this function
alone doesn't provide sequential consistency even with acquire and
release both set. That's why the original comment is misleading.

Can you suggest an alternate wording here? Perhaps we can delete all
mention of memory ordering and leave it up to the reader.

--
Thanks,
Nick

@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 22, 2020

Mailing list message from Andrew Haley on hotspot-gc-dev:

On 22/09/2020 02:47, Nick Gasson wrote:

On 09/21/20 18:23 pm, Andrew Haley wrote:

We only need the trailing membar if is_c1_or_interpreter_only(). Otherwise
all volatile loads in C1 are preceded by a leading membar() so we don't
need a trailing one as well.

Yes, OK. But the same applies to LIR_Assembler::casw() and casl().
Should I change those as well? I don't see any reason for them to be
different.

Neither do I, but I'd rather you didn't touch anything that wasn't
broken.

The history is a little untidy: to begin with I thought we'd be able
to use ldar/stlr instructions as intended by Arm to provide a
fully-sequentially- consistent set of operations, but it became clear
that assumptions about memory fences were so baked into C1 that it'd
be too disruptive to fix.

[ Originally cas used ldaxr;stlxr. On Feb 14 2014 I changed it to be
membar;ldxr;stxr;membar. On April 1 (!) 2014 to ldaxr;stxr;membar. On
Apr 10 2014 to ldaxr;stlxr;membar. This was a the result of much
toing-and-froing with various people, including Arm. ]

Given that we now have a membar in C1 before a volatile load when
needed for compatibility with C2, I don't believe we also need the
trailing membars after volatile stores.

However, I also don't believe that removing them is worth the effort
and risk. But that isn't the same as saying we should start *adding*
unnecessary membars now. C1 performance isn't critical, but stability
is.

462 // By default, this operation has relaxed memory ordering

No, it has whatever memory ordering you tell it. Plus, if you pass
both acquire and release, sequential consistency.

If volatile load is implemented as "ldr ; dmb ishld" then this function
alone doesn't provide sequential consistency even with acquire and
release both set. That's why the original comment is misleading.

I see.

Can you suggest an alternate wording here? Perhaps we can delete all
mention of memory ordering and leave it up to the reader.

Perhaps so.

--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 22, 2020

Mailing list message from Nick Gasson on hotspot-gc-dev:

On 09/22/20 16:49 pm, Andrew Haley wrote:

The history is a little untidy: to begin with I thought we'd be able
to use ldar/stlr instructions as intended by Arm to provide a
fully-sequentially- consistent set of operations, but it became clear
that assumptions about memory fences were so baked into C1 that it'd
be too disruptive to fix.

[ Originally cas used ldaxr;stlxr. On Feb 14 2014 I changed it to be
membar;ldxr;stxr;membar. On April 1 (!) 2014 to ldaxr;stxr;membar. On
Apr 10 2014 to ldaxr;stlxr;membar. This was a the result of much
toing-and-froing with various people, including Arm. ]

Given that we now have a membar in C1 before a volatile load when
needed for compatibility with C2, I don't believe we also need the
trailing membars after volatile stores.

However, I also don't believe that removing them is worth the effort
and risk. But that isn't the same as saying we should start *adding*
unnecessary membars now. C1 performance isn't critical, but stability
is.

I see, thanks for the explanation. I've added the
is_c1_or_interpreter_only() check with a comment to explain why and
edited the documentation for cmpxchg_oop() a bit more. Hopefully the bot
will post the new webrev link...

--
Thanks,
Nick

@nick-arm
Copy link
Member Author

@nick-arm nick-arm commented Sep 28, 2020

Any more comments on the latest version?

@shipilev
Copy link
Contributor

@shipilev shipilev commented Sep 28, 2020

Sorry for the delay. This looks good, thanks.

@nick-arm
Copy link
Member Author

@nick-arm nick-arm commented Sep 28, 2020

/integrate

@openjdk openjdk bot closed this Sep 28, 2020
@openjdk openjdk bot added integrated and removed ready rfr labels Sep 28, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Sep 28, 2020

@nick-arm Since your change was applied there have been 89 commits pushed to the master branch:

  • c2692f8: 8225329: -XX:+PrintBiasedLockingStatistics causes crash during initia…
  • e9c1782: 8252752: Clear card table for old regions during scan in G1
  • 276fcee: 8252835: Revert fix for JDK-8246051
  • ca1ed16: 8253639: Change os::attempt_reserve_memory_at parameter order
  • fed3636: 8252219: C2: Randomize IGVN worklist for stress testing
  • 625a935: 8253638: Cleanup os::reserve_memory and remove MAP_FIXED
  • f014854: 8253624: gtest fails when run from make with read-only source directory
  • 7817963: 8247691: [aarch64] Incorrect handling of VM exceptions in C1 deopt stub/traps
  • 79904c1: 8252730: jlink does not create reproducible builds on different servers
  • ea7c47c: 7179006: [macosx] Print-to-file doesn't work: printing to the default printer instead.
  • ... and 79 more: https://git.openjdk.java.net/jdk/compare/34ec1bedd1c2d774c52f3664f9b57a4422877bfe...master

Your commit was automatically rebased without conflicts.

Pushed as commit 8e87d46.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants