Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8273127: Shenandoah: Adopt relaxed order for update oop #5299

Closed
wants to merge 1 commit into from

Conversation

@weixlu
Copy link
Contributor

@weixlu weixlu commented Aug 30, 2021

Shenandoah evacuates object and then updates the reference in load barriers. Currently, memory order release is adopted in atomic_update_oop(), and we propose to use relaxed instead. Since memory order conservative is adopted in updating forwardee, this membar ensures that object copy is finished before updating the reference. In the scenario when a thread reads self-healed address from forwardee, relaxed is also fine due to the data dependency of the self-healed address.
This relaxation of memory order brings us some performance improvement. We have run specjbb2015 on AArch64. Critical-JOPS increases by 3% while max-JOPS maintains almost the same.

baseline_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107513, max-jOPS = 102968, critical-jOPS = 37913
baseline_2:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 106769, max-jOPS = 101742, critical-jOPS = 37151
baseline_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 110118, max-jOPS = 101742, critical-jOPS = 37041

optimized_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 111708, max-jOPS = 101742, critical-jOPS = 37687
optimized_2:RUN RESULT: hbIR (max attempted) = 147077, hbIR (settled) = 122581, max-jOPS = 104425, critical-jOPS = 39663
optimized_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107286, max-jOPS = 102968, critical-jOPS = 38579

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8273127: Shenandoah: Adopt relaxed order for update oop

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5299/head:pull/5299
$ git checkout pull/5299

Update a local copy of the PR:
$ git checkout pull/5299
$ git pull https://git.openjdk.java.net/jdk pull/5299/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5299

View PR using the GUI difftool:
$ git pr show -t 5299

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5299.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Aug 30, 2021

👋 Welcome back weixlu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Loading

@openjdk openjdk bot added the rfr label Aug 30, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Aug 30, 2021

@weixlu The following labels will be automatically applied to this pull request:

  • hotspot-gc
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

Loading

@mlbridge
Copy link

@mlbridge mlbridge bot commented Aug 30, 2021

Webrevs

Loading

@shipilev
Copy link
Contributor

@shipilev shipilev commented Aug 30, 2021

This conflicts with #2496, which would use "release" when updating the forwardee. I remember trying this, and "relaxed" was not sufficient, with observable failures (IIRC, in tier1 with -XX:+UseShenandoahGC).

Loading

@shipilev
Copy link
Contributor

@shipilev shipilev commented Aug 30, 2021

Actually, let me think about it a bit more. Maybe if we do OrderAccess::release() in fwdptr installation (thus still relaxing it from conservative), we can use relaxed during concurrent heap updates.

Loading

@shipilev
Copy link
Contributor

@shipilev shipilev commented Aug 30, 2021

I believe this is fine, yet I am checking other places and running concurrency-heavy tests. The comment deserves to be updated as well, like this: https://cr.openjdk.java.net/~shade/8273127/update-comments.patch -- what do you think?

Loading

@weixlu
Copy link
Contributor Author

@weixlu weixlu commented Aug 31, 2021

@shipilev Thanks for your suggestions. Yes, using memory order release in forwardee installation is reasonable and brings visibly performance improvement. If forwardee installation is “release”, it is incorrect to use “relaxed” in updating reference, since updating reference is possible to reorder before object copy. To fix this, I agree to your suggestion that we can use OrderAccess::release (unbounded release) to replace Atomic::cmpxchg release (bounded release). Unbounded release is enough to ensure correctness in “relaxed” updating reference, but it may result in a little bit side effect in performance, compared to bounded release.
Besides, could we use memory_order_acq_rel in Atomic::cmpxchg of fwdptr installation? I guess this is also enough for “relaxed” updating reference, and it is more relax than memory_order_conservative by one membar on AArch64.
In short, we can

  1. use OrderAccess::release in forwardee installation and memory_order_relaxed in updating reference.
  2. use memory_order_acq_rel in forwardee installation and memory_order_relaxed in updating reference.

I plan to run specjbb on both cases and see which one gives more performance boost. Looking forward to your suggestions!

Loading

@shipilev
Copy link
Contributor

@shipilev shipilev commented Aug 31, 2021

I think forwardee installation is better done with acq_rel, given that two GC threads may try to install it, and the failing thread should consume/acquire the other forwardee. Not sure the OrderAccess::release provides the same semantics. I have updated #2496 with more discussion and implementation.

Loading

@weixlu
Copy link
Contributor Author

@weixlu weixlu commented Sep 2, 2021

@shipilev Here is the specjbb2015 result. opt1 is acq_rel, opt2 is OrderAccess::release

baseline_1:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 31247, max-jOPS = 30168, critical-jOPS = 20319
baseline_2:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 32419, max-jOPS = 29825, critical-jOPS = 20986
baseline_3:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 32419, max-jOPS = 30168, critical-jOPS = 21053
opt1_1:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 31780, max-jOPS = 29140, critical-jOPS = 16247
opt1_2:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 32419, max-jOPS = 29140, critical-jOPS = 18131
opt1_3:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 31780, max-jOPS = 29483, critical-jOPS = 15928
opt2_1:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 32788, max-jOPS = 30168, critical-jOPS = 22622
opt2_2:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 31780, max-jOPS = 29825, critical-jOPS = 22462
opt2_3:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 33186, max-jOPS = 30168, critical-jOPS = 22845

It seems OrderAccess::release performs better than acq_rel.
I notice that you have implemented acquire in getting forwardee in #2496, so I guess the acq here in acq_rel of forwardee installation is redundant? Given the specjbb result, I prefer to use OrderAccess::release here in forwardee installation. It only serves as storestore barrier on AArch64 and it doesn't provide the same semantics as you have mentioned above about consume/acquire the other forwardee.

Loading

@shipilev
Copy link
Contributor

@shipilev shipilev commented Sep 2, 2021

It seems OrderAccess::release performs better than acq_rel.

Interesting result, thanks. I think that's because on current ARM 8.2 acq_rel is effectively seq_cst, which is stronger than release. ARM 8.3 introduces the RCpc atomics that are weaker, but we are not there yet. Anyhow, in #2496, I tried to do the acquire conditionally now, leaving only release CAS to hopefully avoid this pitfall.

I also think that PR now subsumes this one: it does conditionally relaxed heap updates, while leaving "release" for self-healing paths that might run concurrently during evacuation. Would you mind testing #2496 instead? Maybe also #2496 + OrderAccess::release variant to see if we even need it now performance-wise? If you agree, I would suggest closing this PR and moving the rest of the work under #2496.

Loading

@weixlu
Copy link
Contributor Author

@weixlu weixlu commented Sep 3, 2021

@shipilev Sure, let's discuss under #2496 and I will try testing its performance. Thanks for your suggestions.

Loading

@weixlu weixlu closed this Sep 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants