Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8252990: Intrinsify Unsafe.storeStoreFence #6136

Closed
wants to merge 4 commits into from

Conversation

shipilev
Copy link
Contributor

@shipilev shipilev commented Oct 27, 2021

Unsafe.storeStoreFence currently delegates to stronger Unsafe.storeFence. We can teach compilers to map this directly to already existing rules that handle MemBarStoreStore. Like explicit LoadFence/StoreFence, we introduce the special node to differentiate explicit fence and implicit store-store barriers. storeStoreFence is usually used to simulate safe final-field like constructions in special JDK classes, like ConstantCallSite and friends.

Motivational performance difference on benchmarks from JDK-8276054 on ARM32 (Raspberry Pi 4):

Benchmark                      Mode  Cnt   Score    Error  Units
Multiple.plain                 avgt    3   2.669 ±  0.004  ns/op
Multiple.release               avgt    3  16.688 ±  0.057  ns/op
Multiple.storeStore            avgt    3  14.021 ±  0.144  ns/op // Better

MultipleWithLoads.plain        avgt    3   4.672 ±  0.053  ns/op
MultipleWithLoads.release      avgt    3  16.689 ±  0.044  ns/op
MultipleWithLoads.storeStore   avgt    3  14.012 ±  0.010  ns/op // Better

MultipleWithStores.plain       avgt    3  14.687 ±  0.009  ns/op
MultipleWithStores.release     avgt    3  45.393 ±  0.192  ns/op
MultipleWithStores.storeStore  avgt    3  38.048 ±  0.033  ns/op // Better

Publishing.plain               avgt    3  27.079 ±  0.201  ns/op
Publishing.release             avgt    3  27.088 ±  0.241  ns/op
Publishing.storeStore          avgt    3  27.009 ±  0.259  ns/op // Within error, hidden by allocation

Single.plain                   avgt    3   2.670 ± 0.002  ns/op
Single.releaseFence            avgt    3   6.675 ± 0.001  ns/op
Single.storeStoreFence         avgt    3   8.012 ± 0.027  ns/op  // Worse, seems to be ARM32 implementation artifact

The same thing on AArch64 (Raspberry Pi 3):

Benchmark                      Mode  Cnt   Score   Error  Units

Multiple.plain                 avgt    3   5.914 ± 0.115  ns/op
Multiple.release               avgt    3  10.149 ± 0.059  ns/op
Multiple.storeStore            avgt    3   6.757 ± 0.138  ns/op // Better

MultipleWithLoads.plain        avgt    3  11.849 ± 0.331  ns/op
MultipleWithLoads.release      avgt    3  35.565 ± 1.144  ns/op
MultipleWithLoads.storeStore   avgt    3  19.441 ± 0.471  ns/op // Better

MultipleWithStores.plain       avgt    3   5.920 ± 0.213  ns/op
MultipleWithStores.release     avgt    3  20.286 ± 0.347  ns/op
MultipleWithStores.storeStore  avgt    3  12.686 ± 0.230  ns/op // Better

Publishing.plain               avgt    3  22.261 ± 1.630  ns/op
Publishing.release             avgt    3  22.269 ± 0.576  ns/op
Publishing.storeStore          avgt    3  17.464 ± 0.397  ns/op // Better

Single.plain                   avgt    3   5.916 ± 0.063  ns/op
Single.release                 avgt    3  10.148 ± 0.401  ns/op
Single.storeStore              avgt    3   6.767 ± 0.164  ns/op // Better

As expected, this does not affect x86_64 at all, because both release and storeStore are effectively no-ops, only affecting compiler optimizations:

Benchmark                      Mode  Cnt  Score   Error  Units

Multiple.plain                 avgt    3  0.406 ± 0.002  ns/op
Multiple.release               avgt    3  0.409 ± 0.018  ns/op
Multiple.storeStore            avgt    3  0.406 ± 0.001  ns/op

MultipleWithLoads.plain        avgt    3  4.328 ± 0.006  ns/op
MultipleWithLoads.release      avgt    3  4.600 ± 0.014  ns/op
MultipleWithLoads.storeStore   avgt    3  4.602 ± 0.006  ns/op

MultipleWithStores.plain       avgt    3  0.812 ± 0.001  ns/op
MultipleWithStores.release     avgt    3  0.812 ± 0.002  ns/op
MultipleWithStores.storeStore  avgt    3  0.812 ± 0.002  ns/op

Publishing.plain               avgt    3  6.370 ± 0.059  ns/op
Publishing.release             avgt    3  6.358 ± 0.436  ns/op
Publishing.storeStore          avgt    3  6.367 ± 0.054  ns/op

Single.plain                   avgt    3  0.407 ± 0.039  ns/op
Single.releaseFence            avgt    3  0.406 ± 0.001  ns/op
Single.storeStoreFence         avgt    3  0.406 ± 0.001  ns/op

Additional testing:

  • Linux x86_64 fastdebug tier1
  • Linux AArch64 fastdebug tier1
  • Linux x86_64 Fences benchmark
  • Linux AArch64 Fences benchmark
  • Linux ARM32 Fences benchmark
  • Linux AArch64 jcstress quick run

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6136/head:pull/6136
$ git checkout pull/6136

Update a local copy of the PR:
$ git checkout pull/6136
$ git pull https://git.openjdk.java.net/jdk pull/6136/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6136

View PR using the GUI difftool:
$ git pr show -t 6136

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6136.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Oct 27, 2021

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Oct 27, 2021

@shipilev The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot core-libs labels Oct 27, 2021
@shipilev shipilev marked this pull request as ready for review Oct 27, 2021
@shipilev shipilev marked this pull request as draft Oct 27, 2021
@shipilev shipilev marked this pull request as ready for review Oct 27, 2021
@openjdk openjdk bot added the rfr label Oct 27, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 27, 2021

Webrevs

Copy link
Member

@dholmes-ora dholmes-ora left a comment

I'm certainly no JIT expert but the pattern for adding the new intrinsic seems consistent with the existing code.

Thanks,
David

src/java.base/share/classes/jdk/internal/misc/Unsafe.java Outdated Show resolved Hide resolved
@openjdk
Copy link

@openjdk openjdk bot commented Oct 28, 2021

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8252990: Intrinsify Unsafe.storeStoreFence

Reviewed-by: dholmes, thartmann, whuang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 99 new commits pushed to the master branch:

  • 92be9d8: 8276236: Table headers missing in Formatter api docs
  • 9bf3165: 8276164: RandomAccessFile#write method could throw IndexOutOfBoundsException that is not described in javadoc
  • 0488ebd: 8276105: C2: Conv(D|F)2(I|L)Nodes::Ideal should handle rounding correctly
  • acceffc: 8273704: DrawStringWithInfiniteXform.java failed : drawString with InfiniteXform transform takes long time
  • 2eafa03: 8276234: Trivially clean up locale-related code
  • 47e7a42: 8262945: [macos] Regression Manual Test for Key Events Fails
  • 99b7b95: 8276205: Shenandoah: CodeCache_lock should always be held for initializing code cache iteration
  • 9771544: 8260428: Drop support for pre JDK 1.4 DatagramSocketImpl implementations
  • e265f83: 8276107: Preventive collections trigger before maxing out heap
  • c8abe35: 8276121: G1: Remove unused and uninitialized _g1h in g1SATBMarkQueueSet.hpp
  • ... and 89 more: https://git.openjdk.java.net/jdk/compare/f6232982b91cb2314e96ddbde3984836a810a556...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Oct 28, 2021
Copy link
Member

@TobiHartmann TobiHartmann left a comment

That looks good to me.

Copy link

@Wanghuang-Huawei Wanghuang-Huawei left a comment

LGTM

@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Nov 1, 2021

Finally revived my quiet AArch64 dev board, added AArch64 results, which are even better than ARM32. Updated PR with perf results.

@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Nov 2, 2021

jcstress and tier1 passes on AArch64. Seems like we are good to go.

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Nov 2, 2021

Going to push as commit b7a06be.
Since your change was applied there have been 99 commits pushed to the master branch:

  • 92be9d8: 8276236: Table headers missing in Formatter api docs
  • 9bf3165: 8276164: RandomAccessFile#write method could throw IndexOutOfBoundsException that is not described in javadoc
  • 0488ebd: 8276105: C2: Conv(D|F)2(I|L)Nodes::Ideal should handle rounding correctly
  • acceffc: 8273704: DrawStringWithInfiniteXform.java failed : drawString with InfiniteXform transform takes long time
  • 2eafa03: 8276234: Trivially clean up locale-related code
  • 47e7a42: 8262945: [macos] Regression Manual Test for Key Events Fails
  • 99b7b95: 8276205: Shenandoah: CodeCache_lock should always be held for initializing code cache iteration
  • 9771544: 8260428: Drop support for pre JDK 1.4 DatagramSocketImpl implementations
  • e265f83: 8276107: Preventive collections trigger before maxing out heap
  • c8abe35: 8276121: G1: Remove unused and uninitialized _g1h in g1SATBMarkQueueSet.hpp
  • ... and 89 more: https://git.openjdk.java.net/jdk/compare/f6232982b91cb2314e96ddbde3984836a810a556...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Nov 2, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Nov 2, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Nov 2, 2021

@shipilev Pushed as commit b7a06be.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@shipilev shipilev deleted the JDK-8252990-storeStoreFence branch Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs hotspot integrated
4 participants