Skip to content

Conversation

shipilev
Copy link
Member

@shipilev shipilev commented Jul 11, 2024

JDK-8240696 added the native method for Reference.clear. The original patch skipped intrinsification of this method, because we thought Reference.clear is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. ThreadLocal cleanups. See the bug for an example profile with RRWL benchmarks.

We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for refersTo0 intrinsics. C2 Access also need a support for AS_NO_KEEPALIVE for stores.

Additional testing:

  • Linux x86_64 server fastdebug, all
  • Linux AArch64 server fastdebug, all

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8329597: C2: Intrinsify Reference.clear (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139
$ git checkout pull/20139

Update a local copy of the PR:
$ git checkout pull/20139
$ git pull https://git.openjdk.org/jdk.git pull/20139/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20139

View PR using the GUI difftool:
$ git pr show -t 20139

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20139.diff

Webrev

Link to Webrev Comment

@shipilev
Copy link
Member Author

shipilev commented Jul 11, 2024

On Mac AArch64, which suffers from both native call and WX transition:

Benchmark                   Mode  Cnt  Score   Error  Units

# Intrinsic OFF
ReferenceClear.phantom      avgt    9  52,297 ? 0,294  ns/op
ReferenceClear.phantom_new  avgt    9  57,075 ? 0,296  ns/op
ReferenceClear.soft         avgt    9  52,567 ? 0,393  ns/op
ReferenceClear.soft_new     avgt    9  57,640 ? 0,264  ns/op
ReferenceClear.weak         avgt    9  53,018 ? 1,285  ns/op
ReferenceClear.weak_new     avgt    9  57,227 ? 0,483  ns/op

# Intrinsic ON (default)
ReferenceClear.phantom      avgt    9   0,780 ? 0,017  ns/op
ReferenceClear.soft         avgt    9   0,784 ? 0,022  ns/op
ReferenceClear.weak         avgt    9   0,793 ? 0,033  ns/op
ReferenceClear.phantom_new  avgt    9   3,018 ? 0,015  ns/op
ReferenceClear.soft_new     avgt    9   3,268 ? 0,014  ns/op
ReferenceClear.weak_new     avgt    9   3,004 ? 0,057  ns/op

On x86_64 m7a.16xlarge, which only suffers from the native call:

Benchmark                   Mode  Cnt  Score   Error  Units

# Intrinsic OFF
ReferenceClear.phantom      avgt    9  14.643 ± 0.049  ns/op
ReferenceClear.soft         avgt    9  14.939 ± 0.438  ns/op
ReferenceClear.weak         avgt    9  14.648 ± 0.081  ns/op
ReferenceClear.phantom_new  avgt    9  19.859 ± 2.405  ns/op
ReferenceClear.soft_new     avgt    9  20.208 ± 1.805  ns/op
ReferenceClear.weak_new     avgt    9  20.385 ± 2.570  ns/op

# Intrinsic ON (default)
ReferenceClear.phantom      avgt    9   0.821 ± 0.010  ns/op
ReferenceClear.soft         avgt    9   0.817 ± 0.007  ns/op
ReferenceClear.weak         avgt    9   0.819 ± 0.010  ns/op
ReferenceClear.phantom_new  avgt    9   4.195 ± 0.729  ns/op
ReferenceClear.soft_new     avgt    9   4.315 ± 0.599  ns/op
ReferenceClear.weak_new     avgt    9   3.986 ± 0.596  ns/op

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 11, 2024

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jul 11, 2024

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8329597: C2: Intrinsify Reference.clear

Reviewed-by: rcastanedalo, eosterlund, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 103 new commits pushed to the master branch:

  • ebc17c7: 8339637: (tz) Update Timezone Data to 2024b
  • e7cf25c: 8340801: Disable ubsan checks in some awt/2d coding
  • 577babf: 8334010: VM crashes with ObjectAlignmentInBytes > GCCardSizeInBytes
  • b9b0bd0: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing
  • 724de68: 8342081: Shenandoah: Remove extra ShenandoahMarkUpdateRefsSuperClosure
  • e4ff553: 8341931: os_linux gtest uses lambdas with explicit capture lists
  • e94e3bb: 8324672: Update jdk/java/time/tck/java/time/TCKInstant.java now() to be more robust
  • 6d7e679: 8340790: Open source several AWT Dialog tests - Batch 4
  • 86ce19e: 8341142: Maintain a single source file for sun.net.www.protocol.jar.JarFileFactory
  • b9cabbe: 8341997: Tests create files in src tree instead of scratch dir
  • ... and 93 more: https://git.openjdk.org/jdk/compare/580eb62dc097efeb51c76b095c1404106859b673...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Jul 11, 2024

@shipilev The following labels will be automatically applied to this pull request:

  • core-libs
  • graal
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Jul 11, 2024
@shipilev shipilev marked this pull request as ready for review July 11, 2024 19:14
@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 11, 2024
@mlbridge
Copy link

mlbridge bot commented Jul 11, 2024

@fisk
Copy link
Contributor

fisk commented Jul 12, 2024

The reason we did not do this before is that this is not a strong reference store. Strong reference stores with a SATB collector will keep the referent alive, which is typically the exact opposite of what a user wants when they clear a Reference.

@shipilev
Copy link
Member Author

The reason we did not do this before is that this is not a strong reference store. Strong reference stores with a SATB collector will keep the referent alive, which is typically the exact opposite of what a user wants when they clear a Reference.

You mean not doing this store just on the Java side? Yes, I agree, it would be awkward. In intrinsic, we are storing with the same decorators that JVM_ReferenceClear is using, which should be good with SATB collectors. Perhaps I am misunderstanding the comment.

@fisk
Copy link
Contributor

fisk commented Jul 12, 2024

The reason we did not do this before is that this is not a strong reference store. Strong reference stores with a SATB collector will keep the referent alive, which is typically the exact opposite of what a user wants when they clear a Reference.

You mean not doing this store just on the Java side? Yes, I agree, it would be awkward. In intrinsic, we are storing with the same decorators that JVM_ReferenceClear is using, which should be good with SATB collectors. Perhaps I am misunderstanding the comment.

The runtime use of the Access API knows how to resolve an unknown oop ref strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. However, we do not have support for that in the C2 backend. In fact, it does not understand non-strong oop stores at all. Because there hasn't really been a use case for it, other than clearing a Reference. That's the precise reason why we do not have a clear intrinsic; it would have to add that infrastructure.

@shipilev
Copy link
Member Author

shipilev commented Jul 12, 2024

The runtime use of the Access API knows how to resolve an unknown oop ref strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. However, we do not have support for that in the C2 backend. In fact, it does not understand non-strong oop stores at all.

Aw, nice usability landmine. I thought C2 barrier set would assert on me if it cannot deliver. Apparently not, I see it just does pre-barriers when it is not sure what strongness the store is. Hrmpf. OK, let me see what can be done here. It might be just easier to further specialize Reference.clear in subclasses and carry down the actual strongness, like we do with refersTo0 currently. This would still require C2 backend adjustments to handle AS_NO_KEEPALIVE on stores, but at least we would not have to guess about the strongness type in C2 intrinsic.

@shipilev shipilev marked this pull request as draft July 12, 2024 13:32
@openjdk openjdk bot removed the rfr Pull request is ready for review label Jul 12, 2024
@kimbarrett
Copy link

kimbarrett commented Jul 15, 2024

The runtime use of the Access API knows how to resolve an unknown oop ref strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. However, we do not have support for that in the C2 backend. In fact, it does not understand non-strong oop stores at all.

Aw, nice usability landmine. I thought C2 barrier set would assert on me if it cannot deliver. Apparently not, [...]

Reference.refersTo has similar issues. See refersToImpl and refersTo0 in both Reference and PhantomReference.
I think you should be able to model on those and the intrinsic implementation for refersTo to get what you want.

One additional complication is that Reference.enqueue intentionally calls clear0. If implementing clear similarly
to refersTo, then enqueue should be changed to call clearImpl.

@kimbarrett
Copy link

Aw, nice usability landmine. I thought C2 barrier set would assert on me if it cannot deliver. Apparently not, [...]

Reference.refersTo has similar issues. See refersToImpl and refersTo0 in both Reference and PhantomReference. I think you should be able to model on those and the intrinsic implementation for refersTo to get what you want.

One additional complication is that Reference.enqueue intentionally calls clear0. If implementing clear similarly to refersTo, then enqueue should be changed to call clearImpl.

I should have read what I was replying to more carefully, rather than focusing on what was further up in the thread.
Looks like you (@shipilev) already spotted the refersTo stuff. But the enqueue => clear0 could have easily been missed,
so perhaps not an entirely unneeded suggestion.

@shipilev shipilev force-pushed the JDK-8329597-intrinsify-reference-clear branch from fecf4af to ba820da Compare July 16, 2024 18:50
@shipilev
Copy link
Member Author

I should have read what I was replying to more carefully, rather than focusing on what was further up in the thread. Looks like you (@shipilev) already spotted the refersTo stuff. But the enqueue => clear0 could have easily been missed, so perhaps not an entirely unneeded suggestion.

Yeah, thanks. The enqueue => clear0 was indeed easy to miss.

Pushed the crude prototype that follows refersTo example and drills some new AS_NO_KEEPALIVE holes in C2 Access API to cover this intrinsic case. Super untested. IR tests are still failing, I'll take more in-depth look there. (Perhaps it would not be possible to clearly match the absence of pre-barrier in IR tests, we'll see.)

@shipilev
Copy link
Member Author

Split out the refersTo test to #20215.

@shipilev
Copy link
Member Author

shipilev commented Jul 17, 2024

Yeah, so this version seems to work well on tests.

I am being extra paranoid about only accepting null stores, since AS_NO_KEEPALIVE means all other barriers like inter-generational post-barriers in G1 should still work. G1 barrier set delegates the stores to CardTable/ModRefBarrierSet, which: a) does not know which barriers can be bypassed by AS_NO_KEEPALIVE; b) calls back G1BarrierSet for prebarrier generation, but already loses the decorators. So the simplest way to deal with this is to handle this special case specially.

I think this is sanely insane, given how sharp-edged AS_NO_KEEPALIVE is.

@shipilev shipilev marked this pull request as ready for review July 17, 2024 18:46
@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 17, 2024
@shipilev
Copy link
Member Author

shipilev commented Oct 7, 2024

Tests pass with the new change. I eyeballed G1 perfasm output on new benchmark, and there are no barriers in sight as well.

Copy link
Contributor

@fisk fisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing...

Copy link
Contributor

@fisk fisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 13, 2024
@shipilev
Copy link
Member Author

Thanks! I see Kim reviewed JDK parts, so we need another Reviewer for Hotspot parts.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C2 change (intrinsics code) is fine.

@shipilev
Copy link
Member Author

Thanks for review, folks. I am re-running testing locally here. Would appreciate if you can give this patch a spin through your CIs as well.

@robcasloz
Copy link
Contributor

Thanks for review, folks. I am re-running testing locally here. Would appreciate if you can give this patch a spin through your CIs as well.

I will run some internal CI testing and report back in one or two days.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Oct 15, 2024
@robcasloz
Copy link
Contributor

I will run some internal CI testing and report back in one or two days.

The test results look good. I tested the changes (up to commit 9f7ad7a) on top of jdk-24+19 running tier1-tier5 on all Oracle-supported platforms.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 16, 2024
@shipilev
Copy link
Member Author

Thank you for testing! Here goes.

/integrate

@openjdk
Copy link

openjdk bot commented Oct 16, 2024

Going to push as commit 7625b29.
Since your change was applied there have been 115 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 16, 2024
@openjdk openjdk bot closed this Oct 16, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 16, 2024
@openjdk
Copy link

openjdk bot commented Oct 16, 2024

@shipilev Pushed as commit 7625b29.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

7 participants