Skip to content
This repository has been archived by the owner. It is now read-only.

8269897: Shenandoah: Treat UNKNOWN refs access as strong #219

Closed
wants to merge 4 commits into from

Conversation

rkennke
Copy link
Contributor

@rkennke rkennke commented Jul 6, 2021

We've observed test failures in jcstress, see:
https://bugs.openjdk.java.net/browse/JDK-8269897

We used to treat UNKNOWN reference accesses like weak accesses. UNKNOWN is used for Unsafe, reflection and JNI accesses, where it cannot be determined at compilation-time if we are accessing a regular field or a Reflection.referent field. The rationale for treating UNKNOWN as weak was that if the reference is a regular reference, then the value would be strongly reachable anyway, and only if it is a referent field would reachability matter. However, it turns out that this assumption is wrong: the test shows that a reference that is only weakly reachable can be legitimately written into a field, thus resurrecting the reference, and when that weakly reachable reference is loaded, it would be (wrongly) filtered as NULL.

A fix is to treat UNKNOWN accesses as strong. Accessing Reference.referent via reflection, JNI or Unsafe is Bad Idea anyway.
This test shows the problem with CAS, but I believe it affects all accesses via reflection, JNI, etc.

Testing:

  • the provided jcstress test
  • hotspot_gc_shenandoah
  • tier1

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8269897: Shenandoah: Treat UNKNOWN refs access as strong

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk17 pull/219/head:pull/219
$ git checkout pull/219

Update a local copy of the PR:
$ git checkout pull/219
$ git pull https://git.openjdk.java.net/jdk17 pull/219/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 219

View PR using the GUI difftool:
$ git pr show -t 219

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk17/pull/219.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Jul 6, 2021

👋 Welcome back rkennke! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Jul 6, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jul 6, 2021

@rkennke The following labels will be automatically applied to this pull request:

  • hotspot-gc
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-gc shenandoah labels Jul 6, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Jul 6, 2021

Webrevs

Copy link
Contributor

@shipilev shipilev left a comment

It looks okay to me, and it fixes the jcstress failures. But I see one of the tier1 tests failing:

$ make run-test TEST=gc/TestReferenceClearDuringReferenceProcessing.java TEST_VM_OPTS="-XX:+UseShenandoahGC"
...
TEST RESULT: Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: ref not enqueued

I wonder if that failure indicates that fix is still incomplete or too conservative.

@rkennke
Copy link
Contributor Author

@rkennke rkennke commented Jul 7, 2021

It looks okay to me, and it fixes the jcstress failures. But I see one of the tier1 tests failing:

$ make run-test TEST=gc/TestReferenceClearDuringReferenceProcessing.java TEST_VM_OPTS="-XX:+UseShenandoahGC"
...
TEST RESULT: Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: ref not enqueued

I wonder if that failure indicates that fix is still incomplete or too conservative.

Changing behaviour for oop_load_in_heap() seems to break it (although it is not quite obvious to me how - the test does not seem to access Reference.referent itself, especially not via reflection/unsafe/etc. Maybe something internal in Reference queue does, and expects it to return NULL on unreachable.)
Let's limit the change to treat UNKNOWN as STRONG only for CAE.

Copy link
Contributor

@shipilev shipilev left a comment

Need synopsis change after limiting a fix?
Need to handle xchg case too.

This patch passes tier1 with Shenandoah for me, but fails jcstress bundle when run with C1. A relevant ShenandoahBarrierSetC1 change is quite probably missing.

Also, probably Unsafe get and set require the same handling: under concurrent weak roots, it is safer to return overly strong value, not NULL for strong refs. Let it be users' problem accessing weak values with Unsafe?

@@ -640,7 +640,7 @@ Node* ShenandoahBarrierSetC2::atomic_cmpxchg_val_at_resolved(C2AtomicParseAccess
load_store = kit->gvn().transform(new DecodeNNode(load_store, load_store->get_ptr_type()));
}
#endif
load_store = kit->gvn().transform(new ShenandoahLoadReferenceBarrierNode(NULL, load_store, access.decorators()));
load_store = kit->gvn().transform(new ShenandoahLoadReferenceBarrierNode(NULL, load_store, access.decorators() & ~ON_UNKNOWN_OOP_REF));
Copy link
Contributor

@shipilev shipilev Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment:

    // Convert ON_UNKNOWN_OOP_REF to strong, so that we don't return NULL for weak values.
    // See ShenandoahBarrierSet::See oop_atomic_cmpxchg_not_in_heap for explanation.

@@ -267,7 +267,7 @@ inline oop ShenandoahBarrierSet::AccessBarrier<decorators, BarrierSetT>::oop_ato

// Note: We don't need a keep-alive-barrier here. We already enqueue any loaded reference for SATB anyway,
// because it must be the previous value.
res = ShenandoahBarrierSet::barrier_set()->load_reference_barrier<decorators, T>(res, NULL);
res = ShenandoahBarrierSet::barrier_set()->load_reference_barrier<decorators & ~ON_UNKNOWN_OOP_REF, T>(res, NULL);
Copy link
Contributor

@shipilev shipilev Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment:

  // Note: We don't need a keep-alive-barrier here. We already enqueue any loaded reference for SATB anyway,
  // because it must be the previous value. Convert ON_UNKNOWN_OOP_REF to strong, so that we don't return
  // NULL for weak values, breaking CAS contract. The downside is this method effectively resurrecting
  // weak values without GC knowing about it. If Unsafe/JNI users are using this method on weak values
  // (i.e. doing unsafe accesses to Reference.referent), they are on their own.

Copy link
Contributor

@shipilev shipilev Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also believe "xchg" needs the same fix and a comment:

  // Note: We don't need a keep-alive-barrier here. We already enqueue any loaded reference for SATB anyway,
  // because it must be the previous value. Convert ON_UNKNOWN_OOP_REF to strong, so that we don't return
  // NULL for weak values. ShenandoahBarrierSet::See oop_atomic_cmpxchg_not_in_heap for explanation.

@@ -640,7 +640,7 @@ Node* ShenandoahBarrierSetC2::atomic_cmpxchg_val_at_resolved(C2AtomicParseAccess
load_store = kit->gvn().transform(new DecodeNNode(load_store, load_store->get_ptr_type()));
}
#endif
load_store = kit->gvn().transform(new ShenandoahLoadReferenceBarrierNode(NULL, load_store, access.decorators()));
load_store = kit->gvn().transform(new ShenandoahLoadReferenceBarrierNode(NULL, load_store, access.decorators() & ~ON_UNKNOWN_OOP_REF));
Copy link
Contributor

@zhengyu123 zhengyu123 Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This load should go through slow path when GC is in progress, right? Should let slow path to resolve the strength? sounds there is possibility to resurrect dead oop.

@rkennke
Copy link
Contributor Author

@rkennke rkennke commented Jul 8, 2021

Fixing this in jdk18 first, should backport to jdk17 later, after some testing in 18. See: openjdk/jdk#4697

@rkennke rkennke closed this Jul 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotspot-gc rfr shenandoah
3 participants