Skip to content
This repository has been archived by the owner on Sep 19, 2023. It is now read-only.

8288139: JavaThread touches oop after GC barrier is detached #21

Closed
wants to merge 10 commits into from
Closed

8288139: JavaThread touches oop after GC barrier is detached #21

wants to merge 10 commits into from

Conversation

dcubed-ojdk
Copy link
Member

@dcubed-ojdk dcubed-ojdk commented Jun 15, 2022

Update SharedRuntime::get_java_tid() to verify that the calling thread is safely
accessing its own threadObj(). This check uses the new is_gc_barrier_detached()
function added by JDK-8288497 add support for JavaThread::is_gc_barrier_detached().

The above check was used to reproduce the failure mode without Shenandoah
and the remainder of the fix relocates the offending code from
ThreadsSMRSupport::remove_thread() to Threads::remove(). The work of
removed the 'tid' entry from the ThreadIdTable is still done under the
protection of the Threads_lock.

This fix along with the fix for JDK-8288497 has been tested in Mach5 Tier[1-8].
There are no related failures in Mach5 Tier[1-7]; Mach5 Tier8 is still running.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8288139: JavaThread touches oop after GC barrier is detached

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk19 pull/21/head:pull/21
$ git checkout pull/21

Update a local copy of the PR:
$ git checkout pull/21
$ git pull https://git.openjdk.org/jdk19 pull/21/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21

View PR using the GUI difftool:
$ git pr show -t 21

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk19/pull/21.diff

@dcubed-ojdk dcubed-ojdk marked this pull request as ready for review June 15, 2022 16:07
@dcubed-ojdk
Copy link
Member Author

/label add hotspot-runtime

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 15, 2022

👋 Welcome back dcubed! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@dcubed-ojdk dcubed-ojdk changed the base branch from master to pr/20 June 15, 2022 16:23
@openjdk openjdk bot added rfr Pull request is ready for review hotspot-runtime hotspot-runtime-dev@openjdk.org labels Jun 15, 2022
@openjdk
Copy link

openjdk bot commented Jun 15, 2022

@dcubed-ojdk
The hotspot-runtime label was successfully added.

@mlbridge
Copy link

mlbridge bot commented Jun 15, 2022

Webrevs

@dcubed-ojdk
Copy link
Member Author

dcubed-ojdk commented Jun 15, 2022

@dholmes-ora, @pchilano and @robehn - Pinging you guys since we were all
involved with JDK-8286830 ~HandshakeState should not touch oops

@fisk - Pinging you since you are interested in GC barriers.

Copy link

@pchilano pchilano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I only have one comment but made it in 8288497 because it applies more to that change.

Thanks,
Patricio

@dcubed-ojdk
Copy link
Member Author

@pchilano - Thanks for the review!

@@ -996,11 +996,12 @@ JRT_ENTRY_NO_ASYNC(void, SharedRuntime::register_finalizer(JavaThread* current,
JRT_END

jlong SharedRuntime::get_java_tid(Thread* thread) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional RFE: can we not declare this to take JavaThread and so avoid the checks and casts below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can research that and file a follow up RFE if necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New RFE:
JDK-8289004 investigate if SharedRuntime::get_java_tid parameter should be a JavaThread*

Comment on lines +3597 to +3603
if (ThreadIdTable::is_initialized()) {
// This cleanup must be done before the current thread's GC barrier
// is detached since we need to touch the threadObj oop.
jlong tid = SharedRuntime::get_java_tid(p);
ThreadIdTable::remove_thread(tid);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like exposing ThreadSMR internal details like this. Can we at least make this a little ThreadSMRSupport method that we call from here.

Or ... can we just move ThreadsSMRSupport::remove_thread(p); to before on_thread_detach?

Or can we grab the pid before on_thread_detach and pass it to ThreadsSMRSupport::remove_thread?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ummmm... The code that I moved is not a TheadSMR internal detail. That code
is part of the M&M support that was added by Daniil in:

JDK-8185005 Improve performance of ThreadMXBean.getThreadInfo(long ids[], int maxDepth)
https://bugs.openjdk.org/browse/JDK-8185005

I don't remember why Daniil put that code in ThreadsSMRSupport::remove_thread(p),
but it is just as appropriate to put it in Threads::remove(). That M&M cleanup code
is part of the cleanup necessary when a JavaThread is removed. The important part
is to call that code after the Threads_lock is grabbed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dan, I had forgotten about that and just assumed it was ThreadSMR related. Moving it as you did is now not a concern.

Comment on lines 1001 to 1003
guarantee(current != thread || !JavaThread::cast(thread)->is_gc_barrier_detached(),
"current cannot touch oops after its GC barrier is detached.");
oop obj = JavaThread::cast(thread)->threadObj();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the oop-touching-safety check should be done in threadObj() itself so that all callers are protected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The placement in SharedRuntime::get_java_tid() is because that is the failure
mode that this bug is about. My next stress testing experiment will be to put
a similar guarantee() in threadObj(), but I expect there to be more fan-out
from that placement.

This fix is tightly focued on the bug as it is reported because I plan to fix
it in JDK19.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah ... hmmm ... threadObj is still the better place for any safety check ... arguably it should be even deeper in OopHandle.

I'm not sure what fanout you are concerned about. If we find there are other invalid oop accesses then we would want to fix them in 19 - no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to fix this bug in JDK19 since it is a blocker for @zhengyu123. I want to
keep this fix small and focused on the exact crash associated with the bug.

Yes, threadObj() or deeper is a better place. No argument, but I plan to do that
as a new hunt and I expect there to be many problems and I don't want to hold
up this fix for additional hunting. If I find more issues, then we'll determine how
to get them fixed in JDK19.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New bug:
JDK-8289091 move oop safety check from SharedRuntime::get_java_tid() to JavaThread::threadObj()

@dcubed-ojdk
Copy link
Member Author

@dholmes-ora - Thanks for the review! I think I've replied to everything.
Please let me know if I missed something...

@robehn
Copy link
Contributor

robehn commented Jun 16, 2022

Hi

After we have done:

if (JvmtiEnv::environments_might_exist()) {
    JvmtiExport::cleanup_thread(this);
}

I don't think there is any valid use case of ThreadIdTable.
So can't we just remove the thread from the table here instead:

I.e.

if (JvmtiEnv::environments_might_exist()) {
    JvmtiExport::cleanup_thread(this);
}

if (ThreadIdTable::is_initialized()) {
    jlong tid = SharedRuntime::get_java_tid(thread);
    ThreadIdTable::remove_thread(tid);
}

?

@dcubed-ojdk
Copy link
Member Author

The ThreadIdTable::remove_thread() call has to be done under the protection
of the Threads_lock if I remember correctly and the Threads_lock is not held
at that point in the code. There's some race condition with the other code
where I updated the comments.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dan.

@robehn
Copy link
Contributor

robehn commented Jun 17, 2022

Ok, seems fine, thanks.

Copy link

@fisk fisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@dcubed-ojdk
Copy link
Member Author

@dholmes-ora, @robehn and @fisk - Thanks for the re-review.
I'm doing another rename in JDK-8288497 so that will ripple into here...

Copy link

@pchilano pchilano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still good.

@dcubed-ojdk
Copy link
Member Author

@pchilano - Thanks for the re-review!

@dcubed-ojdk
Copy link
Member Author

The latest version's Mach5 Tier[1-7] looks good. Mach5 Tier8 was just started.

@dcubed-ojdk
Copy link
Member Author

The Mach5 Tier8 run had no failures. All testing looks great.

@openjdk
Copy link

openjdk bot commented Jun 21, 2022

⚠️ @dcubed-ojdk This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@openjdk-notifier openjdk-notifier bot changed the base branch from pr/20 to master June 21, 2022 16:16
@openjdk-notifier
Copy link

The dependent pull request has now been integrated, and the target branch of this pull request has been updated. This means that changes from the dependent pull request can start to show up as belonging to this pull request, which may be confusing for reviewers. To remedy this situation, simply merge the latest changes from the new target branch into this pull request by running commands similar to these in the local repository for your personal fork:

git checkout JDK-8288139
git fetch https://git.openjdk.org/jdk19 master
git merge FETCH_HEAD
# if there are conflicts, follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk
Copy link

openjdk bot commented Jun 21, 2022

@dcubed-ojdk This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8288139: JavaThread touches oop after GC barrier is detached

Reviewed-by: pchilanomate, dholmes, rehn, eosterlund

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 21, 2022
@dcubed-ojdk
Copy link
Member Author

/integrate

@dcubed-ojdk
Copy link
Member Author

@dholmes-ora - Thanks for the re-review!

@openjdk
Copy link

openjdk bot commented Jun 21, 2022

Going to push as commit a144988.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 21, 2022
@openjdk openjdk bot closed this Jun 21, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 21, 2022
@openjdk
Copy link

openjdk bot commented Jun 21, 2022

@dcubed-ojdk Pushed as commit a144988.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@dcubed-ojdk dcubed-ojdk deleted the JDK-8288139 branch June 21, 2022 20:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
5 participants