Skip to content
This repository has been archived by the owner. It is now read-only.

8273107: RunThese24H times out with "java.lang.management.ThreadInfo.getLockName()" is null #25

Closed
wants to merge 4 commits into from

Conversation

dcubed-ojdk
Copy link
Member

@dcubed-ojdk dcubed-ojdk commented Dec 14, 2021

RunThese24H sometimes times out with a couple of error msgs:

  • "java.lang.management.ThreadInfo.getLockName()" is null
  • ForkJoin common pool thread stuck

The '"java.lang.management.ThreadInfo.getLockName()" is null' error msg was
due to RunThese's use of an older JCK test suite which has since been fixed.

The 'ForkJoin common pool thread stuck' failure mode is likely due to a thread
spending a lot of time in ObjectSynchronizer::monitors_iterate() due to a
VM_ThreadDump::doit() call. I say "likely" because I've never been able to
reproduce this failure mode in testing outside of Mach5. With the Mach5
sightings, all we have are thread dumps and core files and not a live process.

The VM_ThreadDump::doit() call is trying to gather owned monitor information
for all threads in the system. I've seen sightings of this failure mode with > 2000
threads. I've also seen passing runs with > 1.7 million monitors on the in-use list.
Imagine searching a larger in-use list for > 2000 threads. It just doesn't scale.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8273107: RunThese24H times out with "java.lang.management.ThreadInfo.getLockName()" is null

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk18 pull/25/head:pull/25
$ git checkout pull/25

Update a local copy of the PR:
$ git checkout pull/25
$ git pull https://git.openjdk.java.net/jdk18 pull/25/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25

View PR using the GUI difftool:
$ git pr show -t 25

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk18/pull/25.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Dec 14, 2021

👋 Welcome back dcubed! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Dec 14, 2021

@dcubed-ojdk The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime label Dec 14, 2021
@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 14, 2021

I have logging in place for all of the RunThese configs to gather a little
bit of 'monitorinflation' info. So far a new sighting has not happened,
but I'm hopeful that the stars will align sometime soon.

I have learned thru my own test jobs that a RunThese24H job doesn't
always include calls to VM_ThreadDump::doit() so it seems that there's
yet another factor in these RunThese24H failures.

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 14, 2021

The proposed solution in this patch is to call deflate_idle_monitors() from
VM_ThreadDump::doit() with a new and optional ObjectMonitorsHashtable
parameter. As deflate_idle_monitors() and deflate_monitor_list() do their
work, owned ObjectMonitors are added to the ObjectMonitorsHashtable
indexed by the owning JavaThread. After the deflate_idle_monitors() call
is finished VM_ThreadDump::doit() can find and return information about
the owned ObjectMonitors for each JavaThread without doing a linear walk
of the in-use monitor list for each JavaThread. A JavaThread* is used to
lookup a LinkedList in the ObjectMonitorsHashtable and if one exists, then
the JavaThread* owns all the ObjectMonitors on that LinkedList.

Also, the usual work done by deflate_idle_monitors() and deflate_monitor_list()
will serve to reduce the length of the in-use monitor list which makes the
in-use monitor list more efficient for other parts of the system.

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 14, 2021

This fix has been tested with Mach5 Tier[1-8]. I've also done some targeted
memory leak testing since ObjectMonitorsHashtable is new (see the bug
report for details). I've rebased the fix to the latest JDK18 repo as of 2021.12.14
mid-day and I'm doing some sanity checking with Mach5 Tier[1-3].

It would be good to hear from @dholmes-ora, @coleenp, @fisk and @robehn
since they were my reviewers on:

JDK-8253064 monitor list simplifications and getting rid of TSM
https://bugs.openjdk.java.net/browse/JDK-8253064

Update: Mach5 Tier[1-3] testing looks good.

@coleenp
Copy link
Contributor

@coleenp coleenp commented Dec 14, 2021

This looks good to me. You solved both the problem of the very long monitor list and the per-thread iteration too. I like it!

@dcubed-ojdk dcubed-ojdk marked this pull request as ready for review Dec 15, 2021
@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 15, 2021

@coleenp - Thanks for the draft review! It's now officially ready...

@openjdk openjdk bot added the rfr label Dec 15, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Dec 15, 2021

Webrevs

@@ -57,6 +57,44 @@
#include "utilities/events.hpp"
#include "utilities/preserveException.hpp"

class CleanupObjectMonitorsHashtable: StackObj {
public:
bool do_entry(JavaThread*& key, ObjectMonitorsHashtable::PtrList*& list) {
Copy link
Member

@dholmes-ora dholmes-ora Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this being called by anything ???

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See L71 below: _ptrs->unlink(&cleanup); // cleanup the LinkedLists
That passes the cleanup object to ResourceHashtable::unlink()
function which calls the do_entry() function in the cleanup object.

Copy link
Member

@dholmes-ora dholmes-ora Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see (well I don't really see as this is very obscure). I would have expected CleanupObjectMonitorsHashtable to be a subtype of something that I could then see is used by the ResourceHashtable. I'm going to guess that there is some template usage here that ensures the passed in object has the right API even if there is no subtyping relationship. Hard to see these connections at a distance.

To be clear not a review as I don't understand our Hashtable's enough to make much sense of what is actually happening here.

Copy link
Contributor

@coleenp coleenp Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a nice place for a lambda (on the list of future RFEs). You could also put this class in scope of the function that calls it which might help. (optional)

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResourceHashtable::unlink() is definitely one of those template
things that I don't really like, but I have to use it... All that stuff
is in src/hotspot/share/utilities/resourceHash.hpp, but I don't
see any precedent for putting the CleanupObjectMonitorsHashtable
definition in there. I think I'll leave it where it is.

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 15, 2021

@dholmes-ora - Thanks for reviewing!

robehn
robehn approved these changes Dec 15, 2021
@@ -279,6 +279,18 @@ void VM_ThreadDump::doit() {
concurrent_locks.dump_at_safepoint();
}

ObjectMonitorsHashtable table;
ObjectMonitorsHashtable* tablep = nullptr;
Copy link

@robehn robehn Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you can remove this pointer.
ThreadStackTrace::dump_stack_at_safepoint also looks at "with_locked_monitors", if false it ignores the table.
So there should not be any need to pass in null.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now ThreadStackTrace::dump_stack_at_safepoint() is flexible in
that it does not require ObjectMonitorsHashtable* table to be non-null
when _with_locked_monitors is true.

From this call site, when _with_locked_monitors is true we always pass
a non-nullptr ObjectMonitorsHashtable* table and we do that because
we're at a safepoint and likely have to deal with a lot of threads if not all
the threads in the system.

For other ThreadStackTrace::dump_stack_at_safepoint() call sites,
I want things to be more flexible. Mostly because I haven't done
detailed analysis of them to justify changing their behavior.

// Get the ObjectMonitors locked by this thread (if any):
ObjectMonitorsHashtable::PtrList* list = table->get_entry(_thread);
if (list != nullptr) {
ObjectSynchronizer::monitors_iterate(&imc, list, _thread);
Copy link

@robehn robehn Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The InflatedMonitorsClosure walks the stack until object is found with this method:
ThreadStackTrace::is_owned_monitor_on_stack(oop object)
for every object...

If you instead just collect all locks on stack with one pass you don't have to walk the stack over and over, which should be major speedup.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like an interesting RFE, but I'd prefer that be investigated
separately from this bug.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification: ThreadStackTrace::is_owned_monitor_on_stack() is not actually
walking the (JavaThread's) stack, but it is walking the collection of
StackFrameInfo objects that was gathered earlier in
ThreadStackTrace::dump_stack_at_safepoint(). Adding a gather-locked-monitor
helper object to the "add_stack_frame()" call would provide an easy hook and
then InflatedMonitorsClosure could use the new helper object. Something like
that anyway...

Copy link

@robehn robehn Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, we loop the stack frame info for every object. I.e. a loop in a loop which is bad for time-complexity.
If the we created a sorted list from the frame info, we can binary search it instead.
Since a thread often only have at max a few locks, but could have hundreds of frames, just to be able skip looping over all frames without locks would be very beneficial.

@openjdk
Copy link

@openjdk openjdk bot commented Dec 15, 2021

@dcubed-ojdk This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8273107: RunThese24H times out with "java.lang.management.ThreadInfo.getLockName()" is null

Reviewed-by: rehn, coleenp, eosterlund

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 17 new commits pushed to the master branch:

  • 8fbe172: 8278604: SwingSet2 table demo does not have accessible description set for images
  • b46f0b0: 8277447: Hotspot C1 compiler crashes on Kotlin suspend fun with loop
  • 9cd7090: 8278587: StringTokenizer(String, String, boolean) documentation bug
  • fffa73c: 8269425: 2 jdk/jfr/api/consumer/streaming tests failed to attach
  • b9a477b: 8275638: GraphKit::combine_exception_states fails with "matching stack sizes" assert
  • bb7efb3: 8278790: Inner loop of long loop nest runs for too few iterations
  • 8494fec: 8278796: Incorrect behavior of FloatVector.withLane on X86
  • f5d7c77: 8276826: Clarify the ModuleDescriptor.Version specification’s treatment of repeated punctuation characters
  • be6b90d: 8278574: update --help-extra message to include default value of --finalization option
  • aec1b03: 8278389: SuspendibleThreadSet::_suspend_all should be volatile/atomic
  • ... and 7 more: https://git.openjdk.java.net/jdk18/compare/475ec8e6c5abc3431344d69bd46395e8c4b46e4c...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Dec 15, 2021
Copy link
Contributor

@coleenp coleenp left a comment

This looks good.

@@ -57,6 +57,44 @@
#include "utilities/events.hpp"
#include "utilities/preserveException.hpp"

class CleanupObjectMonitorsHashtable: StackObj {
public:
bool do_entry(JavaThread*& key, ObjectMonitorsHashtable::PtrList*& list) {
Copy link
Contributor

@coleenp coleenp Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a nice place for a lambda (on the list of future RFEs). You could also put this class in scope of the function that calls it which might help. (optional)

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 15, 2021

@coleenp - Thanks for chiming in on the official review.

@robehn - Thanks for reviewing!

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 15, 2021

Mach5 Tier[4567] testing looks good.
Mach5 Tier8 is still running and won't finish for at least another day or two.

Update: Mach5 Tier8 testing for v0 finished and looks good.

Copy link

@fisk fisk left a comment

Looks okay, but I found a potential issue.

ObjectMonitorsHashtable::PtrList* list = get_entry(jt);
if (list != nullptr) {
// Add to existing list from the hash table:
list->add(om);
Copy link

@fisk fisk Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines are executed in both the if and else clause. Consider putting it after instead.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at doing that. Thanks for the suggestion.

ObjectMonitorsHashtable::PtrList* list,
JavaThread* thread) {
typedef LinkedListIterator<ObjectMonitor*> ObjectMonitorIterator;
ObjectMonitorIterator iter = ObjectMonitorIterator(list->head());
Copy link

@fisk fisk Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to create an object and then copy it to iter. I think the more obvious syntax is ObjectMonitorIterator iter(list->head());

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I'll fix that.

// unowned ObjectMonitors that cannot be deflated because of a waiter. Since
// deflate_idle_monitors() and deflate_monitor_list() can be called more than
// once, we have to make sure the entry has not already been added.
JavaThread* jt = (JavaThread*)mid->owner();
Copy link

@fisk fisk Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm. I'm not sure this is correct. The owned of an ObjectMonitor is a JavaThread most of the time, but not always. I think that if the owner started off with a stack lock, and another thread inflated the lock due to contention, then the ObjectMonitor will get the stack lock as the owner, as the random other thread is too lazy to look up which thread stack the stack lock belongs to.

Because of that, I think we might miss locks that really are owned by a thread, as the thread key now will not map to such locks. Some kind of ownership lookup probably needs to happen here.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch on an oddity of ObjectMonitor ownership!

The baseline code has the same "issue":

void ObjectSynchronizer::monitors_iterate(MonitorClosure* closure, JavaThread* thread) {

<snip>

    ObjectMonitor* mid = iter.next();
    if (mid->owner() != thread) {
      continue;
    }

Please notice that I said "issue" and not "bug". This isn't a bug and you really
won't like the reason that it's not a bug. I know that I don't like it...

Because of this ownership check monitors_iterate() won't apply the closure
(InflatedMonitorsClosure in this case) to an ObjectMonitor that is marked as
owned by a stack lock address because it will never match the target JavaThread*.

Here's the closure:

// Iterate through monitor cache to find JNI locked monitors
class InflatedMonitorsClosure: public MonitorClosure {
private:
  ThreadStackTrace* _stack_trace;
public:
  InflatedMonitorsClosure(ThreadStackTrace* st) {
    _stack_trace = st;
  }
  void do_monitor(ObjectMonitor* mid) {
    oop object = mid->object();
    if (!_stack_trace->is_owned_monitor_on_stack(object)) {
      _stack_trace->add_jni_locked_monitor(object);
    }
  }
};

This particular monitors_iterate() call uses InflatedMonitorsClosure to
gather just the JNI locked monitors owned by the target JavaThread*.
A JNI locked monitor will never have a stack lock address as its owner
marking. It will always have a JavaThread*. So the baseline code isn't
"broken" because it will find the JNI locked monitors just fine. Similarly
the new code isn't "broken" because it will also find the JNI locked
monitors just fine.

I definitely need to clarify the baseline comments and the new code
comments to make this subtlety much more clear.

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 16, 2021

@fisk - Thanks for the review!

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 16, 2021

I made changes to address @fisk's review. I've also included comment
updates to make clear the limitations of monitors_iterate(). I need a
re-review from @fisk and a re-review from @coleenp would also be
a nice thing to have. Thanks, in advance, for the re-review(s)...

Tested this latest patch with Mach5 Tier1. Mach5 Tier[23] are running now.

Update: Mach5 Tier[1-5] look good.

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 17, 2021

The ObjectMonitor::_owner field can contain a JavaThread*, the address of
a BasicLock on a JavaThread's stack, nullptr or DEFLATER_MARKER so the
_owner field is of type void*.

@coleenp pointed out off-line that I had a bit of type confusion in my new
ObjectMonitorsHashtable. Instead of indexing the Hashtable with a
JavaThread*, I should be indexing the Hashtable with a void* to reflect
the fact that my _owner field can be a JavaThread* or the address of a
BasicLock on the JavaThread's stack.

My most recent push to this PR cleans up the type confusion. This update
is tested with Mach5 Tier1 as a sanity check.

Update: Also ran Mach5 Tier6 and Tier7 on this latest patch and those results
look good. Tier8 is running now.

Update: Tier8 results also look good.

@fisk
Copy link

@fisk fisk commented Dec 17, 2021

I made changes to address @fisk's review. I've also included comment

updates to make clear the limitations of monitors_iterate(). I need a

re-review from @fisk and a re-review from @coleenp would also be

a nice thing to have. Thanks, in advance, for the re-review(s)...

Tested this latest patch with Mach5 Tier1. Mach5 Tier[23] are running now.

Update: Mach5 Tier[1-5] look good.

When the owner field of an ObjectMonitor is the address of a stack lock instead of a thread, it doesn't mean that it is a stack lock. It means that it was a stack lock when it was acquired by that thread, but has since then been upgraded to a full ObjectMonitor (by another thread), due to contention. But the fact that it was a stack lock is irrelevant in this context; we care about what it is now. And it is an inflated lock that just happens to have a slightly obscured owner field, as the owner thread was not easily available when it was inflated.

In other words, I still believe this is a bug. Such a monitor that gets upgraded from a stack lock to a full ObjectMonitor, can - unline actual stack locks - be involved in a deadlock that we are trying to find. If we don't associate such ObjectMonitor objects with its owner thread, then we will not be able to find the deadlock as we are supposed to with this code.

Copy link
Contributor

@coleenp coleenp left a comment

Approve again. This makes me a lot happier.

@coleenp
Copy link
Contributor

@coleenp coleenp commented Dec 17, 2021

Re: both Robbin's and Erik's comments. I think Robbin's should be an RFE and Erik's should be another bug report and fixed separately. This change fixes the timeout that's been observed multiple times and preserves existing behavior (with existing bugs). I think some redesign is in order to make this correct all the time.

@fisk
Copy link

@fisk fisk commented Dec 17, 2021

Re: both Robbin's and Erik's comments. I think Robbin's should be an RFE and Erik's should be another bug report and fixed separately. This change fixes the timeout that's been observed multiple times and preserves existing behavior (with existing bugs). I think some redesign is in order to make this correct all the time.

I'm okay with fixing the other bug in a separate RFR.

fisk
fisk approved these changes Dec 17, 2021
Copy link

@fisk fisk left a comment

If the bug I pointed out is filed separately, then this looks good.

robehn
robehn approved these changes Dec 17, 2021
@fisk
Copy link

@fisk fisk commented Dec 17, 2021

For the record, Threads::owning_thread_from_monitor_owner seems to be the function to look up who really own an ObjectMonitor, based on the value in the owner field, to deal with the stuff I talked about.

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 17, 2021

@coleenp, @fisk and @robehn - Thanks for the re-reviews and the approvals.

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 17, 2021

I need to address these comments from @fisk:

When the owner field of an ObjectMonitor is the address of a stack lock instead
of a thread, it doesn't mean that it is a stack lock. It means that it was a stack
lock when it was acquired by that thread, but has since then been upgraded to
a full ObjectMonitor (by another thread), due to contention. But the fact that it
was a stack lock is irrelevant in this context; we care about what it is now. And
it is an inflated lock that just happens to have a slightly obscured owner field,
as the owner thread was not easily available when it was inflated.

What @fisk says here is correct and I'm probably guilty of over simplifying what
I was trying to explain earlier. Here's what I said earlier for context:

The baseline code has the same "issue":

void ObjectSynchronizer::monitors_iterate(MonitorClosure* closure, JavaThread* thread) {

ObjectMonitor* mid = iter.next();
if (mid->owner() != thread) {
  continue;
}

Please notice that I said "issue" and not "bug". This isn't a bug and you really
won't like the reason that it's not a bug. I know that I don't like it...

Because of this ownership check monitors_iterate() won't apply the closure
(InflatedMonitorsClosure in this case) to an ObjectMonitor that is marked as
owned by a stack lock address because it will never match the target JavaThread*.

@fisk, please notice that I don't say that this is a stack lock. I say this is

an ObjectMonitor that is marked as owned by a stack lock address

and that is the "bug" in your opinion. The baseline code and new code
do not recognize that the owner is not a JavaThread* and don't apply
the closure even though the stack lock address might belong to the
current target JavaThread.

I think we are in strong agreement about what the attributes of the "bug" are.

In other words, I still believe this is a bug. Such a monitor that gets upgraded
from a stack lock to a full ObjectMonitor, can - unline actual stack locks - be
involved in a deadlock that we are trying to find. If we don't associate such
ObjectMonitor objects with its owner thread, then we will not be able to find
the deadlock as we are supposed to with this code.

You are missing part of the algorithm in the code that calls monitors_iterate().
The monitors_iterate() part ONLY EXISTS to find the JNI locked ObjectMonitors.
It does that by finding all of the ObjectMonitors on the in-use list that are owned
by the target JavaThread* (and by owned I mean _owner == JavaThread*). It uses
the closure to search for each matched ObjectMonitor* in the JavaThread's stack
info that it squirreled away before making the call to monitors_iterate(). If it finds
the ObjectMonitor* on the JavaThread's stack, then that ObjectMonitor* IS NOT
a JNI locked ObjectMonitor and monitors_iterate() ignores it.

If monitors_iterate() does not find the ObjectMonitor* in the JavaThread's stack
info, then it assumes that the ObjectMonitor* is a JNI locked monitor and it adds
that ObjectMonitor* to the thread dump info.

Because an ObjectMonitor that is marked as owned by a stack lock address won't
match the target JavaThread*, it won't get processed by the closure that is called
by monitors_iterate(). However, we don't care because that ObjectMonitor* IS NOT
a JNI locked monitor and that is what monitors_iterate() is looking for. It must be a
monitor that is mentioned in the JavaThread's stack because it was acquired that
way originally so it will ALREADY be included in the thread's info that was gathered
before we made the monitors_iterate() call. Nothing is missed here!

I have tried to clarify this in the comments that I added as part of this patch. I'll
re-review the comments again and see if there is anything else I can add or change.

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 17, 2021

I've filed the following RFE for @robehn's improvement:

JDK-8278968 InflatedMonitorsClosure repeats walks of StackFrameInfo and should cache instead
https://bugs.openjdk.java.net/browse/JDK-8278968

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 17, 2021

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Dec 17, 2021

Going to push as commit a5f5d60.
Since your change was applied there have been 17 commits pushed to the master branch:

  • 8fbe172: 8278604: SwingSet2 table demo does not have accessible description set for images
  • b46f0b0: 8277447: Hotspot C1 compiler crashes on Kotlin suspend fun with loop
  • 9cd7090: 8278587: StringTokenizer(String, String, boolean) documentation bug
  • fffa73c: 8269425: 2 jdk/jfr/api/consumer/streaming tests failed to attach
  • b9a477b: 8275638: GraphKit::combine_exception_states fails with "matching stack sizes" assert
  • bb7efb3: 8278790: Inner loop of long loop nest runs for too few iterations
  • 8494fec: 8278796: Incorrect behavior of FloatVector.withLane on X86
  • f5d7c77: 8276826: Clarify the ModuleDescriptor.Version specification’s treatment of repeated punctuation characters
  • be6b90d: 8278574: update --help-extra message to include default value of --finalization option
  • aec1b03: 8278389: SuspendibleThreadSet::_suspend_all should be volatile/atomic
  • ... and 7 more: https://git.openjdk.java.net/jdk18/compare/475ec8e6c5abc3431344d69bd46395e8c4b46e4c...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated label Dec 17, 2021
@openjdk openjdk bot closed this Dec 17, 2021
@openjdk openjdk bot removed ready rfr labels Dec 17, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Dec 17, 2021

@dcubed-ojdk Pushed as commit a5f5d60.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

fisk
fisk approved these changes Dec 17, 2021
Copy link

@fisk fisk left a comment

I guess you can recursively JNI lock an ObjectMonitor, that has been promoted from a stack lock. Then it's a lock held by JNI and we won't report it here.

But this is JVMTI code so there is probably some other random check somewhere else that checks for that so I'm just going to assume this works somehow.

Ship it!

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 17, 2021

If you recursively JNI lock an ObjectMonitor that has been promoted from a stack lock,
then I think the code will update the _owner field from the address of the original stack
lock to the owning JavaThread*. Only the owning JavaThread can recursively enter an
existing ObjectMonitor and only the owning JavaThread can update that _owner field...

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Dec 17, 2021

@fisk - Thanks for closing the loop on this thread.

@dcubed-ojdk dcubed-ojdk deleted the JDK-8273107 branch Dec 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotspot-runtime integrated
5 participants