Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8241403: JavaThread::get_thread_name() should be ThreadSMR-aware #2535

Closed
wants to merge 9 commits into from

Conversation

dcubed-ojdk
Copy link
Member

@dcubed-ojdk dcubed-ojdk commented Feb 11, 2021

A minor fix to add a new function:

bool Thread::is_JavaThread_protected(const JavaThread* p)

that returns true when the target JavaThread* is protected and false
otherwise. Update JavaThread::get_thread_name() to create a
ThreadsListHandle and use the new is_JavaThread_protected(). Also
update JvmtiTrace::safe_get_thread_name() to use the new
is_JavaThread_protected().

This fix is tested via a Mach5 Tier[1-8] run.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8241403: JavaThread::get_thread_name() should be ThreadSMR-aware

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2535/head:pull/2535
$ git checkout pull/2535

@dcubed-ojdk
Copy link
Member Author

/label add hotspot-runtime

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 11, 2021

👋 Welcome back dcubed! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@dcubed-ojdk
Copy link
Member Author

/label add serviceability

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Feb 11, 2021
@openjdk
Copy link

openjdk bot commented Feb 11, 2021

@dcubed-ojdk
The hotspot-runtime label was successfully added.

@openjdk openjdk bot added the serviceability serviceability-dev@openjdk.org label Feb 11, 2021
@openjdk
Copy link

openjdk bot commented Feb 11, 2021

@dcubed-ojdk
The serviceability label was successfully added.

@dcubed-ojdk
Copy link
Member Author

@dholmes-ora and @fisk - It looks like both of you are interested
in this fix. @robehn - This one is Thread-SMR related so you might
also want to take a look.

@dcubed-ojdk
Copy link
Member Author

@sspitsyn - I'm making a minor JVM/TI tracing tweak here so this
might also interest you.

@dcubed-ojdk dcubed-ojdk marked this pull request as ready for review February 11, 2021 22:08
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 11, 2021
@mlbridge
Copy link

mlbridge bot commented Feb 11, 2021

Copy link
Contributor

@robehn robehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems alright.

@openjdk
Copy link

openjdk bot commented Feb 12, 2021

@dcubed-ojdk This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8241403: JavaThread::get_thread_name() should be ThreadSMR-aware

Reviewed-by: rehn, coleenp, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 38 new commits pushed to the master branch:

  • 3088e1e: 8262430: doclint warnings in java.base module
  • 67b9e5a: 8262420: typo: @implnote in java.desktop module
  • 240f2a1: 8260366: ExtendedSocketOptions can deadlock in some circumstances
  • de3f519: 8258897: wrong translation of capturing local classes inside nested lambdas
  • d7efb4c: 8262199: issue in jli args.c
  • 7603278: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set
  • 0a4e710: 8261954: Dependencies: Improve iteration over class hierarchy under context class
  • 722142e: 8261520: JDK-8261302 breaks runtime/NMT/CheckForProperDetailStackTrace.java
  • bcca100: 4710675: JTextArea.setComponentOrientation does not work with correct timing
  • fce5765: 8262433: doclint: reference error in module jdk.incubator.foreign
  • ... and 28 more: https://git.openjdk.java.net/jdk/compare/c769388d0f0ef1377b2652e445cd9cbe39711b1c...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 12, 2021
@dcubed-ojdk
Copy link
Member Author

@robehn - Thanks for the review!

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I have a change - could you see if you agree.
It's nice not to have to take Threads_lock to get the name of the thread.

@@ -1687,6 +1689,7 @@ class JavaThread: public Thread {
void verify();
const char* get_thread_name() const;
protected:
friend class JvmtiTrace; // so get_thread_name_string() can be called
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to think of a way to not have JvmtiTrace not be a friend of JavaThread for this, maybe by adding a default value parameter to return "<NOT FILLED IN>" rather than Thread::name.
is_JavaThread_protected only seems to be called by JvmtiTrace also, so should be private (with the friend, which also makes the friend unfortunate).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaThread::get_thread_name() also calls is_JavaThread_protected().

The "friend" is so that JvmtiTrace can call get_thread_name_string()
and we can get rid of the JvmtiTrace version of the logic. I kept the
"" rather than figure out a way to call Thread::name()
so we don't introduce the possibility of a compatibility issue for any
code that might depend on that hand rolled string value...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Thread::is_JavaThread_protected() should be "protected" then, not public.
yes, I was suggesting adding a default last parameter like
JavaThread::get_thread_name(char* default = Thread::name());
and pass "<NOT_FILLED_IN>" from JVMTI. Then JVMTI doesn't have to be a friend and have more visibility to the JavaThread class than it should have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh.... I think I finally understand what you mean...
I'll look in the AM after I've had some coffee... :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name "default" is reserved so I went with "def_name".
You can't call Thread::name() from the declaration:

./open/src/hotspot/share/runtime/thread.hpp:1690:62: error: call to non-static member function without an object argument
  const char* get_thread_name(const char* def_name = Thread::name()) const;
                                                     ~~~~~~~~^~~~

so I went with setting def_name = NULL in the declaration and decoding that state in get_thread_name():
return (def_name != NULL) ? def_name : Thread::name();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_name, it's not long enough to abbreviate... but yes. thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it looks like making is_JavaThread_protected() a protected function doesn't work either since JavaThread::get_thread_name() can't call it:

./open/src/hotspot/share/runtime/thread.cpp:2857:15: error: 'is_JavaThread_protected' is a protected member of 'Thread'
  if (thread->is_JavaThread_protected(this)) {
              ^
./open/src/hotspot/share/runtime/thread.cpp:488:14: note: can only access this member on an object of type 'JavaThread'
bool Thread::is_JavaThread_protected(const JavaThread* p) {
             ^
1 error generated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaThread::get_thread_name() is calling is_JavaThread_protected() on the calling Thread and NOT on the this JavaThread so protected doesn't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

Copy link
Contributor

@sspitsyn sspitsyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Dan,
It looks good to me modulo your your discussion with Coleen on JvmtiTrace being a friend.
Thanks,
Serguei

@dcubed-ojdk
Copy link
Member Author

dcubed-ojdk commented Feb 13, 2021

Mach5 Tier[1-8] testing on CR1 is in process:

  • Mach5 Tier[1-3] - no failures
  • Mach5 Tier4 - 2 unrelated, known failures - still running
  • Mach5 Tier5 - 1 unrelated, known failure
  • Mach5 Tier6 - no failures - still running
  • Mach5 Tier7 - no failures
  • Mach5 Tier8 - no failures

@robehn, @coleenp, @sspitsyn - please take a look when you get the chance.

Copy link
Contributor

@sspitsyn sspitsyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update looks good.
Thanks,
Serguei

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dcubed-ojdk
Copy link
Member Author

@coleenp and @sspitsyn - Thanks for the re-reviews!

@dcubed-ojdk
Copy link
Member Author

@dholmes and @fisk - did either or both of you plan to review this one?

@dcubed-ojdk
Copy link
Member Author

Merged with master and retested with Mach5 Tier1.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Dan,

Sorry but I have a lot of issues with this. After thinking about it a lot I don't think the current approach is what is needed. To repeat what I wrote in one of the comments I think the simple fix here is to replace the use of Threads_lock in the caller with suitable use of a TLH, and then replace the assert_locked_or_safepoint(Threads_lock) with assert(is_JavaThread_protected(t)) where is_JavaThread_protected checks the target for either being the current thread or included in a TLH of the current thread. I don't think get_thread_name() should try to protect the target as that is the responsibility of the caller.

Thanks,
David

@@ -1005,7 +1005,6 @@ void CompileBroker::init_compiler_sweeper_threads() {
_compilers[1]->set_num_compiler_threads(i + 1);
if (TraceCompilerThreads) {
ResourceMark rm;
MutexLocker mu(Threads_lock);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems potentially broken both before and after this change. If the ct thread is protected already (by current call chain) you don't need the lock, if it isn't protected then you can't safely call get_thread_name() on it without first ensuring it is still a valid reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch! You are correct that any time after the make_thread() call,
the newly created JavaThread could run and exit quickly. In the
baseline code, the Threads_lock grab only keeps 'ct' from exiting
while the lock is held. It doesn't help if 'ct' exited before the lock
was grabbed.

I'm not sure why you would think that 'ct' could be protected by
the current call chain. Of course, I could be missing something.

I'm going to see if I can do something here with a TLH.

@@ -482,6 +482,27 @@ void Thread::check_for_dangling_thread_pointer(Thread *thread) {
}
#endif

// Is the target JavaThread protected by this Thread:
bool Thread::is_JavaThread_protected(const JavaThread* p) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a function on Thread instead of JavaThread??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it a function on Thread because a Thread can use a ThreadsListHandle
to protect access to JavaThreads and we're checking the ThreadsList(s) that
are associated with the calling Thread.

It dawns on me that the comment could have been more clear:

// Is the target JavaThread protected by the calling Thread:

If I had written the comment that way, then I would have realized that the
function should be static in Thread and then it could call Thread::current()
itself and then make its checks of the ThreadsList(s).

Comment on lines 281 to 283
// If the target JavaThread is not protected, then we return the
// specified non-NULL string:
return thread->as_Java_thread()->get_thread_name("<NOT FILLED IN>");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what the original logic is doing. We only return "NOT FILLED IN" if we find that the java.lang.Thread oop does not have a name set. That's not normally possible but this code is intended to be 100% safe, hence the check. (And it may be possible to see null during the attach process ...)

I would expect the target thread to already be protected when this method is called in any case. So while this code partially duplicates JavaThread::get_thread_name() I'm not convinced we need to change this code to use JavaThread::get_thread_name() instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent was to have safe_get_thread_name() take advantage of
JavaThread::get_thread_name() when we know that we're dealing
with a JavaThread. Shared code means less maintenance, but it
does look like I didn't achieve equivalence so I'm going to back
out my changes for JvmtiTrace::safe_get_thread_name().

Comment on lines 487 to 490
if (current() == p) {
// Current thread is always protected:
return true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true but it is awkward that we are going to manifest Thread::current multiple times when using this - see below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static Thread::is_JavaThread_protected() now calls:

 bool Thread::is_JavaThread_protected(const JavaThread* p) {
  Thread* thread = Thread::current();
  if (thread == p) {

and because the function is now static, the caller doesn't have to call
Thread::current() and just calls Thread::is_JavaThread_protected(target).

assert_locked_or_safepoint_or_handshake(Threads_lock, this);
}
const char* JavaThread::get_thread_name(const char* default_name) const {
Thread* thread = Thread::current();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you manifest Thread::current() and then is_JavaThread_protected will manifest it again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm dropping the call to Thread::current() in JavaThread::get_thread_name().

}
const char* JavaThread::get_thread_name(const char* default_name) const {
Thread* thread = Thread::current();
ThreadsListHandle tlh(thread);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would only expect this code to create a TLH if the target is not protected, otherwise why do we need to search through the existing thread-lists as it would suffice to see if the target is within the newly created TLH ??
I expected the basic logic here to be:
if (thread is protected) return name; else { protect the thread; return name; }

though if it is cheaper to create a TLH than to search the existing ones, then we may as well just create the TLH, check that the target has been captured and then return its name.

That said we need to carefully examine where the taking of the Threads_lock would occur in the thread termination sequence, to understand what observable states are possible with respect to the target having a threadObj. (Though this may have already changed with the introduction of Thread-SMR.)

That all said I'm not so sure the only thing we actually need to do for this issue is to replace the use of Threads_lock, in the caller, with suitable TLH usage, and then in get_thread_name() assert that the target is in fact protected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped the TLH into JavaThread::get_thread_name() because I was focused
on protecting the fetching of the name from the JavaThread. In reality that
doesn't happen in a vacuum so a TLH should really exist in the caller or some
where higher in the call stack.

There is a subtle point here: Creating a TLH in JavaThread::get_thread_name()
may not actually protect "this" JavaThread. "this" JavaThread would only be
protected by a newly created TLH, if it is on the current ThreadsList. If we
happen to call java_thread->get_thread_name() too early in the lifecycle or
too late in the lifecycle, then that newly created TLH doesn't help.

The original code's assertion check was a little complicated, but still was
a good idea:

if we're not at a safepoint and if we're not calling the function on our self
   then assert_locked_or_safepoint_or_handshake(Threads_lock)

I'm going to add back a simpler assertion that takes advantage of the
fact that Thread::is_JavaThread_protected() returns true when the
target is the current thread.

@dcubed-ojdk
Copy link
Member Author

@dholmes-ora - I'm really glad that I waited for your review! Thanks for taking the time.

@mlbridge
Copy link

mlbridge bot commented Feb 17, 2021

Mailing list message from David Holmes on serviceability-dev:

Hi Dan,

On 18/02/2021 3:36 am, Daniel D.Daugherty wrote:

On Wed, 17 Feb 2021 07:32:46 GMT, David Holmes <dholmes at openjdk.org> wrote:

Daniel D. Daugherty has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

- Merge branch 'master' into JDK-8241403
- Address coleenp CR0 comments.
- 8241403: JavaThread::get_thread_name() should be ThreadSMR-aware

Hi Dan,

Sorry but I have a lot of issues with this. After thinking about it a lot I don't think the current approach is what is needed. To repeat what I wrote in one of the comments I think the simple fix here is to replace the use of Threads_lock in the caller with suitable use of a TLH, and then replace the assert_locked_or_safepoint(Threads_lock) with assert(is_JavaThread_protected(t)) where is_JavaThread_protected checks the target for either being the current thread or included in a TLH of the current thread. I don't think get_thread_name() should try to protect the target as that is the responsibility of the caller.

Thanks,
David

@dholmes-ora - I'm really glad that I waited for your review! Thanks for taking the time.

I think there are more issues with the existing code that then impact
any changes.

The general rule to operate on a target thread is that it must not be
able to terminate (or more strongly be deallocated) whilst you may
operate on it, but there are many ways this can be true:

- current thread
- at a safepoint
- in a handshake
- captured in a TLH
- target thread is immortal
- current thread holds a resource that the target needs before it exits

This last category is very general and can't really be captured by an
assertion. Holding the Threads_lock was one old example (and could be
asserted), but there are other possibilities including:

- current thread holds the ObjectMonitor of the target's j.l.Thread instance
- current thread is "synchronizing" with the target and the target can't
proceed until after the current thread releases it

So I think the current assertion in get_thread_name is more restrictive
than is actually necessary (imagine adding a logging statement in the OM
code that prints the name of the thread it has selected to unpark -
perfectly safe but no safepoint active and no Threads_lock held).
Changing that assertion is also more involved than just checking for an
active TLH as there are many safety conditions, not all of which can be
checked explicitly.

I think the CompileBroker code was a case of dealing with immortal
threads (before UseDynamicNumberOfCompilerThreads was introduced) and
the acquisition of the Threads_lock was simply done to placate the
assertion. With dynamic compiler threads it is I guess theoretically
possible that a newly created thread might terminate again before the
creating thread reaches the get_thread_name() call - but exceedingly
unlikely. Not sure how to fix that one without splitting apart the
thread creation and start processes.

Cheers,
David
-----

@dcubed-ojdk
Copy link
Member Author

The changes in CR2 are:

  • Make Thread::is_JavaThread_protected() static and make it clear that
    it checks to see if the calling thread is protecting the target;
    added VMThread access as always protected.
  • Revert changes in JvmtiTrace::safe_get_thread_name().
  • Revert the JavaThread::get_thread_name() default_name parameter and
    don't create a ThreadsListHandle in the function; restore a simpler
    assert for the case where the target thread is not protected.
  • Add Threads::threads_do() and Threads::java_threads_do() that work
    with a JavaThreadIteratorWithHandle.
  • Update src/hotspot/share/services/management.cpp to switch from using
    the Threads_lock to using JavaThreadIteratorWithHandle.
  • Redo compileBroker.cpp changes to use a ThreadsListHandle to protect
    the CompilerThread/JavaThread.
  • Remove a stale comment from src/hotspot/share/prims/jvm.cpp.

You'll notice some temporary testing code in JavaThread::get_thread_name():

  guarantee(false, "current_thread=" INTPTR_FORMAT
            " is not protecting this=" INTPTR_FORMAT,
            p2i(current_thread), p2i(this));

I'm doing my current Mach5 Tier[1-8] run with this guarantee() in place.
It hasn't fired in Mach5 Tier[1-7]; Tier8 is still running, but it hasn't
fired yet. What this means is that we don't currently have a test case that
always exercises the unprotected path; we might have a test case that
sometimes exercises the unprotected path depending on timing and/or
options, but nothing so far. My intention with this fix is not to leave any
unprotected paths for JavaThread::get_thread_name() and these test results
give me confidence, but not complete assurance.

@dcubed-ojdk
Copy link
Member Author

Here's another thing that I noticed and want to discuss with the reviwers:

I added a new comment above the simpler assert that I restored into
JavaThread::get_thread_name():

  // Note: Since this JavaThread isn't protected by a TLH, the call to
  // this->is_handshake_safe_for() may crash, but we have debug bits so...
  assert(SafepointSynchronize::is_at_safepoint() ||
         this->is_handshake_safe_for(current_thread), "JavaThread="
         INTPTR_FORMAT " is not protected, not at a safepoint and "
         "not handshake safe.", p2i(this));

The baseline code used:

assert_locked_or_safepoint_or_handshake(Threads_lock, this);

and that function looks like this:

void assert_locked_or_safepoint_or_handshake(const Mutex* lock, const JavaThread* thread) {
  if (thread->is_handshake_safe_for(Thread::current())) return;
  assert_locked_or_safepoint(lock);
}

and is_handshake_safe_for() looks like this:

  // A JavaThread can always safely operate on it self and other threads
  // can do it safely if they are the active handshaker.
  bool is_handshake_safe_for(Thread* th) const {
    return _handshake.active_handshaker() == th || this == th;
  }

so the baseline code is also using "this" JavaThread without knowing that
it is safe. The interesting thing about is_handshake_safe_for() is that it
is safe if the target is the active handshaker for the current thread.
However, if the target is not the active handshaker for the current thread,
then calling thread->is_handshake_safe_for() and accessing the
_handshake field in that JavaThread may not be safe.

@dcubed-ojdk
Copy link
Member Author

dcubed-ojdk commented Feb 20, 2021

The CR2 changes are being tested with Mach5 Tier[1-8]:

Mach5 Tier[12458] - no failures
Mach5 Tier3 - 1 unrelated, known failure.
Mach5 Tier6 - 2 unrelated failures, 1 known, 1 unknown.
Mach5 Tier7 - 3 unrelated, known failures

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Dan,

Still a few comments/concerns - see below.

Thanks,
David

Comment on lines 1006 to 1007
ThreadsListHandle tlh;
assert(tlh.includes(ct), "ct=" INTPTR_FORMAT " exited unexpectedly.", p2i(ct));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case seems to be one where we are creating an immortal compiler thread (either because they are all immortal, or this is the first compiler thread and it is always immortal (verify that?)) - so the assert seems fine.

Comment on lines 1027 to 1028
ThreadsListHandle tlh;
assert(tlh.includes(ct), "ct=" INTPTR_FORMAT " exited unexpectedly.", p2i(ct));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Comment on lines 1114 to 1115
ThreadsListHandle tlh;
assert(tlh.includes(ct), "ct=" INTPTR_FORMAT " exited unexpectedly.", p2i(ct));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the fully dynamic case. Here the TLH is potentially racing with thread termination and so may or may not capture ct. Rather than an assertion the call to get_thread_name() below should be guarded by tlh.includes(ct).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Here's the new block:

      JavaThread *ct = make_thread(compiler_t, compiler2_object(i), _c2_compile_queue, _compilers[1], THREAD);
      if (ct == NULL) break;
      ThreadsListHandle tlh;
      if (!tlh.includes(ct)) break;
      _compilers[1]->set_num_compiler_threads(i + 1);

So we "break" if the tlh doesn't include the JavaThread which is the same
behavior as when ct == NULL). Please note that I followed the existing
code style of putting the if-statement break on a single line.

Comment on lines 1135 to 1136
ThreadsListHandle tlh;
assert(tlh.includes(ct), "ct=" INTPTR_FORMAT " exited unexpectedly.", p2i(ct));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto - the assert is not appropriate for this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for the resolution.

@@ -482,16 +482,17 @@ void Thread::check_for_dangling_thread_pointer(Thread *thread) {
}
#endif

// Is the target JavaThread protected by this Thread:
// Is the target JavaThread protected by the calling Thread:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/calling/current/ ?

We could also expand this to clarify that even if a target JavaThread is not explicitly protected by the current thread it can still be safe to access if the two threads engage in a suitable protocol.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment used to be:

// Is the target JavaThread protected by this Thread:

so I changed it to "the calling Thread" because I wanted it to be clear that
we're checking to see if the Thread that's calling is_JavaThread_protected()
is protecting the target JavaThread. I didn't use "current" because I thought
"calling" would be more clear.

I could add more comments about the target JavaThread being possibly
protected by other protocols, but I thought that might raise questions
about why the code doesn't check for those things and I didn't want to
go down that rabbit hole...

Comment on lines 2565 to 2568
// Note: Since this JavaThread isn't protected by a TLH, the call to
// this->is_handshake_safe_for() may crash, but we have debug bits so...
assert(SafepointSynchronize::is_at_safepoint() ||
this->is_handshake_safe_for(current_thread), "JavaThread="
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is a pre-existing bug as accessing this may not be safe if the target is not protected. But if we crash attempting the access that implies the assert failed so ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... that will tell us that we have a spot in the code where a TLH is
missing once we've analyzed the crash. So are you okay with this
assert() having a bug since the original assert has the same bug?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has occurred to me that I should have moved all of the
protection checking logic and assertion checks into the
new Thread::is_JavaThread_protected() function. That will
allow JavaThread::get_thread_name() to become very simple.

Comment on lines 847 to 848
JavaThreadIteratorWithHandle jtiwh;
Threads::threads_do(&jtiwh, &vmtcc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is at all necessary. Threads::add and Threads::remove still use the Threads_lock so this code is safe as it stands. Switching to use JTIWH may be an alternative, and may even be desirable, but that hasn't been established, and is not necessary for safety and so seems outside the scope of this bug IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding the Threads_lock is no longer being considered as
proper protection for the JavaThread::get_thread_name() call
so I had to change the code to use a JavaThreadIteratorWithHandle
so that we have the protection of the TLH. That closure eventually
gets us down to a JavaThread::get_thread_name() call...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the code is now broken as it needs the Threads_lock to guard against non-JavaThreads being created or terminating.
The code also appears misleading because the passing of the JTIWH makes it look (to me) like it only deals with JavaThreads.
I think this code needs to be:

    JavaThreadIteratorWithHandle jtiwh;  // ensure JavaThreads are safe to access
    MutexLocker ml(Threads_lock);        // ensure non_JavaThreads are safe to access
    Threads::threads_do(&jtiwh, &vmtcc); // process all threads

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm backing out the changes to src/hotspot/share/services/management.cpp.
By removing the Threads_lock usage, I'm allowing those closures to run in parallel
with other closures that are using the original Threads::java_threads_do() and
Threads::threads_do() calls that do use the Threads_lock. Those other uses might
be assuming exclusivity for their algorithms and my M&M changes break that.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dholmes-ora - thanks for the re-review!

Comment on lines 1114 to 1115
ThreadsListHandle tlh;
assert(tlh.includes(ct), "ct=" INTPTR_FORMAT " exited unexpectedly.", p2i(ct));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Here's the new block:

      JavaThread *ct = make_thread(compiler_t, compiler2_object(i), _c2_compile_queue, _compilers[1], THREAD);
      if (ct == NULL) break;
      ThreadsListHandle tlh;
      if (!tlh.includes(ct)) break;
      _compilers[1]->set_num_compiler_threads(i + 1);

So we "break" if the tlh doesn't include the JavaThread which is the same
behavior as when ct == NULL). Please note that I followed the existing
code style of putting the if-statement break on a single line.

Comment on lines 1135 to 1136
ThreadsListHandle tlh;
assert(tlh.includes(ct), "ct=" INTPTR_FORMAT " exited unexpectedly.", p2i(ct));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for the resolution.

@@ -482,16 +482,17 @@ void Thread::check_for_dangling_thread_pointer(Thread *thread) {
}
#endif

// Is the target JavaThread protected by this Thread:
// Is the target JavaThread protected by the calling Thread:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment used to be:

// Is the target JavaThread protected by this Thread:

so I changed it to "the calling Thread" because I wanted it to be clear that
we're checking to see if the Thread that's calling is_JavaThread_protected()
is protecting the target JavaThread. I didn't use "current" because I thought
"calling" would be more clear.

I could add more comments about the target JavaThread being possibly
protected by other protocols, but I thought that might raise questions
about why the code doesn't check for those things and I didn't want to
go down that rabbit hole...

Comment on lines 2565 to 2568
// Note: Since this JavaThread isn't protected by a TLH, the call to
// this->is_handshake_safe_for() may crash, but we have debug bits so...
assert(SafepointSynchronize::is_at_safepoint() ||
this->is_handshake_safe_for(current_thread), "JavaThread="
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... that will tell us that we have a spot in the code where a TLH is
missing once we've analyzed the crash. So are you okay with this
assert() having a bug since the original assert has the same bug?

Comment on lines 847 to 848
JavaThreadIteratorWithHandle jtiwh;
Threads::threads_do(&jtiwh, &vmtcc);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding the Threads_lock is no longer being considered as
proper protection for the JavaThread::get_thread_name() call
so I had to change the code to use a JavaThreadIteratorWithHandle
so that we have the protection of the TLH. That closure eventually
gets us down to a JavaThread::get_thread_name() call...

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Dan,

Sorry still some issue here.

Okay let me back up again. The old code for get_thread_name() thinks it is only safe to access the name of another thread if we hold the Threads_lock or we are at a safepoint. That isn't true as I've pointed out before - there are other ways for the access to be safe. And as a result of this, code calling get_thread_name() would acquire the Threads_lock just to get past the assertion e.g the CompilerBroker code. I've now ascertained that the CompilerBroker calls to get_thread_name() are perfectly safe by construction due to holding the CompilerThread_lock and/or dealing with an immortal thread.

The intent of filing this issue was for get_thread_name() to also (instead?) recognise that it is safe if the target is protected by a TLH - thus relaxing the original constraint by adding another checkable condition - and to recognise that a handshake also makes things safe. I didn't recognise at the time that these are still overly strong requirements.

So the change you are proposing is to replace the "needs the Threads_lock" condition with "must have an active TLH" condition when calling get_thread_name(). So now the unnecessary use of the Threads_lock in the CompileBroker is replaced by unnecessary use of the TLH - which is unfortunate IMO but a fix for that is out of scope for this change.

Also the management code changes now require use of a TLH - but there is a bug in replacing the Threads_lock usage: see below!

As a Threads_lock->TLH conversion for users of get_thread_name() I can accept the change in its current form. But it needs to restore the asserts in the CompileBroker code as commented below and fixing the Threads_lock bug in the management code.

Thanks,
David

@@ -1109,10 +1111,11 @@ void CompileBroker::possibly_add_compiler_threads(Thread* THREAD) {
#endif
JavaThread *ct = make_thread(compiler_t, compiler2_object(i), _c2_compile_queue, _compilers[1], THREAD);
if (ct == NULL) break;
ThreadsListHandle tlh;
if (!tlh.includes(ct)) break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change raised a red flag because if the new thread already terminated it should decrement the number of compiler threads on its way out, but we are skipping the increment by breaking here. So that prompted me to look closer at the bigger picture for these compiler threads and their lifecycle. The current code is called whilst holding the CompilerThread_lock, which means that the new thread can't terminate because it has to also acquire the CompilerThread_lock to decide whether it needs to terminate. So access to ct is guaranteed to be safe and no TLH is needed at all - other than because get_thread_name() requires it. So this should be a simple assertion afterall.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I've switched this one back to an assertion after the TLH.
Thanks for the analysis.

Comment on lines 1135 to 1136
ThreadsListHandle tlh;
if (!tlh.includes(ct)) break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I've switched this one back to an assertion after the TLH.

Comment on lines 2565 to 2568
// Note: Since this JavaThread isn't protected by a TLH, the call to
// this->is_handshake_safe_for() may crash, but we have debug bits so...
assert(SafepointSynchronize::is_at_safepoint() ||
this->is_handshake_safe_for(current_thread), "JavaThread="
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Comment on lines 847 to 848
JavaThreadIteratorWithHandle jtiwh;
Threads::threads_do(&jtiwh, &vmtcc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the code is now broken as it needs the Threads_lock to guard against non-JavaThreads being created or terminating.
The code also appears misleading because the passing of the JTIWH makes it look (to me) like it only deals with JavaThreads.
I think this code needs to be:

    JavaThreadIteratorWithHandle jtiwh;  // ensure JavaThreads are safe to access
    MutexLocker ml(Threads_lock);        // ensure non_JavaThreads are safe to access
    Threads::threads_do(&jtiwh, &vmtcc); // process all threads

@@ -2907,6 +2950,11 @@ void Threads::threads_do(ThreadClosure* tc) {
non_java_threads_do(tc);
}

void Threads::threads_do(JavaThreadIteratorWithHandle* jtiwh_p, ThreadClosure* tc) {
java_threads_do(jtiwh_p, tc);
non_java_threads_do(tc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should still have

assert_locked_or_safepoint(Threads_lock);

for non-JavaThread access.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm backing out both:

Threads::java_threads_do(JavaThreadIteratorWithHandle* jtiwh_p, ThreadClosure* tc)
Threads::threads_do(JavaThreadIteratorWithHandle* jtiwh_p, ThreadClosure* tc)

and I'm backing out the changes to src/hotspot/share/services/management.cpp.
By removing the Threads_lock usage, I'm allowing those closures to run in parallel
with other closures that are using the original Threads::java_threads_do() and
Threads::threads_do() calls that do use the Threads_lock. Those other uses might
be assuming exclusivity for their algorithms and my M&M changes break that.

Comment on lines 1703 to 1704
JavaThreadIteratorWithHandle jtiwh;
Threads::threads_do(&jtiwh, &ttc);