Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8261391: ZGC crash - SEGV in RevokeOneBias::do_thread #2571

Closed
wants to merge 4 commits into from
Closed
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
39 changes: 29 additions & 10 deletions src/hotspot/share/runtime/handshake.cpp
Expand Up @@ -48,19 +48,23 @@ class HandshakeOperation : public CHeapObj<mtThread> {
// Once it reaches zero all handshake operations have been performed.
int32_t _pending_threads;
JavaThread* _target;
Thread* _requester;

// Must use AsyncHandshakeOperation when using AsyncHandshakeClosure.
HandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target) :
HandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target, Thread* requester) :
_handshake_cl(cl),
_pending_threads(1),
_target(target) {}
_target(target),
_requester(requester) {}

public:
HandshakeOperation(HandshakeClosure* cl, JavaThread* target) :
HandshakeOperation(HandshakeClosure* cl, JavaThread* target, Thread* requester) :
_handshake_cl(cl),
_pending_threads(1),
_target(target) {}
_target(target),
_requester(requester) {}
virtual ~HandshakeOperation() {}
void prepare(JavaThread* current_target, Thread* executing_thread);
void do_handshake(JavaThread* thread);
bool is_completed() {
int32_t val = Atomic::load(&_pending_threads);
Expand All @@ -77,7 +81,7 @@ class AsyncHandshakeOperation : public HandshakeOperation {
jlong _start_time_ns;
public:
AsyncHandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target, jlong start_ns)
: HandshakeOperation(cl, target), _start_time_ns(start_ns) {}
: HandshakeOperation(cl, target, NULL), _start_time_ns(start_ns) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all clear why requester is always NULL here. There is obviously a requester thread even in the async case. If that thread created a Handle in the op then don't we need to call start_processing on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asynchronous handshake may never be executed, thus having an infinity life-time. (thread never leaves native/blocked)
Referring to a the requesting thread means we must keep it alive forever also.
If you refer to stack-based resources from a thread, means that thread should not touch it's stack before you are done.
If the requester can't use it stack until you are done, you essentially have a synchronous handshake.

The only way we have to keep the requesting thread alive is to put in a ThreadsList, which means we would keep all threads alive potentially forever (until end of VM-life). But still a Handle would be unsafe because it's stack-based.

Therefore it's hard-coded to NULL. (are we suppose to use nullptr now ?)

virtual ~AsyncHandshakeOperation() { delete _handshake_cl; }
jlong start_time() const { return _start_time_ns; }
};
Expand Down Expand Up @@ -276,6 +280,22 @@ class VM_HandshakeAllThreads: public VM_Handshake {
VMOp_Type type() const { return VMOp_HandshakeAllThreads; }
};

void HandshakeOperation::prepare(JavaThread* current_target, Thread* executing_thread) {
if (current_target->is_terminated()) {
// Will never execute any handshakes on this thread.
return;
}
if (current_target != executing_thread) {
// Only when the target is not executing the handshake it self.
robehn marked this conversation as resolved.
Show resolved Hide resolved
StackWatermarkSet::start_processing(current_target, StackWatermarkKind::gc);
}
if (_requester != NULL && _requester != executing_thread && _requester->is_Java_thread()) {
// The handshake closure may contain oop Handles from the _requester.
// We must make sure we can use them.
StackWatermarkSet::start_processing(_requester->as_Java_thread(), StackWatermarkKind::gc);
}
}

void HandshakeOperation::do_handshake(JavaThread* thread) {
jlong start_time_ns = 0;
if (log_is_enabled(Debug, handshake, task)) {
Expand Down Expand Up @@ -305,14 +325,14 @@ void HandshakeOperation::do_handshake(JavaThread* thread) {
}

void Handshake::execute(HandshakeClosure* hs_cl) {
HandshakeOperation cto(hs_cl, NULL);
HandshakeOperation cto(hs_cl, NULL, Thread::current());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unfortunate that we need to manifest the current thread here (and elsewhere).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did that since I did not want to change the API.
I was thinking of:
Handshake::execute(HandshakeClosure* hs_cl, Thread* requester = NULL)
And then go over calls site and see if we have the current thread in a variable.
But I felt that would be better off in another change, if wanted.

VM_HandshakeAllThreads handshake(&cto);
VMThread::execute(&handshake);
}

void Handshake::execute(HandshakeClosure* hs_cl, JavaThread* target) {
JavaThread* self = JavaThread::current();
HandshakeOperation op(hs_cl, target);
HandshakeOperation op(hs_cl, target, Thread::current());

jlong start_time_ns = os::javaTimeNanos();

Expand Down Expand Up @@ -430,6 +450,7 @@ void HandshakeState::process_self_inner() {
bool async = op->is_async();
log_trace(handshake)("Proc handshake %s " INTPTR_FORMAT " on " INTPTR_FORMAT " by self",
async ? "asynchronous" : "synchronous", p2i(op), p2i(_handshakee));
op->prepare(_handshakee, _handshakee);
op->do_handshake(_handshakee);
if (async) {
log_handshake_info(((AsyncHandshakeOperation*)op)->start_time(), op->name(), 1, 0, "asynchronous");
Expand Down Expand Up @@ -522,9 +543,7 @@ HandshakeState::ProcessResult HandshakeState::try_process(HandshakeOperation* ma
pr_ret = HandshakeState::_succeeded;
}

if (!_handshakee->is_terminated()) {
StackWatermarkSet::start_processing(_handshakee, StackWatermarkKind::gc);
}
op->prepare(_handshakee, current_thread);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, just I'm clear on this fix:

  • You are replacing this single StackWatermarkSet::start_processing call on the handshakee with a call to the new prepare() function.
  • The new prepare() function always does the StackWatermarkSet::start_processing call on the current_target which is the handshakee.
  • The new prepare() function does a StackWatermarkSet::start_processing call on executioner/executer/executing_thread when that thread is different than the requester/handshaker thread.

So you have multiple names for the "target": target, handshakee, and current_target. Is there a reason to create the new name current_target instead of using either target or handshakee that you've used before? Since current_target is new name in this fix, I would like to see that one renamed to either target or handshakee in this fix.

I think you also have multiple names for the "requester": requester and handshaker, but I'm less confused by that naming. In this fix, you're now adding a new name: executioner which could be called executer or executing_thread.

Perhaps you could clean up this naming confusion is follow-up RFE that only fixes naming.

Copy link
Contributor Author

@robehn robehn Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1: Request the handshake

  • The thread doing this is the owner of the handshake operation
  • Handles in the handshake operation belongs to this thread and we must do SWS::start_proc() on it

2: Target thread(s)

  • The thread(s) which the operation is performed on
  • We must do SWS::start_proc() on it

3: Executor of the handshake

  • The thread that executes the code that the handshake operation contains.

Handshake operation:
_requester is the thread requested the operation
_target is NULL if we target all threads, otherwise it is the target thread.

The prepare() method is part of the handshake operation and need some additional information.
It needs to know the thread executing the handshake and it needs the current target (since _target may be NULL).

The HandshakeState name for the thread owning the handshake state is handshakee, which is a private implementation detail.
When the handshake state is processing handshakes it calls prepare with _handshakee as current_target and current_thread as parameters to prepare().

target may be NULL
current target may not be NULL
handshakee refers to the handshake state owner (not the target of a handshake operation)

They are thus not interchangeable, but at the moment of execution they can all refer to the same thread.

Regarding handshaker, we have _active_handshaker (same as executioner) inside HandshakeState.
In the context of a handshake operation is is not clear if handshaker would refer to the requester or the executor, therefore I did not use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now see that I missed call prepare when executing the handshake by handshakee, fixing!


_active_handshaker = current_thread;
op->do_handshake(_handshakee);
Expand Down