Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8261391: ZGC crash - SEGV in RevokeOneBias::do_thread #2571

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter
Filter file types
Jump to
Jump to file
Failed to load files.

Always

Just for now

@@ -48,19 +48,23 @@ class HandshakeOperation : public CHeapObj<mtThread> {
// Once it reaches zero all handshake operations have been performed.
int32_t _pending_threads;
JavaThread* _target;
Thread* _requester;

// Must use AsyncHandshakeOperation when using AsyncHandshakeClosure.
HandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target) :
HandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target, Thread* requester) :
_handshake_cl(cl),
_pending_threads(1),
_target(target) {}
_target(target),
_requester(requester) {}

public:
HandshakeOperation(HandshakeClosure* cl, JavaThread* target) :
HandshakeOperation(HandshakeClosure* cl, JavaThread* target, Thread* requester) :
_handshake_cl(cl),
_pending_threads(1),
_target(target) {}
_target(target),
_requester(requester) {}
virtual ~HandshakeOperation() {}
void prepare(JavaThread* current_target, Thread* executing_thread);
void do_handshake(JavaThread* thread);
bool is_completed() {
int32_t val = Atomic::load(&_pending_threads);
@@ -77,7 +81,7 @@ class AsyncHandshakeOperation : public HandshakeOperation {
jlong _start_time_ns;
public:
AsyncHandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target, jlong start_ns)
: HandshakeOperation(cl, target), _start_time_ns(start_ns) {}
: HandshakeOperation(cl, target, NULL), _start_time_ns(start_ns) {}

This comment has been minimized.

@dholmes-ora

dholmes-ora Feb 18, 2021
Member

Not at all clear why requester is always NULL here. There is obviously a requester thread even in the async case. If that thread created a Handle in the op then don't we need to call start_processing on it?

This comment has been minimized.

@robehn

robehn Feb 18, 2021
Author Contributor

The asynchronous handshake may never be executed, thus having an infinity life-time. (thread never leaves native/blocked)
Referring to a the requesting thread means we must keep it alive forever also.
If you refer to stack-based resources from a thread, means that thread should not touch it's stack before you are done.
If the requester can't use it stack until you are done, you essentially have a synchronous handshake.

The only way we have to keep the requesting thread alive is to put in a ThreadsList, which means we would keep all threads alive potentially forever (until end of VM-life). But still a Handle would be unsafe because it's stack-based.

Therefore it's hard-coded to NULL. (are we suppose to use nullptr now ?)

virtual ~AsyncHandshakeOperation() { delete _handshake_cl; }
jlong start_time() const { return _start_time_ns; }
};
@@ -276,6 +280,22 @@ class VM_HandshakeAllThreads: public VM_Handshake {
VMOp_Type type() const { return VMOp_HandshakeAllThreads; }
};

void HandshakeOperation::prepare(JavaThread* current_target, Thread* executing_thread) {
if (current_target->is_terminated()) {
// Will never execute any handshakes on this thread.
return;
}
if (current_target != executing_thread) {
// Only when the target is not executing the handshake itself.
StackWatermarkSet::start_processing(current_target, StackWatermarkKind::gc);
}
Comment on lines +288 to +291

This comment has been minimized.

@dholmes-ora

dholmes-ora Feb 22, 2021
Member

As per my comment to Erik, this situation seems like a programming error to me as there you should not be a way for the target to create a Handle that the executing thread then gets a hold off - data should only flow one way from the requesting thread to the executing thread.

This comment has been minimized.

@robehn

robehn Feb 22, 2021
Author Contributor

StackWatermarkSet::start_processing(current_target, StackWatermarkKind::gc); is for thread local resources which may contain oops. A handshake may read from e.g. the target threads stack, so before the executing thread reads from the stack, we make sure it is safe.

if (_requester != NULL && _requester != executing_thread && _requester->is_Java_thread()) {
// The handshake closure may contain oop Handles from the _requester.
// We must make sure we can use them.
StackWatermarkSet::start_processing(_requester->as_Java_thread(), StackWatermarkKind::gc);
}
}

void HandshakeOperation::do_handshake(JavaThread* thread) {
jlong start_time_ns = 0;
if (log_is_enabled(Debug, handshake, task)) {
@@ -305,14 +325,14 @@ void HandshakeOperation::do_handshake(JavaThread* thread) {
}

void Handshake::execute(HandshakeClosure* hs_cl) {
HandshakeOperation cto(hs_cl, NULL);
HandshakeOperation cto(hs_cl, NULL, Thread::current());

This comment has been minimized.

@dholmes-ora

dholmes-ora Feb 18, 2021
Member

It is unfortunate that we need to manifest the current thread here (and elsewhere).

This comment has been minimized.

@robehn

robehn Feb 18, 2021
Author Contributor

Yes, I did that since I did not want to change the API.
I was thinking of:
Handshake::execute(HandshakeClosure* hs_cl, Thread* requester = NULL)
And then go over calls site and see if we have the current thread in a variable.
But I felt that would be better off in another change, if wanted.

VM_HandshakeAllThreads handshake(&cto);
VMThread::execute(&handshake);
}

void Handshake::execute(HandshakeClosure* hs_cl, JavaThread* target) {
JavaThread* self = JavaThread::current();
HandshakeOperation op(hs_cl, target);
HandshakeOperation op(hs_cl, target, Thread::current());

jlong start_time_ns = os::javaTimeNanos();

@@ -430,6 +450,7 @@ void HandshakeState::process_self_inner() {
bool async = op->is_async();
log_trace(handshake)("Proc handshake %s " INTPTR_FORMAT " on " INTPTR_FORMAT " by self",
async ? "asynchronous" : "synchronous", p2i(op), p2i(_handshakee));
op->prepare(_handshakee, _handshakee);
op->do_handshake(_handshakee);
if (async) {
log_handshake_info(((AsyncHandshakeOperation*)op)->start_time(), op->name(), 1, 0, "asynchronous");
@@ -522,9 +543,7 @@ HandshakeState::ProcessResult HandshakeState::try_process(HandshakeOperation* ma
pr_ret = HandshakeState::_succeeded;
}

if (!_handshakee->is_terminated()) {
StackWatermarkSet::start_processing(_handshakee, StackWatermarkKind::gc);
}
op->prepare(_handshakee, current_thread);

This comment has been minimized.

@dcubed-ojdk

dcubed-ojdk Feb 16, 2021
Member

Okay, just I'm clear on this fix:

  • You are replacing this single StackWatermarkSet::start_processing call on the handshakee with a call to the new prepare() function.
  • The new prepare() function always does the StackWatermarkSet::start_processing call on the current_target which is the handshakee.
  • The new prepare() function does a StackWatermarkSet::start_processing call on executioner/executer/executing_thread when that thread is different than the requester/handshaker thread.

So you have multiple names for the "target": target, handshakee, and current_target. Is there a reason to create the new name current_target instead of using either target or handshakee that you've used before? Since current_target is new name in this fix, I would like to see that one renamed to either target or handshakee in this fix.

I think you also have multiple names for the "requester": requester and handshaker, but I'm less confused by that naming. In this fix, you're now adding a new name: executioner which could be called executer or executing_thread.

Perhaps you could clean up this naming confusion is follow-up RFE that only fixes naming.

This comment has been minimized.

@robehn

robehn Feb 17, 2021
Author Contributor

1: Request the handshake

  • The thread doing this is the owner of the handshake operation
  • Handles in the handshake operation belongs to this thread and we must do SWS::start_proc() on it

2: Target thread(s)

  • The thread(s) which the operation is performed on
  • We must do SWS::start_proc() on it

3: Executor of the handshake

  • The thread that executes the code that the handshake operation contains.

Handshake operation:
_requester is the thread requested the operation
_target is NULL if we target all threads, otherwise it is the target thread.

The prepare() method is part of the handshake operation and need some additional information.
It needs to know the thread executing the handshake and it needs the current target (since _target may be NULL).

The HandshakeState name for the thread owning the handshake state is handshakee, which is a private implementation detail.
When the handshake state is processing handshakes it calls prepare with _handshakee as current_target and current_thread as parameters to prepare().

target may be NULL
current target may not be NULL
handshakee refers to the handshake state owner (not the target of a handshake operation)

They are thus not interchangeable, but at the moment of execution they can all refer to the same thread.

Regarding handshaker, we have _active_handshaker (same as executioner) inside HandshakeState.
In the context of a handshake operation is is not clear if handshaker would refer to the requester or the executor, therefore I did not use it.

This comment has been minimized.

@robehn

robehn Feb 17, 2021
Author Contributor

I now see that I missed call prepare when executing the handshake by handshakee, fixing!


_active_handshaker = current_thread;
op->do_handshake(_handshakee);
ProTip! Use n and p to navigate between commits in a pull request.