-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8274687: JDWP can deadlock VM if Java thread waits in blockOnDebuggerSuspend #5805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8274687: JDWP can deadlock VM if Java thread waits in blockOnDebuggerSuspend #5805
Conversation
👋 Welcome back rrich! A progress list of the required criteria for merging this PR into |
@@ -746,7 +746,16 @@ blockOnDebuggerSuspend(jthread thread) | |||
node = findThread(NULL, thread); | |||
if (node != NULL) { | |||
while (node && node->suspendCount > 0) { | |||
/* Resume requires the event handlerLock so we have to release it */ | |||
eventHandler_unlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure this is always safe. It might be fine in the context of resetting the connection, but not during normal debug agent operations. It allows for another event to be processed when the lock is suppose to keep event processing serialized. What happens for example if we hit the Thread.resume()
breakpoint again on a different thread?
It might be best to limit doing this only when threadControl_reset()
is currently executing (you could set a flag there), although it seems more like a band aid than a proper fix. I could imagine there might still be scenarios were releasing the lock during reset might be problematic, although probably extremely unlikely to ever be noticed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure this is always safe. It might be fine in the context of
resetting the connection, but not during normal debug agent operations. It
allows for another event to be processed when the lock is suppose to keep
event processing serialized. What happens for example if we hit the
Thread.resume()
breakpoint again on a different thread?
I agree that this is probably not safe. E.g. when releasing handlerLock the
handler chain for EI_BREAKPOINT could be modified which is being iterated in
the caller function event_callback().
jdk/src/jdk.jdwp.agent/share/native/libjdwp/eventHandler.c
Lines 647 to 673 in 8609ea5
node = getHandlerChain(evinfo->ei)->first; | |
classname = getClassname(evinfo->clazz); | |
while (node != NULL) { | |
/* save next so handlers can remove themselves */ | |
HandlerNode *next = NEXT(node); | |
jboolean shouldDelete; | |
if (eventFilterRestricted_passesFilter(env, classname, | |
evinfo, node, | |
&shouldDelete)) { | |
HandlerFunction func; | |
func = HANDLER_FUNCTION(node); | |
if ( func == NULL ) { | |
EXIT_ERROR(AGENT_ERROR_INTERNAL,"handler function NULL"); | |
} | |
(*func)(env, evinfo, node, eventBag); | |
} | |
if (shouldDelete) { | |
/* We can safely free the node now that we are done | |
* using it. | |
*/ | |
(void)freeHandler(node); | |
} | |
node = next; | |
} |
I should have checked before.
It might be best to limit doing this only when
threadControl_reset()
is
currently executing (you could set a flag there), although it seems more like
a band aid than a proper fix. I could imagine there might still be scenarios
were releasing the lock during reset might be problematic, although probably
extremely unlikely to ever be noticed.
I looked at threadControl_resumeThread(). It appears to me that resuming a
thread is not possible while some thread is waiting in blockOnDebuggerSuspend()
because threadControl_resumeThread() locks handlerLock.
eventHandler_lock(); /* for proper lock order */ |
Am I missing something?
Maybe the code in handleAppResumeBreakpoint() could moved to doPendingTasks()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding threadControl_resumeThread() it does appear that it would block, as would threadControl_resumeAll(), which seems problematic in that blockOnDebuggerSuspend() won't exit until the suspendCount == 0. So it's unclear to me how this is suppose to work if a resume() can't be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems if we had a test case where the debugger did a ThreadReference.suspend(), then the debuggee did a Thread.resume() on the same thread, and then debugger did a ThreadReference.resume(), it would block, and with no way to unblock it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if I can implement such a test. Also I have I prototype that defers the actions of handleAppResumeBreakpoint() until doPendingTasks() is running. This seems to work in the various jdi/jdwp tests which likely do not cover the functionality very well though.
I've implemented a test that shows the jdwp agent deadlocks also trying to resume one/all threads if a thread is waiting in |
This change fixes the deadlock described in the JBS-bug by:
Releasing
handlerLock
before waiting onthreadLock
inblockOnDebuggerSuspend()
Notifying on
threadLock
after resuming all threads inthreadControl_reset()
The PR has 3 commits:
The first commit delays the cleanup actions after leaving the loop in
debugLoop_run()
because of a Dispose command. The delay allows to reproduce the deadlockrunning the dispose003 test with the command
The second commit contains the fix described above. With it the deadlock
cannot be reproduced anymore.
The third commit removes the diagnostic code introduced with the first commit again.
The fix passed our nightly regression testing: JCK and JTREG, also in Xcomp mode with fastdebug and release builds on all platforms.
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5805/head:pull/5805
$ git checkout pull/5805
Update a local copy of the PR:
$ git checkout pull/5805
$ git pull https://git.openjdk.java.net/jdk pull/5805/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 5805
View PR using the GUI difftool:
$ git pr show -t 5805
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5805.diff