-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proc: changed windows backend to deal with simultaneous breakpoints #598
Conversation
@@ -127,6 +127,20 @@ func Launch(cmd []string) (*Process, error) { | |||
return nil, ProcessExitedError{Pid: dbp.Pid, Status: exitCode} | |||
} | |||
|
|||
for _, thread := range dbp.Threads { | |||
_, err := _SuspendThread(thread.os.hThread) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to wrap this in execPtraceFunc
as well, correct? I assume it has the same origin thread restrictions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukehoban code didn't, their documentation doesn't mention it, and it doesn't seem to cause problems.
I've found out that some of the errors that still happen are because somehow we get new threads starting while even after we suspended all existing threads in setCurrentBreakpoints, I changed waitForDebugEvent to suspend new threads in that section of the code and the only errors I still see are in TestIssue414, which isn't a very important test. I've put those changes in a separate commit (addenda) for easier reviewing, I'll squash before merge. |
for { | ||
var err error | ||
var tid int | ||
dbp.execPtraceFunc(func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea of this function having the side effect of potentially causing the inferior to execute instructions. We continue and then immediately execute a non-blocking wait, which will return an error when we do not have any pending events. That means we never stop this thread, correct?
Is is not possible to just loop through WaitForDebugEvent
in non-blocking mode until we have no events? I assume all events would be queued and we shouldn't have to continue the inferior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop on dbp.Threads
calls _SuspendTherad
on every thread so the inferior shouldn't be able to execute any instructions.
WaitForDebugEvent
doesn't seem to return anything unless we call ContinueDebugEvent
in between.
rebased on master, TestIssue414 is flaky but that shouldn't be a problem after #603. |
var err error | ||
var tid int | ||
dbp.execPtraceFunc(func() { | ||
err = _ContinueDebugEvent(uint32(dbp.Pid), uint32(dbp.os.breakThread), _DBG_CONTINUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only thing that sticks out to me in this patch set. Here, we're continuing a thread and waiting for a debug event, however in the meantime it's possible that the thread will be executing instructions. This function should not have that side effect.
A non-blocking wait for debug event should suffice, yeah? If there's no events we simply move on. I don't understand the need to continue a thread in a function that is setting state on our thread objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we suspend all threads (which we did above this line) ContinueDebugEvent
doesn't resume execution, it just acknowledges that we have processed the event we received. I couldn't figure out any other way of doing this that would work with the windows API. If we don't do this later on on the Continue procedure while we are singlestepping CurrentThread over its breakpoint instruction we receive events for other threads and we are not equipped to dealing with them there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so if I understand correctly, it behaves similar to the Darwin API, where you can suspend a thread multiple times, and have to resume an equal number of times for it to actually execute instructions. Is that correct, from your understanding? If so, could you add a comment / link to docs to make it clear we're not actually continuing execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's like the darwin API, however I don't actually ever suspend the thread more than once, when one of the threads generates a debug event windows will halt the execution of all threads but this state doesn't count as suspended, it's weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment describing what we are doing on that function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think maybe we should alias _DBG_CONTINUE
to _DBG_EXCEPTION_HANDLED
. When reading up on the Windows debug API the other day, IIRC, that existed for 32 bit, but not 64 bit. However, semantically it's more accurate to what we're trying to accomplish, and makes it more explicit from the callers point of view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Googling DBG_EXCEPTION_HANDLED leads me to this blog post which suggests that at some point DBG_EXCEPTION_HANDLED and DBG_CONTINUE did different things and had different values, I think appropriating the name as a synonym muddles the water but if you insist I'll switch.
Overall looks good. Can you rebase this on master so we can pull in the patch fixing prologue detection and get the AppVeyor CI to pass? |
a22e5a5
to
35b9c9f
Compare
Rebased, the only failures I see (and the one in Appveyor) is connected to the Step code, I will debug it after #603, assuming it persists. |
I added a commit that fixes #594, I can split it out if you want to review it separately, it's pretty simple. |
@@ -778,7 +778,9 @@ func TestStacktraceGoroutine(t *testing.T) { | |||
|
|||
for i, g := range gs { | |||
locations, err := g.Stacktrace(40) | |||
assertNoError(err, t, "GoroutineStacktrace()") | |||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a comment / explanation / TODO about why we're ignoring errors.
Overall LGTM, just a few comments. After those are addressed I will merge this. |
Done. |
2d29759
to
ff41c6c
Compare
Rebased on master, I can't replicate the failure I see in appveyor, someone that can will have to fix that. I still think this is an improvement and should be merged. |
Implementation of nextInProgress was wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…o-delve#598) * proc: changed windows backend to deal with simultaneous breakpoints * bugfix: forgot to add windowsPrologue3 to the prologues list in e4c7df1 * Tolerate errors returned by Stacktrace in TestStacktraceGoroutine. * bugfix: proc: propagate debug events we don't cause back to the target process Fixes: go-delve#594 * proc: fixed TestStepConcurrentPtr Implementation of nextInProgress was wrong.
This solves most of the test flakiness that the windows back end has on multi core.
Before this patch when multiple breakpoints were hit simultaneously only one was reported, the other one was processed by (_Thread).singleStep while stepping over the breakpoint which caused the second breakpoint to be missed and the first breakpoint to be processed twice, because (_Thread).singleStep was unable to complete.
This change is