Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: Go 1.14/Windows asynchronous preemption mechanism likely incompatible with debugging #36494

Open
aarzilli opened this issue Jan 10, 2020 · 12 comments

Comments

@aarzilli
Copy link
Contributor

@aarzilli aarzilli commented Jan 10, 2020

If I understand correctly asynchronous preemption is implemented in Windows by:

  1. Calling SuspendThread on the target thread
  2. Waiting for SuspendThread to take effect by calling GetThreadContext
  3. Injecting a function call on the target thread by manipulating the context and calling SetThreadContext
  4. Calling ResumeThread on the target thread

This procedure being implemented by preemptM in src/runtime/os_windows.go.

However, from what I can observe, it seems that GetThreadContext will also return when the target thread get suspended after hitting a software breakpoint (placed by a debugger). This has two effects: first the breakpoint will be missed (because preemptM will manipulate the thread context to insert a function call, masking the breakpoint), secondly the return address for the injected call will be in the middle of an instruction (because the software breakpoint was partially overwriting an instruction).

A debugger that knows about this can work around it by checking if a thread is stopped on the entry point of asyncPreempt, and fix its return address. For this to work however preemptM needs to fully finish its context manipulation, if it ends up being stopped between the point where GetThreadContext returns and SetThreadContext is called the same problem will happen.

I'm not sure that there is a way to use software breakpoints with Go 1.14 with async preemption, as it currently is. Did I misunderstand anything?

@ALTree
Copy link
Member

@ALTree ALTree commented Jan 10, 2020

aarzilli added a commit to aarzilli/delve that referenced this issue Jan 10, 2020
See golang/go#36494 for a description of why
full support for 1.14 under windows is problematic.
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 10, 2020
See golang/go#36494 for a description of why
full support for 1.14 under windows is problematic.
@aclements
Copy link
Member

@aclements aclements commented Jan 10, 2020

Can you explain a bit more about how software breakpoints work under Windows (or point me to the relevant APIs)? I thought this would all be fine because SuspendThread acts like a semaphore, but I keep finding more and more little surprises with SuspendThread. :P

One possible, slightly awful work around may be for the debugging to poke a 1 into runtime.debug.asyncpreemptoff when it attaches. (Or if it's starting the process itself, add asyncpreemptoff=1 to GODEBUG.) Obviously not ideal because it may change program behavior.

@aarzilli
Copy link
Contributor Author

@aarzilli aarzilli commented Jan 10, 2020

Debuggers call WaitForDebugEvent, when a breakpoint happens (of any kind) the thread that executed the breakpoint is stopped and WaitForDebugEvent returns. Apparently this is also enough for any GetThreadContext call to return. A software breakpoint is just an INT 3 instruction overwriting the first byte of some other instruction, since executing it also updates the value of RIP the context returned by GetThreadContext will have RIP one byte inside whatever instruction was overwritten with the breakpoint.

One only-works-with-delve solution would be to add this between the SuspendThread call and the GetThreadContext call in preemptM:

if debuggerAttached {
    INT 3
}

with debuggerAttached being some global variable that the debugger is supposed to set after attaching. This should prevent (if I'm not wrong) preemptM from seeing the target thread's context before the debugger has a chance to and prevent preemptM from seeing the thread in a weird state.

The only other change needed in the debugger would be to ignore this breakpoint inside preemptM.

Of course this won't solve the problem for any debugger other than delve (also: I haven't tested this, so it's possible that it doesn't work at all).

aarzilli added a commit to aarzilli/delve that referenced this issue Jan 21, 2020
See golang/go#36494 for a description of why
full support for 1.14 under windows is problematic.
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 21, 2020
@bcmills bcmills added the OS-Windows label Jan 22, 2020
@bcmills bcmills added this to the Go1.14 milestone Jan 22, 2020
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 24, 2020
See golang/go#36494 for a description of why
full support for 1.14 under windows is problematic.
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 24, 2020
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 24, 2020
See golang/go#36494 for a description of why
full support for 1.14 under windows is problematic.
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 24, 2020
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 24, 2020
See golang/go#36494 for a description of why
full support for 1.14 under windows is problematic.
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 24, 2020
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 24, 2020
See golang/go#36494 for a description of why
full support for 1.14 under windows is problematic.
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 24, 2020
aarzilli added a commit to aarzilli/delve that referenced this issue Feb 9, 2020
See golang/go#36494 for a description of why
full support for 1.14 under windows is problematic.
aarzilli added a commit to aarzilli/delve that referenced this issue Feb 9, 2020
derekparker pushed a commit to go-delve/delve that referenced this issue Feb 11, 2020
* tests: misc test fixes for go1.14

- math.go is now ambiguous due to changes to the go runtime so specify
  that we mean our own math.go in _fixtures
- go list -m requires vendor-mode to be disabled so pass '-mod=' to it
  in case user has GOFLAGS=-mod=vendor
- update version of go/packages, required to work with go 1.14 (and
  executed go mod vendor)
- Increased goroutine migration in one development version of Go 1.14
  revealed a problem with TestCheckpoints in command_test.go and
  rr_test.go. The tests were always wrong because Restart(checkpoint)
  doesn't change the current thread but we can't assume that when the
  checkpoint was taken the current goroutine was running on the same
  thread.

* goversion: update maximum supported version

* Makefile: disable testing lldb-server backend on linux with Go 1.14

There seems to be some incompatibility with lldb-server version 6.0.0
on linux and Go 1.14.

* proc/gdbserial: better handling of signals

- if multiple signals are received simultaneously propagate all of them to the
  target threads instead of only one.
- debugserver will drop an interrupt request if a target thread simultaneously
  receives a signal, handle this situation.

* dwarf/line: normalize backslashes for windows executables

Starting with Go 1.14 the compiler sometimes emits backslashes as well
as forward slashes in debug_line, normalize everything to / for
conformity with the behavior of previous versions.

* proc/native: partial support for Windows async preempt mechanism

See golang/go#36494 for a description of why
full support for 1.14 under windows is problematic.

* proc/native: disable Go 1.14 async preemption on Windows

See golang/go#36494
@toothrot toothrot modified the milestones: Go1.14, Go1.15 Feb 25, 2020
@aclements
Copy link
Member

@aclements aclements commented Mar 2, 2020

Just checking my understanding: is the problem ultimately a race between two threads trying to change a third thread's context? Say, the debugger injects a breakpoint into the process. Thread D in the debugger blocks on WaitForDebugEvent. Thread A in the application hits the breakpoint, which suspends thread A (and all other threads) and unblocks thread D. Simultaneously, thread P in the runtime attempts to preempt thread A by also suspending it. But both thread D and P need to manipulate thread A's context: thread D to back up RIP by 1, and thread P to inject an asyncPreempt call.

It's tricky to come up with a specific sequence of steps that leads to this problem because WaitForDebugEvent suspends the whole process when any thread hits a debug event. The only way I can think of requires whole process suspend to be non-atomic:

  1. Thread P calls SuspendThread to queue a suspend for thread A and GetThreadContext, which blocks.
  2. Thread A hits a breakpoint, which enters the kernel. This suspends thread A and runs APCs for A, which wakes thread P's GetThreadContext. Suppose other threads haven't been suspended yet.
  3. Thread P's GetThreadContext returns.
  4. The kernel suspends all the other threads (including thread P).
  5. Thread D wakes up from WaitForDebugEvent, uses GetThreadContext and SetThreadContext to back up RIP by 1, and resumes the process. Thread A still has a non-zero suspend count, so it remains suspended.
  6. Thread P calls SetThreadContext to inject the asyncPreempt call, which that smashes thread D's context update, and resumes thread A.
  7. Eventually asyncPreempt returns to the RIP after the INT3, which isn't a valid instruction (or is the wrong instruction). Badness.

This has a pretty narrow race window (I think), but could still happen, which is bad. Are there also "easier" ways to get into a bad state?

@aarzilli
Copy link
Contributor Author

@aarzilli aarzilli commented Mar 2, 2020

I think what happens is:

  1. Thread A hits a breakpoint and enters the kernel
  2. while thread A is in the kernel, thread P calls SuspendThread (which just increments a counter) and GetThreadContext which returns immediately because thread A is already suspended
  3. after a while thread A successfully suspends the rest of the process
  4. thread D wakes up from WaitForDebugEvent

the race window for this would be the scheduling quantum which is plenty big enough for this to happen regularly in a stress test. The way to reproduce this is to revert the changes I made to Delve to disable async preemption and then run TestBreakpointCounts (or better enable TestBreakpointCountsWithDetection and run that one) inside pkg/proc/proc_test.go.

I don't have a minimal reproducer at the moment but the code of Delve that actually runs with that test isn't a lot.

@aarzilli
Copy link
Contributor Author

@aarzilli aarzilli commented Mar 2, 2020

PS. I don't know for sure that that's what happens, I don't have access to the source code of windows, and it would imply an unfortunate implementation of GetThreadContext but I don't think that's unlikely since this isn't the "intended" use of SuspendThread/GetThreadContext.

@artli
Copy link

@artli artli commented Mar 20, 2020

I seem to be having a related problem: when debugging in GoLand, the process of stepping over the source code lines often gets interrupted and instead of the next line the execution continues at runtime/preempt_amd64.s for no apparent reason.

I am not familiar with the details of the debugging process nor asynchronous preemption, but assuming my problem is connected to what is discussed in this thread, do you possibly know any kind of a temporary fix that can prevent this from happening and let me debug without interruption?

@networkimprov
Copy link

@networkimprov networkimprov commented Mar 20, 2020

@artli did you try running with environment variable GODEBUG=asyncpreemptoff=1?

@artli
Copy link

@artli artli commented Mar 20, 2020

@networkimprov, no, I haven't. Next time I have this issue I'try this method, thanks for the suggestion!

@aarzilli
Copy link
Contributor Author

@aarzilli aarzilli commented Mar 22, 2020

@artli the latest verison of Delve (1.4.0) should disable asynchronous preemption automatically on windows. And the symptoms are not what I would expect. Have you reported this to GoLand?

@artli
Copy link

@artli artli commented Mar 31, 2020

@aarzilli, oh, turns out my installation of GoLand is way out of date and has Delve 1.1.0 vendored with it (for reference, the version can be checked by running <GoLand installation path>\plugins\go\lib\dlv\dlv.exe version). I'll update it and hope the problem goes away. Should have done that first, I guess :)

alexsaezm pushed a commit to alexsaezm/delve that referenced this issue Apr 24, 2020
* tests: misc test fixes for go1.14

- math.go is now ambiguous due to changes to the go runtime so specify
  that we mean our own math.go in _fixtures
- go list -m requires vendor-mode to be disabled so pass '-mod=' to it
  in case user has GOFLAGS=-mod=vendor
- update version of go/packages, required to work with go 1.14 (and
  executed go mod vendor)
- Increased goroutine migration in one development version of Go 1.14
  revealed a problem with TestCheckpoints in command_test.go and
  rr_test.go. The tests were always wrong because Restart(checkpoint)
  doesn't change the current thread but we can't assume that when the
  checkpoint was taken the current goroutine was running on the same
  thread.

* goversion: update maximum supported version

* Makefile: disable testing lldb-server backend on linux with Go 1.14

There seems to be some incompatibility with lldb-server version 6.0.0
on linux and Go 1.14.

* proc/gdbserial: better handling of signals

- if multiple signals are received simultaneously propagate all of them to the
  target threads instead of only one.
- debugserver will drop an interrupt request if a target thread simultaneously
  receives a signal, handle this situation.

* dwarf/line: normalize backslashes for windows executables

Starting with Go 1.14 the compiler sometimes emits backslashes as well
as forward slashes in debug_line, normalize everything to / for
conformity with the behavior of previous versions.

* proc/native: partial support for Windows async preempt mechanism

See golang/go#36494 for a description of why
full support for 1.14 under windows is problematic.

* proc/native: disable Go 1.14 async preemption on Windows

See golang/go#36494
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 16, 2020

Rolling forward to 1.16.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.15, Go1.16 Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.