New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go can't be recorded #1861
Comments
WFM with Go 1.6 in Fedora 24 and Ubuntu 16.06. Guess I'll have to try it in Ubuntu 14.04. |
This is a stack overflow calling into the syscallbuf. Presumably this is Go's split-stack implementation allocating a small initial stack and allocating new stack pages based on assumptions about stack usage which are broken by syscallbuf. Need to think about what to do here. |
I think we need to have the syscallbuf code switch to an alternate stack. The question is where to put those stacks. Perhaps the simplest thing to do would be to use the thread_locals page. (Currently we assume syscallbuf only needs 2K of stack but that's conservative and could be reduced further.) The problem is that all those thread_locals pages are at the same address so this will muck up watchpoints and other debugging of anything on the syscallbuf stack. |
Hmm in fact if we used the thread_locals page then whenever you're stopped in gdb and you try to get a backtrace of another thread that's blocked in syscallbuf, you'll get the wrong answer. That would be really bad. |
BTW the reason this doesn't show up as easily in Go 1.6 is that in Go 1.3 they switched to contiguous stacks: when they run out of stack space, a new stack is created by copying the old stack into a larger area. Since they're still relying on manual stack-size checking and not kernel stack extension, this is probably still an rr issue, just harder to trigger. |
Another possible approach is to detect SIGSEGVs triggered by the instruction that extends the syscallbuf stack, and in that case redirect program state to a stub that just triggers a non-buffered syscall and returns. Unfortunately these stacks might not have a guard page (they don't seem to in Go 1.2) so we can't rely on the SIGSEGV being triggered --- we might just start overwriting some other memory. So let's not do this. |
Hmm, apart from the stack used by the syscallbuf hook code itself, there's also the 256-byte area we skip in X64SyscallStubExtendedJump to avoid corrupting the redzone. |
I think we need to allocate a per-thread page for a syscallbuf alternate stack. We can store its address in the preload_thread_locals page. X64SyscallStubExtendedJump should switch stacks to this page to avoid the redzone issue. The tricky thing is, how do we allocate and deallocate these pages? |
Ah, I think we can just use part of the per-thread scratch buffer! Extend it by a page and use the last page. |
Getting the CFI right so stack unwinding works will be a bit tricky but we should be able to manage it. |
A potential problem with this is that a signal handler that runs during a blocked syscallbuf syscall runs on this alternate stack :-(. |
Can we just use sigaltstack for that? |
No because these are application signal handlers. If they expect to see the stack pointer pointing to an application stack, they'll get confused no matter which alt-stack we use. Another issue which I've just uncovered is the signal handler calling a hooked system call. It tries to reuse the alt-stack, corrupting the stack frames already using it... |
I can avoid the latter issue with some extra code to detect nested usage of the alt-stack, and for nested calls, suppress the stack switch. The question is whether signal handlers will get confused in the presence of our stack switch, e.g. if they try to just scan from current $rsp to some top-of-stack saved in TLS. |
Go, of course, uses SA_STACK for its signal handlers so they all use their own alt-stacks. So we have the situation that for Go we need to use our own alt-stack, but possibly other applications could be harmed by that. |
I think that it might be possible, if necessary, to have rr switch the stack pointer back to the original stack pointer before delivering the signal, and restore it if/when the handler does a sigreturn. So I'm going to go with this approach. |
Handling |
This is wrong. |
|
This change was quite complicated and some additional scrutiny would be welcome :-). |
Setup: rr revision f024ebc, Ubuntu 14.04 on amd64, golang package version 2:1.2.1-2ubuntu1
STR:
rr record go help
produces this:However,
rr record --no-syscall-buffer
seems to work, slowly but usably, both ongo help
and the less-trivial original test case (the BoringSSL test suite).The text was updated successfully, but these errors were encountered: