New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: can not read stacktrace from a core file #25218
Comments
I assume this is all with GOTRACEBACK=crash. GDB is breaking at the first SIGABRT, which isn't what actually kills the process:
So that much makes sense. Is the real complaint here that the application's stack trace doesn't appear in the core dump? |
Yes. I would expect some goroutine's stack to have a frame for main.main and main.crash in it, at least. |
Well, it would still be on the goroutine's stack, but the thread is on the signal handler stack when it dies, not the goroutine stack. And IIRC Austin said there's no way to convince GDB to show you a different stack for a thread while handling a core file without writing a custom backtracer and even then it was very hard or didn't work or something. My inclination would be to deregister the signal handler for sig in dieFromSignal. If the first signal kills the process then you'd get the first stack in the core file, which is what we want. But all this stuff is incredibly subtle, so there's probably a reason that wouldn't work, maybe some cgo interop thing. Does Delve do any better than GDB here? In principle it could show the goroutine stack along with gsignal. |
It used to do better, but it doesn't anymore after that commit. In particular:
doesn't look true to me. What it looks like to me is that the thread is running on the normal stack of goroutine 1, if it was a signal handling stack it would have goid == 0, right? Also, how does sigfwdgo call dieFromSignal? |
Where are you seeing goroutine 1 that you trust? The panic backtrace shows goroutine 1, but that happens before dieFromSignal. gdb shows 1 in info goroutines, but it combines scheduler information with thread state in a way that'll just end up showing whatever the thread's up to, not the user code that was running for goroutine 1.
Line 637 in 28b40f3
or am I missing something? I'm probably out of my depth at this point. Hopefully Austin or Elias can weigh in. |
I'm seeing it while debugging delve, the TLS for the first thread contains a pointer to the same g struct that's in the first entry of runtime.allgs. And it has goid == 1.
oh I didn't see that. |
If you want to handle this case I think you have to. This area of the code is responsible for handling signals that might be caused by C code, so it can't blindly muck with Go stuff until it knows what's up. The setg call you hoped had run is here: Line 343 in 28b40f3
and only runs if sigfwdgo doesn't do anything. sigtrampgo checks the stack pointer to decide if gsignal or g0 is running: Lines 307 to 308 in 28b40f3
(I still kinda think clearing out the signal handler in dieFromSignal is reasonable.) |
Ok, the thread's sp is actually inside gsignal's stack. Do you know where the sp for the normal goroutine stack is saved in this case? g.sched.sp is zero. |
That's a good point. The goroutine wasn't preempted normally, so nothing in the runtime will know what its stack pointer was. The only place you could find it is the ctx argument to the signal handler (the thing we wrap a sigctxt around), and even that will be awkward to interpret in the case of a chain of signal handlers. |
I think the concern here is if the signal needs to be forwarded. E.g., if you have some SIGABRT-trapping crash handling service installed, we want the signal to get forwarded to it. Maybe dieFromSignal could check if we're forwarding it (which we never will be in pure Go programs) and, if not, go straight to its fallback path that clears the handler and raises? |
It is not so crazy to use the signal context to figure out where a signal handler was invoked. That is what gdb does. The key point is to reliably determine whether you are running in a signal handler. I think gdb does that by recognizing the signal trampoline that is on the return stack, which resumes normal execution if the signal handler returns. |
I don't think the chained signal handlers are a problem. We don't manipulate the context when calling down the chain. I was thinking we might be able to stash this away in |
I don't think we need to do anything in |
If we don't forward the signal, we potentially bypass crash reporters (particularly on iOS and Android) and skew people's crash metrics. But I think I'm missing something here:
The saved base pointer in the root frame of the signal stack should point back to the goroutine stack, right? (Do we know why the debuggers aren't following that base pointer? Am I just confused?) |
Since that was sort of addressed to me, I'll say that looking with GDB I think you're right that following the base pointer chain gets you to the right stack, and that seems better than using the signal handler context. Based on Stack Overflow I guess GDB only follows the base pointers when it's out of options. |
Following the base pointer will presumably only work when the base pointer is used by the code being interupted, so it will be unreliable if the signal fires while executing C code compiled with |
I think gdb also won't follow a frame pointer if it points to a smaller address, to avoid walking corrupted stacks or going in cycles. But why doesn't it unwind through the signal context? That I would expect to work here. |
Maybe it's because you aren't telling it where you saved rdx on the stack? |
Never mind, I see now that x86_64_fallback_frame_state always knows how to get the context. My next guess is that, if gdb's documentation is correct, x86_64_fallback_frame_state is only used when no FDE is found, which is not the case for runtime.sigreturn. |
@aarzilli, did you want anything else here? |
I'm satisfied by blacklisting runtime.sigreturn's FDE on delve's side, I've left this issue open because I suspect it would be useful to other debuggers to remove it entirely from the executable. |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?This problem was introduced by b1d1ec9 (CL 110065) and is still present at tip.
Does this issue reproduce with the latest release?
No.
What operating system and processor architecture are you using (
go env
)?What did you do?
Given:
Running it under gdb will produce this stacktrace:
however letting it produce a core file then reading the core file with gdb produces this:
I'm not sure what's happening here. Is the signal handler running and overwriting part of the stack?
The text was updated successfully, but these errors were encountered: