Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: panicmem should expose the address it's panicking about. #37023

Open
seebs opened this issue Feb 4, 2020 · 8 comments
Open

runtime: panicmem should expose the address it's panicking about. #37023

seebs opened this issue Feb 4, 2020 · 8 comments

Comments

@seebs
Copy link
Contributor

@seebs seebs commented Feb 4, 2020

What version of Go are you using (go version)?

1.13

Does this issue reproduce with the latest release?

probably

What operating system and processor architecture are you using (go env)?

AMD64

What did you do?

I don't know, but it resulted in a segfault. :) But then I started experimenting with panic and recover, and discovered (somewhat to my surprise) that I can catch a SEGV's panic.

What did you expect to see?

A runtime error/exception/something that has comparable information to what I'd get if I didn't recover().

What did you see instead?

The string "invalid memory address or nil pointer dereference"

So, contrast:

unexpected fault address 0x7f84e68d6112
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x7f84e68d6112 pc=0x5f17cb]

goroutine 992352 [running]:
runtime.throw(0xfab924, 0x5)
        /usr/local/Cellar/go/1.13/libexec/src/runtime/panic.go:774 +0x72 fp=0xd5b852e7f8 sp=0xd5b852e7c8 pc=0x42f3b2
runtime.sigpanic()
        /usr/local/Cellar/go/1.13/libexec/src/runtime/signal_unix.go:401 +0x3de fp=0xd5b852e828 sp=0xd5b852e7f8 pc=0x444a3e
[...]

If I look at [runtime/]debug.StackTrace(), I can get some of this information, but I can't necessarily get the actual address of the unexpected fault. It looks to me like runtime has this information at the time when the panic is generated (or would be, if I turned on SetPanicOnFault):

        case _SIGSEGV:
                if (g.sigcode0 == 0 || g.sigcode0 == _SEGV_MAPERR || g.sigcode0 == _SEGV_ACCERR) && g.sigcode1 < 0x1000 {
                        panicmem()
                }
                // Support runtime/debug.SetPanicOnFault.
                if g.paniconfault {
                        panicmem()
                }
                print("unexpected fault address ", hex(g.sigcode1), "\n")
                throw("fault")

But it looks like panicmem doesn't expose these values. Possibly it should. I can't think of a reasonable approach immediately -- obviously I'd just like the siginfo_t exposed, except that this is unportable even by memory fault handling standards and won't be compatible between implementations. But it seems like exposing the affected address (g.sigcode1) and possibly the kind-of-error (SEGV_MAPERR or SEGV_ACCERR?)

But as a simple baseline that I think is probably supportable on current systems, if panicmem took a uintptr, and the resulting panic were an error type that could be queried for the uintptr, or displayed it by default, that might make life easier.

Because on the one hand, I want to catch the panics and prevent them from breaking things, but on the other hand, diagnosing them sort of benefits from having the address be available.

(On the third hand, it might be that changing the exact panic result in these cases would break the Compatibility Promise, without some hypothetical new debug.SetFancyPanicOnFault() that would not have been used in the existing code.)

@bcmills bcmills changed the title propsoal: panicmem should expose the address it's panicking about. proposal: panicmem should expose the address it's panicking about. Feb 4, 2020
@gopherbot gopherbot added this to the Proposal milestone Feb 4, 2020
@gopherbot gopherbot added the Proposal label Feb 4, 2020
@bcmills bcmills changed the title proposal: panicmem should expose the address it's panicking about. proposal: runtime: panicmem should expose the address it's panicking about. Feb 4, 2020
@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Feb 4, 2020

We could do something about this. The types of runtime panics are explicitly unspecified: https://golang.org/ref/spec#Run_time_panics
We can add a Addr() method to the value we panic with, so you can do

type addrable interface {
    Addr() uintptr
}
e := recover()
if a, ok := e.(addrable), ok {
   _ = a.Addr()
}

Should we include kind-of also? Not sure. We can rapidly reach implementation-specific areas, which I don't think we want to do.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Feb 5, 2020

You can't always get the address either. For example on amd64 if you fault because the address is malformed (top 17 bits not all 0s or all 1s) then you don't get the address in the fault at all. In that case it definitely can't be passed anywhere else. I am not convinced we should do this.

Why do you want it? There's not usually much use to it.

@seebs

This comment has been minimized.

Copy link
Contributor Author

@seebs seebs commented Feb 5, 2020

Yeah, I noticed the lack of address with high bits set recently. I remember all the mc68k programs that said "oh hey the address bus is only 24 bits, we can use that top word and just mask it out", and I am so looking forward to having that again in amd64 in a little bit here.

The reason I want it is that it turns out to let me distinguish between "i have no idea what garbage was here" and "wow what an amazing coincidence that address is in a space that got munmapped recently".

So if I get a throw(), I can only dump every mmap operation as it happens just in case there's a crash later, and then after a week of logging, I get a crash in which the relevant thing is in the last five lines of the logs. If I use PanicOnFault, I can't tell which of the things are relevant, but I can look at recent activity. If I used PanicOnFault, and got the address (when possible/applicable), I could look at the recent activity and report exactly what happened, and life would be a lot simpler. (And it really does seem to be days-to-weeks between events.)

I agree that there's not usually much use to it, but in the specific case where mmap is involved (which is pretty much the only time I'd be likely to want PanicOnFault anyway), the distinction between "an invalid address" and "an invalid address which I can trace to a specific memory-mapped region" is very significant.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Feb 12, 2020

Adding just the address does help with PanicOnFault, so I could see maybe doing this.
Thoughts, @aclements, @ianlancetaylor, @randall77?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 12, 2020

I don't see a problem as long as we document that the information is system-dependent, and is not always available or reliable.

Would this be a language spec change or just a runtime package change? The spec says that runtime errors have "perhaps other methods" so I guess we can just do it in the runtime.

There are three signals that produce a runtime panic: SIGBUS, SIGSEGV, SIGFPE. They all set the si_addr field. SIGBUS and SIGSEGV set it to the memory address that caused the fault. SIGFPE sets it to the memory address of the instruction that faulted. Should we provide an address in all cases? I think not, as on a soft floating point implementation we will get a floating point exception without a signal and without a specific associated instruction.

@seebs

This comment has been minimized.

Copy link
Contributor Author

@seebs seebs commented Feb 12, 2020

Intuitively, I would be really confused if a runtime panic from SIGFPE gave me a different kind of address than the other two do, and it's redundant with the already-available stack backtrace. So I think it'd probably be more confusing than not to provide that one. (If SIGFPE gave the address of the floating point object, that might almost be useful, but not very because if you know what code is executing, you're already able to narrow it down pretty well.)

I don't think it makes sense to add a spec change, because this is so specific to a particular runtime implementation; if you don't have runtime.PanicOnFault, it'll never come up.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Feb 26, 2020

SIGFPE is already turning into a different panic.
If the target here is PanicOnFault then that would only apply to the panic generated in that case, which is SIGBUS, SIGSEGV.
This need not be a language spec change (it doesn't mention PanicOnFault either), and we can be suitably non-committal about how good the address is.

Based on the discussion, it seems like this is a likely accept.

@rsc rsc moved this from Active to Likely Accept in Proposals Feb 26, 2020
@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Mar 4, 2020

No change in consensus, so accepting.

@rsc rsc moved this from Likely Accept to Accepted in Proposals Mar 4, 2020
@rsc rsc modified the milestones: Proposal, Backlog Mar 4, 2020
@rsc rsc changed the title proposal: runtime: panicmem should expose the address it's panicking about. runtime: panicmem should expose the address it's panicking about. Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.