Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/debug: add SetCrashOutput(file *os.File) #42888

Open
egonelbre opened this issue Nov 30, 2020 · 117 comments
Open

runtime/debug: add SetCrashOutput(file *os.File) #42888

egonelbre opened this issue Nov 30, 2020 · 117 comments

Comments

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Nov 30, 2020

Currently there are difficulties with nicely capturing crash output from Go programs. Crashes print to stderr, however, usually both stdout and stderr are used by programs for other purposes. While it's not wrong to output it to stderr, it mixes two outputs together making later separation more difficult.

Capturing crashes is useful for post-mortem debugging and sending reports. Of course, due to unknown crashed program state using a custom callback is understandably dangerous.

A simpler alternative would be to allow changing file descriptor where to print crashes. In the simplest form it could look like:

package main
import "runtime/debug"

func init(){
	debug.SetCrashOutputFD(3)
}

This would allow passing in a fd for separate file, pipe to another "crash monitor" process or connect to an external server. Of course there's a slight possibility of write taking a long time, when passing in a pipe/connection slowing down the crashing of the program.

With regards to naming it could also be SetCrashFD, SetTracebackFD or SetTracebackOutputFD.

@gopherbot gopherbot added this to the Proposal milestone Nov 30, 2020
@ianlancetaylor ianlancetaylor changed the title proposal: allow setting fd for crash output proposal: runtime/debug: allow setting fd for crash output Nov 30, 2020
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Nov 30, 2020
@prattmic
Copy link
Member

@prattmic prattmic commented Nov 30, 2020

As a data point, the gVisor project wants to redirect panic/throw output. It works around this problem by dup'ing over stderr with the FD of the desired crash output location (https://cs.opensource.google/gvisor/gvisor/+/master:runsc/cli/main.go;l=194-198?q=dup3&ss=gvisor%2Fgvisor). stderr is otherwise unused by the program, so this works OK. This is a workable solution, but it would certainly be nicer to be able to change the destination directly.

@mvdan
Copy link
Member

@mvdan mvdan commented Nov 30, 2020

Thinking outloud, the runtime prints other output besides just crashes, such as what one can enable via GODEBUG. Should that be included in this option?

@prattmic
Copy link
Member

@prattmic prattmic commented Nov 30, 2020

I'd think (initially, at least), yes. I'd probably call this something like SetRuntimeFD for any runtime-generated output.

On the other hand, there are nice opportunities if a separate "fatal" FD could be provided that only includes fatal panic/throw output, plus any related prints immediately before a throw. That would allow a nice simplification of "any bytes received on this FD indicates a crash". But it means extra complexity for a rather edge case [1].

[1] I can't think of any non-fatal runtime output that isn't behind a GODEBUG option.

@egonelbre
Copy link
Contributor Author

@egonelbre egonelbre commented Nov 30, 2020

I would say, yes, to including other runtime output as well. If there's a need for additional separation it can be always introduced later. But, I wouldn't be also opposed to introducing it immediately either.

I think the fatal output can be detected when the fd write closes due to the program stopping.

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Dec 1, 2020

@egonelbre how do you propose to use your API on Windows?

I agree. I wanted this feature to capture crash dumps of a Windows service.

/cc @zx2c4

Alex

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Dec 1, 2020

I've handled this until now with defer/recover in each go routine, which is really very subpar and doesn't handle all crashes.

Rather than setting an output fd, what about setting an output callback? There'd be certain restrictions on what it could do, obviously, but it could still probably accomplish most things that people want. In my case, it'd just copy the bytes to some mmap'd ring buffer log file.

@egonelbre
Copy link
Contributor Author

@egonelbre egonelbre commented Dec 1, 2020

@alexbrainman it would behave in a similar manner that you can provide a fd that outputs to a file or a pipe. Flushing it to Event Log would require some external "crash reporting" service that reads that file or pipe. This isn't ideal, but would allow better handling than what can be currently done. Ideal scenario would be to write directly to event-log, however that would probably require specialized machinery.

@zx2c4 I believe the callback has been proposed a few times, however I cannot find the exact proposals. I think the fundamental issue is that you don't know really much about the crashed system - what's broken and what's not. e.g. maybe you got a panic during sigprof where some code holds a runtime lock "x", which your code needs and you'll deadlock when calling file.Write. This is all hypothetical of course.

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Dec 1, 2020

what's not. e.g. maybe you got a panic during sigprof where some code holds a runtime lock "x", which your code needs and you'll deadlock when calling file.Write. This is all hypothetical of course.

Right. If you're using debug.SetCrashCallback, then you are signing up for that kind of fun, and it's up to to write Go code that calls the minimal set of things needed. If the runtime can already do that for spitting to a fd, so can a custom handler function.

Ideal scenario would be to write directly to event-log, however that would probably require specialized machinery.

Writing to event log isn't too bad. It'd support the callback-based approach I mentioned.
(In my case, log writes look like this: https://git.zx2c4.com/wireguard-windows/tree/ringlogger/ringlogger.go#n105 )

@prattmic
Copy link
Member

@prattmic prattmic commented Dec 1, 2020

Right. If you're using debug.SetCrashCallback, then you are signing up for that kind of fun, and it's up to to write Go code that calls the minimal set of things needed. If the runtime can already do that for spitting to a fd, so can a custom handler function.

This means no heap allocation, no stack splits, no map access (IIRC), particularly if you want this to cover runtime throws. Not impossible, but a pretty high bar to set for an API.

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Dec 1, 2020

This means no heap allocation, no stack splits, no map access (IIRC), particularly if you want this to cover runtime throws. Not impossible, but a pretty high bar to set for an API.

But also not so out of place for something in debug, right? We're offering some way to hook into runtime internals, and with that comes the responsibilities of being in the runtime. But maybe there's a better idea here:

The other way of doing this might be to have an unstable and unexported function runtime.setCrashHook, which can then be used by wrapper packages like:

//go:linkname setCrashHook runtime.setCrashHook

That then represents the lowest level. At a higher level, then the debug package can use that to implement things like debug.SetCrashFD and the Windows service package can implement things like svc.SetCrashEventlog. And then most users use these normal functions that our libraries can provide.

Meanwhile, insane users like me can dip down into runtime.setCrashHook, along with all of its dangers, for hooking it up my custom ringbuffer mmap'd logger thing, knowing full well that if anything goes wrong, this is unsupported and my fault, etc.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 2, 2020

The restrictions on an output callback would be severe, as @prattmic says, and failure modes would be unpredictable. In Go we try to avoid that kind of deeply unsafe operation.

I think it would be better to keep this issue focused on the much safer operation of setting a file descriptor.

That said, I don't understand what would happen if the descriptor is a pipe that nothing is reading from. Or a file opened on a networked file system that is not responding. What should do in cases like that?

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Dec 2, 2020

The restrictions on an output callback would be severe, as @prattmic says, and failure modes would be unpredictable. In Go we try to avoid that kind of deeply unsafe operation.

What do you think of my proposal above of allowing this to be a general callback via go:linkname -- i.e. just for libraries? That's a mere "implementation detail", but it'd go a long way of enabling this to be extensible for the adventurous.

@prattmic
Copy link
Member

@prattmic prattmic commented Dec 2, 2020

That said, I don't understand what would happen if the descriptor is a pipe that nothing is reading from. Or a file opened on a networked file system that is not responding. What should do in cases like that?

I'm not sure we need to worry about those (beyond documentation, perhaps). stderr could already be any of those kinds of descriptors, as set by the parent process, so the same problems would exist today.

@egonelbre
Copy link
Contributor Author

@egonelbre egonelbre commented Dec 2, 2020

That said, I don't understand what would happen if the descriptor is a pipe that nothing is reading from. Or a file opened on a networked file system that is not responding. What should do in cases like that?

I guess this is the main danger with the proposal. As @prattmic mentioned, somebody could pipe stderr to somewhere that isn't being read. I didn't test it, but I think that would block in the same way.

Using a non-blocking write that drops data when the fd is not responding would be nice, however I suspect that would be difficult to implement.

It'll definitely need examples how to write the listening side.

Other than that, I don't have any further ideas.

@egonelbre
Copy link
Contributor Author

@egonelbre egonelbre commented Dec 2, 2020

This might be still difficult due to the constraints, but if the writing is on a different thread and the previous write hasn't finished in appropriate time, the write can be dropped. The threshold could be configured via GODEBUG or similar.

@rsc
Copy link
Contributor

@rsc rsc commented Dec 2, 2020

Should we just have GODEBUG=debugfd=3?

/cc @aclements

@rsc rsc moved this from Incoming to Active in Proposals Dec 2, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 2, 2020

@zx2c4 I'm not fond of documenting a callback to use with go:linkname either. I tend to think that people who operate at that level can customize their Go distribution.

@aclements
Copy link
Member

@aclements aclements commented Dec 2, 2020

I'd be okay with GODEBUG=debugfd=3. It's not straightforward to implement, though, since we don't distinguish between runtime output and anything else printed by print(ln). It would actually be much easier to implement just for tracebacks because we could use something like the gp.writebuf redirection, just to an FD instead. But maybe that's better anyway?

I agree with @alexbrainman that I'm not sure how this would work on Windows. Internally, we treat any "fd" other than 1 or 2 as a raw Windows handle value, but I don't know if such a thing can be meaningfully passed as an environment variable. Is it possible for a parent process to create a handle and pass it to a child like this?

On the topic of a callback, in addition to the subtle and rather onerous restrictions on the callback, those restrictions can also change with new releases. We don't expose anything remotely like that right now.

@egonelbre
Copy link
Contributor Author

@egonelbre egonelbre commented Dec 2, 2020

I did think about GODEBUG, but I suspect that would be complicated to use in Windows services since you don't have a nice way to figure out the fd prior to starting it. The main process needs to respond to messages (https://github.com/golang/sys/blob/ef89a241ccb347aa16709cf015e91af5a08ddd33/windows/svc/example/service.go#L23).

debug.SetCrashFD allows the main process to start a "monitor process" and set the fd accordingly, while the main process can handle the "service" responsibilities.

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Dec 2, 2020

@alexbrainman it would behave in a similar manner that you can provide a fd that outputs to a file or a pipe. Flushing it to Event Log would require some external "crash reporting" service that reads that file or pipe. This isn't ideal, but would allow better handling than what can be currently done. Ideal scenario would be to write directly to event-log, however that would probably require specialized machinery.

@egonelbre I don't understand what you are suggesting.

If I have a Windows service written in Go, how can I redirect its crash dump to a file by using debug.SetCrashOutputFD(3) ? Let's say I want crash dump file to be called c:\a.txt.

Similarly, I don't see how using debug.SetCrashOutputFD(3) would help me write to Event Log.

Maybe we can use file path instead of file descriptor? Like debug.SetCrashOutput("/tmp/a.txt").

Alex

@aclements
Copy link
Member

@aclements aclements commented Dec 2, 2020

@alexbrainman , for writing to a file, I imagine you would open the file and then pass that handle to debug.SetCrashOutputFD (or whatever). I don't know how the event log works, but I imagine you'd start a second process with a pipe between them and the second process would be responsible for writing to the event log.

debug.SetCrashFD allows the main process to start a "monitor process" and set the fd accordingly, while the main process can handle the "service" responsibilities.

Is there a reason this can't be done the other way around? The first process is the monitor process and opens the FD, then starts the second process with the GODEBUG environment set. Does this not work for Windows services?

(I'm not necessarily opposed to having a runtime/debug function for this, just trying to understand the space of constraints. One nice thing about the environment variable is that it can catch very early panics, like during package init.)

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Dec 2, 2020

@zx2c4 I'm not fond of documenting a callback to use with go:linkname either. I tend to think that people who operate at that level can customize their Go distribution.

Yea, documenting it is probably the wrong thing to do. But having something in there akin to nanotime would be "very nice"...and would allow me to implement something for Go's Windows service package that uses it to ship things to eventlog automatically.

Anyway, I realize pushing this too hard is an loosing battle, for good reason. But if somehow the implementation happened to be structured in a way that made it coincidentally possible to go:linkname it from elsewhere, I would be very happy.

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Dec 2, 2020

Is it possible for a parent process to create a handle and pass it to a child like this?

You can make a handle inheritable, and then pass its value as an argument to the new process. The value will remain the same in the new process. Alternatively, DuplicateHandleEx allows the one process to duplicate a handle into another process.

@egonelbre
Copy link
Contributor Author

@egonelbre egonelbre commented Dec 2, 2020

@alexbrainman, so roughly what I'm thinking is the following. However, I might also forget some internal details that could make the following difficult with Windows.

The most basic thing is to send the output to a separate file:

package main

import (
	"os"
	"runtime/debug"
)

func main() {
	f, err := os.Open("service_log.txt")
	if err != nil {
		os.Exit(1)
		return
	}

	debug.SetCrashFD(f.Fd())

	// rest of the service logic
}

The next step would be to create a watchdog sub-process:

package main

import (
	"os"
	"os/exec"
	"runtime/debug"
)

func main() {
	r, w, err := os.Pipe()
	if err != nil {
		os.Exit(1)
		return
	}

	// this could also self-start, but in a different mode
	// but we'll use a separate binary for now
	cmd := exec.Command("watchdog.exe") 
	cmd.Stdin = r
	// also set other flags here to ensure that the watchdog
	// doesn't immediately close together with the parent.
	cmd.Start()

	debug.SetCrashFD(w.Fd())

	// rest of the service logic
}

You could also create governors using ExtraFiles:

package main

import (
	"os"
	"os/exec"
	"runtime/debug"
)

func main() {
	r, w, err := os.Pipe()
	if err != nil {
		os.Exit(1)
		return
	}

	cmd := exec.Command("watchdog.exe") 
	cmd.ExtraFiles = []*os.File{w}
	cmd.Start()

	// monitor `r`
}

// The other process has `debug.SetCrashFD(3)` in init,
// with some check that `3` actually exists.

For windows, I think (named) pipes could also be used:

f, err := os.Open(`\\.\pipe\randomhexvalue`)
...
debug.SetCrashFD(f.Fd())

@rsc
Copy link
Contributor

@rsc rsc commented Apr 28, 2021

On further reflection, one important benefit of *os.File would be that it would let the runtime/debug package hold on to the underlying file object and keep it from being garbage collected, which in turn would keep the fd from being closed. Changing to *os.File.

@rsc rsc changed the title proposal: runtime/debug: add SetCrashFD(fd uintptr) proposal: runtime/debug: add SetCrashOutput(file *os.File) Apr 28, 2021
@prattmic
Copy link
Member

@prattmic prattmic commented Apr 28, 2021

@prattmic, re "prints immediately prior to a throw." I think it would be fine for the runtime to set a flag before those prints that includes them in the throw. That can count as "crash output".

Yup, I totally agree.

@rsc
Copy link
Contributor

@rsc rsc commented May 5, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Accept in Proposals May 5, 2021
@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented May 5, 2021

I'm still catching up on Go things. We've had a lot of discussion here so I was wondering what parts of this are in the "likely accept". Does that include something like https://go-review.googlesource.com/c/go/+/278792/ for the purpose of the eventlog x/sys/windows API?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 5, 2021

@zx2c4 I believe that what is accepted is:

  • add debug.SetCrashFD(file *os.File). That file gets crash output only, not user-generated prints, nor runtime-generated tracing prints (like GODEBUG=gctrace=1). Stderr still gets the crash output too.
  • Add a linkname backdoor for x/sys/windows (*Log).SetCrashEvent(eid uint32).

@rsc
Copy link
Contributor

@rsc rsc commented May 12, 2021

Yes, see #42888 (comment).

@rsc rsc moved this from Likely Accept to Accepted in Proposals May 12, 2021
@rsc
Copy link
Contributor

@rsc rsc commented May 12, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: runtime/debug: add SetCrashOutput(file *os.File) runtime/debug: add SetCrashOutput(file *os.File) May 12, 2021
@rsc rsc removed this from the Proposal milestone May 12, 2021
@rsc rsc added this to the Backlog milestone May 12, 2021
@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented May 12, 2021

Ooops, about 5 hours late.

The "Add a linkname backdoor for x/sys/windows (*Log).SetCrashEvent(eid uint32)." part would get all println statements, right, while SetCrashFD wouldn't. Correct?

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented May 12, 2021

I've updated/rebased https://go-review.googlesource.com/c/go/+/278792 which should be inline with this proposal.

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented May 13, 2021

The "Add a linkname backdoor for x/sys/windows (*Log).SetCrashEvent(eid uint32)." part would get all println statements, right, while SetCrashFD wouldn't. Correct?

That is not how I read #42888 (comment). See add debug.SetCrashFD(fd uintptr). That fd gets crash output only, not user-generated prints, nor runtime-generated tracing prints (like GODEBUG=gctrace=1). ... bit.

Alex

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented May 13, 2021

Based on #42888 (comment) I would assume this is including all println statements. Remember: the whole point of the linkname escape hatch is for when fd-based output isn't possible.

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented May 14, 2021

Based on #42888 (comment) I would assume this is including all println statements.

I still disagree with you. But I will let others to confirm.

Alex

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Jun 10, 2021

@rsc - Could we get https://go-review.googlesource.com/c/go/+/278792 in for 1.17? It'd be a nice first step.

crawshaw added a commit to tailscale/tailscale that referenced this issue Aug 3, 2021
Given that golang/go#42888 is coming, this
catches most practical panics without interfering in our development
environments.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
crawshaw added a commit to tailscale/tailscale that referenced this issue Aug 3, 2021
Given that golang/go#42888 is coming, this
catches most practical panics without interfering in our development
environments.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
crawshaw added a commit to tailscale/tailscale that referenced this issue Aug 3, 2021
Given that golang/go#42888 is coming, this
catches most practical panics without interfering in our development
environments.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Nov 3, 2021

@rsc - following up on this... could you change your -2 to a +2 on https://go-review.googlesource.com/c/go/+/278792 ?

gopherbot pushed a commit that referenced this issue Nov 4, 2021
…pointer

The x/sys/windows package currently uses go:linkname for other facilities
inside of runtime that are not suitable to be exposed as a public API
due to their dangers but are still necessary for manipulating any
low-level plumbing that the runtime controls.

Logging, via the built-in println and panic handler, is one such
low-level plumbing feature. In this case, x/sys/windows/svc needs to be
able to redirect panics to the Windows event log. Because the event log
is a complicated interface, this requires a bit more fiddling than the
simple solution used on Android (baking it into runtime itself), and
because Windows services are very diverse, the event log might not even
always be a desirable destination.

This commit accomplishes this by exposing a function pointer called
"overrideWrite" that low-level runtime packages like x/sys/windows/svc
can use to redirect output logs toward the event log or otherwise.

It is not safe or acceptable to use as a generic mechanism, and for that
reason, we wouldn't want to expose this as a real stable API, similar to
the other instances of go:linkname in x/sys/windows. But for packages
that must interoperate with low-level Go runtime fundamentals, this is a
safety hatch for packages that are developed in tandem with the runtime.
x/sys/windows is one such package.

Updates #42888.

Change-Id: I77a32ff7e1494324e8cc38e792e007f86d32672d
Reviewed-on: https://go-review.googlesource.com/c/go/+/278792
Trust: Jason A. Donenfeld <Jason@zx2c4.com>
Run-TryBot: Jason A. Donenfeld <Jason@zx2c4.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
@odeke-em
Copy link
Member

@odeke-em odeke-em commented Mar 5, 2022

@rsc - following up on this... could you change your -2 to a +2 on https://go-review.googlesource.com/c/go/+/278792 ?

Thanks for the ping @zx2c4, Russ did turn it into a +2 and that change got merged on Nov 4th 2021.

@tmm1
Copy link
Contributor

@tmm1 tmm1 commented Mar 30, 2022

The method proposed here would be very helpful in capturing crashes on iOS when using embedded golang via x/mobile

I am using a similar technique in #19744 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Development

No branches or pull requests