New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: unexpected fault address 0xffffffffffffffff (and other more innocent addresses) #15861

Open
MStoykov opened this Issue May 27, 2016 · 5 comments

Comments

Projects
None yet
3 participants
@MStoykov

MStoykov commented May 27, 2016

After updating the go version from 1.5.1 to 1.6.2 our application crashed right after startup in production.
We run it with start-stop-daemon and use the '-u' flag to change the user. This is relevant because changing this to sudo-ing the start-stop-daemon to the user instead of using the flag the crash vanished. We figured this out trough experimentation.

# this crashes
start-stop-daemon -u user --start golang-cool-api-v2000 
# this doesn't crash
sudo -u user start-stop-daemon --start golang-cool-api-v2000 
  1. What version of Go are you using (go version)?
    1.6.2
  2. What operating system and processor architecture are you using (go env)?
    Linux 3.2.12-gentoo 64bit
  3. What did you do?
    Update to golang 1.6.2 from 1.5.1
  4. What did you expect to see?
    No difference :)
  5. What did you see instead?
unexpected fault address 0xffffffffffffffff
fatal error: fault
[signal 0xb code=0x1 addr=0xffffffffffffffff pc=0xffffffffffffffff]

goroutine 19 [running]:
runtime.throw(0x9a3280, 0x5)
        /usr/local/go/src/runtime/panic.go:547 +0x90 fp=0xc82002aed8 sp=0xc82002aec0
runtime.sigpanic()
        /usr/local/go/src/runtime/sigpanic_unix.go:27 +0x2ab fp=0xc82002af28 sp=0xc82002aed8
runtime.call32(0x0, 0xc820109d20, 0xc820bea000, 0x1000000010)
        /usr/local/go/src/runtime/asm_amd64.s:472 +0x3e fp=0xc82002af50 sp=0xc82002af28
runtime.runfinq()
        /usr/local/go/src/runtime/mfinal.go:200 +0x299 fp=0xc82002afc0 sp=0xc82002af50
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc82002afc8 sp=0xc82002afc0
created by runtime.createfing
        /usr/local/go/src/runtime/mfinal.go:139 +0x60

the address isn't the same between crashes

unexpected fault address 0xc8206fdd4a
fatal error: fault
[signal 0xb code=0x2 addr=0xc8206fdd4a pc=0xc8206fdd4a]

goroutine 18 [running]:
runtime.throw(0x9a3280, 0x5)
        /usr/local/go/src/runtime/panic.go:547 +0x90 fp=0xc82002e6d8 sp=0xc82002e6c0
runtime.sigpanic()
        /usr/local/go/src/runtime/sigpanic_unix.go:27 +0x2ab fp=0xc82002e728 sp=0xc82002e6d8
runtime.call32(0x0, 0xc820c4a040, 0xc820c5c000, 0x1000000010)
        /usr/local/go/src/runtime/asm_amd64.s:472 +0x3e fp=0xc82002e750 sp=0xc82002e728
runtime.runfinq()
        /usr/local/go/src/runtime/mfinal.go:200 +0x299 fp=0xc82002e7c0 sp=0xc82002e750
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc82002e7c8 sp=0xc82002e7c0
created by runtime.createfing
        /usr/local/go/src/runtime/mfinal.go:139 +0x60

This stacktrace doesn't contain any relevant information for me. All the stackframes of the other goroutines don't seem the same between crashes.
It might be relevant (given the mfina.go in the stack) that this happened two times after the 9th gc and once after the 7th gc (GODEBUG="gctrace=2" is set)

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented May 27, 2016

The crashes point to memory corruption when running a finalizer. Where does your code call runtime.SetFinalizer? Do you use cgo or SWIG? Do you use unsafe?

Look closely at code that uses objects for which runtime.SetFinalizer has been called, and make sure those objects are alive. See #13347 for the kinds of problems that can occur.

@ianlancetaylor ianlancetaylor added this to the Go1.7Maybe milestone May 27, 2016

@MStoykov

This comment has been minimized.

MStoykov commented May 30, 2016

We have a single runtime.SetFinalizer in all the code including every dependency (except the golang standard library). There is no cgo - we have a counter on the status page saying how many calls to cgo there have been which stays at 1. No SWIG either. No unsafe, I double checked because I was surprised we managed to remove every instance of it.

The single finalizer (which is not needed and we should probably remove it) is unlikely to be called so early in the execution of the server. And even if it does it does not look like #13347 .

The interesting thing here is that running it with start-stop-daemon and -u flag is the only thing that makes it crash. The documentation for start-stop-daemon says that it sets HOME to the user's home which was tried with sudo and no crash:

HOME=/dev/null sudo -u user start-stop-daemon --start ./nedomi # yes the user home is /dev/null

I'm gonna try to run a few more tests later today in hopes of giving better directions on how reproduce it from run it with start-stop-daemon on the gentoo that we are using

@davecheney

This comment has been minimized.

Contributor

davecheney commented May 30, 2016

You can delete that finalizer, all os.File objects have a finalizer which
does the same thing.

On Mon, 30 May 2016, 18:00 Mihail Stoykov notifications@github.com wrote:

We have a single runtime.SetFinalizer
https://github.com/ironsmile/nedomi/blob/8380d698b367963a6b63e27a161d43224acabc9f/logger/ironsmile/impl.go#L92
in all the code including every dependency (except the golang standard
library). There is no cgo - we have a counter on the status page saying how
many calls to cgo there have been which stays at 1. No SWIG either. No
unsafe, I double checked because I was surprised we managed to remove every
instance of it.

The single finalizer (which is not needed and we should probably remove
it) is unlikely to be called so early in the execution of the server. And
even if it does it does not look like #13347
#13347 .

The interesting thing here is that running it with start-stop-daemon and
-u flag is the only thing that makes it crash. The documentation for
start-stop-daemon says that it sets HOME to the user's home which was tried
with sudo and no crash:

HOME=/dev/null sudo -u user start-stop-daemon --start ./nedomi # yes the user home is /dev/null

I'm gonna try to run a few more tests later today in hopes of giving
better directions on how reproduce it from run it with start-stop-daemon
on the gentoo that we are using


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#15861 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAAcAwWQCV5I5OQhaXfQQjUEm16glB2Aks5qGpkWgaJpZM4Ioqsq
.

@ianlancetaylor ianlancetaylor changed the title from unexpected fault address 0xffffffffffffffff (and other more innocent addresses) to runtime: unexpected fault address 0xffffffffffffffff (and other more innocent addresses) Jun 9, 2016

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 9, 2016

The docs I see for start-stop-daemon suggest that -u doesn't do anything with --start other than determine whether the process is already running. To set the user ID under which the program runs, you use the -c option. But there are probably multiple versions of start-stop-daemon, I don't know which one you are using. What does your program do if you just run it without using either start-stop-daemon or sudo?

@MStoykov

This comment has been minimized.

MStoykov commented Jun 10, 2016

the manual for the start-stop-daemon used at the machine says that -u is used to switch user('-c' is synonym for it). Running the nedomi(the program in question) in any other way but with start-stop-daemon with '-u' works in so far as it doesn't crash for at least a few minutes.
Quotes:

     -u, --user user[:group]
             Start the daemon as the user and update $HOME accordingly or stop daemons owned by the user. You can optionally append a group name here also.
     -c, --chuid user
             Same as the -u, --user option.

The actual way that we found that nedomi runs at all on the server is that I ran it without anything to see how long it takes to crash.
We tried with different arguments to start-stop-daemona and/or sudo as well as changing $HOME to the actual home of the user, while using sudo. And only start-stop-daemon with '-u' let to a very fast crash. Everything else seemed to run flawlessly for at least a minute. The server is still running with no crashes while using sudo + start-stop-daemon.

Unfortunately more pressing things at work mean that I can't actually investigate more on how it crashes. The installed star-stop-daemon should be build from this tree as it belongs to openrc-0.9.9.3 package

@ianlancetaylor ianlancetaylor modified the milestones: Unplanned, Go1.7Maybe Jun 10, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment