Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: unexpected fault address (when low on disk space?) #37310

Open
bradfitz opened this issue Feb 20, 2020 · 2 comments
Open

cmd/link: unexpected fault address (when low on disk space?) #37310

bradfitz opened this issue Feb 20, 2020 · 2 comments
Milestone

Comments

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Feb 20, 2020

I just got some mysterious linker errors. I suspect they're because this machine only has 20MB of disk free (which I just noticed).

# tailscale.io/control/cfgdb.test
unexpected fault address 0x7f24b561d000
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0x7f24b561d000 pc=0x463c2e]

goroutine 1 [running]:
runtime.throw(0x6b7228, 0x5)
        /home/bradfitz/go/src/runtime/panic.go:1112 +0x72 fp=0xc003df2ee8 sp=0xc003df2eb8 pc=0x432f62
runtime.sigpanic()
        /home/bradfitz/go/src/runtime/signal_unix.go:674 +0x443 fp=0xc003df2f18 sp=0xc003df2ee8 pc=0x449533
runtime.memmove(0x7f24b561b6b0, 0x7f24b6c77c85, 0x22df)
        /home/bradfitz/go/src/runtime/memmove_amd64.s:108 +0xbe fp=0xc003df2f20 sp=0xc003df2f18 pc=0x463c2e
cmd/link/internal/ld.(*OutBuf).Write(0xc000024900, 0x7f24b6c77c85, 0x22df, 0x22df, 0x1, 0x1, 0x0)
        /home/bradfitz/go/src/cmd/link/internal/ld/outbuf.go:65 +0xa1 fp=0xc003df2f70 sp=0xc003df2f20 pc=0x5c83a1
cmd/link/internal/ld.(*OutBuf).WriteSym(0xc000024900, 0xc002807a90)
        /home/bradfitz/go/src/cmd/link/internal/ld/outbuf.go:159 +0x6c fp=0xc003df2fc0 sp=0xc003df2f70 pc=0x5c8b7c
cmd/link/internal/ld.blk(0xc000024900, 0xc004dd0000, 0x18d8, 0x1c00, 0x5ca6b0, 0x31299c, 0xc0063c4b00, 0x1, 0x1)
        /home/bradfitz/go/src/cmd/link/internal/ld/data.go:787 +0x10f fp=0xc003df3090 sp=0xc003df2fc0 pc=0x570d1f
cmd/link/internal/ld.CodeblkPad(0xc000001e00, 0x401000, 0x31299c, 0xc0063c4b00, 0x1, 0x1)
        /home/bradfitz/go/src/cmd/link/internal/ld/data.go:701 +0xbb fp=0xc003df31a0 sp=0xc003df3090 pc=0x5703fb
cmd/link/internal/amd64.asmb(0xc000001e00)
        /home/bradfitz/go/src/cmd/link/internal/amd64/asm.go:669 +0xc6 fp=0xc003df3200 sp=0xc003df31a0 pc=0x5ec9f6
cmd/link/internal/ld.Main(0x899300, 0x10, 0x20, 0x1, 0x7, 0x10, 0x6c25b5, 0x1b, 0x6be64d, 0x14, ...)
        /home/bradfitz/go/src/cmd/link/internal/ld/main.go:269 +0xd61 fp=0xc003df3358 sp=0xc003df3200 pc=0x5c7451
main.main()
        /home/bradfitz/go/src/cmd/link/main.go:68 +0x1bc fp=0xc003df3f88 sp=0xc003df3358 pc=0x63d2bc
runtime.main()
        /home/bradfitz/go/src/runtime/proc.go:203 +0x212 fp=0xc003df3fe0 sp=0xc003df3f88 pc=0x4355b2
runtime.goexit()
        /home/bradfitz/go/src/runtime/asm_amd64.s:1375 +0x1 fp=0xc003df3fe8 sp=0xc003df3fe0 pc=0x4629c1
FAIL    tailscale.io/control/cfgdb [build failed]
FAIL

And another, which looks like the same:

goroutine 1 [running]:
runtime.throw(0x6b7228, 0x5)
        /home/bradfitz/go/src/runtime/panic.go:1112 +0x72 fp=0xc00004eee8 sp=0xc00004eeb8 pc=0x432f62
runtime.sigpanic()
        /home/bradfitz/go/src/runtime/signal_unix.go:674 +0x443 fp=0xc00004ef18 sp=0xc00004eee8 pc=0x449533
runtime.memmove(0x7f29fc0a5000, 0x7f29fc6447ce, 0x4b)
        /home/bradfitz/go/src/runtime/memmove_amd64.s:205 +0x1b2 fp=0xc00004ef20 sp=0xc00004ef18 pc=0x463d22
cmd/link/internal/ld.(*OutBuf).Write(0xc000024900, 0x7f29fc6447ce, 0x4b, 0x4b, 0x7f2a23f586e0, 0x3f, 0xb)
        /home/bradfitz/go/src/cmd/link/internal/ld/outbuf.go:65 +0xa1 fp=0xc00004ef70 sp=0xc00004ef20 pc=0x5c83a1
cmd/link/internal/ld.(*OutBuf).WriteSym(0xc000024900, 0xc000a5d5f0)
        /home/bradfitz/go/src/cmd/link/internal/ld/outbuf.go:159 +0x6c fp=0xc00004efc0 sp=0xc00004ef70 pc=0x5c8b7c
cmd/link/internal/ld.blk(0xc000024900, 0xc0022e8000, 0xa80, 0xc00, 0x401000, 0x102b7c, 0xc000b93280, 0x1, 0x1)
        /home/bradfitz/go/src/cmd/link/internal/ld/data.go:787 +0x10f fp=0xc00004f090 sp=0xc00004efc0 pc=0x570d1f
cmd/link/internal/ld.CodeblkPad(0xc000001e00, 0x401000, 0x102b7c, 0xc000b93280, 0x1, 0x1)
        /home/bradfitz/go/src/cmd/link/internal/ld/data.go:701 +0xbb fp=0xc00004f1a0 sp=0xc00004f090 pc=0x5703fb
cmd/link/internal/amd64.asmb(0xc000001e00)
        /home/bradfitz/go/src/cmd/link/internal/amd64/asm.go:669 +0xc6 fp=0xc00004f200 sp=0xc00004f1a0 pc=0x5ec9f6
cmd/link/internal/ld.Main(0x899300, 0x10, 0x20, 0x1, 0x7, 0x10, 0x6c25b5, 0x1b, 0x6be64d, 0x14, ...)
        /home/bradfitz/go/src/cmd/link/internal/ld/main.go:269 +0xd61 fp=0xc00004f358 sp=0xc00004f200 pc=0x5c7451
main.main()
        /home/bradfitz/go/src/cmd/link/main.go:68 +0x1bc fp=0xc00004ff88 sp=0xc00004f358 pc=0x63d2bc
runtime.main()
        /home/bradfitz/go/src/runtime/proc.go:203 +0x212 fp=0xc00004ffe0 sp=0xc00004ff88 pc=0x4355b2
runtime.goexit()
        /home/bradfitz/go/src/runtime/asm_amd64.s:1375 +0x1 fp=0xc00004ffe8 sp=0xc00004ffe0 pc=0x4629c1
FAIL    tailscale.com/logtail/filch [build failed]
FAIL
@cherrymui

This comment has been minimized.

Copy link
Contributor

@cherrymui cherrymui commented Feb 20, 2020

Yeah, this is writing to mmap'd memory backed by the output file. SIGBUS may occur if write to mmap'd memory fails (maybe cannot flush to disk?).

It would be good if we can fail nicely. But I don't know how. Maybe write (using file IO) to the last byte of the output file before mmap? If that write fails we can exit with a nice error. Does it work? cc @aclements

@toothrot toothrot added this to the Backlog milestone Feb 20, 2020
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 21, 2020

I haven't looked at the code, but if we're going to mmap the output file, we should fallocate it first. Otherwise, it's possible for the program to exit after writing all blocks to the memory mapped area, calling munmap and close, but before disk blocks have been assigned. In that case the generated executable may be incomplete, which is bad. Or, of course, we can get a SIGBUS during execution which is less bad but still bad. The fix is to either fallocate or fdatasync, and fallocate is faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.