New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: FreeBSD system crashes when adding regular files to kqueue #19093

Open
ianlancetaylor opened this Issue Feb 14, 2017 · 12 comments

Comments

Projects
None yet
4 participants
@ianlancetaylor
Contributor

ianlancetaylor commented Feb 14, 2017

While testing https://golang.org/cl/36800, it appears that the FreeBSD amd64 system crashes when adding regular disk files to kqueue. This is based on testing using gomote, which creates a VM running FreeBSD 10.1 (I think). I am going to disable using kqueue on FreeBSD for regular files; it will still be used for sockets as usual. I am opening this issue to note the problem in the hopes that it can be fixed by someone more familiar with FreeBSD.

@gopherbot

This comment has been minimized.

gopherbot commented Feb 14, 2017

CL https://golang.org/cl/36800 mentions this issue.

@ianlancetaylor ianlancetaylor added this to the Go1.9Maybe milestone Feb 15, 2017

gopherbot pushed a commit that referenced this issue Feb 15, 2017

os: use poller for file I/O
This changes the os package to use the runtime poller for file I/O
where possible. When a system call blocks on a pollable descriptor,
the goroutine will be blocked on the poller but the thread will be
released to run other goroutines. When using a non-pollable
descriptor, the os package will continue to use thread-blocking system
calls as before.

For example, on GNU/Linux, the runtime poller uses epoll. epoll does
not support ordinary disk files, so they will continue to use blocking
I/O as before. The poller will be used for pipes.

Since this means that the poller is used for many more programs, this
modifies the runtime to only block waiting for the poller if there is
some goroutine that is waiting on the poller. Otherwise, there is no
point, as the poller will never make any goroutine ready. This
preserves the runtime's current simple deadlock detection.

This seems to crash FreeBSD systems, so it is disabled on FreeBSD.
This is issue 19093.

Using the poller on Windows requires opening the file with
FILE_FLAG_OVERLAPPED. We should only do that if we can remove that
flag if the program calls the Fd method. This is issue 19098.

Update #6817.
Update #7903.
Update #15021.
Update #18507.
Update #19093.
Update #19098.

Change-Id: Ia5197dcefa7c6fbcca97d19a6f8621b2abcbb1fe
Reviewed-on: https://go-review.googlesource.com/36800
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
@asomers

This comment has been minimized.

asomers commented Mar 6, 2017

Do you have a stacktrace of the FreeBSD system?

@asomers

This comment has been minimized.

asomers commented Apr 17, 2017

@ianlancetaylor Can you please post some more information about the crash? The stack trace, output of uname -a, and steps to reproduce would be helpful.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Apr 17, 2017

Sorry for not noticing your earlier message.

I do not have a stack trace of the system. I am accessing the system using https://godoc.org/golang.org/x/build/cmd/gomote, which runs FreeBSD in a VM, and I am deducing that the system has crashed because it stops responding for a while, and when it does start responding the system uptime has been reset to a very small number. I assume that what is happening is that the VM is being discarded and restarted. Perhaps @bradfitz can say something more about that.

This is how I reproduce it (I just verified that it still fails like this for me). First, remove these lines from src/os/file_unix.go:

	// Don't try to use kqueue with regular files on FreeBSD.
	// It crashes the system unpredictably while running all.bash.
	// Issue 19093.
	if runtime.GOOS == "freebsd" {
		pollable = false
	}

Then run

name=`gomote create freebsd-amd64-110`
gomote put14 $name
gomote push $name
gomote run $name go/src/make.bash

When I try this it gets to the point where it prints

##### Building packages and commands for freebsd/amd64.

which is the point where it starts running cmd/go built with the new polling code. At that point the run stops and gomote prints

2017/04/17 11:06:00 Buildlet https://farmer.golang.org:443 failed three heartbeats; final error: timeout waiting for headers

Further attempts to use gomote run $name fails with

Error running run: Error trying to execute /usr/bin/uptime: buildlet: HTTP status 502 Bad Gateway: 

Then after a while the gomote usually recovers, but running uptime shows that the system has rebooted.

With luck you can create the problem on a real FreeBSD system, or on a VM which you fully control, by simply editing src/os/file_unix.go as suggested and then running make.bash.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Apr 17, 2017

I suppose gomote & the build system could be modified to watch the VM's serial console & log that, but I don't have time for that at the moment.

For debugging this, somebody should just use a FreeBSD VM directly, not via gomote.

@asomers

This comment has been minimized.

asomers commented Apr 17, 2017

I'll try it on a real FreeBSD system. But when you've done it using gomote, did you notice if there was anything in /var/crash after the reboot? /var/crash/minfree and /var/crash/bounds are normal, but any other file would be interesting.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Apr 17, 2017

The gomote proxy server that owns the VM destroys it immediately if it fails its heartbeats. The user (Ian, here) would have no way to check those things before the machine was nuked.

@asomers

This comment has been minimized.

asomers commented Apr 17, 2017

I can't reproduce on FreeBSD head or stable/10 amd64. Can you please figure out the exact version that gomote is using, and any nondefault configurations?

@bradfitz

This comment has been minimized.

@asomers

This comment has been minimized.

asomers commented Apr 19, 2017

I can reproduce it on 11.0-RELEASE. This turns out to be a duplicate of FreeBSD PR 24923, even though the stack traces look different. Here's the stacktrace:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x18
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80a80c99
stack pointer           = 0x28:0xfffffe085fb26690
frame pointer           = 0x28:0xfffffe085fb26870
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 954 (go_bootstrap)
trap number             = 12
panic: page fault
cpuid = 2
KDB: stack backtrace:
#0 0xffffffff80b24477 at kdb_backtrace+0x67
#1 0xffffffff80ad97e2 at vpanic+0x182
#2 0xffffffff80ad9653 at panic+0x43
#3 0xffffffff80fa1d51 at trap_fatal+0x351
#4 0xffffffff80fa1f43 at trap_pfault+0x1e3
#5 0xffffffff80fa14ec at trap+0x26c
#6 0xffffffff80f845a1 at calltrap+0x8
#7 0xffffffff80a802c5 at kern_kevent+0xb5
#8 0xffffffff80a8014c at sys_kevent+0x11c
#9 0xffffffff80fa26ae at amd64_syscall+0x4ce
#10 0xffffffff80f8488b at Xfast_syscall+0xfb

This bug was fixed in head by r310302 and later MFCed to stable/11 by r310578, but no update was ever issued for FreeBSD 11.0. I checked with secteam, and they don't plan to issue an update. So you should continue to use your current workaround until 11.1 is released, which will happen sometime in June.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=214923
https://svnweb.freebsd.org/base?view=revision&revision=310302

@bradfitz

This comment has been minimized.

Member

bradfitz commented Apr 19, 2017

@asomers, thanks for the investigation and update!

@bradfitz

This comment has been minimized.

Member

bradfitz commented Nov 22, 2017

I suppose gomote & the build system could be modified to watch the VM's serial console & log that, but I don't have time for that at the moment.

Btw, the x/build/cmd/debugnewvm does the serial console thing, btw, but gomote doesn't yet. But sounds like it's not needed now, since the bug has been fixed.

@ianlancetaylor, do you want to conditionally enable this behavior based on FreeBSD version?

In the meantime I can update our builder image from 11.0 to 11.1. We still have our 9.3 builders running too, which would test the fallback path if you make it $FREE_VERSION >= 11.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment