Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: 'wait: bad address' on FreeBSD/amd64 #6372

Closed
wathiede opened this issue Sep 12, 2013 · 9 comments
Closed

runtime: 'wait: bad address' on FreeBSD/amd64 #6372

wathiede opened this issue Sep 12, 2013 · 9 comments
Milestone

Comments

@wathiede
Copy link
Member

I run a buildbot for bradfitz's camlistore project on my FreeBSD/amd64 machine (uname:
FreeBSD sagan.sf.xinu.tv 8.3-RELEASE-p9 FreeBSD 8.3-RELEASE-p9 #0: Fri Jul 26 23:07:20
UTC 2013     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64). 

Once in a long while I see something fail to build with 'wait: bad address'  Camlistore
builds against go-release and go-tip for every change submitted to Go or camlistore. 
Today I saw the error with tip  98971b9411b9, but I think I've seen the error with
go-release too.  I'd say it happens once or week or so. 

Doing a google search for 'site:build.golang.org "wait: bad address"' shows
two failed builds on the official Go buildbot with the same error both on FreeBSD/amd64.

I have no idea what is going on here, but I'm filing the issue in the event someone has
an idea how to track this down.
@robpike
Copy link
Contributor

robpike commented Sep 13, 2013

Comment 1:

This error has come up occasionally when doing heavy workloads on FreeBSD. We do not
have a handle on it although some changes involving the madvise system call have reduced
its frequency.
My uninformed opinion is that it is either a kernel bug in FreeBSD or the Go
implementation stumbling over an inconsistency between FreeBSD and the other Unix
implementations.
If you look into the problem, you'll see it's all but inconceivable that this error can
arise. An address becomes invalid in a situation where that truly cannot happen.
We need more reproducible examples or fewer FreeBSDs.

Labels changed: added os-freebsd, priority-someday, removed priority-triage.

Status changed to Accepted.

@ianlancetaylor
Copy link
Contributor

Comment 2:

In fairness, it could happen in principle if there were a GC bug.  The goroutine would
call wait, which would cause a thread to suspend until the wait system call returned. 
The wait system call would be pointing to an integer on the heap.  A GC bug could free
that integer even though there is a pointer to it on the goroutine stack.  It's possible
that everything else on the page would also be freed.  The scavenger could then release
the page back to the OS via madvise.  Then the wait could return, and get precisely that
error.
It doesn't seem very likely but I can't think of anything else other than a kernel bug.

@wathiede
Copy link
Member Author

Comment 3:

Simple, and reproduces fairly quickly:
$ go run wait.go
2013/09/12 18:45:03 Found 8 CPUs, spawning go routines
2013/09/12 18:45:07 5 wait: bad address
exit status 1
$ go run wait.go
2013/09/12 18:45:12 Found 8 CPUs, spawning go routines
2013/09/12 18:45:20 2 wait: bad address
exit status 1
$ go run wait.go
2013/09/12 18:45:30 Found 8 CPUs, spawning go routines
2013/09/12 18:45:33 4 wait: bad address
exit status 1
$ go run wait.go
2013/09/12 18:46:52 Found 8 CPUs, spawning go routines
2013/09/12 18:46:53 4 53 wait: bad address
exit status 1
$ go run wait.go
2013/09/12 18:48:44 Found 8 CPUs, spawning go routines
2013/09/12 18:48:47 7 648 wait: bad address
exit status 1

Attachments:

  1. bug6372.go (534 bytes)

@davecheney
Copy link
Contributor

Comment 4:

Thanks for the repro.

Labels changed: added priority-soon, go1.2, removed priority-someday.

@wathiede
Copy link
Member Author

Comment 5:

To explore Ian's suggestion, running:
$ GOGC=off go run wait.go
2013/09/12 18:58:52 Found 8 CPUs, spawning go routines
Ran for 30 minutes before I got bored and ctrl-c'd it.

@davecheney
Copy link
Contributor

Comment 7:

Slightly simplified example
http://play.golang.org/p/ROB_uGzYxR
# panics within seconds
[dfc@deadwood ~/src]$ GOMAXPROCS=2 go run bug6372.go                                    
                                                         
2013/09/16 15:39:06 Found 2 CPUs, spawning go routines
2013/09/16 15:39:07 1 1035 wait: bad address
exit status 1
# runs for longer than my attention span would allow.
[dfc@deadwood ~/src]$ GOMAXPROCS=1 go run bug6372.go                                    
                                                         
2013/09/16 15:39:11 Found 1 CPUs, spawning go routines
^Cexit status 2
Is there a way to increase the number of gc worker threads without increasing the number
of concurrent g's ?

@rsc
Copy link
Contributor

rsc commented Sep 16, 2013

Comment 8:

Thank you for the very simplified case. I have reproduced the problem in C. It is a
FreeBSD kernel bug. I filed http://www.freebsd.org/cgi/query-pr.cgi?pr=182161 and will
work around it in the Go library. The fix is not to use the SYSCALL instruction.

Owner changed to @rsc.

@rsc
Copy link
Contributor

rsc commented Sep 16, 2013

Comment 9:

This issue was closed by revision 555da73.

Status changed to Fixed.

@wathiede
Copy link
Member Author

Comment 10:

In case you hadn't seen, this breaks the build:
http://build.golang.org/log/a09e574dfbb72c98721571ed8e87e634faeb7863

bradfitz pushed a commit that referenced this issue Jan 18, 2015
This manually reverts 555da73 from #6372 which implies a
minimum FreeBSD version of 8-STABLE.
Updates docs to mention new minimum requirement.

Fixes #9627

Change-Id: I40ae64be3682d79dd55024e32581e3e5e2be8aa7
Reviewed-on: https://go-review.googlesource.com/3020
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@rsc rsc added this to the Go1.2 milestone Apr 14, 2015
@rsc rsc removed the go1.2 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
@rsc rsc removed their assignment Jun 22, 2022
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants