Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible GC-related panic #5554

Closed
rogpeppe opened this issue May 24, 2013 · 21 comments
Closed

possible GC-related panic #5554

rogpeppe opened this issue May 24, 2013 · 21 comments
Milestone

Comments

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented May 24, 2013

e570c2daeaca (release-branch.go1.1) go1.1/release

To reproduce (currently we haven't managed to narrow
it down at all, sorry):

cd $GOPATH/src
mkdir -p launchpad.net
cd launchpad.net
bzr branch lp:~rogpeppe/juju-core/dimitern-041-provisioner-api-calls-GC-bug juju-core

# use the Go-only version of goyaml to rule
# out possible unsafe interactions.
bzr branch lp:~niemeyer/goyaml/go-port goyaml

go get launchpad.net/juju-core/state/apiserver
while true; do
    go test launchpad.net/juju-core/state/apiserver
done

There are expected test failures (this problem was encountered
during development). But every so often we see a panic like this,
which looks highly suspect.

unexpected fault address 0xc200b00000
fatal error: fault
[signal 0xb code=0x1 addr=0xc200b00000 pc=0x40e1a4]

goroutine 3181 [running]:
[fp=0xc200141608] runtime.throw(0xd02b37)
    /home/rog/go-release/src/pkg/runtime/panic.c:473 +0x67
[fp=0xc200141620] runtime.sigpanic()
    /home/rog/go-release/src/pkg/runtime/os_linux.c:239 +0xe7
[fp=0xc2001419c0] scanblock(0x7f2d2d5de000, 0x7f2d2d5dee88, 0x9a, 0xc200141900)
    /home/rog/go-release/src/pkg/runtime/mgc0.c:791 +0x534
[fp=0xc200141a10] markroot(0xc2000fe480, 0x1000000029)
    /home/rog/go-release/src/pkg/runtime/mgc0.c:1269 +0xab
[fp=0xc200141a88] runtime.parfordo(0xc2000fe480)
    /home/rog/go-release/src/pkg/runtime/parfor.c:105 +0x9b
[fp=0xc200141bb8] gc(0x7f2d2d5c575c)
    /home/rog/go-release/src/pkg/runtime/mgc0.c:2000 +0x29d
----- stack segment boundary -----
[fp=0x7f2d2d5c5770] runtime.gc(0xc200000000)
    /home/rog/go-release/src/pkg/runtime/mgc0.c:1927 +0x11b
[fp=0x7f2d2d5c57c8] runtime.mallocgc(0xc0, 0x100000000, 0xc200000001)
    /home/rog/go-release/src/pkg/runtime/zmalloc_linux_amd64.c:101 +0x1e4
[fp=0x7f2d2d5c5800] makeslice1(0x7de1e0, 0x0, 0x8, 0x7f2d2d5c5848)
    /home/rog/go-release/src/pkg/runtime/slice.c:63 +0xb6
[fp=0x7f2d2d5c5830] runtime.makeslice(0x7de1e0, 0x0, 0x8, 0x7f2d2d5c5800, 0x0, ...)
    /home/rog/go-release/src/pkg/runtime/slice.c:34 +0x9a
[fp=0x7f2d2d5c5908] labix.org/v2/mgo/bson.(*decoder).readSliceDoc(0xc2008f6c30,
0xc20012d000, 0x7dcfe0, 0xc20012d000, 0x7ede20, ...)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:282 +0x3e
[fp=0x7f2d2d5c5c10] labix.org/v2/mgo/bson.(*decoder).readElemTo(0xc2008f6c30, 0x7ede20,
0xc2002603d0, 0x146, 0xc200260304, ...)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:413 +0x24e6
[fp=0x7f2d2d5c5ca8] labix.org/v2/mgo/bson.func·001(0xc2008f6c04, 0xc2007822d0, 0x1)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:329 +0x104
[fp=0x7f2d2d5c5ce8] labix.org/v2/mgo/bson.(*decoder).readDocWith(0xc2008f6c30,
0x7f2d2d5c5d40)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:350 +0x12f
[fp=0x7f2d2d5c5d78] labix.org/v2/mgo/bson.(*decoder).readDocElems(0xc2008f6c30,
0xc20012d000, 0x8b5920, 0x8d2e00, 0x8b5901, ...)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:332 +0xe9
[fp=0x7f2d2d5c5f98] labix.org/v2/mgo/bson.(*decoder).readDocTo(0xc2008f6c30, 0x8b5920,
0xc200260320, 0x176)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:190 +0x105a
----- stack segment boundary -----
[fp=0x7f2d2d6cf420] labix.org/v2/mgo/bson.(*decoder).readElemTo(0xc2008f6c30, 0x7ede20,
0xc200260310, 0x146, 0xc200260303, ...)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:381 +0xda
[fp=0x7f2d2d6cf4b8] labix.org/v2/mgo/bson.func·001(0xc2008f6c03, 0xc2007822c0, 0x5)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:329 +0x104
[fp=0x7f2d2d6cf4f8] labix.org/v2/mgo/bson.(*decoder).readDocWith(0xc2008f6c30,
0x7f2d2d6cf550)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:350 +0x12f
[fp=0x7f2d2d6cf588] labix.org/v2/mgo/bson.(*decoder).readDocElems(0xc2008f6c30,
0xc20012d000, 0x8b5920, 0x8d2e00, 0x101, ...)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:332 +0xe9
[fp=0x7f2d2d6cf7a8] labix.org/v2/mgo/bson.(*decoder).readDocTo(0xc2008f6c30, 0x8b5920,
0xc20026d500, 0x176)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:190 +0x105a
[fp=0x7f2d2d6cf838] labix.org/v2/mgo/bson.Unmarshal(0xc200305e60, 0x4a, 0x4a, 0x8a50c0,
0xc20026d500, ...)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/bson.go:459 +0x157
[fp=0x7f2d2d6cfa58] labix.org/v2/mgo.(*Iter).Next(0xc200298a50, 0x8a50c0, 0xc20026d500,
0x8)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/session.go:2311 +0x5b2
[fp=0x7f2d2d6cff18] launchpad.net/juju-core/state/watcher.(*Watcher).sync(0xc200424840,
0xc200910000, 0x7f2d2d6cff70)
    /home/rog/src/go-alt/src/launchpad.net/juju-core/state/watcher/watcher.go:364 +0x17e
[fp=0x7f2d2d6cff90] launchpad.net/juju-core/state/watcher.(*Watcher).loop(0xc200424840,
0x943d40, 0xc20079bf28)
    /home/rog/src/go-alt/src/launchpad.net/juju-core/state/watcher/watcher.go:231 +0x16b
[fp=0x7f2d2d6cffb8] launchpad.net/juju-core/state/watcher.func·001()
    /home/rog/src/go-alt/src/launchpad.net/juju-core/state/watcher/watcher.go:119 +0x2c
[fp=0x7f2d2d6cffc0] runtime.goexit()
    /home/rog/go-release/src/pkg/runtime/proc.c:1223
created by launchpad.net/juju-core/state/watcher.New
    /home/rog/src/go-alt/src/launchpad.net/juju-core/state/watcher/watcher.go:121 +0x100
@rogpeppe
Copy link
Contributor Author

@rogpeppe rogpeppe commented May 24, 2013

Comment 1:

I can't reproduce this against tip (d6e06d0f3c29), FWIW.
@minux
Copy link
Member

@minux minux commented May 24, 2013

Comment 2:

have you bisected yet?
i'd suspect a few fixes for issues labled Go1.1.1 fixes this issue.
@minux
Copy link
Member

@minux minux commented May 24, 2013

Comment 3:

have you bisected yet?
i'd suspect a few fixes for issues labled Go1.1.1 fixes this issue.
@rogpeppe
Copy link
Contributor Author

@rogpeppe rogpeppe commented May 24, 2013

Comment 4:

No, and I really haven't got enough time to bisect currently - I've got a hundred things
to do before I go away on holiday next week.
I hadn't realised there was a prospective go1.1.1. That makes me happy.
I also suspect that the issue has been fixed.
@davecheney
Copy link
Contributor

@davecheney davecheney commented May 24, 2013

Comment 5:

Ill try to bisect as I have the setup to run the juju tests.

Owner changed to @davecheney.

@davecheney
Copy link
Contributor

@davecheney davecheney commented May 24, 2013

Comment 6:

I can reproduce the problem (almost exactly the same panic message) at the original
revision
unexpected fault address 0xc200b00000  
fatal error: fault
[signal 0xb code=0x1 addr=0xc200b00000 pc=0x40e1a4]
goroutine 3168 [running]:
[fp=0xc2008d9608] runtime.throw(0xd05b37)
        /home/dfc/go/src/pkg/runtime/panic.c:473 +0x67
[fp=0xc2008d9620] runtime.sigpanic()   
        /home/dfc/go/src/pkg/runtime/os_linux.c:239 +0xe7
[fp=0xc2008d99c0] scanblock(0x7f6031625000, 0x7f6031626488, 0xda, 0xc2008d9900)
        /home/dfc/go/src/pkg/runtime/mgc0.c:791 +0x534
[fp=0xc2008d9a10] markroot(0xc20011c000, 0x1000000028)
        /home/dfc/go/src/pkg/runtime/mgc0.c:1269 +0xab
[fp=0xc2008d9a88] runtime.parfordo(0xc20011c000)
        /home/dfc/go/src/pkg/runtime/parfor.c:105 +0x9b
[fp=0xc2008d9bb8] gc(0x7f603160520c)   
        /home/dfc/go/src/pkg/runtime/mgc0.c:2000 +0x29d
----- stack segment boundary -----
[fp=0x7f6031605220] runtime.gc(0xc200000000)
        /home/dfc/go/src/pkg/runtime/mgc0.c:1927 +0x11b
[fp=0x7f6031605278] runtime.mallocgc(0x480, 0x100000000, 0x0)
        /home/dfc/go/src/pkg/runtime/zmalloc_linux_amd64.c:101 +0x1e4
[fp=0x7f60316052b0] hash_grow(0x7e5d20, 0xc2008cf7c0)
        /home/dfc/go/src/pkg/runtime/hashmap.c:-203 +0x78

Status changed to Started.

@davecheney
Copy link
Contributor

@davecheney davecheney commented May 24, 2013

Comment 7:

I believe this was fixed by 2c128d417029, but I can't see why it would.
@dvyukov
Copy link
Member

@dvyukov dvyukov commented May 24, 2013

Comment 8:

I have no idea how it fixes something... but good it's fixed :)
@davecheney
Copy link
Contributor

@davecheney davecheney commented May 25, 2013

Comment 9:

I've also confirmed that applying only 2c128d417029 to the release.go1.1 branch also
fixes the crash, so I think it is safe to say as long as 2c128d417029/9557043 is applied
for Go 1.1.1 we should be ok.
@rogpeppe, the apiserver returns funcs which returns funcs doesn't it ? That was the
original issue #5493 addressed.

Labels changed: added priority-soon, go1.1.1, removed priority-triage.

@dvyukov
Copy link
Member

@dvyukov dvyukov commented May 28, 2013

Comment 10:

It still looks strange. I've tried to reproduce it, but failed. It either hangs or says:
$ go test -v -gocheck.vv
=== RUN TestAll
START: api_test.go:0: suite.SetUpSuite
PASS: api_test.go:0: suite.SetUpSuite   0.000s
START: api_test.go:828: suite.TestBadLogin
START: api_test.go:0: suite.SetUpTest
[LOG] 60.04749 INFO mongod: error command line: unknown option sslOnNormalPorts
[LOG] 60.04754 INFO mongod: use --help for help
$ mongod --version
db version v2.4.3
Tue May 28 10:35:48.334 git version: fe1743177a5ea03e91e0052fb5e2cb2945f6d95f
@davecheney
Copy link
Contributor

@davecheney davecheney commented May 28, 2013

Comment 12:

I'm really sorry, our tests only work with a special version of mongo db which is not
available. I have an environment where I can run tests if you need me to check things.
@dvyukov
Copy link
Member

@dvyukov dvyukov commented May 28, 2013

Comment 13:

"Remote" debugging is very slow and unpleasant thing.
Is it possible to hack the code to work with stock mongod? Remove some flags, etc?
@davecheney
Copy link
Contributor

@davecheney davecheney commented May 28, 2013

Comment 14:

http://juju-dist.s3.amazonaws.com/tools/mongo-2.2.0-quantal-amd64.tgz. If you are not
using a quantal series installation, don't worry it is compiled statically.
It would be wise to uninstall mongod-2.4.3, unless you like 200hz timers preventing your
machine from reaching a decent sleep state.
@dvyukov
Copy link
Member

@dvyukov dvyukov commented May 28, 2013

Comment 15:

Owner changed to @dvyukov.

@dvyukov
Copy link
Member

@dvyukov dvyukov commented May 28, 2013

Comment 16:

Mailed https://golang.org/cl/9831043
@dvyukov
Copy link
Member

@dvyukov dvyukov commented May 28, 2013

Comment 17:

This issue was closed by revision 2f5825d.

Status changed to Fixed.

@davecheney
Copy link
Contributor

@davecheney davecheney commented May 29, 2013

Comment 18:

I can confirm e84e7204b01b has fixed the crash.
@davecheney
Copy link
Contributor

@davecheney davecheney commented May 29, 2013

Comment 19:

This CL also fixed a crash I was experiencing under freebsd/arm which I had not
reported. Prior to this revision the freebsd builder process itself would crash with
malloc/free deadlocks or other gc related 'cant happen' failures. Since rebuilding at
tip the builder has run reliably for 12 hours.
@dvyukov
Copy link
Member

@dvyukov dvyukov commented May 29, 2013

Comment 20:

Great!
@davecheney
Copy link
Contributor

@davecheney davecheney commented May 30, 2013

Comment 21:

Oops, I spoke too soon, issue #5594. However stability is much improved after 5554 was
closed, so 5594 is probably a unrelated crash.
@adg
Copy link
Contributor

@adg adg commented Jun 5, 2013

Comment 22:

This issue was closed by revision b88e87d911e1.

@rogpeppe rogpeppe added fixed labels Jun 5, 2013
@rsc rsc added this to the Go1.1.1 milestone Apr 14, 2015
@rsc rsc removed the go1.1.1 label Apr 14, 2015
adg added a commit that referenced this issue May 11, 2015
««« CL 9831043 / e84e7204b01b
runtime: fix heap corruption during GC
The 'n' variable is used during rescan initiation in GC_END case,
but it's overwritten with chan capacity in GC_CHAN case.
As the result rescan is done with the wrong object size.
Fixes #5554.

R=golang-dev, khr
CC=golang-dev
https://golang.org/cl/9831043
»»»

R=dvyukov, khr, dave
CC=golang-dev
https://golang.org/cl/10028044
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.