possible GC-related panic #5554

Closed
rogpeppe opened this Issue May 24, 2013 · 21 comments

Comments

Projects
None yet
7 participants
@rogpeppe
Contributor

rogpeppe commented May 24, 2013

e570c2daeaca (release-branch.go1.1) go1.1/release

To reproduce (currently we haven't managed to narrow
it down at all, sorry):

cd $GOPATH/src
mkdir -p launchpad.net
cd launchpad.net
bzr branch lp:~rogpeppe/juju-core/dimitern-041-provisioner-api-calls-GC-bug juju-core

# use the Go-only version of goyaml to rule
# out possible unsafe interactions.
bzr branch lp:~niemeyer/goyaml/go-port goyaml

go get launchpad.net/juju-core/state/apiserver
while true; do
    go test launchpad.net/juju-core/state/apiserver
done

There are expected test failures (this problem was encountered
during development). But every so often we see a panic like this,
which looks highly suspect.

unexpected fault address 0xc200b00000
fatal error: fault
[signal 0xb code=0x1 addr=0xc200b00000 pc=0x40e1a4]

goroutine 3181 [running]:
[fp=0xc200141608] runtime.throw(0xd02b37)
    /home/rog/go-release/src/pkg/runtime/panic.c:473 +0x67
[fp=0xc200141620] runtime.sigpanic()
    /home/rog/go-release/src/pkg/runtime/os_linux.c:239 +0xe7
[fp=0xc2001419c0] scanblock(0x7f2d2d5de000, 0x7f2d2d5dee88, 0x9a, 0xc200141900)
    /home/rog/go-release/src/pkg/runtime/mgc0.c:791 +0x534
[fp=0xc200141a10] markroot(0xc2000fe480, 0x1000000029)
    /home/rog/go-release/src/pkg/runtime/mgc0.c:1269 +0xab
[fp=0xc200141a88] runtime.parfordo(0xc2000fe480)
    /home/rog/go-release/src/pkg/runtime/parfor.c:105 +0x9b
[fp=0xc200141bb8] gc(0x7f2d2d5c575c)
    /home/rog/go-release/src/pkg/runtime/mgc0.c:2000 +0x29d
----- stack segment boundary -----
[fp=0x7f2d2d5c5770] runtime.gc(0xc200000000)
    /home/rog/go-release/src/pkg/runtime/mgc0.c:1927 +0x11b
[fp=0x7f2d2d5c57c8] runtime.mallocgc(0xc0, 0x100000000, 0xc200000001)
    /home/rog/go-release/src/pkg/runtime/zmalloc_linux_amd64.c:101 +0x1e4
[fp=0x7f2d2d5c5800] makeslice1(0x7de1e0, 0x0, 0x8, 0x7f2d2d5c5848)
    /home/rog/go-release/src/pkg/runtime/slice.c:63 +0xb6
[fp=0x7f2d2d5c5830] runtime.makeslice(0x7de1e0, 0x0, 0x8, 0x7f2d2d5c5800, 0x0, ...)
    /home/rog/go-release/src/pkg/runtime/slice.c:34 +0x9a
[fp=0x7f2d2d5c5908] labix.org/v2/mgo/bson.(*decoder).readSliceDoc(0xc2008f6c30,
0xc20012d000, 0x7dcfe0, 0xc20012d000, 0x7ede20, ...)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:282 +0x3e
[fp=0x7f2d2d5c5c10] labix.org/v2/mgo/bson.(*decoder).readElemTo(0xc2008f6c30, 0x7ede20,
0xc2002603d0, 0x146, 0xc200260304, ...)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:413 +0x24e6
[fp=0x7f2d2d5c5ca8] labix.org/v2/mgo/bson.func·001(0xc2008f6c04, 0xc2007822d0, 0x1)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:329 +0x104
[fp=0x7f2d2d5c5ce8] labix.org/v2/mgo/bson.(*decoder).readDocWith(0xc2008f6c30,
0x7f2d2d5c5d40)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:350 +0x12f
[fp=0x7f2d2d5c5d78] labix.org/v2/mgo/bson.(*decoder).readDocElems(0xc2008f6c30,
0xc20012d000, 0x8b5920, 0x8d2e00, 0x8b5901, ...)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:332 +0xe9
[fp=0x7f2d2d5c5f98] labix.org/v2/mgo/bson.(*decoder).readDocTo(0xc2008f6c30, 0x8b5920,
0xc200260320, 0x176)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:190 +0x105a
----- stack segment boundary -----
[fp=0x7f2d2d6cf420] labix.org/v2/mgo/bson.(*decoder).readElemTo(0xc2008f6c30, 0x7ede20,
0xc200260310, 0x146, 0xc200260303, ...)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:381 +0xda
[fp=0x7f2d2d6cf4b8] labix.org/v2/mgo/bson.func·001(0xc2008f6c03, 0xc2007822c0, 0x5)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:329 +0x104
[fp=0x7f2d2d6cf4f8] labix.org/v2/mgo/bson.(*decoder).readDocWith(0xc2008f6c30,
0x7f2d2d6cf550)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:350 +0x12f
[fp=0x7f2d2d6cf588] labix.org/v2/mgo/bson.(*decoder).readDocElems(0xc2008f6c30,
0xc20012d000, 0x8b5920, 0x8d2e00, 0x101, ...)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:332 +0xe9
[fp=0x7f2d2d6cf7a8] labix.org/v2/mgo/bson.(*decoder).readDocTo(0xc2008f6c30, 0x8b5920,
0xc20026d500, 0x176)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/decode.go:190 +0x105a
[fp=0x7f2d2d6cf838] labix.org/v2/mgo/bson.Unmarshal(0xc200305e60, 0x4a, 0x4a, 0x8a50c0,
0xc20026d500, ...)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/bson/bson.go:459 +0x157
[fp=0x7f2d2d6cfa58] labix.org/v2/mgo.(*Iter).Next(0xc200298a50, 0x8a50c0, 0xc20026d500,
0x8)
    /home/rog/src/go-alt/src/labix.org/v2/mgo/session.go:2311 +0x5b2
[fp=0x7f2d2d6cff18] launchpad.net/juju-core/state/watcher.(*Watcher).sync(0xc200424840,
0xc200910000, 0x7f2d2d6cff70)
    /home/rog/src/go-alt/src/launchpad.net/juju-core/state/watcher/watcher.go:364 +0x17e
[fp=0x7f2d2d6cff90] launchpad.net/juju-core/state/watcher.(*Watcher).loop(0xc200424840,
0x943d40, 0xc20079bf28)
    /home/rog/src/go-alt/src/launchpad.net/juju-core/state/watcher/watcher.go:231 +0x16b
[fp=0x7f2d2d6cffb8] launchpad.net/juju-core/state/watcher.func·001()
    /home/rog/src/go-alt/src/launchpad.net/juju-core/state/watcher/watcher.go:119 +0x2c
[fp=0x7f2d2d6cffc0] runtime.goexit()
    /home/rog/go-release/src/pkg/runtime/proc.c:1223
created by launchpad.net/juju-core/state/watcher.New
    /home/rog/src/go-alt/src/launchpad.net/juju-core/state/watcher/watcher.go:121 +0x100
@rogpeppe

This comment has been minimized.

Show comment Hide comment
@rogpeppe

rogpeppe May 24, 2013

Contributor

Comment 1:

I can't reproduce this against tip (d6e06d0f3c29), FWIW.
Contributor

rogpeppe commented May 24, 2013

Comment 1:

I can't reproduce this against tip (d6e06d0f3c29), FWIW.
@minux

This comment has been minimized.

Show comment Hide comment
@minux

minux May 24, 2013

Member

Comment 2:

have you bisected yet?
i'd suspect a few fixes for issues labled Go1.1.1 fixes this issue.
Member

minux commented May 24, 2013

Comment 2:

have you bisected yet?
i'd suspect a few fixes for issues labled Go1.1.1 fixes this issue.
@minux

This comment has been minimized.

Show comment Hide comment
@minux

minux May 24, 2013

Member

Comment 3:

have you bisected yet?
i'd suspect a few fixes for issues labled Go1.1.1 fixes this issue.
Member

minux commented May 24, 2013

Comment 3:

have you bisected yet?
i'd suspect a few fixes for issues labled Go1.1.1 fixes this issue.
@rogpeppe

This comment has been minimized.

Show comment Hide comment
@rogpeppe

rogpeppe May 24, 2013

Contributor

Comment 4:

No, and I really haven't got enough time to bisect currently - I've got a hundred things
to do before I go away on holiday next week.
I hadn't realised there was a prospective go1.1.1. That makes me happy.
I also suspect that the issue has been fixed.
Contributor

rogpeppe commented May 24, 2013

Comment 4:

No, and I really haven't got enough time to bisect currently - I've got a hundred things
to do before I go away on holiday next week.
I hadn't realised there was a prospective go1.1.1. That makes me happy.
I also suspect that the issue has been fixed.
@davecheney

This comment has been minimized.

Show comment Hide comment
@davecheney

davecheney May 24, 2013

Contributor

Comment 5:

Ill try to bisect as I have the setup to run the juju tests.

Owner changed to @davecheney.

Contributor

davecheney commented May 24, 2013

Comment 5:

Ill try to bisect as I have the setup to run the juju tests.

Owner changed to @davecheney.

@davecheney

This comment has been minimized.

Show comment Hide comment
@davecheney

davecheney May 24, 2013

Contributor

Comment 6:

I can reproduce the problem (almost exactly the same panic message) at the original
revision
unexpected fault address 0xc200b00000  
fatal error: fault
[signal 0xb code=0x1 addr=0xc200b00000 pc=0x40e1a4]
goroutine 3168 [running]:
[fp=0xc2008d9608] runtime.throw(0xd05b37)
        /home/dfc/go/src/pkg/runtime/panic.c:473 +0x67
[fp=0xc2008d9620] runtime.sigpanic()   
        /home/dfc/go/src/pkg/runtime/os_linux.c:239 +0xe7
[fp=0xc2008d99c0] scanblock(0x7f6031625000, 0x7f6031626488, 0xda, 0xc2008d9900)
        /home/dfc/go/src/pkg/runtime/mgc0.c:791 +0x534
[fp=0xc2008d9a10] markroot(0xc20011c000, 0x1000000028)
        /home/dfc/go/src/pkg/runtime/mgc0.c:1269 +0xab
[fp=0xc2008d9a88] runtime.parfordo(0xc20011c000)
        /home/dfc/go/src/pkg/runtime/parfor.c:105 +0x9b
[fp=0xc2008d9bb8] gc(0x7f603160520c)   
        /home/dfc/go/src/pkg/runtime/mgc0.c:2000 +0x29d
----- stack segment boundary -----
[fp=0x7f6031605220] runtime.gc(0xc200000000)
        /home/dfc/go/src/pkg/runtime/mgc0.c:1927 +0x11b
[fp=0x7f6031605278] runtime.mallocgc(0x480, 0x100000000, 0x0)
        /home/dfc/go/src/pkg/runtime/zmalloc_linux_amd64.c:101 +0x1e4
[fp=0x7f60316052b0] hash_grow(0x7e5d20, 0xc2008cf7c0)
        /home/dfc/go/src/pkg/runtime/hashmap.c:-203 +0x78

Status changed to Started.

Contributor

davecheney commented May 24, 2013

Comment 6:

I can reproduce the problem (almost exactly the same panic message) at the original
revision
unexpected fault address 0xc200b00000  
fatal error: fault
[signal 0xb code=0x1 addr=0xc200b00000 pc=0x40e1a4]
goroutine 3168 [running]:
[fp=0xc2008d9608] runtime.throw(0xd05b37)
        /home/dfc/go/src/pkg/runtime/panic.c:473 +0x67
[fp=0xc2008d9620] runtime.sigpanic()   
        /home/dfc/go/src/pkg/runtime/os_linux.c:239 +0xe7
[fp=0xc2008d99c0] scanblock(0x7f6031625000, 0x7f6031626488, 0xda, 0xc2008d9900)
        /home/dfc/go/src/pkg/runtime/mgc0.c:791 +0x534
[fp=0xc2008d9a10] markroot(0xc20011c000, 0x1000000028)
        /home/dfc/go/src/pkg/runtime/mgc0.c:1269 +0xab
[fp=0xc2008d9a88] runtime.parfordo(0xc20011c000)
        /home/dfc/go/src/pkg/runtime/parfor.c:105 +0x9b
[fp=0xc2008d9bb8] gc(0x7f603160520c)   
        /home/dfc/go/src/pkg/runtime/mgc0.c:2000 +0x29d
----- stack segment boundary -----
[fp=0x7f6031605220] runtime.gc(0xc200000000)
        /home/dfc/go/src/pkg/runtime/mgc0.c:1927 +0x11b
[fp=0x7f6031605278] runtime.mallocgc(0x480, 0x100000000, 0x0)
        /home/dfc/go/src/pkg/runtime/zmalloc_linux_amd64.c:101 +0x1e4
[fp=0x7f60316052b0] hash_grow(0x7e5d20, 0xc2008cf7c0)
        /home/dfc/go/src/pkg/runtime/hashmap.c:-203 +0x78

Status changed to Started.

@davecheney

This comment has been minimized.

Show comment Hide comment
@davecheney

davecheney May 24, 2013

Contributor

Comment 7:

I believe this was fixed by 2c128d417029, but I can't see why it would.
Contributor

davecheney commented May 24, 2013

Comment 7:

I believe this was fixed by 2c128d417029, but I can't see why it would.
@dvyukov

This comment has been minimized.

Show comment Hide comment
@dvyukov

dvyukov May 24, 2013

Member

Comment 8:

I have no idea how it fixes something... but good it's fixed :)
Member

dvyukov commented May 24, 2013

Comment 8:

I have no idea how it fixes something... but good it's fixed :)
@davecheney

This comment has been minimized.

Show comment Hide comment
@davecheney

davecheney May 25, 2013

Contributor

Comment 9:

I've also confirmed that applying only 2c128d417029 to the release.go1.1 branch also
fixes the crash, so I think it is safe to say as long as 2c128d417029/9557043 is applied
for Go 1.1.1 we should be ok.
@rogpeppe, the apiserver returns funcs which returns funcs doesn't it ? That was the
original issue #5493 addressed.

Labels changed: added priority-soon, go1.1.1, removed priority-triage.

Contributor

davecheney commented May 25, 2013

Comment 9:

I've also confirmed that applying only 2c128d417029 to the release.go1.1 branch also
fixes the crash, so I think it is safe to say as long as 2c128d417029/9557043 is applied
for Go 1.1.1 we should be ok.
@rogpeppe, the apiserver returns funcs which returns funcs doesn't it ? That was the
original issue #5493 addressed.

Labels changed: added priority-soon, go1.1.1, removed priority-triage.

@dvyukov

This comment has been minimized.

Show comment Hide comment
@dvyukov

dvyukov May 28, 2013

Member

Comment 10:

It still looks strange. I've tried to reproduce it, but failed. It either hangs or says:
$ go test -v -gocheck.vv
=== RUN TestAll
START: api_test.go:0: suite.SetUpSuite
PASS: api_test.go:0: suite.SetUpSuite   0.000s
START: api_test.go:828: suite.TestBadLogin
START: api_test.go:0: suite.SetUpTest
[LOG] 60.04749 INFO mongod: error command line: unknown option sslOnNormalPorts
[LOG] 60.04754 INFO mongod: use --help for help
$ mongod --version
db version v2.4.3
Tue May 28 10:35:48.334 git version: fe1743177a5ea03e91e0052fb5e2cb2945f6d95f
Member

dvyukov commented May 28, 2013

Comment 10:

It still looks strange. I've tried to reproduce it, but failed. It either hangs or says:
$ go test -v -gocheck.vv
=== RUN TestAll
START: api_test.go:0: suite.SetUpSuite
PASS: api_test.go:0: suite.SetUpSuite   0.000s
START: api_test.go:828: suite.TestBadLogin
START: api_test.go:0: suite.SetUpTest
[LOG] 60.04749 INFO mongod: error command line: unknown option sslOnNormalPorts
[LOG] 60.04754 INFO mongod: use --help for help
$ mongod --version
db version v2.4.3
Tue May 28 10:35:48.334 git version: fe1743177a5ea03e91e0052fb5e2cb2945f6d95f
@davecheney

This comment has been minimized.

Show comment Hide comment
@davecheney

davecheney May 28, 2013

Contributor

Comment 12:

I'm really sorry, our tests only work with a special version of mongo db which is not
available. I have an environment where I can run tests if you need me to check things.
Contributor

davecheney commented May 28, 2013

Comment 12:

I'm really sorry, our tests only work with a special version of mongo db which is not
available. I have an environment where I can run tests if you need me to check things.
@dvyukov

This comment has been minimized.

Show comment Hide comment
@dvyukov

dvyukov May 28, 2013

Member

Comment 13:

"Remote" debugging is very slow and unpleasant thing.
Is it possible to hack the code to work with stock mongod? Remove some flags, etc?
Member

dvyukov commented May 28, 2013

Comment 13:

"Remote" debugging is very slow and unpleasant thing.
Is it possible to hack the code to work with stock mongod? Remove some flags, etc?
@davecheney

This comment has been minimized.

Show comment Hide comment
@davecheney

davecheney May 28, 2013

Contributor

Comment 14:

http://juju-dist.s3.amazonaws.com/tools/mongo-2.2.0-quantal-amd64.tgz. If you are not
using a quantal series installation, don't worry it is compiled statically.
It would be wise to uninstall mongod-2.4.3, unless you like 200hz timers preventing your
machine from reaching a decent sleep state.
Contributor

davecheney commented May 28, 2013

Comment 14:

http://juju-dist.s3.amazonaws.com/tools/mongo-2.2.0-quantal-amd64.tgz. If you are not
using a quantal series installation, don't worry it is compiled statically.
It would be wise to uninstall mongod-2.4.3, unless you like 200hz timers preventing your
machine from reaching a decent sleep state.
@dvyukov

This comment has been minimized.

Show comment Hide comment
@dvyukov

dvyukov May 28, 2013

Member

Comment 15:

Owner changed to @dvyukov.

Member

dvyukov commented May 28, 2013

Comment 15:

Owner changed to @dvyukov.

@dvyukov

This comment has been minimized.

Show comment Hide comment
@dvyukov

dvyukov May 28, 2013

Member

Comment 16:

Mailed https://golang.org/cl/9831043
Member

dvyukov commented May 28, 2013

Comment 16:

Mailed https://golang.org/cl/9831043
@dvyukov

This comment has been minimized.

Show comment Hide comment
@dvyukov

dvyukov May 28, 2013

Member

Comment 17:

This issue was closed by revision 2f5825d.

Status changed to Fixed.

Member

dvyukov commented May 28, 2013

Comment 17:

This issue was closed by revision 2f5825d.

Status changed to Fixed.

@davecheney

This comment has been minimized.

Show comment Hide comment
@davecheney

davecheney May 29, 2013

Contributor

Comment 18:

I can confirm e84e7204b01b has fixed the crash.
Contributor

davecheney commented May 29, 2013

Comment 18:

I can confirm e84e7204b01b has fixed the crash.
@davecheney

This comment has been minimized.

Show comment Hide comment
@davecheney

davecheney May 29, 2013

Contributor

Comment 19:

This CL also fixed a crash I was experiencing under freebsd/arm which I had not
reported. Prior to this revision the freebsd builder process itself would crash with
malloc/free deadlocks or other gc related 'cant happen' failures. Since rebuilding at
tip the builder has run reliably for 12 hours.
Contributor

davecheney commented May 29, 2013

Comment 19:

This CL also fixed a crash I was experiencing under freebsd/arm which I had not
reported. Prior to this revision the freebsd builder process itself would crash with
malloc/free deadlocks or other gc related 'cant happen' failures. Since rebuilding at
tip the builder has run reliably for 12 hours.
@dvyukov

This comment has been minimized.

Show comment Hide comment
@dvyukov

dvyukov May 29, 2013

Member

Comment 20:

Great!
Member

dvyukov commented May 29, 2013

Comment 20:

Great!
@davecheney

This comment has been minimized.

Show comment Hide comment
@davecheney

davecheney May 30, 2013

Contributor

Comment 21:

Oops, I spoke too soon, issue #5594. However stability is much improved after 5554 was
closed, so 5594 is probably a unrelated crash.
Contributor

davecheney commented May 30, 2013

Comment 21:

Oops, I spoke too soon, issue #5594. However stability is much improved after 5554 was
closed, so 5594 is probably a unrelated crash.
@adg

This comment has been minimized.

Show comment Hide comment
@adg

adg Jun 5, 2013

Contributor

Comment 22:

This issue was closed by revision b88e87d911e1.

Contributor

adg commented Jun 5, 2013

Comment 22:

This issue was closed by revision b88e87d911e1.

@rogpeppe rogpeppe added fixed labels Jun 5, 2013

@rsc rsc added this to the Go1.1.1 milestone Apr 14, 2015

@rsc rsc removed the go1.1.1 label Apr 14, 2015

adg added a commit that referenced this issue May 11, 2015

[release-branch.go1.1] runtime: fix heap corruption during GC
««« CL 9831043 / e84e7204b01b
runtime: fix heap corruption during GC
The 'n' variable is used during rescan initiation in GC_END case,
but it's overwritten with chan capacity in GC_CHAN case.
As the result rescan is done with the wrong object size.
Fixes #5554.

R=golang-dev, khr
CC=golang-dev
https://golang.org/cl/9831043
»»»

R=dvyukov, khr, dave
CC=golang-dev
https://golang.org/cl/10028044

@gopherbot gopherbot locked and limited conversation to collaborators Jun 24, 2016

This issue was closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.