Skip to content

runtime: PreciseGC causing app to crash during longevity tests #5443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ugorji opened this issue May 11, 2013 · 19 comments
Closed

runtime: PreciseGC causing app to crash during longevity tests #5443

ugorji opened this issue May 11, 2013 · 19 comments

Comments

@ugorji
Copy link
Contributor

ugorji commented May 11, 2013

My longevity test crashes after handling a few thousand requests. By turning off
PreciseGC, the crashes stopped.

I have run my application under different linux kernels
- 3.5.0-26-generic
- 3.5.0-28-generic

I have used different versions of go1.1:
- 1.1rc1
- some older randomly picked version
- tip

I didn't notice this issue before because I ran my app with ab -n 2000 -c 16
http://localhost:8001/...

However, once I increased it to let it run for hours (e.g. set n to 2000000), it starts
crashing after handling a few thousand requests e.g. after 4957, 16688, 1721, 7820,
8406, 13981 requests. 

I've looked through my code over the last 2 days to see if there is something I'm doing,
and have seen nothing to suggest my code may be the culprit. As expected, I make
judicious use of sync.Mutex, sync.Cond and channels in my codebase, and I don't copy
sync.Mutex or sync.Cond anywhere.

However, when I switched turned off PreciseGC (by setting IgnorePreciseGC=1 in mgc0.c
and rebuilding), the crashes stopped, and my load test continued forever (till my HDD /
partition filled up - I just posted a message on the golang-devs about that). 

I use ulimit to prevent an issue in my code from locking up my whole machine (it
happened many times during development). So I run my app using sh run.sh, where run.sh
contains:
  ulimit -v 2000000 && ulimit -c unlimited && GOTRACEBACK=crash GOMAXPROCS=8 ./myapp

When I use gdb to look at the crash, I get that the crash typically happens around the
same point, but that point typically changes each time I reboot my machine. The points I
have found are:

---------------------------------------
(gdb) bt
#0  runtime.raise (sig=void) at /opt/go-tip/src/pkg/runtime/sys_linux_amd64.s:85
#1  0x000000000041aa1f in runtime.crash ()
#2  0x0000000000413bd7 in runtime.dopanic (unused=void) at
/opt/go-tip/src/pkg/runtime/panic.c:434
#3  0x0000000000413d17 in runtime.throw (s=void) at
/opt/go-tip/src/pkg/runtime/panic.c:472
#4  0x0000000000412cc7 in runtime.sigpanic () at
/opt/go-tip/src/pkg/runtime/os_linux.c:239
#5  0x000000000041b2eb in runtime.copy (to=void, fm=void, width=void, ret=void)
    at /opt/go-tip/src/pkg/runtime/slice.c:257
#6  0x000000000059cdcc in bufio.(*Reader).Read (b=0xc20021f720, p=..., n=1, err=...)
    at /opt/go-tip/src/pkg/bufio/bufio.go:156
#7  0x000000000043f327 in io.ReadAtLeast (r=..., buf=..., min=1, n=0, err=...) at
/opt/go-tip/src/pkg/io/io.go:284
#8  0x000000000043f47f in io.ReadFull (r=..., buf=..., n=4305628, err=...) at
/opt/go-tip/src/pkg/io/io.go:302
#9  0x000000000050fbcd in ugorji.net/ndb.(*client).readD (c=0xc200219870, numToRead=1,
expecting=0 '\000', bs=..., 
    cat=...) at /home/ugorji/depot/repo/src/ugorji.net/ndb/client.go:141
#10 0x000000000050ff6d in ugorji.net/ndb.(*client).readTag (c=0xc200219870, v=0 '\000')
    at /home/ugorji/depot/repo/src/ugorji.net/ndb/client.go:156
#11 0x0000000000511b52 in ugorji.net/ndb.(*Db).SvrQuery (l=0xc2001b0240, ctxId=...,
pkey=0xc2005e6a10, kind=..., 
    opts=0x0, filters=..., qs=..., err=void) at /home/ugorji/depot/repo/src/ugorji.net/ndb/client.go:421
#12 0x00000000005222cf in ugorji.net/ndb.func·001 (k3=..., cursor3=..., err3=...)
    at /home/ugorji/depot/repo/src/ugorji.net/ndb/low_level_driver.go:195
#13 0x00000000004232e0 in ?? () at /opt/go-tip/src/pkg/runtime/asm_amd64.s:278
#14 0x0000000000000000 in ?? ()
(gdb) q

------------------------------------
(gdb) bt
#0  runtime.raise (sig=void) at /opt/go-tip/src/pkg/runtime/sys_linux_amd64.s:85
#1  0x000000000041aa1f in runtime.crash ()
#2  0x0000000000413bd7 in runtime.dopanic (unused=void) at
/opt/go-tip/src/pkg/runtime/panic.c:434
#3  0x0000000000413d17 in runtime.throw (s=void) at
/opt/go-tip/src/pkg/runtime/panic.c:472
#4  0x0000000000412cc7 in runtime.sigpanic () at
/opt/go-tip/src/pkg/runtime/os_linux.c:239
#5  0x0000000000423c75 in runtime.memmove (to=void, fr=void, n=void)
    at /opt/go-tip/src/pkg/runtime/memmove_amd64.s:48
#6  0x000000000041b2dc in runtime.copy (to=void, fm=void, width=void, ret=void)
    at /opt/go-tip/src/pkg/runtime/slice.c:259
#7  0x000000000059ce3c in bufio.(*Reader).Read (b=0xc2003044e0, p=..., n=2, err=...)
    at /opt/go-tip/src/pkg/bufio/bufio.go:156
#8  0x000000000043f357 in io.ReadAtLeast (r=..., buf=..., min=2, n=0, err=...) at
/opt/go-tip/src/pkg/io/io.go:284
#9  0x000000000043f4af in io.ReadFull (r=..., buf=..., n=833226801153, err=...) at
/opt/go-tip/src/pkg/io/io.go:302
#10 0x000000000050fbfd in ugorji.net/ndb.(*client).readD (c=0xc20035bd20, numToRead=2,
expecting=0 '\000', bs=..., 
    cat=...) at /home/ugorji/depot/repo/src/ugorji.net/ndb/client.go:141
#11 0x000000000051009f in ugorji.net/ndb.(*client).readNumNoTag (c=0xc20035bd20, v=0)
    at /home/ugorji/depot/repo/src/ugorji.net/ndb/client.go:184
#12 0x00000000005106d0 in ugorji.net/ndb.(*client).readSliceNoTag (c=0xc20035bd20, v=...)
    at /home/ugorji/depot/repo/src/ugorji.net/ndb/client.go:226
#13 0x000000000051216b in ugorji.net/ndb.(*Db).SvrQuery (l=0xc2001ae240, ctxId=...,
pkey=0xc2006e2c10, kind=..., 
    opts=0x0, filters=..., qs=..., err=void) at /home/ugorji/depot/repo/src/ugorji.net/ndb/client.go:439
#14 0x00000000005222ff in ugorji.net/ndb.func·001 (k3=..., cursor3=..., err3=...)
    at /home/ugorji/depot/repo/src/ugorji.net/ndb/low_level_driver.go:195
#15 0x0000000000423310 in ?? () at /opt/go-tip/src/pkg/runtime/asm_amd64.s:278
#16 0x0000000000000000 in ?? ()

------------------------------------

(gdb) bt
#0  runtime.raise (sig=void) at /opt/go-tip/src/pkg/runtime/sys_linux_amd64.s:85
#1  0x000000000041aa1f in runtime.crash ()
#2  0x0000000000413bd7 in runtime.dopanic (unused=void) at
/opt/go-tip/src/pkg/runtime/panic.c:434
#3  0x0000000000413d17 in runtime.throw (s=void) at
/opt/go-tip/src/pkg/runtime/panic.c:472
#4  0x0000000000412cc7 in runtime.sigpanic () at
/opt/go-tip/src/pkg/runtime/os_linux.c:239
#5  0x00000000007e2881 in string.* ()
#6  0x000000000059c8cc in bufio.(*Reader).fill (b=0xc2002d7e40) at
/opt/go-tip/src/pkg/bufio/bufio.go:79
#7  0x000000000059ccd0 in bufio.(*Reader).Read (b=0xc2002d7e40, p=..., n=1, err=...)
    at /opt/go-tip/src/pkg/bufio/bufio.go:147
#8  0x000000000043f327 in io.ReadAtLeast (r=..., buf=..., min=1, n=0, err=...) at
/opt/go-tip/src/pkg/io/io.go:284
#9  0x000000000043f47f in io.ReadFull (r=..., buf=..., n=4305628, err=...) at
/opt/go-tip/src/pkg/io/io.go:302
#10 0x000000000050fbcd in ugorji.net/ndb.(*client).readD (c=0xc200254640, numToRead=1,
expecting=0 '\000', bs=..., 
    cat=...) at /home/ugorji/depot/repo/src/ugorji.net/ndb/client.go:141
#11 0x000000000050ff6d in ugorji.net/ndb.(*client).readTag (c=0xc200254640, v=0 '\000')
    at /home/ugorji/depot/repo/src/ugorji.net/ndb/client.go:156
#12 0x0000000000511b52 in ugorji.net/ndb.(*Db).SvrQuery (l=0xc2001af240, ctxId=...,
pkey=0xc20055b830, kind=..., 
    opts=0x0, filters=..., qs=..., err=void) at /home/ugorji/depot/repo/src/ugorji.net/ndb/client.go:421
#13 0x00000000005222cf in ugorji.net/ndb.func·001 (k3=..., cursor3=..., err3=...)
    at /home/ugorji/depot/repo/src/ugorji.net/ndb/low_level_driver.go:195
#14 0x00000000004232e0 in ?? () at /opt/go-tip/src/pkg/runtime/asm_amd64.s:278
#15 0x0000000000000000 in ?? ()
(gdb) 

-------------------------------------

(gdb) bt
#0  runtime.raise (sig=void) at /opt/go-tip/src/pkg/runtime/sys_linux_amd64.s:85
#1  0x000000000041aa1f in runtime.crash ()
#2  0x0000000000413bd7 in runtime.dopanic (unused=void) at
/opt/go-tip/src/pkg/runtime/panic.c:434
#3  0x0000000000413d17 in runtime.throw (s=void) at
/opt/go-tip/src/pkg/runtime/panic.c:472
#4  0x0000000000412cc7 in runtime.sigpanic () at
/opt/go-tip/src/pkg/runtime/os_linux.c:239
#5  0x00000000004232e0 in ?? () at /opt/go-tip/src/pkg/runtime/asm_amd64.s:278
#6  0x00007f704c166100 in ?? ()
#7  0x00007f704c166fb8 in ?? ()
#8  0x00007f704c1660f8 in ?? ()
#9  0x000000000040b4c7 in runtime.MCentral_AllocList (c=void, n=void, pfirst=void)
    at /opt/go-tip/src/pkg/runtime/mcentral.c:62
#10 0x00007f705e3a7000 in ?? ()
#11 0x00007f705e3f7000 in ?? ()
#12 0x00007f704c1db000 in ?? ()
#13 0x00007f704c1e3000 in ?? ()
#14 0x00007f704c192000 in ?? ()
#15 0x00007f704c173000 in ?? ()
#16 0x00007f704c1c9000 in ?? ()
#17 0x00007f704c0b6000 in ?? ()
#18 0x00007f704c0d2000 in ?? ()
#19 0x0000000000000006 in ?? ()
#20 0x0000000000000280 in ?? ()
#21 0x000000000041ae99 in runtime.appendstr (t=void, x=void, y=void, ret=void)
    at /opt/go-tip/src/pkg/runtime/slice.c:147
#22 0x2010000000000076 in ?? ()
#23 0x0000000000000001 in ?? ()
#24 0x0000000000000003 in ?? ()
#25 0x0000000000410503 in runtime.markallocated (v=void, n=void, noptr=void)
    at /opt/go-tip/src/pkg/runtime/mgc0.c:2213
#26 0x000000000041e103 in runtime.mallocgc (size=void, flag=void, dogc=void, zeroed=void)
    at /opt/go-tip/src/pkg/runtime/malloc.goc:88
#27 0x000000000041e103 in runtime.mallocgc (size=void, flag=void, dogc=void, zeroed=void)
    at /opt/go-tip/src/pkg/runtime/malloc.goc:88
#28 0x0000000000421d5b in gostringsize (ret=void, l=void) at
/opt/go-tip/src/pkg/runtime/string.goc:48
#29 0x0000000000422411 in runtime.slicebytetostring (b=void, s=void) at
/opt/go-tip/src/pkg/runtime/string.goc:274
#30 0x000000000046533e in strconv.quoteWith (s=640, quote=43 '+', ASCIIonly=false,
~anon3=...)
    at /opt/go-tip/src/pkg/strconv/quote.go:82
#31 0x00000000004660c9 in strconv.Quote (s=..., ~anon1=...) at
/opt/go-tip/src/pkg/strconv/quote.go:91
#32 0x000000000042a6ca in fmt.(*fmt).fmt_q (f=0x2010000000017902, s=...) at
/opt/go-tip/src/pkg/fmt/format.go:325
---Type <return> to continue, or q <return> to quit---
#33 0x000000c2004de468 in ?? ()
#34 0x2010000000017902 in ?? ()
#35 0x000000000000000a in ?? ()
#36 0x0000000000000000 in ?? ()
(gdb) 

-----------------------------

(gdb) bt
#0  runtime.raise (sig=void) at /opt/go-tip/src/pkg/runtime/sys_linux_amd64.s:85
#1  0x000000000041aa1f in runtime.crash ()
#2  0x0000000000413bd7 in runtime.dopanic (unused=void) at
/opt/go-tip/src/pkg/runtime/panic.c:434
#3  0x0000000000413d17 in runtime.throw (s=void) at
/opt/go-tip/src/pkg/runtime/panic.c:472
#4  0x0000000000412ca7 in runtime.sigpanic () at
/opt/go-tip/src/pkg/runtime/os_linux.c:239
#5  0x000000000040a8e7 in runtime.efacethash (e1=void, ret=void) at
/opt/go-tip/src/pkg/runtime/iface.c:670
#6  0x000000000042f042 in fmt.(*pp).printField (p=0xc200130d00, field=..., verb=118,
plus=false, goSyntax=false, 
    depth=0, wasString=false) at /opt/go-tip/src/pkg/fmt/print.go:765
#7  0x00000000004348c6 in fmt.(*pp).doPrintf (p=..., format=..., a=...) at
/opt/go-tip/src/pkg/fmt/print.go:1111
#8  0x000000000042bfa5 in fmt.Sprintf (format=..., a=..., ~anon2=...) at
/opt/go-tip/src/pkg/fmt/print.go:229
#9  0x0000000000534d4a in ugorji.net/logging.logR (calldepth=2 '\002', ctx=...,
level=1000, message=..., 
    params=..., err=void) at /home/ugorji/depot/repo/src/ugorji.net/logging/logging.go:264
#10 0x0000000000535053 in ugorji.net/logging.Trace (ctx=..., message=..., params=...,
~anon3=...)
    at /home/ugorji/depot/repo/src/ugorji.net/logging/logging.go:284
#11 0x0000000000511b9a in ugorji.net/ndb.(*Db).SvrQuery (l=0xc2001b0240, ctxId=...,
pkey=0xc200968630, kind=..., 
    opts=0x0, filters=..., qs=..., err=void) at /home/ugorji/depot/repo/src/ugorji.net/ndb/client.go:422
#12 0x000000000052235f in ugorji.net/ndb.func·001 (k3=..., cursor3=..., err3=...)
    at /home/ugorji/depot/repo/src/ugorji.net/ndb/low_level_driver.go:195
#13 0x00000000004232e0 in ?? () at /opt/go-tip/src/pkg/runtime/asm_amd64.s:278
#14 0x0000000000000000 in ?? ()

------------------------------------

Which compiler are you using (5g, 6g, 8g, gccgo)?
6g

Which operating system are you using?
Linux 3.5.0-28-generic #42-Ubuntu SMP Fri Mar 8 23:18:20 UTC 2013 x86_64 x86_64 x86_64
GNU/Linux


Which version are you using?  (run 'go version')
go version devel +30c566874b83 Wed May 08 16:06:25 2013 -0700 linux/amd64

Please provide any additional information below.
@bradfitz
Copy link
Contributor

Comment 1:

Do you have code you can share?
Or can you reduce your crash down to a more minimal repro that you can then share?

Status changed to WaitingForReply.

@ugorji
Copy link
Contributor Author

ugorji commented May 12, 2013

Comment 2:

I have been trying for 2 days, but can't narrow it down. The crash point changes like
every day, so it's hard to pin-point, and the crash doesn't come from a specific place
in my code.
The only constant is that the crash disappears once I disabled PreciseGC.
If it will help, I can build and privately share my the application binaries (both the
app and the datastore), so you can reproduce in-house.

@ugorji
Copy link
Contributor Author

ugorji commented May 12, 2013

Comment 3:

I can reproduce this now even without concurrent access.
I added these lines to my code (paraphrased as):
    func logR(calldepth uint8, ctx interface{}, level Level, message string, params ...interface{},
    ) (err error) {
    //snip ...
        fmt.Printf("====> Params: len: %d\n", len(params))
        fmt.Printf("====> %#v\n", params)
        r.Message = fmt.Sprintf(message, params...)
    //snip ...
    }
#### I attached logging.go in case that will help.
In my code, I print out the arguments to Sprintf. Since it's always the same request,
the same sequence of lines should be printed. I then request a page in a loop. T
Notice that, at the time the crash happens, we should have been printing out 
[]interface {}{interface {}(nil)}
But at a certain point, accessing that within fmt.Sprintf crashes with a unexpected
fault address 0x0.
Like I mentioned earlier, this is not limited to fmt.Sprintf. The crash point has
changed a number of 
times, but has been constant at this point the last few days.
I've tried to trigger this standalone (ie code that loops forever and does this sprintf
of a []interface{}{nil}... and calls runtime.GC() each time, but can't trigger it
standalone. This last time during my longevity test, it triggered it with no concurrent
requests after 47275 requests.
====> []interface {}{"/p/core/topic/1688849860329472"}
====> Params: len: 1
====> []interface {}{"topic"}
====> Params: len: 4
====> []interface {}{"TP", "", 1688849860329472, interface {}(nil)}
====> Params: len: 4
====> []interface {}{1688849860329472, 0x6, 0x1, 1688849860329472}
====> Params: len: 2
====> []interface {}{(*ndb.Key)(0xc2003bdb90), []uint8{0x20, 0x6, 0x0, 0x0, 0x0, 0x1,
0x8e, 0x2}}
====> Params: len: 2
====> []interface {}{"TC", ""}
====> Params: len: 2
====> []interface {}{"TC", ""}
====> Params: len: 0
====> []interface {}(nil)
====> Params: len: 4
====> []interface {}{(*ndb.Key)(0xc2003bdb90), "TC", (*app.QueryOpts)(nil),
[]*app.QueryFilter(nil)}
====> Params: len: 1
====> []interface {}{[]uint8{0x20, 0x6, 0x0, 0x0, 0x0, 0x1, 0x8e, 0x2}}
====> Params: len: 2
====> []interface {}{1, []uint8{0x20, 0x6, 0x0, 0x0, 0x0, 0x1, 0x8e, 0x2}}
====> Params: len: 2
====> []interface {}{8, []uint8{0x6, 0x0, 0x8}}
====> Params: len: 2
====> []interface {}{0, []uint8{0x6, 0x0, 0x0}}
====> Params: len: 1
====> []interface {}{interface {}(nil)}
<snip>
====> []interface {}{"/p/core/topic/1688849860329472"}
====> Params: len: 1
====> []interface {}{"topic"}
====> Params: len: 4
====> []interface {}{"TP", "", 1688849860329472, interface {}(nil)}
====> Params: len: 4
====> []interface {}{1688849860329472, 0x6, 0x1, 1688849860329472}
====> Params: len: 2
====> []interface {}{(*ndb.Key)(0xc200585190), []uint8{0x20, 0x6, 0x0, 0x0, 0x0, 0x1,
0x8e, 0x2}}
====> Params: len: 2
====> []interface {}{"TC", ""}
====> Params: len: 2
====> []interface {}{"TC", ""}
====> Params: len: 0
====> []interface {}(nil)
====> Params: len: 4
====> []interface {}{(*ndb.Key)(0xc200585190), "TC", (*app.QueryOpts)(nil),
[]*app.QueryFilter(nil)}
====> Params: len: 1
====> []interface {}{[]uint8{0x20, 0x6, 0x0, 0x0, 0x0, 0x1, 0x8e, 0x2}}
====> Params: len: 2
====> []interface {}{1, []uint8{0x20, 0x6, 0x0, 0x0, 0x0, 0x1, 0x8e, 0x2}}
====> Params: len: 2
====> []interface {}{8, []uint8{0x6, 0x0, 0x8}}
====> Params: len: 2
====> []interface {}{0, []uint8{0x6, 0x0, 0x0}}
====> Params: len: 1
unexpected fault address 0x0
fatal error: fault
[signal 0xb code=0x80 addr=0x0 pc=0x409682]
goroutine 47300 [running]:
[fp=0x7f6e2f967328] runtime.throw(0x9c5297)
    /opt/go-tip/src/pkg/runtime/panic.c:473 +0x67
[fp=0x7f6e2f967340] runtime.sigpanic()
    /opt/go-tip/src/pkg/runtime/os_linux.c:239 +0xe7
[fp=0x7f6e2f9673d8] itab(0x6fe1a0, 0xdeaddeaddeaddead, 0x1)
    /opt/go-tip/src/pkg/runtime/iface.c:46 +0x62
[fp=0x7f6e2f9673f8] runtime.assertE2I2(0x6fe1a0, 0xdeaddeaddeaddead, 0xdeaddeaddeaddead,
0x0, 0x0, ...)
    /opt/go-tip/src/pkg/runtime/iface.c:507 +0x5c
[fp=0x7f6e2f967480] fmt.(*pp).handleMethods(0xc2003570d0, 0x10000000076, 0x1,
0xdeaddeaddead0000)
    /opt/go-tip/src/pkg/fmt/print.go:673 +0x69
[fp=0x7f6e2f9674f8] fmt.(*pp).printValue(0xc2003570d0, 0x69f860, 0xc2003bd030, 0x146,
0x10000000076, ...)
    /opt/go-tip/src/pkg/fmt/print.go:847 +0x26e
[fp=0x7f6e2f967f80] fmt.(*pp).printReflectValue(0xc2003570d0, 0x692e40, 0xc200587040,
0x172, 0x10000000076, ...)
    /opt/go-tip/src/pkg/fmt/print.go:985 +0xfa4
----- stack segment boundary -----
[fp=0x7f6e1c0262f0] fmt.(*pp).printField(0xc2003570d0, 0x692e40, 0xc200587040,
0x10000000076, 0x0, ...)
    /opt/go-tip/src/pkg/fmt/print.go:813 +0x3d1
[fp=0x7f6e1c026670] fmt.(*pp).doPrintf(0xc2003570d0, 0x7af6d0, 0xa, 0x7f6e1c0267b0, 0x1,
...)
    /opt/go-tip/src/pkg/fmt/print.go:1111 +0x1276
[fp=0x7f6e1c0266c0] fmt.Fprintf(0xc2000f44b0, 0xc200000008, 0x7af6d0, 0xa,
0x7f6e1c0267b0, ...)
    /opt/go-tip/src/pkg/fmt/print.go:214 +0x80
[fp=0x7f6e1c026718] fmt.Printf(0x7af6d0, 0xa, 0x7f6e1c0267b0, 0x1, 0x1, ...)
    /opt/go-tip/src/pkg/fmt/print.go:223 +0x8c
[fp=0x7f6e1c0269d8] ugorji.net/logging.logR(0xc2001cee02, 0x76c340, 0xc200358d20,
0xc2001c03e9, 0x80ef70, ...)
    /home/ugorji/depot/repo/src/ugorji.net/logging/logging.go:277 +0x79f
[fp=0x7f6e1c026a38] ugorji.net/logging.Trace(0x76c340, 0xc200358d20, 0x80ef70, 0x2f,
0xc2003bd030, ...)
    /home/ugorji/depot/repo/src/ugorji.net/logging/logging.go:303 +0x83
[fp=0x7f6e1c026e70] ugorji.net/ndb.(*Db).SvrQuery(0xc2001aa240, 0x76c340, 0xc200358d20,
0xc200585190, 0x7a0a00, ...)
    /home/ugorji/depot/repo/src/ugorji.net/ndb/client.go:423 +0x922
[fp=0x7f6e1c026f88] ugorji.net/ndb.func·001(0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /home/ugorji/depot/repo/src/ugorji.net/ndb/low_level_driver.go:195 +0x16c
----- stack segment boundary -----
[fp=0x7f6e2fa73420] ugorji.net/db.QuerySupport(0xc2001e8640, 0xc200358d20, 0xc200298be0,
0x19, 0x7a0a00, ...)
    /home/ugorji/depot/repo/src/ugorji.net/db/datastore.go:760 +0xac0
[fp=0x7f6e2fa73580] ugorji.net/ndb.LowLevelDriver.Query(0xc2001ca080, 0xc2001a8400,
0xc2001aa240, 0xc2001c39a0, 0xc2001c39a0, ...)
    /home/ugorji/depot/repo/src/ugorji.net/ndb/low_level_driver.go:208 +0x434
[fp=0x7f6e2fa73638] ugorji.net/ndb.(*LowLevelDriver).Query(0xc2001c9420, 0xc2001e8640,
0xc200358d20, 0xc20023b240, 0xc200585190, ...)
    /home/ugorji/depot/repo/src/ugorji.net/ndb/blob.go:0 +0x173
[fp=0x7f6e2fa736d0] ugorji.net/baseapp.(*tlld).Query(0xc200124e20, 0xc2001e8640,
0xc200358d20, 0xc20023b240, 0xc200585190, ...)
    /home/ugorji/depot/repo/src/ugorji.net/baseapp/baseapp.go:0 +0x117
[fp=0x7f6e2fa738f8] ugorji.net/richapp.TopicView(0xc2001e8640, 0xc200358d20,
0xc2001e8680, 0xc200358d50, 0xc2003571a0, ...)
    /home/ugorji/depot/repo/src/ugorji.net/richapp/actions.go:106 +0x2a3
[fp=0x7f6e2fa73938] ugorji.net/web.HandlerFunc.HandleHttp(0x833f58, 0xc2001e8640,
0xc200358d20, 0xc2001e8680, 0xc200358d50, ...)
    /home/ugorji/depot/repo/src/ugorji.net/web/webrouter.go:132 +0x57
[fp=0x7f6e2fa739b0] ugorji.net/web.Dispatch(0xc2001e8640, 0xc200358d20, 0xc2001b42a0,
0xc2001e8680, 0xc200358d50, ...)
    /home/ugorji/depot/repo/src/ugorji.net/web/webrouter.go:290 +0x191
[fp=0x7f6e2fa73ae0] ugorji.net/baseapp.HTTPHandler.ServeHTTP(0xc2001a95a0, 0xc2001d2bb0,
0xc2002149c0, 0x0, 0xc2003c2840, ...)
    /home/ugorji/depot/repo/src/ugorji.net/baseapp/baseapp.go:338 +0x627
[fp=0x7f6e2fa73b20] ugorji.net/baseapp.(*HTTPHandler).ServeHTTP(0xc2002149e0,
0xc2003c2840, 0xc2004498c0, 0xc2003571a0)
    /home/ugorji/depot/repo/src/ugorji.net/baseapp/baseapp.go:0 +0xa8
[fp=0x7f6e2fa73da8] ugorji.net/blackannex/server.func·003(0xc2003c2840, 0xc2004498c0,
0xc2003571a0)
    /home/ugorji/depot/repo/src/ugorji.net/blackannex/server/local.go:156 +0xa6c
[fp=0x7f6e2fa73dc8] net/http.HandlerFunc.ServeHTTP(0xc200206c90, 0xc2003c2840,
0xc2004498c0, 0xc2003571a0)
    /opt/go-tip/src/pkg/net/http/server.go:1149 +0x3e
[fp=0x7f6e2fa73e10] ugorji.net/web.(*Server).ServeHTTP(0xc2001ee180, 0xc2003c2840,
0xc2004498c0, 0xc2003571a0)
    /home/ugorji/depot/repo/src/ugorji.net/web/server.go:88 +0x1a5
[fp=0x7f6e2fa73e50] net/http.serverHandler.ServeHTTP(0xc2001ce4b0, 0xc2003c2840,
0xc2004498c0, 0xc2003571a0)
    /opt/go-tip/src/pkg/net/http/server.go:1517 +0x16c
[fp=0x7f6e2fa73fb0] net/http.(*conn).serve(0xc20035a240)
    /opt/go-tip/src/pkg/net/http/server.go:1096 +0x765
[fp=0x7f6e2fa73fb8] runtime.goexit()
    /opt/go-tip/src/pkg/runtime/proc.c:1223
created by net/http.(*Server).Serve
    /opt/go-tip/src/pkg/net/http/server.go:1564 +0x266
<snip>
##### gdb bt output from the core file
#0  runtime.raise (sig=void) at /opt/go-tip/src/pkg/runtime/sys_linux_amd64.s:85
#1  0x000000000041aa1f in runtime.crash ()
#2  0x0000000000413bd7 in runtime.dopanic (unused=void) at
/opt/go-tip/src/pkg/runtime/panic.c:434
#3  0x0000000000413d17 in runtime.throw (s=void) at
/opt/go-tip/src/pkg/runtime/panic.c:472
#4  0x0000000000412cc7 in runtime.sigpanic () at
/opt/go-tip/src/pkg/runtime/os_linux.c:239
#5  0x0000000000409682 in itab (inter=void, type=void, canfail=void) at
/opt/go-tip/src/pkg/runtime/iface.c:43
#6  0x000000000040a3dc in runtime.assertE2I2 (inter=void, e=void, ret=void, ok=void)
    at /opt/go-tip/src/pkg/runtime/iface.c:507
#7  0x000000000042ea89 in fmt.(*pp).handleMethods (p=0xc2003570d0, verb=118, plus=false,
goSyntax=true, depth=1, 
    wasString=false, handled=false) at /opt/go-tip/src/pkg/fmt/print.go:673
#8  0x000000000042fcbe in fmt.(*pp).printValue (p=..., value=..., verb=118, plus=false,
goSyntax=true, depth=1, 
    wasString=70) at /opt/go-tip/src/pkg/fmt/print.go:847
#9  0x0000000000430d14 in fmt.(*pp).printReflectValue (p=..., value=..., verb=118,
plus=false, goSyntax=true, 
    depth=0, wasString=false) at /opt/go-tip/src/pkg/fmt/print.go:985
#10 0x00000000004232e0 in ?? () at /opt/go-tip/src/pkg/runtime/asm_amd64.s:278
#11 0x000000c2003570d0 in ?? ()
#12 0x0000000000692e40 in egcdata ()
#13 0x000000c200587040 in ?? ()
#14 0x0000000000000172 in ?? ()
#15 0x0000010000000076 in ?? ()
#16 0x0000000000000000 in ?? ()
(gdb) q

Attachments:

  1. logging.go (8780 bytes)

@ianlancetaylor
Copy link
Contributor

Comment 4:

Though it's hard to be sure, the symptoms strongly suggest some sort of memory
corruption.  Precise GC significantly exacerbates the effect of corruption.  Doing
things like running Sprintf in a loop are unlikely to make a difference.  What we need
to figure out is what is causing the corruption in the first place.
Does your program use cgo at all?
Does the problem happen when you set GOMAXPROCS = 1?
It may help to set Debug to 2 near the top of pkg/runtime/mgc0.c.  That will cause a lot
of printing, but will also detect heap corruption earlier.
It is likely that we will need to have a test case that we can reproduce on our own
machines, however we get there.

Labels changed: added priority-asap, removed priority-triage.

Status changed to Accepted.

@ugorji
Copy link
Contributor Author

ugorji commented May 13, 2013

Comment 5:

My program doesn't use cgo.
I just ran with GOMAXPROCS=1. It takes a much longer time to get the crash (after 76,000
requests are handled), and I see the crash when I send log output stderr (I think stderr
introduces some more timing issues that help see the crash). I've attached the log once
the crash occurred. The gdb out is shown inline below (note it is different). 
I'd also run w/ Debug=2 now and see if something jumps out / makes sense to me.
If not, I can package the built app and send to you privately along with the
dependencies so you can reproduce on your machines. What email address can I send it to?

Attachments:

  1. stack-and-gdb-out-GOMAXPROCS-eq-1.txt (56204 bytes)

@ugorji
Copy link
Contributor Author

ugorji commented May 13, 2013

Comment 6:

I just ran w/ Debug=2, GOMAXPROCS=8, no logging to stderr, and see the crash after 3851
requests.
I've attached the stderr output which is just the output from Debug=2. I don't know how
to interpret it. Also attached the gdb output.

Attachments:

  1. err.txt (6575094 bytes)
  2. gdb.txt (3171 bytes)

@dvyukov
Copy link
Member

dvyukov commented May 13, 2013

Comment 7:

Please patch in https://golang.org/cl/8724043
And run with it.

@ugorji
Copy link
Contributor Author

ugorji commented May 13, 2013

Comment 8:

With Dmitri's electric-fence patch,
It made things slower. Saw a lot of stalls during run. Crash happened much faster, after
135 requests. Different crash point.
(gdb) bt
#0  runtime.raise (sig=void) at /opt/go-tip/src/pkg/runtime/sys_linux_amd64.s:85
#1  0x000000000041a4ff in runtime.crash ()
#2  0x00000000004136b7 in runtime.dopanic (unused=void) at
/opt/go-tip/src/pkg/runtime/panic.c:434
#3  0x00000000004137f7 in runtime.throw (s=void) at
/opt/go-tip/src/pkg/runtime/panic.c:472
#4  0x000000000040bacc in runtime.SysMap (v=void, n=void) at
/opt/go-tip/src/pkg/runtime/mem_linux.c:121
#5  0x0000000000410861 in runtime.MHeap_MapBits (h=void) at
/opt/go-tip/src/pkg/runtime/mgc0.c:2412
#6  0x000000000041e761 in runtime.MHeap_SysAlloc (h=void, n=void) at
/opt/go-tip/src/pkg/runtime/malloc.goc:442
#7  0x0000000000422d10 in ?? () at /opt/go-tip/src/pkg/runtime/asm_amd64.s:278
#8  0x00007f44c942f000 in ?? ()
#9  0x0000000000100000 in ?? ()
#10 0x00007f44bd8cf100 in ?? ()
#11 0x00007f44bd8cffb8 in ?? ()
#12 0x00007f44bd8cf0f0 in ?? ()
#13 0x0000000000411035 in MHeap_Grow (h=void, npage=void) at
/opt/go-tip/src/pkg/runtime/mheap.c:224
#14 0x0000000000000000 in ?? ()

@dvyukov
Copy link
Member

dvyukov commented May 13, 2013

Comment 9:

Unfortunately it's just out-of-memory crash.

@ianlancetaylor
Copy link
Contributor

Comment 10:

What would be infinitely more helpful than the built binary would be the source code, so
that we can rebuild the binary ourselves with a patched runtime.  Is that a possibility?

@ugorji
Copy link
Contributor Author

ugorji commented May 13, 2013

Comment 11:

That's fair. I realize that's the best course of action at this point.
I'm not comfortable with it. However, I will share the source with the understanding
that only a very few specific people working on this issue will see the source and their
copies will be deleted once the issue is resolved. I will put the source up on google
drive, and grant access to the google accounts of those folks.
If that is ok, please tell me your email and I will follow up offline.

@dvyukov
Copy link
Member

dvyukov commented May 14, 2013

Comment 12:

Please apply https://golang.org/cl/8517044
and try with.
It's dirty and triggers compiler warnings, it's fine.

@ugorji
Copy link
Contributor Author

ugorji commented May 14, 2013

Comment 13:

Ran with Dmitri's patch @ 8517044
stderr attached (contains some useful information, I think).
gdb output:
(gdb) bt
#0  runtime.raise (sig=void) at /opt/go-tip/src/pkg/runtime/sys_linux_amd64.s:85
#1  0x000000000041b85f in runtime.crash ()
#2  0x00000000004149e7 in runtime.dopanic (unused=void) at
/opt/go-tip/src/pkg/runtime/panic.c:434
#3  0x0000000000414b27 in runtime.throw (s=void) at
/opt/go-tip/src/pkg/runtime/panic.c:472
#4  0x0000000000413ad7 in runtime.sigpanic () at
/opt/go-tip/src/pkg/runtime/os_linux.c:239
#5  0x000000000040a907 in runtime.efacethash (e1=void, ret=void) at
/opt/go-tip/src/pkg/runtime/iface.c:670
#6  0x000000000042ff72 in fmt.(*pp).printField (p=0xc2001d8820, field=..., verb=118,
plus=false, goSyntax=false, 
    depth=0, wasString=false) at /opt/go-tip/src/pkg/fmt/print.go:765
#7  0x00000000004357f6 in fmt.(*pp).doPrintf (p=..., format=..., a=...) at
/opt/go-tip/src/pkg/fmt/print.go:1111
#8  0x000000000042ced5 in fmt.Sprintf (format=..., a=..., ~anon2=...) at
/opt/go-tip/src/pkg/fmt/print.go:229
#9  0x0000000000527137 in ugorji.net/logging.logR (calldepth=2 '\002', ctx=...,
level=101 'e', message=..., 
    params=..., err=void) at /home/ugorji/depot/repo/src/ugorji.net/logging/logging.go:280
#10 0x0000000000527441 in ugorji.net/logging.Trace (ctx=..., message=..., params=...,
~anon3=...)
    at /home/ugorji/depot/repo/src/ugorji.net/logging/logging.go:305
#11 0x0000000000504a7a in ugorji.net/ndb.(*Db).SvrQuery (l=0xc2001ae240, ctxId=...,
pkey=0xc2006fcf60, kind=..., 
    opts=0x0, filters=..., qs=..., err=void) at /home/ugorji/depot/repo/src/ugorji.net/ndb/client.go:420
#12 0x000000000051529f in ugorji.net/ndb.func·001 (k3=..., cursor3=..., err3=...)
    at /home/ugorji/depot/repo/src/ugorji.net/ndb/low_level_driver.go:195
#13 0x0000000000424290 in ?? () at /opt/go-tip/src/pkg/runtime/asm_amd64.s:278
#14 0x0000000000000000 in ?? ()

Attachments:

  1. err.txt (70550 bytes)

@ianlancetaylor
Copy link
Contributor

Comment 14:

Sure, I can promise to only share the source with a couple of people, and to delete it
when done.  I can also sign an NDA if you prefer.  My e-mail address is iant@golang.org.

@ugorji
Copy link
Contributor Author

ugorji commented May 15, 2013

Comment 15:

Thanks Ian. Your word will suffice for now.
I've shared the source with you and Dmitry.

@ugorji
Copy link
Contributor Author

ugorji commented May 15, 2013

Comment 16:

I have verified fix by Dmitry: https://golang.org/cl/9372044 
Thanks Dmitry.

@ianlancetaylor
Copy link
Contributor

Comment 17:

Labels changed: added go1.1.1.

@dvyukov
Copy link
Member

dvyukov commented May 15, 2013

Comment 18:

This issue was closed by revision c6293d2.

Status changed to Fixed.

@adg
Copy link
Contributor

adg commented Jun 5, 2013

Comment 19:

This issue was closed by revision b7405fc44ff7.

@rsc rsc added this to the Go1.1.1 milestone Apr 14, 2015
@rsc rsc removed the go1.1.1 label Apr 14, 2015
adg added a commit that referenced this issue May 11, 2015
««« CL 9372044 / 1abed5873071
runtime: fix GC scanning of slices
If a slice points to an array embedded in a struct,
the whole struct can be incorrectly scanned as the slice buffer.
Fixes #5443.

R=cshapiro, iant, r, cshapiro, minux.ma
CC=bradfitz, gobot, golang-dev
https://golang.org/cl/9372044
»»»

R=cshapiro, iant
CC=golang-dev
https://golang.org/cl/10027043
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants