New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: large core files #28081

Open
randall77 opened this Issue Oct 8, 2018 · 9 comments

Comments

Projects
None yet
6 participants
@randall77
Contributor

randall77 commented Oct 8, 2018

package main

var p *int

func main() {
	*p = 0
}

Run with

$ ulimit -c unlimited
$ GOTRACEBACK=crash go run test.go
$ ls -l core

Go 1.11 and tip generate a ~100MB core. Go 1.10 and earlier only generate a ~2MB core.
Probably related to the arena changes in 1.11.

@aclements @heschik @hyangah

@mknyszek

This comment has been minimized.

Contributor

mknyszek commented Oct 8, 2018

Reproduced locally, but if I stat the file I'm pretty sure it's appearing as sparse.

While the reported size is 100MB, (# of blocks) * (# of bytes per block) = 8640 * 4096 ~= 34K for me.

@aclements

This comment has been minimized.

Member

aclements commented Oct 8, 2018

Almost certainly related to the arena changes.

These core files are large, but are very sparse, so their actual on-disk size is much smaller.

@aclements

This comment has been minimized.

Member

aclements commented Oct 8, 2018

If we care about fixing this, we probably need to map the arenas PROT_NONE initially and incrementally re-map them read/write. That means we can't just make a giant span out of each new arena, but perhaps that's fine. We'd have to keep track of how far we'd mapped, but if we're willing to only keep the latest arena partially mapped, that could be a single value in mheap.

@randall77

This comment has been minimized.

Contributor

randall77 commented Oct 8, 2018

Ok, it sounds like disk space is not a huge issue.
The core dump reader does mmap the file, so it burns address space in the reader. That's not a big deal, as long as it's not also burning physical memory. Is mmap smart enough to share zero pages when mmap'ing a sparse file? I hope so.
(The reader should ideally never access the zero pages, so linux might never get to that decision point.)

@aclements

This comment has been minimized.

Member

aclements commented Oct 8, 2018

Huh, it's not smart enough to zero-share those pages, even if it's a read-only shared mapping.

package main

import (
	"log"
	"os"
	"syscall"
	"time"
)

func main() {
	const size = 512 << 20
	f, _ := os.Create("/tmp/sparse")
	f.Truncate(size)
	p, _ := syscall.Mmap(int(f.Fd()), 0, size, syscall.PROT_READ, syscall.MAP_SHARED)
	sum := 0
	for i := range p {
		sum += int(p[i])
	}
	if sum != 0 {
		log.Fatal(sum)
	}
	time.Sleep(10 * time.Second)
}
@gaolizhou

This comment has been minimized.

gaolizhou commented Oct 11, 2018

Hi Guys, Here is the backtrace for the 100M memory malloc.

[root@10-25-51-148 ~]# pmap -x `pgrep mem-test`
17590:   ./mem-test
Address           Kbytes     RSS   Dirty Mode  Mapping
0000000000400000     316     244       0 r-x-- mem-test
000000000044f000     416     100       0 r---- mem-test
00000000004b7000      12      12      12 rw--- mem-test
00000000004ba000     124      36      36 rw---   [ anon ]
000000c000000000   65536    2048    2048 rw---   [ anon ]
00007f0f11962000   35264    2168    2168 rw---   [ anon ]
00007ffe11d76000     132      16      16 rw---   [ stack ]
00007ffe11dec000       8       4       0 r-x--   [ anon ]
ffffffffff600000       4       0       0 r-x--   [ anon ]
---------------- ------- ------- ------- 
total kB          101812    4628    4280
(gdb) bt
#0  runtime.sysMmap () at /home/lizgao/go/src/runtime/sys_linux_amd64.s:450
#1  0x0000000000403261 in runtime.mmap (addr=0xc000000000, fd=-1, flags=34, n=67108864, off=0, prot=0, ~r6=, ~r7=)
    at /home/lizgao/go/src/runtime/cgo_mmap.go:39
#2  0x0000000000410d6e in runtime.sysReserve (n=, v=, ~r2=) at /home/lizgao/go/src/runtime/mem_linux.go:144
#3  0x00000000004091b9 in runtime.(*mheap).sysAlloc (h=0x4be660 , n=67108864, size=0, v=0x0) at /home/lizgao/go/src/runtime/malloc.go:541
#4  0x000000000041c372 in runtime.(*mheap).grow (h=0x4be660 , npage=, ~r1=) at /home/lizgao/go/src/runtime/mheap.go:919
#5  0x000000000041c1f7 in runtime.(*mheap).allocSpanLocked (h=0x4be660 , npage=1, stat=0x4d6c68 , ~r2=)
    at /home/lizgao/go/src/runtime/mheap.go:848
#6  0x000000000041ba09 in runtime.(*mheap).alloc_m (h=0x4be660 , large=false, npage=1, spanclass=42 '*', ~r3=)
    at /home/lizgao/go/src/runtime/mheap.go:692
#7  0x00000000004451ac in runtime.(*mheap).alloc.func1 () at /home/lizgao/go/src/runtime/mheap.go:759
#8  0x000000000041bcaa in runtime.(*mheap).alloc (h=, large=, needzero=, npage=, 
    spanclass=, ~r4=) at /home/lizgao/go/src/runtime/mheap.go:758
#9  0x0000000000410874 in runtime.(*mcentral).grow (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:232
#10 0x00000000004103c8 in runtime.(*mcentral).cacheSpan (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:106
#11 0x000000000040ff85 in runtime.(*mcache).refill (c=0x7ffff7fbd000, spc=42 '*') at /home/lizgao/go/src/runtime/mcache.go:118
#12 0x0000000000444692 in runtime.(*mcache).nextFree.func1 () at /home/lizgao/go/src/runtime/malloc.go:749
#13 0x0000000000409536 in runtime.(*mcache).nextFree (c=0x7ffff7fbd000, s=, shouldhelpgc=, spc=42 '*', v=)
    at /home/lizgao/go/src/runtime/malloc.go:748
#14 0x0000000000409e83 in runtime.mallocgc (needzero=true, size=384, typ=0x46baa0, ~r3=) at /home/lizgao/go/src/runtime/malloc.go:903
#15 0x000000000040a268 in runtime.newobject (typ=, ~r1=) at /home/lizgao/go/src/runtime/malloc.go:1032
#16 0x000000000042b411 in runtime.malg (stacksize=, ~r1=) at /home/lizgao/go/src/runtime/proc.go:3285
#17 0x0000000000420649 in runtime.mpreinit (mp=) at /home/lizgao/go/src/runtime/os_linux.go:311
#18 0x0000000000424ff1 in runtime.mcommoninit (mp=0x4ba3a0 ) at /home/lizgao/go/src/runtime/proc.go:624
#19 0x0000000000424cb9 in runtime.schedinit () at /home/lizgao/go/src/runtime/proc.go:546
#20 0x0000000000446b8a in runtime.rt0_go () at /home/lizgao/go/src/runtime/asm_amd64.s:195
#21 0x0000000000000001 in ?? ()
#22 0x00007fffffffe578 in ?? ()
#23 0x0000000000000001 in ?? ()
#24 0x00007fffffffe578 in ?? ()
#25 0x0000000000000000 in ?? ()
(gdb) c
Continuing.

Breakpoint 1, runtime.sysMmap () at /home/lizgao/go/src/runtime/sys_linux_amd64.s:450
450             MOVQ    addr+0(FP), DI
(gdb) bt
#0  runtime.sysMmap () at /home/lizgao/go/src/runtime/sys_linux_amd64.s:450
#1  0x0000000000403261 in runtime.mmap (addr=0xc000000000, fd=-1, flags=50, n=67108864, off=0, prot=3, ~r6=, ~r7=)
    at /home/lizgao/go/src/runtime/cgo_mmap.go:39
#2  0x0000000000410e1a in runtime.sysMap (n=, sysStat=, v=0xc000000000) at /home/lizgao/go/src/runtime/mem_linux.go:154
#3  0x0000000000408e87 in runtime.(*mheap).sysAlloc (h=0x4be660 , n=67108864, size=, v=)
    at /home/lizgao/go/src/runtime/malloc.go:619
#4  0x000000000041c372 in runtime.(*mheap).grow (h=0x4be660 , npage=, ~r1=) at /home/lizgao/go/src/runtime/mheap.go:919
#5  0x000000000041c1f7 in runtime.(*mheap).allocSpanLocked (h=0x4be660 , npage=1, stat=0x4d6c68 , ~r2=)
    at /home/lizgao/go/src/runtime/mheap.go:848
#6  0x000000000041ba09 in runtime.(*mheap).alloc_m (h=0x4be660 , large=false, npage=1, spanclass=42 '*', ~r3=)
    at /home/lizgao/go/src/runtime/mheap.go:692
#7  0x00000000004451ac in runtime.(*mheap).alloc.func1 () at /home/lizgao/go/src/runtime/mheap.go:759
#8  0x000000000041bcaa in runtime.(*mheap).alloc (h=, large=, needzero=, npage=, 
    spanclass=, ~r4=) at /home/lizgao/go/src/runtime/mheap.go:758
#9  0x0000000000410874 in runtime.(*mcentral).grow (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:232
#10 0x00000000004103c8 in runtime.(*mcentral).cacheSpan (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:106
#11 0x000000000040ff85 in runtime.(*mcache).refill (c=0x7ffff7fbd000, spc=42 '*') at /home/lizgao/go/src/runtime/mcache.go:118
#12 0x0000000000444692 in runtime.(*mcache).nextFree.func1 () at /home/lizgao/go/src/runtime/malloc.go:749
#13 0x0000000000409536 in runtime.(*mcache).nextFree (c=0x7ffff7fbd000, s=, shouldhelpgc=, spc=42 '*', v=)
    at /home/lizgao/go/src/runtime/malloc.go:748
#14 0x0000000000409e83 in runtime.mallocgc (needzero=true, size=384, typ=0x46baa0, ~r3=) at /home/lizgao/go/src/runtime/malloc.go:903
#15 0x000000000040a268 in runtime.newobject (typ=, ~r1=) at /home/lizgao/go/src/runtime/malloc.go:1032
#16 0x000000000042b411 in runtime.malg (stacksize=, ~r1=) at /home/lizgao/go/src/runtime/proc.go:3285
#17 0x0000000000420649 in runtime.mpreinit (mp=) at /home/lizgao/go/src/runtime/os_linux.go:311
#18 0x0000000000424ff1 in runtime.mcommoninit (mp=0x4ba3a0 ) at /home/lizgao/go/src/runtime/proc.go:624
#19 0x0000000000424cb9 in runtime.schedinit () at /home/lizgao/go/src/runtime/proc.go:546
#20 0x0000000000446b8a in runtime.rt0_go () at /home/lizgao/go/src/runtime/asm_amd64.s:195
#21 0x0000000000000001 in ?? ()
#22 0x00007fffffffe578 in ?? ()
#23 0x0000000000000001 in ?? ()
#24 0x00007fffffffe578 in ?? ()
#25 0x0000000000000000 in ?? ()
(gdb) c
Continuing.

Breakpoint 1, runtime.sysMmap () at /home/lizgao/go/src/runtime/sys_linux_amd64.s:450
450             MOVQ    addr+0(FP), DI
(gdb) bt
#0  runtime.sysMmap () at /home/lizgao/go/src/runtime/sys_linux_amd64.s:450
#1  0x0000000000403261 in runtime.mmap (addr=0x0, fd=-1, flags=34, n=33554432, off=0, prot=3, ~r6=, ~r7=)
    at /home/lizgao/go/src/runtime/cgo_mmap.go:39
#2  0x000000000041099e in runtime.sysAlloc (n=33554432, sysStat=, ~r2=) at /home/lizgao/go/src/runtime/mem_linux.go:21
#3  0x000000000040a899 in runtime.persistentalloc1 (align=67108864, size=33554432, sysStat=0x0, ~r3=) at /home/lizgao/go/src/runtime/malloc.go:1172
#4  0x0000000000444765 in runtime.persistentalloc.func1 () at /home/lizgao/go/src/runtime/malloc.go:1143
#5  0x000000000040a632 in runtime.persistentalloc (align=, size=, sysStat=, ~r3=)
    at /home/lizgao/go/src/runtime/malloc.go:1142
#6  0x0000000000408fce in runtime.(*mheap).sysAlloc (h=0x4be660 , n=67108864, size=, v=)
    at /home/lizgao/go/src/runtime/malloc.go:627
#7  0x000000000041c372 in runtime.(*mheap).grow (h=0x4be660 , npage=, ~r1=) at /home/lizgao/go/src/runtime/mheap.go:919
#8  0x000000000041c1f7 in runtime.(*mheap).allocSpanLocked (h=0x4be660 , npage=1, stat=0x4d6c68 , ~r2=)
    at /home/lizgao/go/src/runtime/mheap.go:848
#9  0x000000000041ba09 in runtime.(*mheap).alloc_m (h=0x4be660 , large=false, npage=1, spanclass=42 '*', ~r3=)
    at /home/lizgao/go/src/runtime/mheap.go:692
#10 0x00000000004451ac in runtime.(*mheap).alloc.func1 () at /home/lizgao/go/src/runtime/mheap.go:759
#11 0x000000000041bcaa in runtime.(*mheap).alloc (h=, large=, needzero=, npage=, 
    spanclass=, ~r4=) at /home/lizgao/go/src/runtime/mheap.go:758
#12 0x0000000000410874 in runtime.(*mcentral).grow (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:232
#13 0x00000000004103c8 in runtime.(*mcentral).cacheSpan (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:106
#14 0x000000000040ff85 in runtime.(*mcache).refill (c=0x7ffff7fbd000, spc=42 '*') at /home/lizgao/go/src/runtime/mcache.go:118
#15 0x0000000000444692 in runtime.(*mcache).nextFree.func1 () at /home/lizgao/go/src/runtime/malloc.go:749
#16 0x0000000000409536 in runtime.(*mcache).nextFree (c=0x7ffff7fbd000, s=, shouldhelpgc=, spc=42 '*', v=)
    at /home/lizgao/go/src/runtime/malloc.go:748
#17 0x0000000000409e83 in runtime.mallocgc (needzero=true, size=384, typ=0x46baa0, ~r3=) at /home/lizgao/go/src/runtime/malloc.go:903
#18 0x000000000040a268 in runtime.newobject (typ=, ~r1=) at /home/lizgao/go/src/runtime/malloc.go:1032
#19 0x000000000042b411 in runtime.malg (stacksize=, ~r1=) at /home/lizgao/go/src/runtime/proc.go:3285
#20 0x0000000000420649 in runtime.mpreinit (mp=) at /home/lizgao/go/src/runtime/os_linux.go:311
#21 0x0000000000424ff1 in runtime.mcommoninit (mp=0x4ba3a0 ) at /home/lizgao/go/src/runtime/proc.go:624
#22 0x0000000000424cb9 in runtime.schedinit () at /home/lizgao/go/src/runtime/proc.go:546
#23 0x0000000000446b8a in runtime.rt0_go () at /home/lizgao/go/src/runtime/asm_amd64.s:195
#24 0x0000000000000001 in ?? ()
#25 0x00007fffffffe578 in ?? ()
#26 0x0000000000000001 in ?? ()
#27 0x00007fffffffe578 in ?? ()
#28 0x0000000000000000 in ?? ()
(gdb) c
Continuing.

Breakpoint 1, runtime.sysMmap () at /home/lizgao/go/src/runtime/sys_linux_amd64.s:450
450             MOVQ    addr+0(FP), DI
(gdb) bt
#0  runtime.sysMmap () at /home/lizgao/go/src/runtime/sys_linux_amd64.s:450
#1  0x0000000000403261 in runtime.mmap (addr=0x0, fd=-1, flags=34, n=2162688, off=0, prot=3, ~r6=, ~r7=)
    at /home/lizgao/go/src/runtime/cgo_mmap.go:39
#2  0x000000000041099e in runtime.sysAlloc (n=2162688, sysStat=, ~r2=) at /home/lizgao/go/src/runtime/mem_linux.go:21
#3  0x000000000040a899 in runtime.persistentalloc1 (align=67108864, size=2162688, sysStat=0x4d6cb8 , ~r3=)
    at /home/lizgao/go/src/runtime/malloc.go:1172
#4  0x0000000000444765 in runtime.persistentalloc.func1 () at /home/lizgao/go/src/runtime/malloc.go:1143
#5  0x000000000040a632 in runtime.persistentalloc (align=, size=, sysStat=, ~r3=)
    at /home/lizgao/go/src/runtime/malloc.go:1142
#6  0x0000000000408f9c in runtime.(*mheap).sysAlloc (h=0x4be660 , n=67108864, size=, v=)
    at /home/lizgao/go/src/runtime/malloc.go:640
#7  0x000000000041c372 in runtime.(*mheap).grow (h=0x4be660 , npage=, ~r1=) at /home/lizgao/go/src/runtime/mheap.go:919
#8  0x000000000041c1f7 in runtime.(*mheap).allocSpanLocked (h=0x4be660 , npage=1, stat=0x4d6c68 , ~r2=)
    at /home/lizgao/go/src/runtime/mheap.go:848
#9  0x000000000041ba09 in runtime.(*mheap).alloc_m (h=0x4be660 , large=false, npage=1, spanclass=42 '*', ~r3=)
    at /home/lizgao/go/src/runtime/mheap.go:692
#10 0x00000000004451ac in runtime.(*mheap).alloc.func1 () at /home/lizgao/go/src/runtime/mheap.go:759
#11 0x000000000041bcaa in runtime.(*mheap).alloc (h=, large=, needzero=, npage=, 
    spanclass=, ~r4=) at /home/lizgao/go/src/runtime/mheap.go:758
#12 0x0000000000410874 in runtime.(*mcentral).grow (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:232
#13 0x00000000004103c8 in runtime.(*mcentral).cacheSpan (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:106
#14 0x000000000040ff85 in runtime.(*mcache).refill (c=0x7ffff7fbd000, spc=42 '*') at /home/lizgao/go/src/runtime/mcache.go:118
#15 0x0000000000444692 in runtime.(*mcache).nextFree.func1 () at /home/lizgao/go/src/runtime/malloc.go:749
#16 0x0000000000409536 in runtime.(*mcache).nextFree (c=0x7ffff7fbd000, s=, shouldhelpgc=, spc=42 '*', v=)
    at /home/lizgao/go/src/runtime/malloc.go:748
#17 0x0000000000409e83 in runtime.mallocgc (needzero=true, size=384, typ=0x46baa0, ~r3=) at /home/lizgao/go/src/runtime/malloc.go:903
#18 0x000000000040a268 in runtime.newobject (typ=, ~r1=) at /home/lizgao/go/src/runtime/malloc.go:1032
#19 0x000000000042b411 in runtime.malg (stacksize=, ~r1=) at /home/lizgao/go/src/runtime/proc.go:3285
#20 0x0000000000420649 in runtime.mpreinit (mp=) at /home/lizgao/go/src/runtime/os_linux.go:311
#21 0x0000000000424ff1 in runtime.mcommoninit (mp=0x4ba3a0 ) at /home/lizgao/go/src/runtime/proc.go:624
#22 0x0000000000424cb9 in runtime.schedinit () at /home/lizgao/go/src/runtime/proc.go:546
#23 0x0000000000446b8a in runtime.rt0_go () at /home/lizgao/go/src/runtime/asm_amd64.s:195
#24 0x0000000000000001 in ?? ()
#25 0x00007fffffffe578 in ?? ()
#26 0x0000000000000001 in ?? ()
#27 0x00007fffffffe578 in ?? ()
#28 0x0000000000000000 in ?? ()
(gdb) 
@aclements

This comment has been minimized.

Member

aclements commented Oct 11, 2018

@gaolizhou, thanks for the backtrace. We do understand exactly what's causing this issue. It's just an unintended consequence of a very intentional change that switched from mapping the heap in small increments to mapping it in 64MB increments.

@bcmills bcmills added the Performance label Oct 23, 2018

@bcmills

This comment has been minimized.

Member

bcmills commented Oct 23, 2018

We do understand exactly what's causing this issue. It's just an unintended consequence of a very intentional change that switched from mapping the heap in small increments to mapping it in 64MB increments.

Do we have a fix in mind? If not, does the benefit of that change outweigh the regression in core sizes?

@bcmills bcmills added this to the Go1.12 milestone Oct 23, 2018

@aclements

This comment has been minimized.

Member

aclements commented Oct 23, 2018

Do we have a fix in mind? If not, does the benefit of that change outweigh the regression in core sizes?

Map arenas PROT_NONE initially and incrementally re-map them PROT_READ|PROT_WRITE as we need more space. We only need to keep track of the "current arena" and how far we've mapped it (any other arena can be fully mapped).

I don't think this is very hard to implement or adds much complexity, though everything in sysAlloc is a little tricky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment