Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: high memory usage during compilation #23242

Closed
quantexlab opened this issue Dec 26, 2017 · 7 comments

Comments

Projects
None yet
5 participants
@quantexlab
Copy link

commented Dec 26, 2017

What version of Go are you using (go version)?

1.9.2

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOHOSTARCH="amd64"
GOHOSTOS="darwin"

What did you do?

package main

type TreeCell struct {
    Tabs func() *string
}

func Cell() *string {
    s:= ""
    return &s
}

func Table(Line *[]TreeCell) {
    if Line != nil {
        Num["rtt"] = Line
    }
}

var (
    Num map[string]*[]TreeCell 
)

func main() {

    Table(&[]TreeCell{
        TreeCell{Tabs: Cell},
        TreeCell{Tabs: Cell},
        ...repeat 15000 times
        TreeCell{Tabs: Cell},
    })
}

go build -a -v -gcflags "-N -l" -ldflags "-s -w"
Size of executable file 1,9Mb

Andreys-Mac-Pro:test andrey$ size -A test
test  :
section                size       addr
__text              1459891   16781312
__rodata             158107   18241216
__symbol_stub1            0   18399323
__typelink             2016   18399328
__itablink                8   18401344
__gosymtab                0   18401352
__gopclntab          194547   18401376
__nl_symbol_ptr           0   18599936
__noptrdata            2584   18599936
__data                 1952   18602528
__bss                123400   18604480
__noptrbss             9016   18727904
Total               1951521

if I change the func() *string to the interface{}

type TreeCell struct {
    Tabs interface{}
}

then size of executable file 32Mb

Andreys-Mac-Pro:test andrey$ size -A test
test  :
section                 size       addr
__text               1864389   16781312
__rodata            30375699   18645728
__symbol_stub1             0   49021427
__typelink              2016   49021440
__itablink                 8   49023456
__gosymtab                 0   49023464
__gopclntab           319155   49023488
__nl_symbol_ptr            0   49344512
__noptrdata             2584   49344512
__data                  1952   49347104
__bss                 123400   49349056
__noptrbss              9016   49472480
Total               32698219

If we exclude the -gcflags '-N -l' params, then interface{} is 2MB func() *string is 32MB

Next Step
I increase size of function

func Cell() *string {
	s:= "01234567890102345678901234567890123456789012345678900123456789010234567890123456789012345678901234567890"
	return &s
}

or

func Cell() *string {
	s:= ""
	for i := 0; i < 5; i++ {
		s= s +s
		for j := 0; j < 5; j++ {
			s= s +s
			for k := 0; k < 5; k++ {
				s= s +"df"
			}
		}
	}
	return &s
}

But the size remained the same. I think that this is not inline problem.

My questions

  1. Why increased __rodata 30375699 and not __text?
  2. Why the flag works differently on func() *string and interface{}?
  3. I think this is a compiler bug.

https://stackoverflow.com/questions/47968970/inlining-and-output-binary-size

@bradfitz bradfitz added this to the Unplanned milestone Dec 26, 2017

@quantexlab

This comment has been minimized.

Copy link
Author

commented Dec 26, 2017

UPD: I simplified the code so that the effect remained.

package main

type TreeCell struct {
	Tabs interface{}//func() 
}

func Cell() {
       // empty
}

func main() {
	var Num map[string]*[]TreeCell 

	Num["rtt"] = &[]TreeCell{
               TreeCell{Tabs: Cell},
               TreeCell{Tabs: Cell},
                ...repeat 15000 times
		TreeCell{Tabs: Cell},
		TreeCell{Tabs: Cell},
	}
}
Andreys-Mac-Pro:test andrey$ size -A test
test  :
section                 size       addr
__text               1864213   16781312
__rodata            30377587   18645536
__symbol_stub1             0   49023123
__typelink              2012   49023136
__itablink                 8   49025152
__gosymtab                 0   49025160
__gopclntab           319003   49025184
__nl_symbol_ptr            0   49344512
__noptrdata             2584   49344512
__data                  1952   49347104
__bss                 123368   49349056
__noptrbss              9016   49472448
Total               32699743
@mvdan

This comment has been minimized.

Copy link
Member

commented Dec 26, 2017

Have you tried earlier Go versions, such as 1.0, 1.3 or 1.7? It would be useful to see if the unusual increment in size was added recently or not.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Dec 26, 2017

This seems to be fixed in 1.10, although the compilation seems to use a lot more memory.

@bradfitz bradfitz changed the title Output binary size cmd/compile: high memory usage during compilation Dec 27, 2017

@bradfitz

This comment has been minimized.

Copy link
Member

commented Dec 27, 2017

/cc @mdempsky

@bradfitz bradfitz modified the milestones: Unplanned, Go1.11 Dec 27, 2017

@mvdan

This comment has been minimized.

Copy link
Member

commented Dec 27, 2017

Might be related to #19818, which is about extra binary size (and extra compiler work) when multiple equal statictmps are converted into interface.

That issue has a CL from @josharian, but it has been inactive recently since he's on leave. I believe he'll be back for 1.11. In any case, worth investigating if the fix for that issue also fixes this one.

@quantexlab

This comment has been minimized.

Copy link
Author

commented Dec 27, 2017

Ver 1.7.6
with -N -l
Interface{} - time of compilation ~ 46 sec. Crash

> Executing task: go build -a -v -gcflags " -N -l" -ldflags "-s -w"<

runtime/internal/sys
runtime/internal/atomic
runtime
quantex/test
# quantex/test
runtime.cgocall: nosplit stack overflow
        584     assumed on entry to runtime.cgocall (nosplit)
        528     after runtime.cgocall (nosplit) uses 56
        520     on entry to runtime.entersyscall (nosplit)
        480     after runtime.entersyscall (nosplit) uses 40
        472     on entry to runtime.reentersyscall (nosplit)
        352     after runtime.reentersyscall (nosplit) uses 120
        344     on entry to runtime.casgstatus (nosplit)
        168     after runtime.casgstatus (nosplit) uses 176
        160     on entry to runtime.throw (nosplit)
        128     after runtime.throw (nosplit) uses 32
        120     on entry to runtime.dopanic (nosplit)
        32      after runtime.dopanic (nosplit) uses 88
        24      on entry to runtime.getcallerpc (nosplit)
        8       after runtime.getcallerpc (nosplit) uses 16
        0       on entry to runtime.nextBarrierPC (nosplit)
        -16     after runtime.nextBarrierPC (nosplit) uses 16
runtime.cgocallbackg: nosplit stack overflow
        584     assumed on entry to runtime.cgocallbackg (nosplit)
        488     after runtime.cgocallbackg (nosplit) uses 96
        480     on entry to runtime.exitsyscall (nosplit)
        344     after runtime.exitsyscall (nosplit) uses 136
        336     on entry to runtime.exitsyscallfast (nosplit)
        168     after runtime.exitsyscallfast (nosplit) uses 168
        160     on entry to runtime.writebarrierptr (nosplit)
        104     after runtime.writebarrierptr (nosplit) uses 56
        96      on entry to runtime.cgoCheckWriteBarrier (nosplit)
        24      after runtime.cgoCheckWriteBarrier (nosplit) uses 72
        16      on entry to runtime.cgoIsGoPointer (nosplit)
        -40     after runtime.cgoIsGoPointer (nosplit) uses 56
runtime.cgoCheckSliceCopy: nosplit stack overflow
        584     assumed on entry to runtime.cgoCheckSliceCopy (nosplit)
        512     after runtime.cgoCheckSliceCopy (nosplit) uses 72
        504     on entry to runtime.cgoCheckTypedBlock (nosplit)
        288     after runtime.cgoCheckTypedBlock (nosplit) uses 216
        280     on entry to runtime.cgoCheckBits (nosplit)
        136     after runtime.cgoCheckBits (nosplit) uses 144
        128     on entry to runtime.cgoIsGoPointer (nosplit)
        72      after runtime.cgoIsGoPointer (nosplit) uses 56
        64      on entry to runtime.inHeapOrStack (nosplit)
        8       after runtime.inHeapOrStack (nosplit) uses 56
        0       on entry to runtime.(*mspan).base
        -8      on entry to runtime.morestack (nosplit)
runtime.(*cpuProfile).addWithFlushlog: nosplit stack overflow
        584     assumed on entry to runtime.(*cpuProfile).addWithFlushlog (nosplit)
        216     after runtime.(*cpuProfile).addWithFlushlog (nosplit) uses 368
        208     on entry to runtime.(*cpuProfile).evict (nosplit)
        0       after runtime.(*cpuProfile).evict (nosplit) uses 208
        -8      on entry to function pointer
        -16     on entry to runtime.morestack (nosplit)
runtime.typedmemmove: nosplit stack overflow
        584     assumed on entry to runtime.typedmemmove (nosplit)
        536     after runtime.typedmemmove (nosplit) uses 48
        528     on entry to runtime.cgoCheckMemmove (nosplit)
        480     after runtime.cgoCheckMemmove (nosplit) uses 48
        472     on entry to runtime.cgoCheckTypedBlock (nosplit)
        256     after runtime.cgoCheckTypedBlock (nosplit) uses 216
        248     on entry to runtime.cgoCheckBits (nosplit)
        104     after runtime.cgoCheckBits (nosplit) uses 144
        96      on entry to runtime.cgoIsGoPointer (nosplit)
        40      after runtime.cgoIsGoPointer (nosplit) uses 56
        32      on entry to runtime.inHeapOrStack (nosplit)
        -24     after runtime.inHeapOrStack (nosplit) uses 56
runtime.typedslicecopy: nosplit stack overflow
        584     assumed on entry to runtime.typedslicecopy (nosplit)
        440     after runtime.typedslicecopy (nosplit) uses 144
        432     on entry to runtime.cgoCheckSliceCopy (nosplit)
        360     after runtime.cgoCheckSliceCopy (nosplit) uses 72
        352     on entry to runtime.cgoCheckTypedBlock (nosplit)
        136     after runtime.cgoCheckTypedBlock (nosplit) uses 216
        128     on entry to runtime.cgoCheckBits (nosplit)
        -16     after runtime.cgoCheckBits (nosplit) uses 144
runtime.entersyscallblock: nosplit stack overflow
        584     assumed on entry to runtime.entersyscallblock (nosplit)
        368     after runtime.entersyscallblock (nosplit) uses 216
        360     on entry to runtime.casgstatus (nosplit)
        184     after runtime.casgstatus (nosplit) uses 176
        176     on entry to runtime.throw (nosplit)
        144     after runtime.throw (nosplit) uses 32
        136     on entry to runtime.dopanic (nosplit)
        48      after runtime.dopanic (nosplit) uses 88
        40      on entry to runtime.getcallerpc (nosplit)
        24      after runtime.getcallerpc (nosplit) uses 16
        16      on entry to runtime.nextBarrierPC (nosplit)
        0       after runtime.nextBarrierPC (nosplit) uses 16
        -8      on entry to runtime.panicindex
runtime.exitsyscall: nosplit stack overflow
        584     assumed on entry to runtime.exitsyscall (nosplit)
        448     after runtime.exitsyscall (nosplit) uses 136
        440     on entry to runtime.exitsyscallfast (nosplit)
        272     after runtime.exitsyscallfast (nosplit) uses 168
        264     on entry to runtime.writebarrierptr (nosplit)
        208     after runtime.writebarrierptr (nosplit) uses 56
        200     on entry to runtime.cgoCheckWriteBarrier (nosplit)
        128     after runtime.cgoCheckWriteBarrier (nosplit) uses 72
        120     on entry to runtime.cgoIsGoPointer (nosplit)
        64      after runtime.cgoIsGoPointer (nosplit) uses 56
        56      on entry to runtime.inHeapOrStack (nosplit)
        0       after runtime.inHeapOrStack (nosplit) uses 56
        -8      on entry to runtime.(*mspan).base
runtime.cgocallback_gofunc: nosplit stack overflow
        584     assumed on entry to runtime.cgocallback_gofunc (nosplit)
        560     after runtime.cgocallback_gofunc (nosplit) uses 24
        552     on entry to runtime.cgocallbackg (nosplit)
        456     after runtime.cgocallbackg (nosplit) uses 96
        448     on entry to runtime.exitsyscall (nosplit)
        312     after runtime.exitsyscall (nosplit) uses 136
        304     on entry to runtime.exitsyscallfast (nosplit)
        136     after runtime.exitsyscallfast (nosplit) uses 168
        128     on entry to runtime.writebarrierptr (nosplit)
        72      after runtime.writebarrierptr (nosplit) uses 56
        64      on entry to runtime.cgoCheckWriteBarrier (nosplit)
        -8      after runtime.cgoCheckWriteBarrier (nosplit) uses 72
Der Terminalprozess wurde mit folgendem Exitcode beendet: 2

func() - Crash als with interface{}, time of compilation ~ 11 sec

without -N -l
Interface{} = ?? MB, I did not wait for the compilation to finish. I think that this does not work

func() - size 0,9 MB, time of compilation ~ 11 sec

Andreys-Mac-Pro:test andrey$ size -A test
test  :
section               size     addr
__text              525296     8192
__rodata            138504   533504
__typelink            1764   672008
__itablink               8   673776
__gosymtab               0   673784
__gopclntab         159429   673792
__symbol_stub1           0   833221
__nl_symbol_ptr          0   835584
__noptrdata           1696   835584
__data                1664   837280
__bss               108016   838944
__noptrbss           18848   946976
Total               955225

Ver 1.10 Beta
with -N -l
Interface{} = 2.2MB time of compilation ~ 22 sec

Andreys-Mac-Pro:test andrey$ size -A test
test  :
section                size       addr
__text              1691157   16781312
__rodata             179683   18472480
__symbol_stub1            0   18652163
__typelink             1696   18652192
__itablink                8   18653888
__gosymtab                0   18653896
__gopclntab          279421   18653920
__nl_symbol_ptr           0   18935808
__noptrdata            2536   18935808
__data                 6064   18938368
__bss                115336   18944448
__noptrbss             9048   19059808
Total               2284949

func() = ?? MB, I did not wait for the compilation to finish. I think that this does not work

without -N -l
Interface{} = 1.6 MB time of compilation ~ 3 sec

Andreys-Mac-Pro:test andrey$ size -A test
test  :
section                size       addr
__text              1147015   16781312
__rodata             169157   17928352
__symbol_stub1            0   18097509
__typelink             1692   18097536
__itablink                8   18099232
__gosymtab                0   18099240
__gopclntab          248317   18099264
__nl_symbol_ptr           0   18350080
__noptrdata            2536   18350080
__data                 6064   18352640
__bss                115336   18358720
__noptrbss             9048   18474080
Total               1699173

func() - size 1,3MB time of compilation ~ 8 sec

Andreys-Mac-Pro:test andrey$ size -A test
test  :
section               size       addr
__text              851295   16781312
__rodata            173598   17632608
__symbol_stub1           0   17806206
__typelink            1696   17806208
__itablink               8   17807904
__gosymtab               0   17807912
__gopclntab         185893   17807936
__nl_symbol_ptr          0   17997824
__noptrdata           2536   17997824
__data                6064   18000384
__bss               115336   18006464
__noptrbss            9048   18121824
Total              1345474

@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.12 Jun 30, 2018

@dr2chase

This comment has been minimized.

Copy link
Contributor

commented Nov 13, 2018

Checking to see if this is still a problem:

My input, with gratuitous assignment to global to ensure no dead code elimination:

package main

type TreeCell struct {
	Tabs interface{} //func()
}

func Cell() {
	// empty
}

var Sink *[]TreeCell

func main() {
	var Num map[string]*[]TreeCell

	Num["rtt"] = &[]TreeCell{
		TreeCell{Tabs: Cell}, 
                 // Repeated 14,999 more times
	}
	Sink = Num["rtt"]
}
~/work/tmp/i23242$ time go build .
real	0m2.671s
user	0m2.032s
sys	0m0.478s
~/work/tmp/i23242$ size -A i23242 
i23242  :
section                 size       addr
__text               1111260   16781312
__rodata              188693   17892576
__symbol_stub1           198   18081280
__typelink              1852   18081504
__itablink                 8   18083360
__gosymtab                 0   18083368
__gopclntab           478257   18083392
__nl_symbol_ptr          264   18563072
__noptrdata             2856   18563360
__data                  6608   18566240
__bss                 111600   18572864
__noptrbss              9496   18684480
__zdebug_abbrev          276   18694144
__zdebug_line          59275   18694420
__zdebug_frame         14477   18753695
__zdebug_pubname        2043   18768172
__zdebug_pubtype        8350   18770215
__debug_gdb_scri          51   18778565
__zdebug_info         106122   18778616
__zdebug_loc           43288   18884738
__zdebug_ranges        14797   18928026
Total                2159771

Changing the declaration to

type TreeCell struct {
	Tabs func() //interface{} //func()
}

It took less time, and took less space, but not that much.

~/work/tmp/i23242$ time go build .
real	0m1.946s
user	0m0.946s
sys	0m0.450s
~/work/tmp/i23242$ size -A i23242 
i23242  :
section                size       addr
__text               600944   16781312
__rodata             187928   17382272
__symbol_stub1          198   17570208
__typelink             1848   17570432
__itablink                8   17572280
__gosymtab                0   17572288
__gopclntab          255729   17572288
__nl_symbol_ptr         264   17829888
__noptrdata            2856   17830176
__data                 6608   17833056
__bss                111600   17839680
__noptrbss             9496   17951296
__zdebug_abbrev         276   17960960
__zdebug_line         39261   17961236
__zdebug_frame        14454   18000497
__zdebug_pubname       2041   18014951
__zdebug_pubtype       8332   18016992
__debug_gdb_scri         51   18025324
__zdebug_info        106010   18025375
__zdebug_loc          43241   18131385
__zdebug_ranges       14796   18174626
Total               1405941

User time is in ratio 2.032 to 0.946 (2.15x).
Text size is 1.85x larger.

Two-thirds of the difference in size is in the text.
We might want to direct a little attention to the line number churn, but that would only reduce the size increase 1/3 if it were reduced to zero.

Experimentally, I upped the repetition to 150,000, and saw no scary non-linearity.
Section size differences were 10x larger (linear).
Build time difference was 19.169 to 7.21 (2.7x) -- not linear, but not scary.
Text size is 2.61x larger, which makes the build time ratio seem plausible.

I think this is fixed.

@dr2chase dr2chase closed this Nov 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.