Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: share statictmp #19818

Open
QuestionPython opened this issue Apr 1, 2017 · 11 comments
Labels
Milestone

Comments

@QuestionPython
Copy link

@QuestionPython QuestionPython commented Apr 1, 2017

Hello,

i have 2 file,
one ==> a go file with 10000 time repeat : fmt.Print("...........\n")
two ==> a c(gcc) file with 10000 time repeat : printf("...........\n");

size of output c is about 200kb.
size of output go is 50mb.(also long time need for build...)

@ALTree

This comment has been minimized.

Copy link
Member

@ALTree ALTree commented Apr 1, 2017

I've tried with a file with 10k fmt.Println("0123456789!!!") and while I can reproduce the issue, the tip compiler behaves much better than the 1.8 compiler.

go1.8:

271.93user 1.12system 2:10.43elapsed 209%CPU (0avgtext+0avgdata 702816maxresident)k

tip:

23.48user 0.30system 0:23.02elapsed 103%CPU (0avgtext+0avgdata 347308maxresident)k

so tip is more than 10x faster and also uses half the RAM. The tip executable is also "only" 28M (vs 51M for go1.8). Still quite big.

@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Apr 1, 2017

A couple of things I noticed in this example.

  1. We're using a separate autotmp for each call for the ... args. We could use the same one repeatedly. (We already know it doesn't escape.)
  2. Each string gets its own statictmp for converting it to an interface. We could share them. (I thought we already did?)

@josharian

@josharian

This comment has been minimized.

Copy link
Contributor

@josharian josharian commented Apr 1, 2017

We share the string data, but not (yet) the statictmp.

@bradfitz bradfitz changed the title Size of Compiled is Big! cmd/compile: share statictmp Apr 1, 2017
@bradfitz bradfitz added the Performance label Apr 1, 2017
@bradfitz bradfitz added this to the Unplanned milestone Apr 1, 2017
@QuestionPython

This comment has been minimized.

Copy link
Author

@QuestionPython QuestionPython commented Apr 1, 2017

my go file as input:

package main
import (
    "fmt"
    //"html"
    //"log"
    //"net/http"
)
func main() {
fmt.Print("hello123456789\n");
fmt.Print("hello123456789\n");
fmt.Print("hello123456789\n");
.
.
.
.
}

my c file as input:

#include<stdio.h>
int main()
{
	printf("hello123456789\n");
	printf("hello123456789\n");
.
.
.
}

my for c use gcc.
also for go use go1.8.
my os ==> ubuntu

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Apr 29, 2017

CL https://golang.org/cl/42170 mentions this issue.

josharian added a commit to josharian/go that referenced this issue May 2, 2017
DO NOT SUBMIT

[generates broken code]

To avoid allocating when placing a constant in an interface,
we generate a static symbol containing that constant's value.

However, constants tend to re-occur.
This CL makes the symbols content-addressable,
so that repeated constants can share them.

For the code like that in golang#19818 (1024 fmt.Printlns in a main),
this reduces the object file size by 10%.

name  old time/op       new time/op       delta
Pkg         161ms ± 2%        147ms ± 2%   -8.28%  (p=0.008 n=5+5)

name  old user-time/op  new user-time/op  delta
Pkg         194ms ± 5%        180ms ± 4%   -7.06%  (p=0.016 n=5+5)

name  old alloc/op      new alloc/op      delta
Pkg        36.2MB ± 0%       34.7MB ± 0%   -4.09%  (p=0.008 n=5+5)

name  old allocs/op     new allocs/op     delta
Pkg          191k ± 1%         179k ± 0%   -6.43%  (p=0.008 n=5+5)

name  old object-bytes  new object-bytes  delta
Pkg          462k ± 0%         415k ± 0%  -10.18%  (p=0.008 n=5+5)

name  old export-bytes  new export-bytes  delta
Pkg          51.0 ± 0%         51.0 ± 0%     ~     (all equal)


Change-Id: I6e04f52b41fd77a2181466160795f35d13b491b4
josharian added a commit to josharian/go that referenced this issue May 10, 2017
DO NOT SUBMIT

[generates broken code]

To avoid allocating when placing a constant in an interface,
we generate a static symbol containing that constant's value.

However, constants tend to re-occur.
This CL makes the symbols content-addressable,
so that repeated constants can share them.

For code like that in golang#19818 (1024 fmt.Printlns in a main),
this reduces the object file size by 10%.

name  old time/op       new time/op       delta
Pkg         161ms ± 2%        147ms ± 2%   -8.28%  (p=0.008 n=5+5)

name  old user-time/op  new user-time/op  delta
Pkg         194ms ± 5%        180ms ± 4%   -7.06%  (p=0.016 n=5+5)

name  old alloc/op      new alloc/op      delta
Pkg        36.2MB ± 0%       34.7MB ± 0%   -4.09%  (p=0.008 n=5+5)

name  old allocs/op     new allocs/op     delta
Pkg          191k ± 1%         179k ± 0%   -6.43%  (p=0.008 n=5+5)

name  old object-bytes  new object-bytes  delta
Pkg          462k ± 0%         415k ± 0%  -10.18%  (p=0.008 n=5+5)

name  old export-bytes  new export-bytes  delta
Pkg          51.0 ± 0%         51.0 ± 0%     ~     (all equal)


Change-Id: I6e04f52b41fd77a2181466160795f35d13b491b4
@agnivade

This comment has been minimized.

Copy link
Contributor

@agnivade agnivade commented Jul 16, 2019

I get impressive results with 1.13beta1 now. My file has 10k fmt.Print("hello123456789\n")s.

$time go1.13beta1 build foo.go
real	0m6.420s
user	0m14.874s
sys	0m0.233s

And binary size is 2.9MB.

So speed is approximately 2x faster. And size is about 9x lower.

Here is a difference between an fmt.Println() line between 1.11.4 and 1.13beta1

-0x0032 00050 (fmter.go:8)       XORPS   X0, X0
-0x0035 00053 (fmter.go:8)       MOVUPS  X0, ""..autotmp_0+528(SP)
-0x003d 00061 (fmter.go:8)       PCDATA  $2, $1
-0x003d 00061 (fmter.go:8)       LEAQ    type.string(SB), AX
-0x0044 00068 (fmter.go:8)       PCDATA  $2, $0
-0x0044 00068 (fmter.go:8)       MOVQ    AX, ""..autotmp_0+528(SP)
-0x004c 00076 (fmter.go:8)       PCDATA  $2, $2
-0x004c 00076 (fmter.go:8)       LEAQ    "".statictmp_0(SB), CX
-0x0053 00083 (fmter.go:8)       PCDATA  $2, $0
-0x0053 00083 (fmter.go:8)       MOVQ    CX, ""..autotmp_0+536(SP)
-0x005b 00091 (fmter.go:8)       PCDATA  $2, $2
-0x005b 00091 (fmter.go:8)       LEAQ    ""..autotmp_0+528(SP), CX
-0x0063 00099 (fmter.go:8)       PCDATA  $2, $0
-0x0063 00099 (fmter.go:8)       MOVQ    CX, (SP)
-0x0067 00103 (fmter.go:8)       MOVQ    $1, 8(SP)
-0x0070 00112 (fmter.go:8)       MOVQ    $1, 16(SP)
-0x0079 00121 (fmter.go:8)       CALL    fmt.Print(SB)
+0x0032 00050 (fmter.go:8)       XORPS   X0, X0
+0x0035 00053 (fmter.go:8)       MOVUPS  X0, ""..autotmp_191+544(SP)
+0x003d 00061 (fmter.go:8)       PCDATA  $0, $1
+0x003d 00061 (fmter.go:8)       LEAQ    type.string(SB), AX
+0x0044 00068 (fmter.go:8)       PCDATA  $0, $0
+0x0044 00068 (fmter.go:8)       MOVQ    AX, ""..autotmp_191+544(SP)
+0x004c 00076 (fmter.go:8)       PCDATA  $0, $2
+0x004c 00076 (fmter.go:8)       LEAQ    ""..stmp_0(SB), CX
+0x0053 00083 (fmter.go:8)       PCDATA  $0, $0
+0x0053 00083 (fmter.go:8)       MOVQ    CX, ""..autotmp_191+552(SP)
@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Oct 17, 2019

On my Macbook Pro, my results are almost just like @agnivade's

$ echo -e 'package main\nimport "fmt"\nfunc main() {' > main.go && \
for ((i=0; i<10000; i++)) do echo -e 'fmt.Print("...........\\n")' >> main.go;done && \
echo -e '}' >> main.go && go fmt main.go && time go build -o binary main.go  && ls -lrth
main.go

real	0m0.302s
user	0m0.385s
sys	0m0.196s

total 6464
-rw-r--r--  1 emmanuelodeke  staff   273K 17 Oct 15:46 main.go
-rwxr-xr-x  1 emmanuelodeke  staff   2.9M 17 Oct 15:46 binary
@agnivade

This comment has been minimized.

Copy link
Contributor

@agnivade agnivade commented Oct 18, 2019

It's not "almost just like" 😛. It's an order of magnitude improvement. From 14s in user time to 0.3s. The assembly output is still the same though.

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Oct 18, 2019

It's not "almost just like" 😛. It's an order of magnitude improvement. From 14s in user time to 0.3s. The assembly output is still the same though.

Hahah yes in deed! I noticed that but didn't want to post up very optimistic numbers here in case the speed up was from being cache, do you get similar results on your machine?

@agnivade

This comment has been minimized.

Copy link
Contributor

@agnivade agnivade commented Oct 18, 2019

I don't. I still get the same results as before with tip.

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Oct 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.