New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/format: add benchmarks #26528

Open
kevinburke opened this Issue Jul 22, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@kevinburke
Contributor

kevinburke commented Jul 22, 2018

I maintain a (fork of a) tool called go-bindata that compiles static files into a binary. Often the compiled file is quite large - tens of megabytes are not uncommon. I call format.Source on this generated file to avoid gofmt thrash. However, I noticed recently that calling format.Source takes 60% of the runtime of the go-bindata program.

Currently there are no benchmarks for format.Source. It might be good to add some to see if there are quick wins to get that function to return more quickly.

A starting point may be the benchmark here: https://github.com/kevinburke/go-bindata/blob/master/benchmark_test.go. Committing an extremely large file to source might not be the best way to do this, though.

kevinburke added a commit to kevinburke/go-bindata that referenced this issue Jul 22, 2018

Improve performance
Previously calling format.Source would take up most of the runtime
(see golang/go#26528 for more info). But most of the file is already
formatted correctly, we mostly need to use go/format to whitespace
align the maps per the Go guidelines. Calling go/format on just this
part of the file yields a significant performance improvement by
avoiding calling it on the entire body of the file.

Use a bytes.Buffer as a file argument instead of an io.Writer which
allows using WriteByte and WriteString in places. Avoid calling
fmt.Fprintf with no arguments.

    name       old time/op    new time/op    delta
    Bindata-4     558ms ± 1%     216ms ± 6%   -61.34%  (p=0.008 n=5+5)

    name       old speed      new speed      delta
    Bindata-4  12.3MB/s ± 1%  31.8MB/s ± 6%  +159.03%  (p=0.008 n=5+5)

    name       old alloc/op   new alloc/op   delta
    Bindata-4     353MB ± 0%     162MB ± 0%   -54.21%  (p=0.008 n=5+5)

    name       old allocs/op  new allocs/op  delta
    Bindata-4     46.1k ± 0%     15.7k ± 0%   -66.00%  (p=0.008 n=5+5)
@josharian

This comment has been minimized.

Contributor

josharian commented Jul 22, 2018

I suspect go/format performance hotspots vary a lot based on input, with a long tail of pathological problematic inputs. Instead of a generic “add benchmarks” issue, which is hard to fix, let’s instead file performance issues for specific inputs, like go-bindata-style output.

We don’t want to add a large file to the repo, but I suspect you could create a suitable input (for reproduction purposes) on the fly, in memory.

Also, have you tried with 1.11 betas yet? I made some improvements to text/tabwriter this cycle that might help.

@josharian josharian added this to the Go1.12 milestone Jul 22, 2018

@kevinburke

This comment has been minimized.

Contributor

kevinburke commented Jul 22, 2018

Also, have you tried with 1.11 betas yet? I made some improvements to text/tabwriter this cycle that might help.

I have, thanks! I'm not sure about the benchmarking output. When we use format.Source on the whole file, it looks slower.

1.9.7: https://travis-ci.org/kevinburke/go-bindata/jobs/406748540
tip: https://travis-ci.org/kevinburke/go-bindata/jobs/406748542

I just merged a change that only calls format.Source on a small part of the file. With that change it looks like there's an improvement between 1.9.7 and tip.

https://travis-ci.org/kevinburke/go-bindata/jobs/406757117
https://travis-ci.org/kevinburke/go-bindata/jobs/406757130

@kevinburke

This comment has been minimized.

Contributor

kevinburke commented Jul 22, 2018

So what are my options here for a benchmark? I am guessing these are not possible:

  • importing github.com/kevinburke/go-bindata from the benchmark
  • checking in large file(s) into the testdata
  • adding a benchmark in the x repos

What about downloading a large file from the public internet, and skipping the benchmark if the download fails? This one for example would be suitable. https://github.com/kevinburke/go-bindata/blob/master/testdata/assets/bindata.go

Otherwise, I suppose I could check in a medium size photo (or use an existing one in the git repository), and manually reconstruct the process of building the bindata file.

@josharian

This comment has been minimized.

Contributor

josharian commented Jul 22, 2018

Skimming the beginning of that file, I'm guessing you could get similar performance behavior with just something like:

package p

var x1 = []byte("some very very very long string created with bytes.Repeat")

var x2 = []byte("another long string")

var x3 = []byte("as many of these as you need to recreate your performance characteristics")

That should be easy to create in memory with a plain old for loop. In any case, I'd start there and use pprof profiles to convince yourself that the cpu/memory behavior is similar. If it isn't, tweak the simple generated file until it is. If somehow (which I would find very surprising) the contents of the strings involved matter, just generate random bytes using math/rand.

@gopherbot

This comment has been minimized.

gopherbot commented Jul 23, 2018

Change https://golang.org/cl/125437 mentions this issue: go/format: add benchmark

@bronze1man

This comment has been minimized.

bronze1man commented Jul 24, 2018

@kevinburke
You have another choose that you can skip the step format.Source to your generated go files. Almost no one will read your generated files,I think it is ok to leave it unformat, you just need it do not change with the same input data (like sort the import path part).
You can also generate a formated generated go files. It is not so hard then it looks like.
Both way can save you a lot of generate time.

I used to use format.Source to format all my generated files, then I find that simple skip the format step can save me more than 50% of the generate time.
So I just skip the format step.

@gopherbot

This comment has been minimized.

gopherbot commented Aug 5, 2018

Change https://golang.org/cl/127928 mentions this issue: go/format: benchmark performance on large autogen file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment