cmd/go: build dependent packages as soon as export data is ready #15734

josharian · 2016-05-18T21:44:50Z

This is a trace of the activity on an 8 core machine running 'go build -a std':

For those who want to explore more, here is an html version. (Hint: Use a, d, w, and s keys to navigate.)

There are a few early bottlenecks (runtime, reflect, fmt) and a long near linear section at the end (net, crypto/x509, crypto/tls, net/http). Critical path scheduling (#8893) could help some with this, as could scheduling cgo invocations earlier (#15681). This issue is to discuss another proposal that complements those.

We currently wait until a package is finished building before building packages that depend on it. However, dependent packages only need export data, not machine code, to start building. I believe that that could be available once we're done with escape analysis and closure transformation, and before we run walk.

For the bottlenecks listed above:

package	time until export data available	total compilation time
runtime	226ms	1300ms
reflect	174ms	960ms
fmt	33ms	229ms
net	114ms	846ms
crypto/x509	66ms	253ms
crypto/tls	82ms	461ms
net/http	168ms	1310ms

Though slightly optimistic (writing export data isn't instantaneous), this does suggest that this would in general significantly reduce time spent waiting for dependencies to compile.

This pattern of large, slow, linear dependency chains also shows up in bigger projects, like juju.

@rsc implemented one enabling piece by adding a flag to emit export data separately from machine code.

Remaining work to implement this, and open questions:

Emitting export data before walk means that inlined functions would get walked and expanded at use rather than at initial package compilation. Does this matter? If so, an alternative is to change the compiler structure to walk all functions and then compile all functions. Would this increase high water memory mark?
How would the compiler signal to cmd/go that it is done emitting export data? I don't know of a clean, simple, portable cross-process semaphore.
This would be a pretty major upheaval in how cmd/go schedules builds. Making this work more fine-grained would be useful anyway, but it'd be a lot of high risk change.

Given the scope of the change, I'm marking this as a proposal. I'd love feedback.

The text was updated successfully, but these errors were encountered:

bradfitz · 2016-05-18T22:05:21Z

Love it. Also glad that traceview ended up working out.

How would the compiler signal to cmd/go that it is done emitting export data? I don't know of a clean, simple, portable cross-process semaphore.

Localhost TCP? Later: multi-machine TCP and making cmd/compile and cmd/link use a VFS (and network impls) rather than the os package for file access. Imagining running a large build on a cloud machine and cmd/go spinning up some helper Kubernetes (or whatever) containers to speed the build, then going away when the build is done, paying for them by the number of seconds they were running.

mdempsky · 2016-05-18T22:11:01Z

How would the compiler signal to cmd/go that it is done emitting export data?

A relatively simple (but local-only) way would be:

Each time cmd/go execs cmd/compile, it creates a new os.Pipe, passes the writing end as one of cmd/compile's ExtraFiles, along with a command line flag like -signalexportdata.
In cmd/compile, if we see -signalexportdata, we close the pipe FD after writing out export data to disk.
Back in cmd/go, when we see the pipe has been closed, we can assume export data is written out.

I also suspect you don't actually need separate export data file functionality for this. Since the export data is at the beginning of the .a file, we can just write a partial .a file, signal cmd/go, and then finish writing later.

griesemer · 2016-05-18T22:16:25Z

Interesting. I suspect also that the time from start of compilation until the time the export data becomes available will become smaller over time (faster frontend), while the backend may become slower (relatively), due to more powerful optimizations. Seems like a good idea to me.

mwhudson · 2016-05-18T22:18:58Z

Only vaguely connected random idea: If the export data hasn't changed, can
you skip the compilation of dependent packages?

On 19 May 2016 at 10:16, Robert Griesemer notifications@github.com wrote:

Interesting. I suspect also that the time from start of compilation until
the time the export data becomes available will become smaller over time
(faster frontend), while the backend may become slower (relatively), due to
more powerful optimizations. Seems like a good idea to me.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#15734 (comment)

mdempsky · 2016-05-18T22:22:06Z

@mwhudson I think so, but that could be done even without this proposal.

josharian · 2016-05-19T17:47:07Z

This proposal also got discussed at #15736. Based on that discussion, I will re-evaluate this proposal once cmd/compile itself is more concurrent.

@mwhudson moved your suggestion to #15752.

josharian · 2017-07-06T20:29:54Z

@rsc is it safe to assume that this will be more or less independent of your 1.10 cmd/go work?

rsc · 2017-10-25T04:34:16Z

I'm very skeptical this is worth the complexity. It would require support for "half-completed" actions in the go command where the compile step half-completes early and then fully-completes later. I don't believe the payoff here would be worth the significant increase in complexity. I guess you could run two compiles, so that you generate the export data first and then the whole object second, but that just does more work overall. Even if it improves latency in certain cases, more work overall is a net loss.

Critical path scheduling or just working on making the compiler faster seems like a better use of time.

rsc · 2019-04-30T15:49:18Z

Especially with good caching I think this is less and less important, and no less complex to implement. Closing, to better reflect our intention not to do this.

josharian added this to the Proposal milestone May 18, 2016

josharian added the ToolSpeed label May 18, 2016

This was referenced May 18, 2016

runtime: refactor into separate subpackages #11647

Open

cmd/go: add timing information to -debug-actiongraph #15736

Closed

josharian mentioned this issue May 19, 2016

cmd/go: only rebuild dependent packages when export data has changed #15752

Open

josharian self-assigned this May 19, 2016

mdempsky mentioned this issue Aug 17, 2016

cmd/go: schedule cgo compilation early #15681

Open

bradfitz changed the title ~~proposal: build dependent packages as soon as export data is ready~~ cmd/go: build dependent packages as soon as export data is ready Aug 22, 2016

bradfitz modified the milestones: Unplanned, Proposal Aug 22, 2016

josharian mentioned this issue Oct 24, 2016

cmd/compile: improve inlining cost model #17566

Open

josharian mentioned this issue Nov 3, 2016

build: all.bash fails to saturate 6 cores #17751

Open

josharian mentioned this issue Apr 8, 2017

cmd/compile: parallelize compilation #15756

Closed

josharian mentioned this issue Apr 21, 2017

cmd/compile: read and process import data lazily #20070

Closed

rsc closed this as completed Apr 30, 2019

golang locked and limited conversation to collaborators Apr 29, 2020

gopherbot added the FrozenDueToAge label Apr 29, 2020

rsc unassigned josharian Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/go: build dependent packages as soon as export data is ready #15734

cmd/go: build dependent packages as soon as export data is ready #15734

josharian commented May 18, 2016

bradfitz commented May 18, 2016

mdempsky commented May 18, 2016

griesemer commented May 18, 2016

mwhudson commented May 18, 2016

mdempsky commented May 18, 2016

josharian commented May 19, 2016

josharian commented Jul 6, 2017

rsc commented Oct 25, 2017

rsc commented Apr 30, 2019

cmd/go: build dependent packages as soon as export data is ready #15734

cmd/go: build dependent packages as soon as export data is ready #15734

Comments

josharian commented May 18, 2016

bradfitz commented May 18, 2016

mdempsky commented May 18, 2016

griesemer commented May 18, 2016

mwhudson commented May 18, 2016

mdempsky commented May 18, 2016

josharian commented May 19, 2016

josharian commented Jul 6, 2017

rsc commented Oct 25, 2017

rsc commented Apr 30, 2019