New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: schedule cgo compilation early #15681

Open
josharian opened this Issue May 14, 2016 · 10 comments

Comments

Projects
None yet
8 participants
@josharian
Contributor

josharian commented May 14, 2016

From a quick scan of the code, it appears that cgo does not depend on a package's dependencies having been compiled. Given that that is the case, and given that cgo (and the resulting C compiler invocations) are generally very slow, it is probably worth scheduling all invocations of cmd/cgo at the very beginning of a build. (As a corollary, this might also mean scheduling the compilation of cmd/cgo itself early.)

cc @ianlancetaylor for input about whether cgo invocations can be safely pushed to the head of the queue.

One downside: It is unclear whether this is just a poor man's version of #8893, which didn't appear to yield much fruit. That could use more investigation.

@josharian josharian added this to the Go1.8 milestone May 14, 2016

@bradfitz bradfitz added the ToolSpeed label May 14, 2016

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented May 14, 2016

I agree that cgo (and SWIG) can be run immediately, without waiting for any Go files to be compiled.

Dmitry's CL for #8893, and your version of it, prove nothing about this one way or the other, as they do not break out the cgo portion of building a package from the rest of building a package. All the cgo support is wrapped up in the same function that compiles the package files: builder.build.

@petermattis

This comment has been minimized.

petermattis commented May 16, 2016

We can also parallelize the C compiler invocations within a package which can give a very big compilation speedup for a cgo heavy project. See https://go-review.googlesource.com/#/c/4931/.

@josharian

This comment has been minimized.

Contributor

josharian commented May 16, 2016

Thanks, @petermattis, I was just trying to find that CL. If I do any serious surgery here, I will see about making individual C compilation fine-grained.

@mdempsky

This comment has been minimized.

Member

mdempsky commented Aug 15, 2016

As mentioned in #16623, I propose the opposite approach: instead of trying to push cmd/cgo earlier, let's push the C compilations later.

Currently the only reason we have to wait for the C sources to finish compiling is so we can run cmd/cgo -dynimport and generate a bunch of //go:cgo_import_dynamic directives. But cmd/compile doesn't actually need these directives: it simply stashes them into the compiled package artifact so cmd/link can find them.

If cmd/go was responsible for saving the directives instead, we could run cmd/compile immediately after the first cmd/cgo run. Then Go package compilation would never be blocked waiting on C compilations, and C compilations could all run in parallel only blocking any link operations that depend on them.

@josharian

This comment has been minimized.

Contributor

josharian commented Aug 16, 2016

let's push the C compilations later

C compilation is slow. Don't we want it to be as early as possible, so that it isn't the lone straggler at the end?

I agree that it'd be very good to not have to wait for cgo/C to be done to start compiling Go, I just want to make sure we don't push C to the end as a consequence of that.

@mdempsky

This comment has been minimized.

Member

mdempsky commented Aug 17, 2016

I don't mean C compilations need to be delayed per se, but they're trivially parallelizable and nothing fundamentally depends on them except for the linker. On the other hand, Go compilations do necessarily depend on other compilations, so it seems beneficial to prioritize scheduling them to unblock more work. E.g., in your graph for #15734, we have a long bottleneck for package runtime at the beginning, but it looks like we have more than enough idle CPUs later in the build to handle the C compilations.

My hypothesis is that if we're able to 1) remove the unnecessary dependency from Go compilations on C compilations, and 2) implement something like #15734; then the Go dependency graph's scheduling delays should essentially flatten, allowing us to naturally schedule C compilations earlier.

I suppose what would be really beneficial here is to collect fine-grained trace timing data, and then analyze how much more optimally it could have been scheduled if we relax various dependencies.

@josharian

This comment has been minimized.

Contributor

josharian commented Aug 17, 2016

I don't mean C compilations need to be delayed per se

Ack

Seems plausible. Definitely worth a run. The cmd/go tracing stuff is near the bottom of my list of pending CLs to get mailed/fixed/submitted, but I will get to them eventually. :) But I think the data about C and cgo is probably clear enough already that we can just move forward with your approach.

@rsc rsc modified the milestones: Unplanned, Go1.8 Oct 21, 2016

@pwaller

This comment has been minimized.

Contributor

pwaller commented Apr 5, 2017

Pinging the thread while I wait here for 31 idle cores to complete a several minute CGo compilation... 🍅

@petermattis

This comment has been minimized.

petermattis commented Apr 5, 2017

@pwaller If you're willing to rebuild your Go toolchain (which is really quite easy), see https://github.com/cockroachdb/cockroach/blob/master/build/parallelbuilds-go1.8.patch. The patch applies cleanly to go1.8 and we use it for development of CockroachDB.

@mdempsky

This comment has been minimized.

Member

mdempsky commented Apr 12, 2018

I played with this a little last night. In particular, I changed cgo -dynimport to directly write a dummy .o object file with embedded cgo directives (as opposed to writing out a .go file), then cmd/go just needs to append it to the .a archive. cmd/link automatically does the right thing.

The main limiting factor then is that cmd/go's build graph currently uses a single monolithic Action to represent package compilation, but ideally we'd separate cgo compilation into (at least) two Actions:

  1. Run cgo to generate .c and .go files, and run go tool compile -out pkg.a -linkobj pkg.la pkg/*.go. At this point, pkg.a (the compiler's export data output) is fully ready for downstream Go compilations.
  2. Run cmd/asm and gcc to compile all the other compilation units within the package, and append them to the .la file. Once those are all done, the .la file is ready for any dependant cmd/link operations.

The actions could probably be further broken down even further (e.g., parallelize the C compilations) or to even more finely-grain the dependencies (e.g., use gcc's -M flag to recognize when individual .o files need to be rebuilt), but that first split is probably the biggest win for most cgo users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment