cmd/compile: read and process import data lazily #20070
As an experiment, I changed the compiler to report the number of packages, types, and functions built up as a result of processing imports for a given package:
This program has no imports, so no overhead:
This program imports fmt, which turns out to mean processing 38 types and 20 functions, only one of which is needed:
This program imports net/http, which turns out to mean processing 231 types and 857 functions, only one of which is needed:
The overhead here is more than 100X. Many real-world packages are even bigger than the ones in the standard library.
In a typical source file or package, especially during an incremental build, I suspect the compiler spends significantly more type loading and type-checking all these unnecessary imported definitions than it does the necessary one and the actual source file.
Now that we have a clean encapsulation of import/export format handling, it seems like the next step should be to make the import/export format indexed in some way, so that it can be consulted on demand during the compilation rather than loaded entirely up front. That is, when you import "fmt" you'd open the export data for fmt, and then when you get to fmt.Println you'd pull in just what you need to understand fmt.Println (in this case, likely just the function type).
Similarly, for the HTTP server, invoking http.ListenAndServe would mean pulling in the function type and possibly then also learning about http.Handler, since that is the type of the second argument. That might in turn require learning about some more types, since Handler's ServeHTTP takes an http.ResponseWriter and a *http.Request. The latter is a struct with many other types, and it might require loading some large number of other types. Ideally this could be cut off through more lazy loading (none of this is required to type check that the argument
The problem will only get worse as packages grow. For protobuf-generated code, where the export data summarizes essentially the entirety of all imported packages recursively, not loading this information into the compiler until it is needed would be a big win.
Even in the main repo, there are indications this could be a big win. Consider a full build of cmd/go (just to take a decent-sized program) and its dependencies, compiled with inlining completely disabled:
Here's the same full build with our current default inlining:
And here's the same full build with the more aggressive mid-stack inlining:
The difference here is that more inlining means more function bodies that must be imported and type-checked. My guess is that, for any single package being compiled, nearly all of the function bodies included in export data are irrelevant (but which ones are irrelevant changes depending
I'm assuming here that the increase in time is due to importing the new data and not writing it out. I haven't checked that, but the export data is typically read back many times but only written out once (or zero times) during a given build, so I think that's a reasonable assumption.
It's obviously too late for Go 1.9, but it seems like this would be an effective way to speed up the compiler in the Go 1.10 cycle.
@mdempsky and I were just discussing this as well. I noticed it with the cmd/compile/internal/ARCH packages, whose compilation time appears to be dominated by reading the import data for the ssa package, which has lots of export data. @mdempsky was going to look into whether we can reduce the size of that export data by making better decisions about which functions might potentially be inlined by other packages, which might (maybe) happen for 1.9, and which would help both writing and reading.
One other bit of anecdotal evidence that this matters came from @davecheney's observation that the biggest compile time speedup last cycle came from switching to the binary export format.
Nice observation. I'd argue that in a more realistic scenario, a single package probably uses more functionality from an imported package; and that same package may be imported multiple times by a larger package (and then the import happens only the very first time), so the overhead is present but perhaps not as extreme. But there's no denying that this doesn't scale well and needs to be addressed eventually.
Here are some approaches that come to mind:
I suspect 1) could be done even for 1.9 if we permit say 1 or 2 weeks extra past the official freeze for this.
True, but the importing and type-checking is not be the whole story. Inlining also means that functions get larger, which means more nodes, more time in the backend, more progs, etc. I'd wager that those effects are also non-trivial.
I still think this is a good idea, I'm just not sure that the inlining data is clear evidence for it, or that it will necessarily make mid-stack inlining much cheaper, compilation-wise.
referenced this issue
Apr 23, 2017
One more thing: I suspect it's a common scenario that a package A, which imports a package C, partially exports pieces of C in the export data of A. Another package B may do the same, partially exporting a part of C, maybe the same as A did, maybe a different part.
A client of A and B now needs to import multiple parts of C, possibly different ones, possibly overlapping ones. Across a sufficiently large program it could be much more efficient to import C in full exactly once.
Could this finer-grained importing be turned around to create finer-grained dependences? I worked on a Brand X The-Java-Programming-Language™ bytecode-to-native static compiler that did this down to the field-offset level, and it was nice. Obviously we can end up spending more time on dependence paperwork than redoing the work, but perhaps there's a sweet spot at a finer granularity than current.