-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: objects emit generic instantiations for imported packages #56718
Comments
I think this is blocked by #47448. |
@cherrymui points out it should be possible to cross-reference DUPOK symbols from other packages by just not defining them. This appears to demonstrate it working correctly across packages: https://go.dev/play/p/0SheVE56Lhm |
Agreed that #47448 is not fundamental to emitting undefined references to things in other packages. In this case, however, the object file has no need to mention |
Signal boost from Google-internal issue b/340857204: It seems like this issue has become more pressing after CL 578815 added more generics to the Every Go Protobuf generated code file ( Despite not using the newly added symbols at all, the .a files for these Protobuf generated code targets have grown. Because Google programs use a large number of Protobufs, the collective size increase is pronounced enough to push many Google-internal programs over the edge in terms of allowed link action input size (“Forge: Total size of inputs too large”). Would it be possible to prioritize this issue to get the .a file size down again? Thanks. |
I was looking into the issues mentioned in #29561 and #50438, which led me to conclude that this is the main culprit behind the bloated build cache and long compilation times that we're seeing. I posted a detailed investigation with numbers from a real-world codebase in #50438, but the short of it is that our codebase (which a year or so ago compiled in seconds) takes several minutes to compile following large-scale refactoring involving generics, and we end up with a 5 GB build cache with many individual ~300 MB package archives, most of which seems to be duplicated generic instantiations. The artifact sizes are further exacerbated by the long type names mentioned in #50438. You can find a MWE at https://github.com/arvidfm/go-cache-bloat which shows how you can produce a 50 MB package archive from a 15 line package within a 130 line codebase. This is a major issue for us as the slowdown from the recompilation of the instantiations makes it harder for us to iterate on our code, and the resulting size of the build cache artifacts made our CI environment run out of space so frequently that we had to create a script to prune our build cache before each build. |
With the use of atomic.Pointer in package sync, this program is now generating the full compiled code for atomic.Pointer into its object file even though it's not being used at all:
The part I didn't omit with ... is all that should be here, plus maybe a package init. The omitted part appears to be the entire compilation of four different instantiations of sync/atomic.Pointer, plus all their supporting type declarations:
Given that there is no code being emitted for this package at all, there's no need to emit all of this. (It's possible they are referenced from the definition of
go:info.sync.Map
, but that is not being emitted here either. The generated code correctly assumes that will be provided by package sync's object file.)Overall the size increase is 33X since Go 1.19:
Obviously there is a small denominator in that 33X, but there is also very little use of generics yet.
There is a deeper problem than dead code here too.
When we were talking about instantiation in the early days of generics, one of the ways we discussed to avoid code size issues would be for packages to know which instantiations they use are already present in packages they import, so that definitions would not need to be generated repeatedly in multiple packages. For example, since package sync generated the definition for atomic.Pointer[any] any program directly or indirectly importing package sync can use atomic.Pointer[any] “for free” without itself regenerating that code. Of course, there's a search problem for answering the question of whether any package in the transitive import closure has already generated the code. But at the least, if we see a type in the ordinary course of reading export data for imported packages, we should mark it as not needed to be regenerated: anything in export data is handled by the package that wrote the export data. Applying that rule has no search problem.
In the standard library we are barely starting to use generics, yet we have 15 packages whose objects repeat the atomic.Pointer code from sync, 13 that repeat atomic.Pointer code from go/token, 9 that repeat an atomic.Pointer[string] from internal/godebug (in my working tree), and three that repeat an atomic.Pointer[response] from net/http:
For what it's worth, the objects we shipped in Go 1.19 appear to have the same duplication of atomic.Pointer[go/token.File], so the bug is not new.
15 doesn't look so bad, but when compiling Kubernetes we end up with 122 copies each of four different atomic.Pointer instantiations from package sync:
If this is easy to fix it would be good to do it for Go 1.20. If not, then since Go 1.19 appears to behave the same way, it's probably not strictly a release blocker.
The text was updated successfully, but these errors were encountered: