-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: long symbol names for instantiated generics => large object files (though not executables) #50438
Comments
It would help if you could provide a small single source file that produces a large output using go tool compile -o x.a x.go |
Here is a single file of about 1500 lines. https://gotipplay.golang.org/p/kMBLHxigsDo
|
With the test case above, The largest symbol in the object file is
It's 30111 bytes (according to This may be working as expected, but CC @randall77 @danscales in case it is not. |
https://gotipplay.golang.org/p/8WZz12Ay6vz It is almost the same code, but the Option, Try , HCons and Iterator types are changed to a struct type.
The object file size has been reduced to 17436020. It seems that object files are created inefficiently when generic interfaces refer to each other. Because I think It would be better not to use type parameters with interface types in the production to reduce compile time. |
As Ian's symbol example shows, the symbols can be quite large for the names of instantiated functions/methods, when the type arguments are instantiated types that are nested and the descriptions of some of the underlying types (e.g. interfaces) are large. For the name of a shape type (the proxy type standing for all the types that a particular instantiation will handle), we use the standard printing (via LinkString) of the shared underlying type (e.g |
No. This is something for a release note, not a compiler warning. Everything works correctly, it just takes more space than expected. The compiler never issues warnings anyhow. |
FWIW, I've written a few tests to show that while package archives are definitely larger (usually 2x), the resulting binary executable shows no real difference -- https://github.com/akutz/go-generics-the-hard-way/blob/main/06-benchmarks/03-file-sizes.md. |
@danscales This is in the 1.18 milestone; time to move to 1.19? Thanks. |
Yes, I'll move to 1.19. Thanks! |
I spent some time digging into this as we've been having significant issues with the build cache growing much too large, even with the type hashing introduced with Go 1.22 to fix #65030. I've created a MWE based on my findings which generates a 50 MB object file from ~130 lines of code: https://github.com/arvidfm/go-cache-bloat. The large object files seem to be a result of several factors compounding to make the issue even worse:
For us this adds up to several ~300 MB package archives being generated as part of the compilation, totalling ~5 GB for a single compilation from a clean cache. Making a single change to a central package results in most other packages being recompiled, resulting in another 5 GB, which quickly adds up. I suspect (5) is unavoidable (and if anything desirable performance wise were it not for the other issues), and (3) is probably a requirement for interfaces to work. Hopefully something can be done about (1), (2) and (4), however. Maybe someone more familiar with the compiler internals could chime in to note which of these might be easiest to tackle. (It seems to me like it should be possible to avoid instantiating a type if the same instance is already present in one of the imported package archives.) To better explain what I mean by (4), consider the following example, where the object files for both the // a/a.go
package a
type GenericType[T any] struct{}
func (GenericType[T]) GenericFunc() {}
type A struct{}
func (A) AFunc(GenericType[int]) {}
// main.go
package main
import "example.com/cache/a"
type B struct {
A a.A
}
func main() {} This compounds for long dependency chains like To get a better idea of specifically where the bloat is coming from, I ran a test on a real-world codebase that makes heavy use of generics. I created a new package containing a single file which calls a single generic function, referencing a type from another package that in turn results in a chain of generic type instantiations. The package looked something like this: package mytest
import (
"github.com/blah/blah/core"
"github.com/blah/blah/users"
)
func doThing() {
core.DoAction[users.User]()
} Running
I suspect the vast, vast majority of data here is from duplicate method instantiations already present in other package archives. Each individual entry in the reloc section is small, but there are just so many of them that they add up, presumably due to the duplicate method instantiations. As for how to best mitigate the issue, this is the best I've managed to come up with in terms of practical advice for the end user:
Of course, most of this is absolutely horrible advice from a maintainability, readability and runtime performance perspective, and not particularly helpful if you've already implemented a certain architecture and can't afford to break compatibility, so I do hope that the issue can be fixed on the compiler side. |
Turns out that there is an open issue for the duplicate generic instantiation (or at least a specific case of it): #56718 |
#50438 (comment) thank you for the breakdown. I am not sure when I will have cycles to investigate this, but I would like to so I will optimistically assign this to myself. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
What did you expect to see?
A build cache directory of a reasonable size ,
or
Warning about incorrectly used generic type.
What did you see instead?
Please excuse my poor English.
It seems to be related to this issue as well. #50204
I have written some algebraic data types to test the generic of Go 1.18.
( https://github.com/csgura/fp.git )
After running the tests, I noticed that the build cache was using a very large amount of disk.
I guess the cause lies in the some recursive type ( HList and curried Func ) and the interface type that uses the type parameter.
When I modified the code so that a generic interface type does not return other generic interface type,
The size of the build cache has been significantly reduced.
This fix is applied in the master branch.
It will not become a problem right now, but I think it will become a big problem as more and more projects use generics.
The text was updated successfully, but these errors were encountered: