-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: long symbol names for instantiated generics => large object files (though not executables) #50438
Comments
It would help if you could provide a small single source file that produces a large output using go tool compile -o x.a x.go |
Here is a single file of about 1500 lines. https://gotipplay.golang.org/p/kMBLHxigsDo
|
With the test case above, The largest symbol in the object file is
It's 30111 bytes (according to This may be working as expected, but CC @randall77 @danscales in case it is not. |
https://gotipplay.golang.org/p/8WZz12Ay6vz It is almost the same code, but the Option, Try , HCons and Iterator types are changed to a struct type.
The object file size has been reduced to 17436020. It seems that object files are created inefficiently when generic interfaces refer to each other. Because I think It would be better not to use type parameters with interface types in the production to reduce compile time. |
As Ian's symbol example shows, the symbols can be quite large for the names of instantiated functions/methods, when the type arguments are instantiated types that are nested and the descriptions of some of the underlying types (e.g. interfaces) are large. For the name of a shape type (the proxy type standing for all the types that a particular instantiation will handle), we use the standard printing (via LinkString) of the shared underlying type (e.g |
No. This is something for a release note, not a compiler warning. Everything works correctly, it just takes more space than expected. The compiler never issues warnings anyhow. |
FWIW, I've written a few tests to show that while package archives are definitely larger (usually 2x), the resulting binary executable shows no real difference -- https://github.com/akutz/go-generics-the-hard-way/blob/main/06-benchmarks/03-file-sizes.md. |
@danscales This is in the 1.18 milestone; time to move to 1.19? Thanks. |
Yes, I'll move to 1.19. Thanks! |
I spent some time digging into this as we've been having significant issues with the build cache growing much too large, even with the type hashing introduced with Go 1.22 to fix #65030. I've created a MWE based on my findings which generates a 50 MB object file from ~130 lines of code: https://github.com/arvidfm/go-cache-bloat. The large object files seem to be a result of several factors compounding to make the issue even worse:
For us this adds up to several ~300 MB package archives being generated as part of the compilation, totalling ~5 GB for a single compilation from a clean cache. Making a single change to a central package results in most other packages being recompiled, resulting in another 5 GB, which quickly adds up. I suspect (5) is unavoidable (and if anything desirable performance wise were it not for the other issues), and (3) is probably a requirement for interfaces to work. Hopefully something can be done about (1), (2) and (4), however. Maybe someone more familiar with the compiler internals could chime in to note which of these might be easiest to tackle. (It seems to me like it should be possible to avoid instantiating a type if the same instance is already present in one of the imported package archives.) To better explain what I mean by (4), consider the following example, where the object files for both the // a/a.go
package a
type GenericType[T any] struct{}
func (GenericType[T]) GenericFunc() {}
type A struct{}
func (A) AFunc(GenericType[int]) {}
// main.go
package main
import "example.com/cache/a"
type B struct {
A a.A
}
func main() {} This compounds for long dependency chains like To get a better idea of specifically where the bloat is coming from, I ran a test on a real-world codebase that makes heavy use of generics. I created a new package containing a single file which calls a single generic function, referencing a type from another package that in turn results in a chain of generic type instantiations. The package looked something like this: package mytest
import (
"github.com/blah/blah/core"
"github.com/blah/blah/users"
)
func doThing() {
core.DoAction[users.User]()
} Running
I suspect the vast, vast majority of data here is from duplicate method instantiations already present in other package archives. Each individual entry in the reloc section is small, but there are just so many of them that they add up, presumably due to the duplicate method instantiations. As for how to best mitigate the issue, this is the best I've managed to come up with in terms of practical advice for the end user:
Of course, most of this is absolutely horrible advice from a maintainability, readability and runtime performance perspective, and not particularly helpful if you've already implemented a certain architecture and can't afford to break compatibility, so I do hope that the issue can be fixed on the compiler side. |
Turns out that there is an open issue for the duplicate generic instantiation (or at least a specific case of it): #56718 |
#50438 (comment) thank you for the breakdown. I am not sure when I will have cycles to investigate this, but I would like to so I will optimistically assign this to myself. |
I started looking at https://github.com/arvidfm/go-cache-bloat . My initial observation is that it is requesting a very large number of types be instantiated, and those types have a large number of methods. I will keep looking, but TBH the number of types instantiated seems like the problem that should be solved. Compressing the size of the strings would help, but it is a secondary symptom. FWIW switching (If you have a more realistic example than go-cache-bloat that would help a lot.) |
@timothy-king The code in that repo is a contrived version (intentionally crafted to result in as much blowup as possible in as few lines of code as possible) of a pattern we use in practice in our codebase for a sort of typesafe query AST builder. Say we have a number of different models containing data (think e.g. tables in SQL), and we want to represent typed expressions that manipulate data from those models. You can't combine expressions evaluated against different models, which we want to encode at the type level. So we might represent this like: type Expression[Model, Type any] struct {
rawExpr string
} To make building these expressions more ergonomic and readable, we have a lot of small methods for composing expressions, e.g.: func (e Expression[Model, Type]) Add(expr Expression[Model, Type]) Expression[Model, Type] {
return Expression[Model, Type]{rawExpr: fmt.Sprintf("%s + %s", e.rawExpr, expr.rawExpr)}
}
func (e Expression[Model, Type]) Eq(value Type) Expression[Model, bool] {
return Expression[Model, bool]{rawExpr: fmt.Sprintf("%s == %v", e.rawExpr, value)}
}
func (e Expression[Model, Type]) IsNull() Expression[Model, bool] {
// ...
} etc etc, for a ton of different types of operators and functions. This allows us to type e.g. Now, the above is very simplified; we don't actually just do raw string interpolation, and this being an AST, we have many different node types which are all interrelated, meaning that (as I've only later come to realise) instantiating one node will also tend to instantiate a bunch of other related node types and their methods. We also use a lot of embedding in order to reuse functionality between node types (e.g. there are multiple different types of expressions), which essentially duplicates the method symbol names for each embedding type. We also define various interfaces to allow us to e.g. introspect expressions without having to know the The Type switches of the form that you see in The insidious part of this is that in practice, the issue really only starts to manifest at scale, when the generic types are already used pervasively throughout the codebase. This means that by the time you come to realise that there is a problem, it's very difficult and time consuming to refactor the code in order to soothe the compiler. I would also find it unfortunate if a pattern that I find very ergonomic to use while also adding a lot of type safety would be discouraged just because of a limitation of the compiler. |
@arvidfm Thank your for the additional context. I could see how one end up in this situation a bit better. FWIW the point I was trying to make is that I think the more promising direction is to find a way to shift the asymptotic growth of the cache. String compression or sharing might shrink the object files (though possibly at the cost of readability and tool complexity), but it won't fundamentally change the growth rate for how many methods are generated. So I don't think this issue (long symbol names) is the most promising way of addressing your problem. Addressing point 4 in #50438 (comment) would be a real shift in the asymptotics of the build cache in this example. So #56718 seems like the more relevant issue. |
@timothy-king Yes, I agree that #56718 is the more immediately pressing issue, and also what will result in the biggest short-term gains. I do think it's worth thinking about ways deal with this issue specifically as well though, since there are pathological cases where it would still crop up for individual compilation units even with the other issue fixed, e.g. if all your instantiations happen in the same package. Probably the most promising avenues to explore in terms of this issue specifically would be finding ways to make the symbol names themselves shorter (lower threshold for hashing symbol names, creating a single hash for all type parameters instead of one per type parameter, abbreviating long package names, etc) as well as reducing the number of methods compiled to begin with (don't compile methods unless explicitly called or reachable from a type made into an interface value, avoid creating wrapper methods unless actually used, allow explicitly marking methods as not available to satisfy interfaces or to call via reflection), but I recognise neither is straightforward (especially the latter). |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
What did you expect to see?
A build cache directory of a reasonable size ,
or
Warning about incorrectly used generic type.
What did you see instead?
Please excuse my poor English.
It seems to be related to this issue as well. #50204
I have written some algebraic data types to test the generic of Go 1.18.
( https://github.com/csgura/fp.git )
After running the tests, I noticed that the build cache was using a very large amount of disk.
I guess the cause lies in the some recursive type ( HList and curried Func ) and the interface type that uses the type parameter.
When I modified the code so that a generic interface type does not return other generic interface type,
The size of the build cache has been significantly reduced.
This fix is applied in the master branch.
It will not become a problem right now, but I think it will become a big problem as more and more projects use generics.
The text was updated successfully, but these errors were encountered: