Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
cmd/link: include per-package aggregate static temp symbols? #39053
I've been working on a Go binary size analysis tool that aims to attribute ~each byte of the binary back to a function or package.
One thing it's not great at doing at the moment in the general case is summing the size of static temp (..stmp_NNN) because those symbols are removed by default (except in external linking mode).
Worse, with Macho-O binaries not having sizes on symbols, I can't even accurately count the sizes of symbols that do exist because the stmp values are in the DATA, but lacking symbols, so I end up calculating the wrong size of existing symbols:
Looking at a binary with the
So, my request: can we aggregate all the
Then I can both calculate the sum stmp sizes per package (e.g. unicode is 68KB, crypto/tls is 12KB), and I can also accurately calculate the size of other symbols (Mach-o symbols without a size)
This would make binaries a tiny bit bigger (but bounded by number of packages at least) but would permit more analysis at making them much smaller, IMO. (Or output it to a separate file.)
It would also require sorting the stmp values all together in the binary. They're currently scattered around:
What I'd like to see is something like:
Aggregating stmp's seems as though it would potentially interfere with linker dead code elimination, since if any stmp in the aggregated glob is made live (by an access from a function presumably) then the entire blob is live.
Seems as though you would get better data if you wrote some throwaway code in the linker, as opposed to relying on the symbol table. There are lots of interesting syms these days that don't go into the symbol table. During 1.15 in the new linker we converted quite a lot of the function-specific DWARF symbols into anonymous syms (which makes them invisible to any tool that keys off the symtab).
NB: in the linker today when field tracking is enabled, the linker's dead code elimination pass builds a "Reachparent" graph where each edge (X -> Y) indicates that symbol X was marked live because symbol Y was live. You could hijack this same mechanism for your purposes, with some additional hacking I think (for "new" syms that the linker invents from nothing). I can send you pointers/examples if you want to try prototyping this.
@cherrymui I think it’d be better to set up content addressability in the compiler, but having the linker do it might be a good backstop and/or source of cheap wins. I think all statictmps are readonly, but we could be extra safe and only do symbols explicitly marked as readonly (and hidden).
Here, x points to a static temp, which is mutable.