Currently, the generated unicode tables.go sets up a separate slice for each R16/R32 in
each RangeTable, each with its own backing array.
Rearranging the code generated by maketables.go (in a way that is invisible to the
exported API) so that the RangeTable slices point into a big, shared R16/R32 array
reduces the contribution of the unicode tables to binary size by ~35k. If issue #7599
were fixed as well, the space savings would be ~45k to ~60k. Details on the savings
below.
Questions:
(1) Are these space savings significant enough to warrant possible inclusion in Go 1.3,
or should I wait to polish + mail the CL until Go 1.4?
(2) Is there a reason not to do this rearrangement?
(3) Is there a fix to the toolchain that achieves these reductions in a better / cleaner
/ deeper way? (For example, instead of creating a separate backing array symbol and
slice header symbol for staticly initialized slices, one could just create a single
slice symbol containing the slice header followed by the array. That would prove some
space savings.)
Details on the size changes:
$ cat radical.go
package main
import "unicode"
func main() {
_ = unicode.Radical
}
Build with 6g.
Binary size before: 733664 bytes. Binary size after: 699296 bytes.
Largest symbols before:
$ go tool nm -size -sort size radical | head -n 50
4e0c0 101365 R _esymtab
4e0c0 101365 R _pclntab
4e0c0 101365 R _etypelink
4e0c0 101365 R _symtab
87200 56984 B runtime.mheap
3d340 49024 R _gcbss
319f8 47372 R go.string.*
265a0 46168 R _rodata
265a0 46168 R type.*
81fc0 21056 B _bufferList
492c0 18192 R _gcdata
492c0 18192 R _egcbss
7e100 16064 B _semtable
22920 15088 T unicode.init
Largest symbols after:
$ go tool nm -size -sort size radical | head -n 50
4efa0 102141 R _pclntab
878c0 56984 B runtime.mheap
3e420 52360 R _gcbss
33c18 42956 R go.string.*
2a5c0 38488 R type.*
2a5c0 38488 R _rodata
22920 31504 T unicode.init
82680 21056 B _bufferList
6a740 20904 D unicode.allRange16
7e7c0 16064 B _semtable
4b0c0 14856 R _gcdata
68020 10016 D unicode.allRange32
7c7a0 8192 B _pdesc
7a800 8096 B _hash
The main size savings here come from a reduction in the number of small symbols
generated to hold staticly initialized autotmp values, each with their own overhead
(name, padding, etc.).
The increase in the size of unicode.init is addressable via issue #7599.