Skip to content

unicode: decrease binary size #7600

@josharian

Description

@josharian
Currently, the generated unicode tables.go sets up a separate slice for each R16/R32 in
each RangeTable, each with its own backing array.

Rearranging the code generated by maketables.go (in a way that is invisible to the
exported API) so that the RangeTable slices point into a big, shared R16/R32 array
reduces the contribution of the unicode tables to binary size by ~35k. If issue #7599
were fixed as well, the space savings would be ~45k to ~60k. Details on the savings
below.


Questions:

(1) Are these space savings significant enough to warrant possible inclusion in Go 1.3,
or should I wait to polish + mail the CL until Go 1.4?
(2) Is there a reason not to do this rearrangement?
(3) Is there a fix to the toolchain that achieves these reductions in a better / cleaner
/ deeper way? (For example, instead of creating a separate backing array symbol and
slice header symbol for staticly initialized slices, one could just create a single
slice symbol containing the slice header followed by the array. That would prove some
space savings.)


Details on the size changes:

$ cat radical.go
package main

import "unicode"

func main() {
    _ = unicode.Radical
}


Build with 6g.

Binary size before: 733664 bytes. Binary size after: 699296 bytes.


Largest symbols before:

$ go tool nm -size -sort size radical | head -n 50
   4e0c0     101365 R _esymtab
   4e0c0     101365 R _pclntab
   4e0c0     101365 R _etypelink
   4e0c0     101365 R _symtab
   87200      56984 B runtime.mheap
   3d340      49024 R _gcbss
   319f8      47372 R go.string.*
   265a0      46168 R _rodata
   265a0      46168 R type.*
   81fc0      21056 B _bufferList
   492c0      18192 R _gcdata
   492c0      18192 R _egcbss
   7e100      16064 B _semtable
   22920      15088 T unicode.init


Largest symbols after:

$ go tool nm -size -sort size radical | head -n 50
   4efa0     102141 R _pclntab
   878c0      56984 B runtime.mheap
   3e420      52360 R _gcbss
   33c18      42956 R go.string.*
   2a5c0      38488 R type.*
   2a5c0      38488 R _rodata
   22920      31504 T unicode.init
   82680      21056 B _bufferList
   6a740      20904 D unicode.allRange16
   7e7c0      16064 B _semtable
   4b0c0      14856 R _gcdata
   68020      10016 D unicode.allRange32
   7c7a0       8192 B _pdesc
   7a800       8096 B _hash


The main size savings here come from a reduction in the number of small symbols
generated to hold staticly initialized autotmp values, each with their own overhead
(name, padding, etc.).

The increase in the size of unicode.init is addressable via issue #7599.

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.binary-size

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions