Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode: decrease binary size #7600

Open
josharian opened this Issue Mar 20, 2014 · 6 comments

Comments

Projects
None yet
4 participants
@josharian
Copy link
Contributor

josharian commented Mar 20, 2014

Currently, the generated unicode tables.go sets up a separate slice for each R16/R32 in
each RangeTable, each with its own backing array.

Rearranging the code generated by maketables.go (in a way that is invisible to the
exported API) so that the RangeTable slices point into a big, shared R16/R32 array
reduces the contribution of the unicode tables to binary size by ~35k. If issue #7599
were fixed as well, the space savings would be ~45k to ~60k. Details on the savings
below.


Questions:

(1) Are these space savings significant enough to warrant possible inclusion in Go 1.3,
or should I wait to polish + mail the CL until Go 1.4?
(2) Is there a reason not to do this rearrangement?
(3) Is there a fix to the toolchain that achieves these reductions in a better / cleaner
/ deeper way? (For example, instead of creating a separate backing array symbol and
slice header symbol for staticly initialized slices, one could just create a single
slice symbol containing the slice header followed by the array. That would prove some
space savings.)


Details on the size changes:

$ cat radical.go
package main

import "unicode"

func main() {
    _ = unicode.Radical
}


Build with 6g.

Binary size before: 733664 bytes. Binary size after: 699296 bytes.


Largest symbols before:

$ go tool nm -size -sort size radical | head -n 50
   4e0c0     101365 R _esymtab
   4e0c0     101365 R _pclntab
   4e0c0     101365 R _etypelink
   4e0c0     101365 R _symtab
   87200      56984 B runtime.mheap
   3d340      49024 R _gcbss
   319f8      47372 R go.string.*
   265a0      46168 R _rodata
   265a0      46168 R type.*
   81fc0      21056 B _bufferList
   492c0      18192 R _gcdata
   492c0      18192 R _egcbss
   7e100      16064 B _semtable
   22920      15088 T unicode.init


Largest symbols after:

$ go tool nm -size -sort size radical | head -n 50
   4efa0     102141 R _pclntab
   878c0      56984 B runtime.mheap
   3e420      52360 R _gcbss
   33c18      42956 R go.string.*
   2a5c0      38488 R type.*
   2a5c0      38488 R _rodata
   22920      31504 T unicode.init
   82680      21056 B _bufferList
   6a740      20904 D unicode.allRange16
   7e7c0      16064 B _semtable
   4b0c0      14856 R _gcdata
   68020      10016 D unicode.allRange32
   7c7a0       8192 B _pdesc
   7a800       8096 B _hash


The main size savings here come from a reduction in the number of small symbols
generated to hold staticly initialized autotmp values, each with their own overhead
(name, padding, etc.).

The increase in the size of unicode.init is addressable via issue #7599.
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Mar 20, 2014

Comment 1:

Labels changed: added repo-main, release-go1.3maybe.

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Mar 21, 2014

Comment 2:

Owner changed to @josharian.

Status changed to Started.

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Apr 3, 2014

Comment 3:

If the problem is padding in the linker we should fix the linker. The rewrite forces the
linking of all unicode table data even if you import unicode and only refer to
unicode.Greek.
Right now importing unicode and not referring to anything still pulls everything in,
because the map init-time code keeps the dead symbol removal from working. But let's not
add a second reason.
Leaving this issue open to be about making unicode take less memory, but I think we'll
need a different approach.

Labels changed: added release-go1.4, removed release-go1.3maybe.

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Apr 3, 2014

Comment 4:

Agreed that we should fix it more deeply.
It is not just padding. It's also the autotemp symbol name showing up multiple places,
the autogenerated init code, etc. Some of these will be fixable head on; reducing the
number of symbols will also help. See the discussion at the end of
https://golang.org/cl/78870047/ for related issues.

Owner changed to ---.

Status changed to Accepted.

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Apr 3, 2014

Comment 5:

I never saw discussion of accepting 78870047 for Go 1.3 and reverting it in Go 1.4 when
the issue is fixed properly.
If it gets us smaller binaries (a goal for Go 1.3) and doesn't hurt our already-broken
support for dropping Glagolitic when we only want Greek, it seems worth considering?
@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Sep 2, 2014

Comment 6:

Labels changed: added release-none, removed release-go1.4.

@rsc rsc added this to the Unplanned milestone Apr 10, 2015

@rsc rsc removed release-none labels Apr 10, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.