Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: binaries too big and growing #6853

Open
robpike opened this issue Nov 30, 2013 · 159 comments
Open

all: binaries too big and growing #6853

robpike opened this issue Nov 30, 2013 · 159 comments

Comments

@robpike
Copy link
Contributor

@robpike robpike commented Nov 30, 2013

As an experiment, I build "hello, world" at the release points for go 1.0.
1.1, and 1.2. Here are the binary's sizes:

% ls -l x.1.?
-rwxr-xr-x  1 r  staff  1191952 Nov 30 10:25 x.1.0
-rwxr-xr-x  1 r  staff  1525936 Nov 30 10:20 x.1.1
-rwxr-xr-x  1 r  staff  2188576 Nov 30 10:18 x.1.2
% size x.1.?
__TEXT  __DATA  __OBJC  others  dec hex
880640  33682096    0   4112    34566848    20f72c0 x.1.0
1064960 94656   0   75952   1235568 12da70  x.1.1
1429504 147896  0   177440  1754840 1ac6d8  x.1.2
% 

A near-doubling of the binary size in two releases is a bug of a kind. I will hold on to
the files so they can be analyzed more, but am filing this issue to get the topic
registered. We need to develop a better understanding of the problem and how to address
it.

Marking this 1.3 (not maybe) because I consider it a priority.


A few months ago I exchanged mail with Russ about this topic regarding a different, much
larger binary. To avoid him having to redo the analysis, here is what he said at the
time:

====
i sent CL 13722046 to make the nm -S output a bit more useful.
for the toy binary i now get

  4a2280  1898528 D symtab
  26f3a0  1405936 D type.*
  671aa0  1058432 D pclntab
  3c6790   598056 D go.string.*
  4620c0    49600 D gcbss
  7a7c20    45496 B runtime.mheap
  46e280    21936 D gcdata
  7a29e0    21056 b bufferList
  1ed600    16480 T crypto/tls.(*Conn).clientHandshake
  79eb20    16064 b semtable
  1b3d90    14224 T net/http.init

that seems plausible to me. some notes:

symtab is the plan 9 symbol table. it in the binary but never referenced at run time. it
supports things like nm -S only. it needs to move into an unmapped section of the
binary, but it is only costing at most 8k at run time right now due to fragmentation and
it just wasn't worth the effort to try to move. the new linker will make this easier. of
course, moving it in the file doesn't shrink the file.

the thing named pclntab is a reencoding of the original pclntab and the parts of the
plan 9 symbol table that we did need at run time (mostly just a list of functions and
their names and addresses). as you can see, it is much smaller than the old form (the
symbol table dominates).

type.* is the reflect types and go.string.* is the static go string data. the *
indicates that i coalesced many symbols into one, to avoid useless individual names
bloating the symbol table. if we tried we could probably cut the reflect types by 2-4x.
it would mean packing the data a bit more compactly than an ordinary go data structure
would and then using unsafe to get it back out.

gcbss and gcdata are garbage collection bits for the bss and data segments. that's what
atom symbol did, and it's not clear whether it will last (probably not) and whether what
will replace it will be smaller. time will tell. i have a meeting with dmitriy, carl,
and keith next week to figure out what the plan is.

runtime.mheap, bufferList, and semtable are bss.

you're not seeing the gdb dwarf debug information here, because it's not a runtime
symbol.
 
g% otool -l $(which toy) | egrep '^  segname|filesize'
  segname __PAGEZERO
 filesize 0
  segname __TEXT
 filesize 7811072
  segname __DATA
 filesize 126560
  segname __LINKEDIT
 filesize 921772
  segname __DWARF
 filesize 2886943
g% 

there's another 3 MB. you can build with -ldflags -w to get rid of that at least.
if you read the full otool -l output you will find

Load command 6
     cmd LC_SYMTAB
 cmdsize 24
  symoff 10825728
   nsyms 22559
  stroff 11186924
 strsize 560576

looks like another 1 MB or so (560576+11186924-10825728 or 22559*16+560576) for the
mach-o symbol table.

when we do the new linker we can make recording this kind of information in a useful
form a priority.
@robpike
Copy link
Contributor Author

@robpike robpike commented Nov 30, 2013

Comment 1:

Note: the binaries were build on amd64 10.7.5 (Lion), with  gcc -version
i686-apple-darwin11-llvm-gcc-4.2
For the record, I couldn't do this experiment on 10.9 with Xcode 5 because the older
releases wouldn't build due to gcc/clang skew.

Loading

@robpike
Copy link
Contributor Author

@robpike robpike commented Nov 30, 2013

Comment 2:

Labels changed: added priority-later, removed priority-triage.

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Dec 2, 2013

Comment 3 by jlourenco27:

Just for added reference, this are the size on go 1.1.2 vs 1.2 with OS X 10.9 and Xcode
5 (darwin gcc llvm 5.0 x86_64):
$ ls -l *1.*
-rwxr-xr-x  1 j      staff  1525984  2 Dez 21:44 hello_1.1.2
-rwxr-xr-x  1 j      staff  2192672  2 Dez 21:40 hello_1.2
$ size *1.*
__TEXT  __DATA  __OBJC  others  dec hex
1064960 94720   0   76000   1235680 12dae0  hello_1.1.2
1433600 147896  0   177440  1758936 1ad6d8  hello_1.2

Loading

@randall77
Copy link
Contributor

@randall77 randall77 commented Dec 3, 2013

Comment 4:

This issue was updated by revision f238049.

R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/35940047

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Dec 4, 2013

Comment 5:

Labels changed: added release-go1.3.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Dec 4, 2013

Comment 6:

Labels changed: removed go1.3.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Dec 4, 2013

Comment 7:

Labels changed: added repo-main.

Loading

@zephyr
Copy link

@zephyr zephyr commented Dec 7, 2013

Comment 8:

In this context, please reevaluate if the golang binaries can be changed to work with
UPX (ultimate packer for executables).¹²
For a small amount of computing power, upx can reduce the size of a binary down to a
quarter of its original size. You can authenticate this using a existing example
›fixer‹ programm for golang binaries on linux/amd64.³
While this approach doesn't fix the root of the problem – only the symptoms – it
would be nice to have this possibility always on hand.
For technical background of this problem (PT_LOAD[0].p_offset==0), please look at the
UPX bugtracker⁴.
¹ http://upx.sourceforge.net/
² https://en.wikipedia.org/wiki/UPX
³ https://github.com/pwaller/goupxhttp://sourceforge.net/p/upx/bugs/195/

Loading

@minux
Copy link
Member

@minux minux commented Dec 7, 2013

Comment 9:

re #8, I don't think it's Go's problem. upx should be made more flexible to handle this.

Loading

@leo-liu
Copy link

@leo-liu leo-liu commented Dec 8, 2013

Comment 10:

minux:
We may figure out why the binaries are growing fast first (lager runtime, optimization,
etc.), before we claim that it isn't Go's problem.

Loading

@minux
Copy link
Member

@minux minux commented Dec 8, 2013

Comment 11:

re #10, my #9 reply is to #8, which is about an entirely different problem.
i'm not saying that the ever-growing binaries is not our problem, only that i
don't believe that upx not accepting our binaries is our problem.
it's clear that upx isn't able to handle all possible and correct ELF files (i.e.
if the kernel can execute our binaries just fine, it's upx's problem to not be
able to compress them).

Loading

@robpike
Copy link
Contributor Author

@robpike robpike commented Feb 19, 2014

Comment 12:

More detail. The Plan 9 symbol table is about to be deleted. Here is a reference point,
adding one new entry to the list above:
$ ls -l *1.*
-rwxr-xr-x  1 j      staff  1525984  2 Dez 21:44 hello_1.1.2
-rwxr-xr-x  1 j      staff  2192672  2 Dez 21:40 hello_1.2
-rwxr-xr-x  1 j      staff  2474512 Feb 18 20:27 hello_1.2.x
$ size *1.*
__TEXT  __DATA  __OBJC  others  dec hex
1064960 94720   0   76000   1235680 12dae0  hello_1.1.2
1433600 147896  0   177440  1758936 1ad6d8  hello_1.2
1699840 160984  0   188944  2049768 1f46e8     hello_1.2.x
Text has grown substantially, as has data. At least some of this is due to new
annotations for the garbage collector.

Loading

@robpike
Copy link
Contributor Author

@robpike robpike commented Feb 19, 2014

Comment 13:

More detail. The Plan 9 symbol table is about to be deleted. Here is a reference point,
adding one new entry to the list above:
$ ls -l *1.*
-rwxr-xr-x  1 r  staff  1191952 Nov 30 10:25 x.1.0
-rwxr-xr-x  1 r  staff  1525936 Nov 30 10:20 x.1.1
-rwxr-xr-x  1 r  staff  2188576 Nov 30 10:18 x.1.2
-rwxr-xr-x  1 r  staff  2474512 Feb 18 20:27 hello_1.2.x
$ size *1.*
__TEXT  __DATA  __OBJC  others  dec hex
880640  33682096     0  4112    34566848     20f72c0    x.1.0
1064960 94656   0   75952   1235568 12da70  x.1.1
1429504 147896  0   177440  1754840 1ac6d8  x.1.2
1699840 160984  0   188944  2049768 1f46e8     hello_1.2.x
Text has grown substantially, as has data. At least some of this is due to new
annotations for the garbage collector.

Loading

@robpike
Copy link
Contributor Author

@robpike robpike commented Feb 19, 2014

Comment 14:

More detail. The Plan 9 symbol table is about to be deleted. Here is a reference point,
adding one new entry to the list above:
$ ls -l *1.*
-rwxr-xr-x  1 r  staff  1191952 Nov 30 10:25 x.1.0
-rwxr-xr-x  1 r  staff  1525936 Nov 30 10:20 x.1.1
-rwxr-xr-x  1 r  staff  2188576 Nov 30 10:18 x.1.2
-rwxr-xr-x  1 r  staff  2474512 Feb 18 20:27 hello_1.2.x
$ size *1.*
__TEXT  __DATA  __OBJC  others  dec hex
880640  33682096     0  4112    34566848     20f72c0    x.1.0
1064960 94656   0   75952   1235568 12da70  x.1.1
1429504 147896  0   177440  1754840 1ac6d8  x.1.2
1699840 160984  0   188944  2049768 1f46e8     x.1.2.x
Text has grown substantially, as has data. At least some of this is due to new
annotations for the garbage collector.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Feb 19, 2014

Comment 15:

This issue was updated by revision 964f6d3.

Nothing reads the Plan 9 symbol table anymore.
The last holdout was 'go tool nm', but since being rewritten in Go
it uses the standard symbol table for the binary format
(ELF, Mach-O, PE) instead.
Removing the Plan 9 symbol table saves ~15% disk space
on most binaries.
Two supporting changes included in this CL:
debug/gosym: use Go 1.2 pclntab to synthesize func-only
symbol table when there is no Plan 9 symbol table
debug/elf, debug/macho, debug/pe: ignore final EOF from ReadAt
LGTM=r
R=r, bradfitz
CC=golang-codereviews
https://golang.org/cl/65740045

Loading

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Feb 19, 2014

Comment 16:

After revision 737767dd81fd, I see a 25% reduction:
Before:
-rwxr-xr-x  1 bradfitz  staff  23556028 Feb 18 20:47 bin/camlistored
After:
-rwxr-xr-x  1 bradfitz  staff  17727420 Feb 18 20:48 bin/camlistored

Loading

@robpike
Copy link
Contributor Author

@robpike robpike commented Feb 19, 2014

Comment 17:

For my test case before/after deleting the Plan 9 symbol table:
% ls -l ...
-rwxr-xr-x  1 r  staff  2474512 Feb 18 20:27 hello_1.2.x
-rwxr-xr-x  1 r  staff  2150928 Feb 18 22:28 hello_1.2.y
% size ...
__TEXT  __DATA  __OBJC  others  dec hex
1699840 160984  0   188944  2049768 1f46e8     hello_1.2.x
1376256 160984  0   188944  1726184 1a56e8    hello_1.2.x
% 
So deleting the Plan 9 symbol table pretty close to exactly compensates for the GC
information. We're back at Go 1.2 levels, still far too large but it's a start.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Feb 19, 2014

Comment 18:

This issue was updated by revision 2541cc8.

Every function now has a gcargs and gclocals symbol
holding associated garbage collection information.
Put them all in the same meta-symbol as the go.func data
and then drop individual entries from symbol table.
Removing gcargs and gclocals reduces the size of a
typical binary by 10%.
LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/65870044

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Feb 19, 2014

Comment 19:

This issue was updated by revision ae38b03.

For an ephemeral binary - one created, run, and then deleted -
there is no need to write dwarf debug information, since the
binary will not be used with gdb. In this case, instruct the linker
not to spend time and disk space generating the debug information
by passing the -w flag to the linker.
Omitting dwarf information reduces the size of most binaries by 25%.
We may be more aggressive about this in the future.
LGTM=bradfitz, r
R=r, bradfitz
CC=golang-codereviews
https://golang.org/cl/65890043

Loading

@robpike
Copy link
Contributor Author

@robpike robpike commented Feb 19, 2014

Comment 20:

After removing gcargs from the symbol table (stepping across CL 65870044)
% ls -l x.1.2.[yz]
-rwxr-xr-x  1 r  staff  2150928 Feb 18 22:28 hello_1.2.y
-rwxr-xr-x  1 r  staff  1932880 Feb 19 08:14 hello_1.2.z
% size x.1.2.[yz] 
__TEXT  __DATA  __OBJC  others  dec hex
1376256 160984  0   188944  1726184 1a56e8    hello_1.2.y
1376256 160984  0   110160  1647400 192328 hello_1.2.z
% 
It's now smaller than at 1.2 but still much bigger than 1.1, let alone 1.0.

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 4, 2014

Comment 21:

I would like to take a look at compressing pclntab. Is it compressible?

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 4, 2014

Comment 22:

The pclntab data should be reasonably compact, and note that fast access to the data is
important, since it is used for runtime.Callers and friends.  If you can find a
significant reduction in size that would be great, but small tweaks are probably not
desirable at this point.

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 4, 2014

Comment 23 by fuzxxl:

Is there any documentation for what the pclntab contains and for what constraints its
data structure must fullfill?

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 4, 2014

Comment 24:

Let's not use this issue as a discussion list.  Please ask questions on golang-dev. 
Thanks.

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 6, 2014

Comment 25 by allard.guy.m:

From the peanut gallery, AFAICT this breaks pprof interactive command 'list'.  At tip I
get, e.g.:
(pprof) list runner
Total: 6424 samples
objdump: syminit: Success
no filename found in main.runner<400c40>
Which works as expected with 1.2.1.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Apr 3, 2014

Comment 26:

Let's not use this issue as a discussion list.  Please ask questions on golang-dev. 
Thanks.
pprof not working is issue #7452.

Labels changed: added restrict-addissuecomment-commit.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Apr 3, 2014

Comment 27:

This is as fixed as it is going to be for Go 1.3.
Right now at tip + CL 80370045 on darwin/amd64, compiling this program:
package main
import "fmt"
func main() {
    fmt.Println("hello, world")
}
I get 1830352 bytes for the binary. Assuming this is the same case for which Rob's
numbers are reported, by this metric Go 1.3 will roll back more than half the size
increase caused by Go 1.2 (relative to Go 1.1). Will leave further improvement for Go
1.4.

Labels changed: added release-go1.4, removed release-go1.3.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Apr 3, 2014

Comment 28:

This issue was updated by revision a26c01a.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/80370045

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Sep 15, 2014

Comment 29:

Labels changed: added release-go1.5, removed release-go1.4.

Loading

@techtonik
Copy link

@techtonik techtonik commented Apr 19, 2021

@creker I don't argue. I just want to see the proof. The visualizer that @jart made helps me to understand how C code maps to binaries. If there was such instrument for inspecting Go blobs, that would help to understand the issue better.

Loading

@randall77
Copy link
Contributor

@randall77 randall77 commented Apr 19, 2021

I think it would be interesting to understand where the requirements @techtonik listed come from. I think I understand why most of them are there, but not all. For instance, I don't think I know why Glob is there.
Likely they are all there for good reasons, but it would be an interesting investigation to understand why for all of them. And there's the possibility that once we understand why they are there, we could figure out a change that would allow us to drop one.

Loading

@josharian
Copy link
Contributor

@josharian josharian commented Apr 19, 2021

Using Go 1.16.3, compiling func main { os.Stdout.Write([]byte("hello world\n")) }, I don't see Glob or Readlink. I'm checking with go tool nm.

For folks who are interested in investigating "why is x in my binary", try running go build -ldflags=-dumpdep. It's a lot of data, but it'll give all symbol dependencies as seen by the linker, which is the entity responsible for removing unnecessary symbols.

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 19, 2021

On tip on Linux I see os.Readlink coming in due to this global variable initialization in os/executable_procfs.go. I don't know why the linker doesn't discard these variables, which are not otherwise referenced.

var executablePath, executablePathErr = func() (string, error) {
	var procfn string
	switch runtime.GOOS {
	default:
		return "", errors.New("Executable not implemented for " + runtime.GOOS)
	case "linux", "android":
		procfn = "/proc/self/exe"
	case "netbsd":
		procfn = "/proc/curproc/exe"
	}
	return Readlink(procfn)
}()

Loading

@randall77
Copy link
Contributor

@randall77 randall77 commented Apr 19, 2021

Init functions are a problem. We have to run them in case there are observable side effects. So just importing os is enough to force os.Readlink to remain. (And additionally the global variables remain because the results need to go somewhere - that problem may be easier to fix.)

Probably there's a better way to organize that helper so it is only called if functions that need executablePath are linked.

Loading

@jart
Copy link

@jart jart commented Apr 20, 2021

How can you remove my concerns as off-topic and then ask for my assistance in fixing them? I understand that Go needs to embed a runtime. The C library I mentioned earlier has a runtime too. It also embeds an operating system and a bootloader into the binaries too. I've discovered it's possible to do this in a way that's 100x smaller than Go's Hello World example, which is very on-topic, considering that's the reason the creator of this language opened this thread. I would love to see the Go community benefit from my work, because it was never my intention to compete. However I can't help you do that if I'm not welcome here.

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 20, 2021

@jart I'm sorry that you feel unwelcome. I agree that some of the comments here have been unnecessarily critical. Everybody, please remember the Go Code of Conduct: be charitable, be respectful, be friendly and welcoming. Thanks. @jart, you are definitely welcome.

That said, I think we all agree that a C runtime can be much much smaller than the Go runtime. In a sense this is because the POSIX kernels that most of use have been designed to implement the C runtime directly. I don't think that looking at a C runtime will help us make the Go runtime smaller. I'm willing to be convinced otherwise.

Loading

@odeke-em
Copy link
Member

@odeke-em odeke-em commented Apr 20, 2021

Howdy @jart, thank you for raising this and your findings when I read them last year were astonishing and I asked friends and colleagues how Go could benefit from them, great work! My biggest apologies that your conversations got marked as off-topic, whoever marked them as that wasn't doing so with bad intentions, but they perhaps thought that the back and forth was heated -- I too have had my discussions marked as off-topic and it didn't feel great, and despite not necessarily being in the same position, I can certainly tell you that your contributions are highly welcome, the Go community welcomes you; and how about this: if you are interested, we can start a couple of changes for debugging what's going on; myself and my team can help out and my company can perhapsh look at an arrangement with you to help with reducing the binary sizes. I think your work would be a good testament to what can be done for linkers and paying attention to detail. If you are not interested, not a problem at all, still awesome that you published your findings and whenever we bump into each other on the Go project or others, it'll still be awesome to interact with you and have your contributions! Thank you.

Loading

@josharian
Copy link
Contributor

@josharian josharian commented Apr 20, 2021

Probably there's a better way to organize that helper so it is only called if functions that need executablePath are linked.

FTR, doing that the obvious if slightly questionable way (using sync.Once and stripping a trailing (deleted) if present) reduces the size of the binary by 5653 bytes, or 0.41%.

Loading

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Apr 20, 2021

@josharian I'm not following why that approach would be questionable -- seems completely reasonable to me.

Loading

@typeless
Copy link

@typeless typeless commented Apr 20, 2021

One low-hanging fruit is probably how the dead-code pass of the linker deals with a use case of reflect.
Currently, the Go linker will keep all exported symbols if it sees any reference to reflect.Method or reflect.MethodByName (IIUC).
Just ignoring this condition, I shrink a Go binary about 22%. (From 3211264 bytes to 2621440 bytes). And it still works.
In the case, I observed that a reflect.Value.MethodByName is used by text/template, which is in turn used by Cobra's inittask.
So, simply using Cobra for my simple CLI argument handling would make the linker lose all deadcode elimination abilities.

Loading

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Apr 20, 2021

@typeless I'm not sure how much can be done about keeping deadcode alive in those situations. I suspect, maybe the solution in those cases is to either:

  1. implement text/template alternative that doesn't use/allow MethodByName
  2. make text/template to cobra optional
  3. use something different from cobra

I suspect the reason your code still works is because the text/template doesn't hit the code-path that calls MethodByName on code that has been removed.

Loading

@typeless
Copy link

@typeless typeless commented Apr 20, 2021

@egonelbre
I think the default behavior of deadcode for reflect is a bit too conservative.
It doesn't seem insane to me to eliminate all statically unused methods by default and only to explicitly pin the methods that have to be kept. That's also how C does it with the compiler attributes.

Loading

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Apr 20, 2021

I agree that the behavior is very conservative. The issue is that there's no easy way of knowing, which methods need to be kept. If it were low-hanging fruit, it would have been already picked :).

It doesn't seem insane to me to eliminate all statically unused methods by default and only to explicitly pin the methods that have to be kept. That's also how C does it with the compiler attributes.

This change would break quite a lot of code. Any html/template that invokes a method could break. If it were built from the start that way, then sure, it could have been a better strategy.

Loading

@clausecker
Copy link

@clausecker clausecker commented Apr 20, 2021

I thought at first that restricting dead-code elimination to only those packages that are not imported by other packages that invoke reflect might work, but of course reflect can also be used through wrapper libraries or even function pointers getting method names from who knows where. It's really not easy.

Loading

@josharian
Copy link
Contributor

@josharian josharian commented Apr 20, 2021

@egonelbre

I'm not following why that approach would be questionable -- seems completely reasonable to me.

Seems fragile. Do all distros use the same string? Gets the wrong answer with a class of (absurdly named) executables.

I guess the current approach is also arguably fragile in that you could lose an init vs delete race.

Feel free to send a CL if you want. :) Note that you have to manually inline strings.TrimSuffix.

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 20, 2021

Change https://golang.org/cl/311790 mentions this issue: os: reduce binary size

Loading

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Apr 20, 2021

@josharian ah, gotcha... I didn't read the comment near executablePath. Yes, then it makes sense why it's fragile.

I'm not sure whether it'll get accepted or not, but I sent the CL anyways.

Loading

gopherbot pushed a commit that referenced this issue Apr 22, 2021
Currently Readlink gets linked into the binary even when Executable is
not needed.

This reduces a simple "os.Stdout.Write([]byte("hello"))" by ~10KiB.

Previously the executable path was read during init time, because
deleting the executable would make "Readlink" return "(deleted)" suffix.
There's probably a slight chance that the init time reading would return
it anyways.

Updates #6853

Change-Id: Ic76190c5b64d9320ceb489cd6a553108614653d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/311790
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: Tobias Klauser <tobias.klauser@gmail.com>
@SeanTolstoyevski
Copy link

@SeanTolstoyevski SeanTolstoyevski commented Jun 4, 2021

I've been following this issue for a long time.
But I don't know very well about compiler theory or assembly. I want to say some of my ideas and ask some questions.

Unlike the general community, I use Golang for systems programming (GUI, games).
I don't want the program deployed on these systems to retain too much information. For example I don't need traceback. This makes the code I distribute even more susceptible to reverse engineering. Ok, I'm not really obsessed with this, but I don't want everything to be there when the program crashes.
It seems that Golang holds a lot of information for traceback, even though no console screen has created.
When I examine the executables with various executable analysis tools I can clearly see the names of all the variables, some panic strings etc. No deassembler is needed for this. They really are there.

Maybe a compiler parameter could be added to remove traceback information.
This is not required for Golang products produced as a web service. But not everyone writes Golang for these jobs. We install programs that contain sensitive data on people's computers.

Loading

@josharian
Copy link
Contributor

@josharian josharian commented Jun 4, 2021

@SeanTolstoyevski you may find this interesting: https://commaok.xyz/post/no-line-numbers/

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 4, 2021

@SeanTolstoyevski We're reluctant to provide a mechanism to remove traceback information because it will break runtime.Callers, which will in return break key libraries like logging libraries. There's no good way to remove only the undesirable traceback information, and the set of people who really truly want no traceback information at all is small.

Loading

@clausecker
Copy link

@clausecker clausecker commented Jun 4, 2021

@SeanTolstoyevski We're reluctant to provide a mechanism to remove traceback information because it will break runtime.Callers, which will in return break key libraries like logging libraries. There's no good way to remove only the undesirable traceback information, and the set of people who really truly want no traceback information at all is small.

Wouldn't it be possible to just remove symbol names, file, and line numbers, keeping the traceback data otherwise intact? This way consumers of runtime.Callers would still somewhat work while all sensitive data is removed.

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 4, 2021

I'm sorry, I don't understand the suggestion. The traceback information is exactly symbol names, file names, and line numbers (see https://golang.org/pkg/runtime/#Frame). How can we remove those and leave the traceback data intact?

Loading

@clausecker
Copy link

@clausecker clausecker commented Jun 4, 2021

The Frame structure already has provisions for an unknown function name (i.e. set the function name to the empty string). Similarly, File and Line could be set to nil and 0 if traceback information is stripped. What would break if this is done? If discarding function names is already too much, what about just discarding file names and line numbers?

Perhaps there was some sort of miscommunication. I assumed what you considered impossible is completely removing traceback information in the sense that runtime.Callers would be unable to retrieve an array of function pointers.

Loading

@SeanTolstoyevski
Copy link

@SeanTolstoyevski SeanTolstoyevski commented Jun 4, 2021

Thanks everyone for the replies.

Data such as traceback, import path information take up enough size in executable files.
Since the title of our topic is large executable files, I thought that people can remove this information from the executable file according to their wishes.
In my opinion this should help for file size. And the scenarios I mentioned.

@ianlancetaylor In fact, there is no need to remove it completely. For example, for module paths consider something simple like .../module/modulefile.go.
This also prevents leaking the Golang project structure on one's computer.
Currently, there is a traceback structure like this. The whole path is there, and when you think about it for all go files it just feels natural to take up enough size:
C/base_path/sub_path/gopath/x/x/x/modulefile.go

Sorry for the insufficient english. I tried to explain this simply. I may have chosen the wrong words.

Loading

@seankhliao
Copy link
Contributor

@seankhliao seankhliao commented Jun 4, 2021

That is available as the -trimpath build flag

Loading

@mvdan
Copy link
Member

@mvdan mvdan commented Jun 4, 2021

I don't want the program deployed on these systems to retain too much information. For example I don't need traceback. This makes the code I distribute even more susceptible to reverse engineering.

You might find https://github.com/burrowers/garble useful, which attempts to remove some source code information when building. It's particularly aggressive with the -tiny flag. I personally don't think that kind of "extra stripping" needs to be supported by Go's toolchain directly, given how it's possible to do it externally, for the most part.

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 5, 2021

If we dropped the file/line/function information, then logging and profiling would break.

Loading

@clausecker
Copy link

@clausecker clausecker commented Nov 12, 2021

If we dropped the file/line/function information, then logging and profiling would break.

I think breaking profiling is acceptable if the user desires smaller binaries. As for logging, I'm not sure what a good trade off would be.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet