New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: use less memory for large []byte literal #6643

Open
rsc opened this Issue Oct 23, 2013 · 6 comments

Comments

Projects
None yet
8 participants
@rsc
Contributor

rsc commented Oct 23, 2013

[]byte literals take up a lot of memory inside the compiler, because each byte in the
literal is a separate syntax Node and, worse, each byte is represented by a
multiprecision integer constant.

Probably a trick is required during parsing to turn []byte{...} into an actual byte
array holding the constant values + a list of index and value for the non-constant data.
@rsc

This comment has been minimized.

Contributor

rsc commented Dec 4, 2013

Comment 1:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Contributor

rsc commented Dec 4, 2013

Comment 2:

Labels changed: added repo-main.

@odeke-em

This comment has been minimized.

Member

odeke-em commented Oct 29, 2017

@mdempsky might you be interested in this?

@Kingwl

This comment has been minimized.

Kingwl commented Aug 27, 2018

i'd like (try) to start my first pr on this,
maybe need some help :)

@josharian

This comment has been minimized.

Contributor

josharian commented Aug 27, 2018

Help is always welcome. The first step would be to reproduce the issue and convince ourselves that it is worth fixing. This issue was originally filed in 2013, and a lot has changed in the compiler since then. A good way to demonstrate this would be to write some realistic code (presumably autogenerated, maybe from go-bindata, cc @kevinburke) demonstrating that the byte slices are in fact still a major memory factor. And if it turns out that they aren't, that's also really useful to know. Thanks!

@pacew

This comment has been minimized.

pacew commented Nov 5, 2018

Hi. I'm new here, and not ready to dig into the compiler sources yet, but I can report on some data I collected relevant to the question of "what is the current situation for compiling big bytes literals?".

The results in short: On Ubuntu 18.04 and go version 1.11.2 linux/amd64, bytes literals of up to 128k elements work with no appreciable extra compiler memory needed, as do string literals to at least 2 million elements. Larger bytes literals require about 865 extra bytes of compiler memory for every additional element. So, a 2 million element literal needs 1.8 gigabytes of compiler memory.

Here's a graph: https://github.com/pacew/goissue6643/blob/master/goissue6643.png

Perhaps this will give the compiler stewards the information needed to decide whether to pursue making bytes literals as efficient as strings.

My approach is to use the setrlimit(2) system call to squeeze down the data segment size for the compiler until it fails for a test file with a given size literal, then plotting the results for "string" and "bytes" literals.

It appears the compiler requires a data segment size of 200 megabytes to do anything, and this stays pretty much constant for string literals up to at least 2 megabytes.

Using bytes literals with up to 128k elements doesn't cause the compiler data requirement to change. At 256k elements, the compiler requires 306 megabytes, then the growth is linear with about 865 bytes of compiler data needed for each additional element of the literal. I tested as far as 2 million element literal needing 1.8 gigabytes of compiler ram.

The repository https://github.com/pacew/goissue6643 contains the helper programs I wrote (mainly a C program that does a binary search on ulimits), along with a graph of the results.

@randall77 randall77 modified the milestones: Go1.12, Go1.13 Nov 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment