Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: static init maps should never generate code #2559

Open
rsc opened this issue Dec 12, 2011 · 46 comments

Comments

@rsc
Copy link
Contributor

commented Dec 12, 2011

so that unreferenced maps can be dropped by linker
@rsc

This comment has been minimized.

Copy link
Contributor Author

commented Jan 8, 2012

Comment 1:

Labels changed: added priority-go1, removed priority-medium.

@robpike

This comment has been minimized.

Copy link
Contributor

commented Jan 13, 2012

Comment 2:

Owner changed to builder@golang.org.

@rsc

This comment has been minimized.

Copy link
Contributor Author

commented Feb 11, 2012

Comment 4:

This is too subtle a change for Go 1 at this point.  It will have to wait.

Labels changed: added priority-later, removed priority-go1.

@rsc

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2012

Comment 5:

Labels changed: added go1.1maybe.

@rsc

This comment has been minimized.

Copy link
Contributor Author

commented Mar 12, 2013

Comment 6:

[The time for maybe has passed.]

Labels changed: removed go1.1maybe.

@rsc

This comment has been minimized.

Copy link
Contributor Author

commented Jul 30, 2013

Comment 7:

Labels changed: added go1.2maybe.

@rsc

This comment has been minimized.

Copy link
Contributor Author

commented Jul 30, 2013

Comment 8:

Labels changed: added feature.

@robpike

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2013

Comment 9:

Labels changed: added go1.3maybe, removed go1.2maybe.

@robpike

This comment has been minimized.

Copy link
Contributor

commented Aug 20, 2013

Comment 10:

Labels changed: removed go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor Author

commented Nov 27, 2013

Comment 11:

Labels changed: added go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor Author

commented Nov 27, 2013

Comment 12:

Labels changed: removed feature.

@rsc

This comment has been minimized.

Copy link
Contributor Author

commented Dec 4, 2013

Comment 13:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor Author

commented Dec 4, 2013

Comment 14:

Labels changed: added repo-main.

@remyoudompheng

This comment has been minimized.

Copy link
Contributor

commented Jan 10, 2014

Comment 15:

Is this actually possible ? The current binaries can use a different hash function
depending on the runtime system (AESNI instructions or not), and they also randomize
hash seed. In 2011 it was not the case.

@rsc rsc added accepted labels Jan 10, 2014

@rsc rsc added this to the Unplanned milestone Apr 10, 2015

@rsc rsc changed the title cmd/gc: static init maps should never generate code cmd/compile: static init maps should never generate code Jun 8, 2015

@gopherbot

This comment has been minimized.

Copy link

commented Jul 31, 2018

Change https://golang.org/cl/127075 mentions this issue: html: lazily populate Unescape tables

gopherbot pushed a commit that referenced this issue Jul 31, 2018
html: lazily populate Unescape tables
Saves ~105KB of heap for callers who don't use html.UnescapeString.
(EscapeString is much more common).

Also saves 70KB of binary size, because now the linker can do dead
code elimination. (because #2559 is still open and global maps always
generate init code)

Fixes #26727
Updates #6853

Change-Id: I18fe9a273097e2c7e0cb7f88205cae1bb60fa89b
Reviewed-on: https://go-review.googlesource.com/127075
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@aclements

This comment has been minimized.

Copy link
Member

commented Aug 1, 2018

This may in fact not be possible because the hash function and even the hash seed can be selected at runtime. However, even if we're generating init code for maps, the linker could still understand that that init code exists solely to populate that map, so if the map is unreferenced, the init code could be eliminated, too.

@aclements

This comment has been minimized.

Copy link
Member

commented Sep 10, 2018

However, even if we're generating init code for maps, the linker could still understand that that init code exists solely to populate that map, so if the map is unreferenced, the init code could be eliminated, too.

Going back to this idea, @thanm and I were discussing a similar issue today and came up with a possible clean approach. One way of viewing the problem is that we have a chain of relocations from package to init function to the global map variable, which means we can't eliminate the global map variable even if this is the only reference to it.

We could twist this reference graph: make the global map variable have a no-op relocation to its init function and make the relocation from the package to the init function weak. This way, if the global variable is referenced, it will make its init function reachable, which would in turn resolve the weak reference to the init function so it gets called.

To my knowledge, we don't have a notion of weak relocations in the linker right now, but this is at least a well-defined and general-purpose mechanism we could add. Then the linker doesn't need to know anything special about maps or init functions; it just becomes another relocation to resolve.

@aclements

This comment has been minimized.

Copy link
Member

commented Nov 25, 2018

However, even if we're generating init code for maps, the linker could still understand that that init code exists solely to populate that map, so if the map is unreferenced, the init code could be eliminated, too.

I had another thought on this approach. If we were to solve #14840 by detecting pure functions (which has a lot of other benefits) and marking variables that are initialized by calling a pure function as dead-codable, then we could eliminate initializers of unused maps by simply rewriting these map initializations as

var globalMap = compilerGeneratedFunction()

Even if we don't automatically detect pure functions, we could simply mark the generated function as pure (there are other uses for well-known pure functions, e.g., #22971) and do the deadcode elimination from there.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Nov 26, 2018

@aclements, that's basically what I said in #14840 (comment) (which you linked to), right?

@aclements

This comment has been minimized.

Copy link
Member

commented Nov 26, 2018

@bradfitz, it's almost the same, but with a subtle but important difference. The way I read your suggestion was to use the current construction and detect that init is side-effect-free:

var globalMap map[...]...
func init() {
  globalMap = ...
}

But init is not side-effect-free. In order to eliminate init, we'd have to detect that init's only side-effect is initializing globalMap and that nothing could have read or assigned to globalMap before init ran. If init is written by the compiler, then it can know that a priori, which is perhaps fine, but it's not about pure functions at that point.

OTOH, if the construction is

var globalMap = createGlobalMap()
func createGlobalMap() map[...]... {
  return ...
}

Then createGlobalMap is genuinely side-effect-free, which makes reasoning about dead-code elimination much easier.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Nov 26, 2018

Yeah, I was assuming there would be some new association between an init and what it initialized to make the linker's job easier.

Because even in your code example,

var globalMap = createGlobalMap()
func createGlobalMap() map[...]... {
  return ...
}

Isn't that really:

var globalMap map[K]V
func init() {
    globalMap = createGlobalMap()
}
func createGlobalMap() map[K]V {
  return ...
}

... and that init.15 or whatever I assume would need some metadata associated with it in the object file for the linker to have an dependency edge with globalMap, no?

In any case, exciting.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jan 11, 2019

@randall77 @aclements @griesemer @ianlancetaylor, I think we should prioritize this for Go 1.13. Our binary sizes have only grown in the past 7 years. It might be time.

@mvdan

This comment has been minimized.

Copy link
Member

commented Mar 9, 2019

I also think this could help with #29382. Lots of packages use static global maps, which I think is reasonable. However, this results in cmd/go's init cost roughly being 8% runtime.mapassign_faststr, 4% runtime.makeBucketArray, 2% runtime.mapassign_fast32, and 3% runtime.makemap, among others.

I'm not sure if the consensus is to still generate code for these global maps, but even avoiding calling the code in some unused cases would help noticeably, I think.

@aclements

This comment has been minimized.

Copy link
Member

commented Mar 19, 2019

Was chatting with @rsc about this. There are several technical reasons why it's hard to statically generate a fully-functioning runtime map. Above we talked a bit about statically laying out just read-only maps, but that's limited by statically proving a map is read-only. Russ and I came up with a variation of this that's more practical: statically lay out maps for read-only use, but dynamically promote them to regular runtime maps if they are written to. The static layout could be the runtime's map structure (with a fixed hash and seed), but it would be better if it were something really simple. On the first write, the runtime would acquire a lock, dump the map's contents into the runtime's regular map structure, and point the map to the newly allocated map.

This could be combined with or done as follow-up to weak init functions.

@seebs

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

clearly, you need to make maps a valid form of constant

const predeclaredTypes = map[...] {...}

then you have an unambiguous compiler hint that the map must be computable at compile time, and can be created in advance, have no init code, and not be modifiable.

(i am not yet sure how much i'm joking and how much that's an actual lucid idea. someone implement it, and if it works out, then i was totally serious, and if it's unusable, i was joking.)

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

@seebs That is #6386.

@seebs

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

That looks well-written enough that I must have been serious because it sounds like a good idea now. :)

@josharian

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

The solution sketched by @aclements is nice.

it would be better if it were something really simple

Agreed. The question is how. This gets at something i have long wanted (and @bradfitz too, I believe), which is the ability to have multiple underlying map implementations, and switch between them. This is hard to accomplish without either adding new indirections or growing the size of a map structure.

Also worth noting is that all uses of maps end up paying the cost of the “is this the first write to a readonly map” check. Probably not a dealbreaker, but maps have been optimized to the point that individual checks like this are noticeable.

@aclements

This comment has been minimized.

Copy link
Member

commented Mar 19, 2019

Also worth noting is that all uses of maps end up paying the cost of the “is this the first write to a readonly map” check. Probably not a dealbreaker, but maps have been optimized to the point that individual checks like this are noticeable.

In this particular case, i think we could put a bit in h.flags. I think all of the operations that matter for this already always check h.flags (e.g., for hashWriting). We could make that check for other flags at the same time at no extra cost, and then check more specific flags.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

I have a working toy prototype of using multiple underlying map implementations. I now also have a decent sense for where the challenges are and what the options are. I will write something up (new issue? golang-dev?) soonish.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Mar 31, 2019

Apologies for the delay in sharing my prototype and notes. It has taken more cleanup than anticipated. I'm still working on it. If folks like the direction, I might be game to take on doing an initial readonly-promote-to-readwrite map implementation, but I want to start with discussing the prototype.

@bcmills bcmills added the Performance label Apr 1, 2019

@gopherbot

This comment has been minimized.

Copy link

commented Apr 1, 2019

Change https://golang.org/cl/170317 mentions this issue: net/textproto: simplify commonHeader initialization

@bcmills

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

we talked a bit about statically laying out just read-only maps, but that's limited by statically proving a map is read-only.

Statically-initialized maps are necessarily stored in package variables, and in my experience most package variables of map types are unexported and trivially non-aliasing (such as the one in CL 170317).

It seems like it would actually be pretty straightforward to prove that such maps are read-only, at least with the (IMO reasonable) assumption that they are not exposed and mutated via a //go:linkname directive.

@aclements

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

@bcmills, given that we would need two map implementations anyway to have statically initialized maps, is there a disadvantage to dynamically upgrading a read-only map to a read/write map on first write?

(I think the canonical example of exported, static maps is the unicode package, though you're right that most are unexported, at least in std.)

@bcmills

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

is there a disadvantage to dynamically upgrading a read-only map to a read/write map on first write?

The branch for the dynamic mode-check is either hard to predict, or needs to consume resources in the branch predictor specific to each map (or map access site), which could slow down map accesses throughout the program.

(I suppose we could only even check the branch for maps that may be in read-only mode, but if we do enough analysis to figure that out, then it seems like we've done nearly enough of the work to eliminate the branch entirely.)

@gopherbot

This comment has been minimized.

Copy link

commented Apr 1, 2019

Change https://golang.org/cl/170280 mentions this issue: runtime: add toy specialized map implementation

@bcmills

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

There are several technical reasons why it's hard to statically generate a fully-functioning runtime map.

Could you give a little more detail about those reasons? Adding a whole separate map implementation seems pretty gnarly too.

The difficulties I would expect to encounter are:

  • linker relocations within the maps
  • runtime hash-salting

Am I missing some others?

@randall77

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2019

Runtime hash salting is the only one I can think of. The compiler already understands most of the rest of the map layout.

Note that the per-map salt is easy to fix. It's the per-process salt that's the challenge.

@aclements

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

Also we select the whole hash function based on what the hardware supports at runtime (e.g., AES or not). We'd have to use the lowest-common-denominator hash function and record that that's what's being used for a given map. Then we're also stuck with that until, say, we grow the map.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 6, 2019

@randall77 @aclements any feedback about the approach sketched in https://golang.org/cl/170280? Particularly the treatment of maps and map iterators as union data types. (Sorry, it is a lot of words and code.) I’m considering starting on readonly maps following that CL as a very rough map.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Apr 6, 2019

I looked through that CL.
I'm worried about adding so much complexity. There's going to be an awful lot of new code that's very tricky to get right, and test. Iterators, reflection, converting between representations, etc.

I'm much more inclined to detect that a map is readonly at compile time, and somehow just special case the hash function so that we can build the map at compile time. That seems a much smaller change to get the same effect.

Of course, your change is much more general, and might make small runtime-built maps faster also. In addition, it does solve the exported-but-never-modified map problem, which we can't do at compile time. But we also have to switch on the map type for every call, and I think most of the cases in the stdlib at least are unexported global maps (TODO: anyone have actual data on this?).

In my mind, the decision comes down to how much Josh's general map setup can buy us beyond just improving startup times. If it's just startup times, I don't think it is worth it - there's a much simpler way to achieve that. If there's more to be had, and we can demonstrate where that improvement might come from, then I'm still open to the idea.

josharian added a commit to josharian/go that referenced this issue May 7, 2019
runtime: add toy specialized map implementation
DO NOT SUBMIT

[the new implementation is a toy]

This change is an experiment to learn what is involved in adding additional
underlying map implementations.

(The existing implementation has some specialized routines selected
at compile-time for performance, but there is only one underlying
data structure.)

There are a few alternative map implementations that might be useful. Examples:

* a plain slice, for small maps with cheap equality checks
* a sorted slice or trie, for small, readonly maps with cheap comparisons
* a hash-based readonly map, for large readonly maps
* a map implementation that uses open addressing, for maps with small keys and values, such as a map[int]struct{}

The choice of implementation may occur at compile-time, run-time, or both.

It would also be nice to be able to make the map structure itself smaller;
many of these alternative implementations require less.

This change sketches what it looks like to go from one map implementation
to more-than-one. If we add a new map implementations, I would expect that
we will add a few more slowly over time, so I have attempted to keep this
change relatively general.

---

Based on the experience of working on this change, the sticking points
for multiple map implementations are as follows.


Because in general we can't know what map implementation we will
encounter, and because we need the flexibility to migrate between them
at runtime, all map implementations must be the same size
and look the same to the GC.

There are three options for accomplishing this:

* Add an indirection. This has the usual costs and benefits.
* Have the map structure hold all possible fields side by side.
  This is helpful when implementing things like promotion and iteration,
  but makes the map structure unacceptably large.
* Make the map structure a union type, as I have done here.
  There is extra bookkeeping and danger, but it offers flexibility
  without unnecessary indirection.

If we end up specializing all small map types, we might be able to
adopt a mixture of the union type and indirection approaches:
Make the union structure big enough to hold all small map implementations,
and use some of the fields to hold an indirection for larger map implementations
(much as hmap has a mapextra field now used only in some cases).
For the larger maps, the extra allocation and indirection may not be
detectable costs, thus allowing us to have the best of both worlds.



Reflection provides yet another entry point to the map routines,
which means that any attempt to specialize map implementations
entirely at compilation time will fail. Or to be more precise,
it triples the surface area, since you must teach the compiler,
reflect, and reflectlite to dispatch correctly.



Ever the child's crayon scribbled on the critical part of the map,
iteration was the trickiest part of this change to get right.
The problem is what happens when you want to change map implementations
during iteration. The solution adopted here is to completely snapshot
the map for iteration. This isn't a great solution, and it is only
even halfway feasible because smallmaps are small.

Solutions here will depend intimately on the details of the map structure.
Fortunately, the new map implementations mooted above mostly admit of reasonable
approaches. Still, this is almost certainly the number one area where new
map implementations will founder.



Promoting from one map type to another is expensive: It involves
copying all the data in the map. You thus don't want to promote
back and forth repeatedly. This change finessing this issue by
only promoting in one direction. Something similar works for readonly maps.



Dispatch code and indirection can be expensive.

Though this map implementation is entirely unoptimized,
with a focus purely on correctness and exploration,
a quick benchmark run confirms ones intuitions:

name                  old time/op    new time/op    delta
MapPopulate/1-8         76.4ns ± 0%    77.4ns ± 1%    +1.33%  (p=0.000 n=15+14)
MapPopulate/10-8         463ns ± 1%     699ns ± 1%   +50.92%  (p=0.000 n=15+14)
MapPopulate/100-8       6.25µs ± 1%    6.51µs ± 0%    +4.19%  (p=0.000 n=15+15)
MapPopulate/1000-8      73.8µs ± 1%    74.3µs ± 0%    +0.68%  (p=0.000 n=15+12)
MapPopulate/10000-8      649µs ± 0%     650µs ± 0%      ~     (p=0.252 n=15+14)
MapPopulate/100000-8    6.95ms ± 1%    6.94ms ± 1%      ~     (p=0.943 n=14+13)

name                  old alloc/op   new alloc/op   delta
MapPopulate/1-8           144B ± 0%      120B ± 0%   -16.67%  (p=0.000 n=15+15)
MapPopulate/10-8          323B ± 0%      443B ± 0%   +37.15%  (p=0.000 n=15+15)
MapPopulate/100-8       3.52kB ± 0%    3.64kB ± 0%    +3.40%  (p=0.000 n=15+15)
MapPopulate/1000-8      53.5kB ± 0%    53.6kB ± 0%    +0.22%  (p=0.000 n=15+15)
MapPopulate/10000-8      428kB ± 0%     428kB ± 0%    +0.03%  (p=0.000 n=15+15)
MapPopulate/100000-8    3.62MB ± 0%    3.62MB ± 0%      ~     (p=0.497 n=15+14)

name                  old allocs/op  new allocs/op  delta
MapPopulate/1-8           2.00 ± 0%      3.00 ± 0%   +50.00%  (p=0.000 n=15+15)
MapPopulate/10-8          3.00 ± 0%      6.00 ± 0%  +100.00%  (p=0.000 n=15+15)
MapPopulate/100-8         19.0 ± 0%      22.0 ± 0%   +15.79%  (p=0.000 n=15+15)
MapPopulate/1000-8        75.0 ± 0%      77.7 ± 1%    +3.56%  (p=0.000 n=12+15)
MapPopulate/10000-8        321 ± 0%       325 ± 0%    +0.95%  (p=0.000 n=15+15)
MapPopulate/100000-8     4.00k ± 0%     4.00k ± 0%    +0.10%  (p=0.010 n=15+14)

For small maps, a specialized small map implementation is already cheaper
in alloc/op, could easily be made as cheap in allocs/op, and could almost
certainly be map cheaped in CPU time.

Map size 10 is the worst case scenario: We put all the data in a small map,
then copy it all into a regular map.

By the time we get to much larger sizes, the cost differential shrinks again.


Updates golang#2559

Change-Id: I542f6bf606e70e9f08f25e6a44da25f710611689
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.