New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: optionally allow callers to intern strings #5160

Open
bradfitz opened this Issue Mar 29, 2013 · 23 comments

Comments

Projects
None yet
7 participants
@bradfitz
Member

bradfitz commented Mar 29, 2013

Duplicate strings can waste memory and cause lots of allocations, but may be cheaper at
allocation time.

Interning strings has a lookup cost, but can save memory and even CPU by reduced number
of GCs.

I'd like to have this choice, for when I know my strings are large and likely duplicates.

Manual interning in user code,

  var internedStrings struct{
     sync.RWMutex
     m map[string]string
  }

... leads to lock contention and a map that grows forever.

The runtime could have a less contentious map and could integrate with the garbage
collector, such that strings that are only referenced from the intern table to
themselves are still collected and removed from the internal map.
@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 29, 2013

Comment 1:

And I'd like to be able to lookup the interned strings from a []byte or string.
Related: issue #3512
@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Mar 30, 2013

Comment 2:

You could do this in user space if the runtime provided weak pointers: pointers that are
set to nil if the object to which they point is garbage collected.
(I don't see why the runtime package has any advantage in making access to the map less
contentious, but perhaps I am missing something.)
@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 30, 2013

Comment 3:

Even if the runtime provided weak pointers, I'm still not exactly sure how I would
implement it.  What would my map key be?  I might also need the runtime to give me the
string's hash value, since doing my own hash function would probably be too slow.
The runtime could do a better job since it could use a less contentious hashmap
(non-iterable, pushing down locks further, aware of resizes, etc) without having to
offer that to the language.
@dvyukov

This comment has been minimized.

Member

dvyukov commented Mar 31, 2013

Comment 4:

Agree with Brad, even if it will use weak pointers inside it's very difficult in
implement and optimize.
It may not necessary live in runtime, it can be in strings/bytes but use some runtime
helpers.
Btw wrt []byte, do you mean that there will be bold caution comment saying that you must
not mutate them?
@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 31, 2013

Comment 5:

Answered []byte question here: https://golang.org/issue/3512?c=5
@bradfitz

This comment has been minimized.

Member

bradfitz commented Apr 3, 2013

Comment 6:

Labels changed: added performance.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Apr 3, 2013

Comment 7:

Labels changed: removed optimization.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Apr 26, 2013

Comment 8:

Labels changed: added garbage.

@rsc

This comment has been minimized.

Contributor

rsc commented Nov 27, 2013

Comment 9:

Labels changed: added go1.3maybe.

@rsc

This comment has been minimized.

Contributor

rsc commented Dec 4, 2013

Comment 10:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Contributor

rsc commented Dec 4, 2013

Comment 11:

Labels changed: added repo-main.

@josharian

This comment has been minimized.

Contributor

josharian commented Feb 23, 2015

@bradfitz I made you a thing: https://github.com/josharian/intern/. It's not exactly what you asked for, and it's a kinda horrible abuse of sync.Pool, and it relies on the gc-specific []byte map key optimization...but it's short and simple and is kinda somewhat like what you asked for. :)

I fiddled with a bunch of non-short, non-simple alternatives and didn't like any of them.

@CAFxX

This comment has been minimized.

Contributor

CAFxX commented Sep 27, 2018

@josharian thanks for https://github.com/josharian/intern! in one of our internal apps that has to parse a lot of json (with a lot of repeated string values) and process it, we were able to slash memory usage by a significant amount (30% on the whole app, >75% just for the in-memory data coming from the parsed json) by using it.

I know I'm jumping the gun on this one, but +1 for having interning - ideally even as an option when unmarshaling serialized payloads, i.e. something along the lines of:

type Collection struct {
  Objects []Object `json:"objects"`
}

type Object struct {
  GUID string `json:"guid"`
  ParentGUID string `json:"parent_guid,intern"` // perform interning during unmarshaling
  // ...
}

(the rationale for doing interning during unmarshaling is this: consider the case of the Objects array containing many elements; if interning is done after unmarshaling we have to do a lot of allocations that could be avoided)

@josharian

This comment has been minimized.

Contributor

josharian commented Oct 9, 2018

@CAFxX I'm pleasantly surprised that it ended up being useful.

I think prior to exposing any API, it’d be worth experimenting with doing this automatically in the runtime. Do it:

  • per-P
  • only for very small strings (to avoid pinning large chunks of memory for arbitrary lengths of time)
  • using a cheap hash (or go fully associative)
  • use something simple like random replacement for the cache

I can't predict offhand how that would turn out, but it seems a simple enough experiment to run.

@CAFxX

This comment has been minimized.

Contributor

CAFxX commented Oct 9, 2018

I'm pleasantly surprised that it ended up being useful

Full disclosure: those JSON structures we parse, and that contain significant amounts of repeated strings, are kept in memory as a form of local LRU cache to avoid lookups on a remote API. So this is probably a very favorable scenario for interning. At the same time, it's a scenario that is common enough to probably warrant consideration.

I think prior to exposing any API, it’d be worth experimenting with doing this automatically in the runtime

Absolutely agree, as long as this won't significantly impact regular string ops. I'm not an expert in GC/runtime, but it would be neat (not for the PoC) if this form of automatic interning was done only for strings that are either likely to survive GC or likely to be already in the interning table. And ideally the interning table should not keep strings alive during GC (like you do in intern).

Random idea for future consideration if this ever gets implemented: maybe interning could be done in multiple steps:

  • first, opportunistically (small, per-P table with random replacement), during string creation
  • then, more aggressively, during concurrent GC for non-garbage strings

only for very small strings

Just as a data point, in our case most of the strings were GUIDs... so it would have helped if the upper limit for "very small" was at least 36 bytes.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 13, 2018

Change https://golang.org/cl/141641 mentions this issue: runtime: intern some small strings

@josharian

This comment has been minimized.

Contributor

josharian commented Oct 13, 2018

CL 141641 is a quick-and-dirty experiment with doing this in the runtime.

@CAFxX

This comment has been minimized.

Contributor

CAFxX commented Oct 13, 2018

@josharian I actually also tried (and got a little bit farther: I also hooked up the GC in cleanpools and implemented interning in concatstrings; btw I went instead for a hash-based approach to allow bigger per-P tables) but tested it only on the go1 benchmarks and they were not bad but not too good either. I was planning to work on it more today. Maybe I can submit an early CL in the meantime for feedback.

@josharian

This comment has been minimized.

Contributor

josharian commented Oct 13, 2018

Cool!

I recommend trying a handful of std packages benchmarks as well as go1; the go1 benchmarks can be narrow in scope. And of course definitely run the runtime string benchmarks. :)

@josharian

This comment has been minimized.

Contributor

josharian commented Oct 14, 2018

@CAFxX

This comment has been minimized.

Contributor

CAFxX commented Oct 15, 2018

I have pushed my current experiment on my fork (see master...CAFxX:intern). It is not pretty as it's meant to be an experiment; specifically, there's a lot of duplication that would need to be sorted out. I'm also not 100% sure the GC part is done correctly (specifically, is it correct to assume that we don't need locking in clearpools?).

What is implemented:

  • per-P interning tables with 1024 slots, hashing via memhash and per-P seeds, random replacement policy (specifically, replacement on hash collision)
  • intern all escaping strings shorter than 64 bytes created either from []byte, []rune or by concatenating strings
  • per-P tables are cleared and per-P seeds are updated at start of GC

The magic numbers 1024 and 64 are arbitrary, haven't really attempted any tuning.

Performance is a mixed bag; some benchmarks improve, other slow down. Allocation benchmarks strictly improve. I suspect some of the regressions could be avoided/mitigated either by refactoring some logic in the standard library (e.g. in strings.Builder), by changes in the compiler or by deferring interning to the GC mark phase. I will post the current benchmark results ASAP.

I read @randall77's comments and I can say that yes, we can try to start smaller (e.g. only do interning in encoding/json and then move from there). I still stand by what I suggested above, i.e. that if the runtime does the right thing with no knobs, it's better.

@CAFxX

This comment has been minimized.

Contributor

CAFxX commented Oct 16, 2018

I managed to improve things a bit (removed code duplication and lowered the overhead for when interning is not supposed to happen) and now there are almost no regressions apart from 3 benchmarks that I still need to analyze in detail (they could very well be artifacts). These are the current results for the strings, strconv, unicode/utf8 and encoding/json benchmarks (comparing master to my branch; benchmarks run on my laptop): https://gist.github.com/CAFxX/5b057d40e9d61dab54ffd26f54181f0d

I plan to run some additional benchmarks simulating our production workloads to validate the results. If you have suggestions about additional benchmarks to run, please let me know. It would be advisable if also somebody else ran benchmarks on different hardware.

I pushed the latest version here: master...CAFxX:intern

Assuming we manage to contain all regressions, would such an approach (that has the benefit of exposing new APIs or modifying existing ones) be considered for inclusion?

@josharian josharian modified the milestones: Unplanned, Go1.13 Nov 10, 2018

@josharian josharian changed the title from runtime: optionally allow callers to intern strings to proposal: runtime: optionally allow callers to intern strings Dec 14, 2018

@josharian

This comment has been minimized.

Contributor

josharian commented Dec 14, 2018

Given the recent flurry of activity, and given that this is a significant change, and given that there are a variety of options (automatic interning, package runtime API, etc.), I've changed this to a proposal so that we can get some input from @golang/proposal-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment