proposal: runtime: optionally allow callers to intern strings #5160

bradfitz · 2013-03-29T23:44:50Z

Duplicate strings can waste memory and cause lots of allocations, but may be cheaper at
allocation time.

Interning strings has a lookup cost, but can save memory and even CPU by reduced number
of GCs.

I'd like to have this choice, for when I know my strings are large and likely duplicates.

Manual interning in user code,

  var internedStrings struct{
     sync.RWMutex
     m map[string]string
  }

... leads to lock contention and a map that grows forever.

The runtime could have a less contentious map and could integrate with the garbage
collector, such that strings that are only referenced from the intern table to
themselves are still collected and removed from the internal map.

bradfitz · 2013-03-29T23:51:30Z

Comment 1:

And I'd like to be able to lookup the interned strings from a []byte or string.
Related: issue #3512

ianlancetaylor · 2013-03-30T00:10:44Z

Comment 2:

You could do this in user space if the runtime provided weak pointers: pointers that are
set to nil if the object to which they point is garbage collected.
(I don't see why the runtime package has any advantage in making access to the map less
contentious, but perhaps I am missing something.)

bradfitz · 2013-03-30T17:43:49Z

Comment 3:

Even if the runtime provided weak pointers, I'm still not exactly sure how I would
implement it.  What would my map key be?  I might also need the runtime to give me the
string's hash value, since doing my own hash function would probably be too slow.
The runtime could do a better job since it could use a less contentious hashmap
(non-iterable, pushing down locks further, aware of resizes, etc) without having to
offer that to the language.

dvyukov · 2013-03-31T18:44:36Z

Comment 4:

Agree with Brad, even if it will use weak pointers inside it's very difficult in
implement and optimize.
It may not necessary live in runtime, it can be in strings/bytes but use some runtime
helpers.
Btw wrt []byte, do you mean that there will be bold caution comment saying that you must
not mutate them?

bradfitz · 2013-03-31T18:58:23Z

Comment 5:

Answered []byte question here: https://golang.org/issue/3512?c=5

bradfitz · 2013-04-03T17:44:37Z

Comment 6:

Labels changed: added performance.

bradfitz · 2013-04-03T17:45:33Z

Comment 7:

Labels changed: removed optimization.

bradfitz · 2013-04-26T03:44:07Z

Comment 8:

Labels changed: added garbage.

rsc · 2013-11-27T18:49:07Z

Comment 9:

Labels changed: added go1.3maybe.

rsc · 2013-12-04T01:28:17Z

Comment 10:

Labels changed: added release-none, removed go1.3maybe.

rsc · 2013-12-04T01:49:19Z

Comment 11:

Labels changed: added repo-main.

josharian · 2015-02-23T10:27:20Z

@bradfitz I made you a thing: https://github.com/josharian/intern/. It's not exactly what you asked for, and it's a kinda horrible abuse of sync.Pool, and it relies on the gc-specific []byte map key optimization...but it's short and simple and is kinda somewhat like what you asked for. :)

I fiddled with a bunch of non-short, non-simple alternatives and didn't like any of them.

CAFxX · 2018-09-27T07:33:05Z

@josharian thanks for https://github.com/josharian/intern! in one of our internal apps that has to parse a lot of json (with a lot of repeated string values) and process it, we were able to slash memory usage by a significant amount (30% on the whole app, >75% just for the in-memory data coming from the parsed json) by using it.

I know I'm jumping the gun on this one, but +1 for having interning - ideally even as an option when unmarshaling serialized payloads, i.e. something along the lines of:

type Collection struct {
  Objects []Object `json:"objects"`
}

type Object struct {
  GUID string `json:"guid"`
  ParentGUID string `json:"parent_guid,intern"` // perform interning during unmarshaling
  // ...
}

(the rationale for doing interning during unmarshaling is this: consider the case of the Objects array containing many elements; if interning is done after unmarshaling we have to do a lot of allocations that could be avoided)

josharian · 2018-10-09T00:31:03Z

@CAFxX I'm pleasantly surprised that it ended up being useful.

I think prior to exposing any API, it’d be worth experimenting with doing this automatically in the runtime. Do it:

per-P
only for very small strings (to avoid pinning large chunks of memory for arbitrary lengths of time)
using a cheap hash (or go fully associative)
use something simple like random replacement for the cache

I can't predict offhand how that would turn out, but it seems a simple enough experiment to run.

CAFxX · 2018-10-09T04:03:44Z

I'm pleasantly surprised that it ended up being useful

Full disclosure: those JSON structures we parse, and that contain significant amounts of repeated strings, are kept in memory as a form of local LRU cache to avoid lookups on a remote API. So this is probably a very favorable scenario for interning. At the same time, it's a scenario that is common enough to probably warrant consideration.

I think prior to exposing any API, it’d be worth experimenting with doing this automatically in the runtime

Absolutely agree, as long as this won't significantly impact regular string ops. I'm not an expert in GC/runtime, but it would be neat (not for the PoC) if this form of automatic interning was done only for strings that are either likely to survive GC or likely to be already in the interning table. And ideally the interning table should not keep strings alive during GC (like you do in intern).

Random idea for future consideration if this ever gets implemented: maybe interning could be done in multiple steps:

first, opportunistically (small, per-P table with random replacement), during string creation
then, more aggressively, during concurrent GC for non-garbage strings

only for very small strings

Just as a data point, in our case most of the strings were GUIDs... so it would have helped if the upper limit for "very small" was at least 36 bytes.

gopherbot · 2018-10-13T02:24:58Z

Change https://golang.org/cl/141641 mentions this issue: runtime: intern some small strings

josharian · 2018-10-13T02:25:57Z

CL 141641 is a quick-and-dirty experiment with doing this in the runtime.

CAFxX · 2018-10-13T06:31:29Z

@josharian I actually also tried (and got a little bit farther: I also hooked up the GC in cleanpools and implemented interning in concatstrings; btw I went instead for a hash-based approach to allow bigger per-P tables) but tested it only on the go1 benchmarks and they were not bad but not too good either. I was planning to work on it more today. Maybe I can submit an early CL in the meantime for feedback.

josharian · 2018-10-13T13:24:20Z

Cool!

I recommend trying a handful of std packages benchmarks as well as go1; the go1 benchmarks can be narrow in scope. And of course definitely run the runtime string benchmarks. :)

josharian · 2018-10-14T02:43:52Z

@CAFxX see also discussion with @randall77 in
https://go-review.googlesource.com/c/go/+/141641.

CAFxX · 2018-10-15T02:51:40Z

I have pushed my current experiment on my fork (see master...CAFxX:intern). It is not pretty as it's meant to be an experiment; specifically, there's a lot of duplication that would need to be sorted out. I'm also not 100% sure the GC part is done correctly (specifically, is it correct to assume that we don't need locking in clearpools?).

What is implemented:

per-P interning tables with 1024 slots, hashing via memhash and per-P seeds, random replacement policy (specifically, replacement on hash collision)
intern all escaping strings shorter than 64 bytes created either from []byte, []rune or by concatenating strings
per-P tables are cleared and per-P seeds are updated at start of GC

The magic numbers 1024 and 64 are arbitrary, haven't really attempted any tuning.

Performance is a mixed bag; some benchmarks improve, other slow down. Allocation benchmarks strictly improve. I suspect some of the regressions could be avoided/mitigated either by refactoring some logic in the standard library (e.g. in strings.Builder), by changes in the compiler or by deferring interning to the GC mark phase. I will post the current benchmark results ASAP.

I read @randall77's comments and I can say that yes, we can try to start smaller (e.g. only do interning in encoding/json and then move from there). I still stand by what I suggested above, i.e. that if the runtime does the right thing with no knobs, it's better.

CAFxX · 2018-10-16T12:14:33Z

I managed to improve things a bit (removed code duplication and lowered the overhead for when interning is not supposed to happen) and now there are almost no regressions apart from 3 benchmarks that I still need to analyze in detail (they could very well be artifacts). These are the current results for the strings, strconv, unicode/utf8 and encoding/json benchmarks (comparing master to my branch; benchmarks run on my laptop): https://gist.github.com/CAFxX/5b057d40e9d61dab54ffd26f54181f0d

I plan to run some additional benchmarks simulating our production workloads to validate the results. If you have suggestions about additional benchmarks to run, please let me know. It would be advisable if also somebody else ran benchmarks on different hardware.

I pushed the latest version here: master...CAFxX:intern

Assuming we manage to contain all regressions, would such an approach (that has the benefit of exposing new APIs or modifying existing ones) be considered for inclusion?

josharian · 2018-12-14T00:21:50Z

Given the recent flurry of activity, and given that this is a significant change, and given that there are a variety of options (automatic interning, package runtime API, etc.), I've changed this to a proposal so that we can get some input from @golang/proposal-review.

rsc · 2018-12-19T19:08:52Z

@josharian What is the exact API proposal? I see a very long issue thread here and lots of links, but could you please post a note here with the API in the text of the note?

rsc · 2018-12-19T19:10:50Z

OK, it looks like https://github.com/josharian/intern/blob/master/intern.go is the API but I don't see what the proposed implementation is. (Probably not what's in that file.) Please help me.

josharian · 2018-12-26T22:23:31Z

@rsc there are several different paths we could take here. I'm not confident that I have clarity about optimal path(s) forward; I'm hoping to restart discussion.

One option is to automatically intern some strings. This has the benefit of no new API, but the downside of making program performance harder to predict. @CAFxX has experimented with this a bunch (above).

If we are going to do manual interning, this looks a lot like free lists and sync.Pool. If you have a way to handle string interning with the correct lifetime and control over concurrency, you should just do it there (much like with free lists). The harder case is where you don't have a good way to manage the string pool's lifetime or concurrency.

For the easy case, we might consider adding to package strings:

type Interner struct {
  // ...
}

func (i *Interner) Intern(s string) string
func (i *Interner) InternBytes(b []byte) string

The argument for adding API for the easy case is that, unlike with free lists, the implementation is non-obvious and non-portable (it requires specific compiler optimization #3512). (Maybe that means it belongs in package runtime as runtime.StringInterner, although that is uglier.)

The harder case needs to handle lifetime as well as concurrency. The analogy with sync.Pool suggests that package sync might a good home for it, with the API spelled the same as above in package strings. We document that interning is best effort, just like re-using objects from sync.Pool. When sync.Pools get flushed during GC, so do the set of interned strings (possibly using a similar mechanism to #22950). And the sync version provides concurrency-safety, perhaps just using a plain mutex. (Or maybe using #18802; this is another reason to make interning best-effort.)

Does this help, Russ?

DO NOT SUBMIT Updates golang#5160 This is the simplest thing I could think of to experiment with. It's incomplete: It doesn't cover all the ways to create strings. It's not optimized at all. I picked the magic numbers haphazardly. Nevertheless, it doesn't do too badly, when applied to the compiler: name old time/op new time/op delta Template 187ms ± 3% 184ms ± 2% -1.47% (p=0.000 n=97+95) Unicode 86.9ms ± 5% 87.3ms ± 4% ~ (p=0.065 n=99+94) GoTypes 658ms ± 2% 659ms ± 2% ~ (p=0.614 n=99+97) Compiler 2.94s ± 2% 2.94s ± 2% ~ (p=0.945 n=95+95) SSA 8.53s ± 1% 8.54s ± 1% ~ (p=0.276 n=97+98) Flate 125ms ± 3% 124ms ± 4% -0.78% (p=0.000 n=99+97) GoParser 149ms ± 4% 149ms ± 3% ~ (p=0.595 n=100+95) Reflect 410ms ± 3% 412ms ± 4% +0.48% (p=0.047 n=97+96) Tar 167ms ± 3% 166ms ± 5% ~ (p=0.078 n=99+98) XML 227ms ± 3% 227ms ± 4% ~ (p=0.723 n=96+95) [Geo mean] 388ms 387ms -0.17% name old user-time/op new user-time/op delta Template 223ms ± 3% 221ms ± 3% -0.78% (p=0.000 n=99+95) Unicode 109ms ± 8% 111ms ± 6% +1.36% (p=0.001 n=99+99) GoTypes 846ms ± 2% 848ms ± 2% ~ (p=0.092 n=97+97) Compiler 3.91s ± 2% 3.91s ± 2% ~ (p=0.666 n=97+96) SSA 12.1s ± 2% 12.1s ± 1% ~ (p=0.128 n=92+96) Flate 145ms ± 3% 144ms ± 4% ~ (p=0.157 n=93+99) GoParser 180ms ± 5% 181ms ± 5% +0.63% (p=0.004 n=90+94) Reflect 522ms ± 3% 524ms ± 4% ~ (p=0.055 n=95+96) Tar 203ms ± 5% 203ms ± 5% ~ (p=0.880 n=100+99) XML 280ms ± 4% 281ms ± 4% ~ (p=0.170 n=99+98) [Geo mean] 487ms 488ms +0.19% name old alloc/op new alloc/op delta Template 36.3MB ± 0% 36.2MB ± 0% -0.23% (p=0.008 n=5+5) Unicode 29.7MB ± 0% 29.7MB ± 0% -0.07% (p=0.008 n=5+5) GoTypes 126MB ± 0% 126MB ± 0% -0.27% (p=0.008 n=5+5) Compiler 537MB ± 0% 536MB ± 0% -0.21% (p=0.008 n=5+5) SSA 2.00GB ± 0% 2.00GB ± 0% -0.12% (p=0.008 n=5+5) Flate 24.6MB ± 0% 24.6MB ± 0% -0.28% (p=0.008 n=5+5) GoParser 29.4MB ± 0% 29.4MB ± 0% -0.20% (p=0.008 n=5+5) Reflect 87.3MB ± 0% 86.9MB ± 0% -0.52% (p=0.008 n=5+5) Tar 35.6MB ± 0% 35.5MB ± 0% -0.28% (p=0.008 n=5+5) XML 48.4MB ± 0% 48.4MB ± 0% -0.16% (p=0.008 n=5+5) [Geo mean] 83.3MB 83.1MB -0.23% name old allocs/op new allocs/op delta Template 352k ± 0% 347k ± 0% -1.16% (p=0.008 n=5+5) Unicode 341k ± 0% 339k ± 0% -0.76% (p=0.008 n=5+5) GoTypes 1.28M ± 0% 1.26M ± 0% -1.48% (p=0.008 n=5+5) Compiler 4.97M ± 0% 4.90M ± 0% -1.38% (p=0.008 n=5+5) SSA 15.6M ± 0% 15.3M ± 0% -2.11% (p=0.016 n=4+5) Flate 233k ± 0% 228k ± 0% -1.92% (p=0.008 n=5+5) GoParser 294k ± 0% 290k ± 0% -1.32% (p=0.008 n=5+5) Reflect 1.04M ± 0% 1.03M ± 0% -1.73% (p=0.008 n=5+5) Tar 343k ± 0% 337k ± 0% -1.62% (p=0.008 n=5+5) XML 432k ± 0% 426k ± 0% -1.28% (p=0.008 n=5+5) [Geo mean] 813k 801k -1.48% Change-Id: I4cd95bf4a74479b0e8a8d339d77e248d1467a6e0

rsc · 2019-01-09T18:56:03Z

Now that the compiler is better about not copying for string(b) in various cases, is there any reason this needs to be in the standard library? Is there any runtime access it really needs?

@bradfitz says he'd be OK with closing this.

josharian · 2019-01-11T20:27:50Z

If we are going to automatically intern strings in the runtime, then this obvious needs runtime access. But maybe that doesn't need a proposal?
The argument for adding std API for assistance interning strings is given explicitly in my comment above. See "The argument for adding API for the easy case is...". However, I can see that this is not particularly compelling, and a third party package would be fine. Maybe I'll write one.
The hard case above needs runtime support for the same reason sync.Pool does: the ability to respond to GC needs and not grow forever. (sync.Pool also benefits from per-P storage, which interning would as well, but that seems less critical.) Maybe we can address that independently, which would benefit everyone. I'll write up a new issue to discuss that, although I anticipate that there will be tricky design issues.

rsc · 2019-01-23T19:07:19Z

It still seems too specialized. You can do interning with a plain map[string]string today, but then you have to know when to drop the map. It sounds like this API contributes "you don't have to think about when to drop the map". But then the runtime does. That's a lot of work and it's unclear that we know how to do it well. We're not even managing sync.Pool that well. This is much more complicated, since it essentially depends on weak reference logic to decide whether a string is needed or not. I'm skeptical we should spend the time to get this right versus other work we could be doing..

Is it really worth the cost of a general solution here, versus people who have this need just using a map[string]string?

Brad suggests maybe there's some way to "just mark these intern-string pools last" in order to automatically evict strings from the pool once all the references are gone, but it's not obvious to me how to implement that with the distributed mark termination detection and lazy sweeping. It seems like possibly adding a lot of complexity that would otherwise go unused.

/cc @RLH @aclements

josharian · 2019-01-23T19:22:28Z

Re-reading this, in an ideal world, we'd be able to build something sophisticated and sync.Pool-like as a third party package. I tried this, using the finalizer hack from @CAFxX to detect GC cycles. The performance was an order of magnitude worse than my sync.Pool-based implementation, due to mutex overhead; we need #18802.

I'm going to close this proposal.

For the simple case, it is easy enough to build yourself or have as a third party package.

For the harder case, I think we should provide the tools for people to build this themselves, well: #18802 and #29696.

As for doing it automatically in the runtime, I guess we'll just decide on a CL by CL basis.

bradfitz added accepted Performance GarbageCollector labels Dec 4, 2013

rsc added this to the Unplanned milestone Apr 10, 2015

rsc removed release-none labels Apr 10, 2015

douglas-reid mentioned this issue Dec 22, 2016

Generic string interning package to help memory & perf. istio/old_mixer_repo#96

Merged

yulvil mentioned this issue Nov 13, 2017

Reduce the number of allocations to reduce garbage collection elliotchance/c2go#348

Open

CAFxX mentioned this issue Sep 28, 2018

Optionally intern strings during unmarshaling mailru/easyjson#191

Closed

josharian added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Nov 10, 2018

josharian modified the milestones: Unplanned, Go1.13 Nov 10, 2018

josharian changed the title ~~runtime: optionally allow callers to intern strings~~ proposal: runtime: optionally allow callers to intern strings Dec 14, 2018

josharian mentioned this issue Jan 11, 2019

proposal: runtime: add way for applications to respond to GC backpressure #29696

Open

josharian closed this as completed Jan 23, 2019

josharian mentioned this issue Jun 13, 2019

proposal: strings: a string factory to convert bytes to string #32593

Closed

mvdan mentioned this issue Jun 26, 2019

encoding/json: memoize strings during decode #32779

Open

golang locked and limited conversation to collaborators Jan 23, 2020

gopherbot added the FrozenDueToAge label Jan 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: runtime: optionally allow callers to intern strings #5160

proposal: runtime: optionally allow callers to intern strings #5160

bradfitz commented Mar 29, 2013

bradfitz commented Mar 29, 2013

ianlancetaylor commented Mar 30, 2013

bradfitz commented Mar 30, 2013

dvyukov commented Mar 31, 2013

bradfitz commented Mar 31, 2013

bradfitz commented Apr 3, 2013

bradfitz commented Apr 3, 2013

bradfitz commented Apr 26, 2013

rsc commented Nov 27, 2013

rsc commented Dec 4, 2013

rsc commented Dec 4, 2013

josharian commented Feb 23, 2015

CAFxX commented Sep 27, 2018 •

edited

Loading

josharian commented Oct 9, 2018

CAFxX commented Oct 9, 2018 •

edited

Loading

gopherbot commented Oct 13, 2018

josharian commented Oct 13, 2018

CAFxX commented Oct 13, 2018 •

edited

Loading

josharian commented Oct 13, 2018

josharian commented Oct 14, 2018

CAFxX commented Oct 15, 2018 •

edited

Loading

CAFxX commented Oct 16, 2018 •

edited

Loading

josharian commented Dec 14, 2018

rsc commented Dec 19, 2018

rsc commented Dec 19, 2018

josharian commented Dec 26, 2018

rsc commented Jan 9, 2019

josharian commented Jan 11, 2019

rsc commented Jan 23, 2019

josharian commented Jan 23, 2019

proposal: runtime: optionally allow callers to intern strings #5160

proposal: runtime: optionally allow callers to intern strings #5160

Comments

bradfitz commented Mar 29, 2013

bradfitz commented Mar 29, 2013

ianlancetaylor commented Mar 30, 2013

bradfitz commented Mar 30, 2013

dvyukov commented Mar 31, 2013

bradfitz commented Mar 31, 2013

bradfitz commented Apr 3, 2013

bradfitz commented Apr 3, 2013

bradfitz commented Apr 26, 2013

rsc commented Nov 27, 2013

rsc commented Dec 4, 2013

rsc commented Dec 4, 2013

josharian commented Feb 23, 2015

CAFxX commented Sep 27, 2018 • edited Loading

josharian commented Oct 9, 2018

CAFxX commented Oct 9, 2018 • edited Loading

gopherbot commented Oct 13, 2018

josharian commented Oct 13, 2018

CAFxX commented Oct 13, 2018 • edited Loading

josharian commented Oct 13, 2018

josharian commented Oct 14, 2018

CAFxX commented Oct 15, 2018 • edited Loading

CAFxX commented Oct 16, 2018 • edited Loading

josharian commented Dec 14, 2018

rsc commented Dec 19, 2018

rsc commented Dec 19, 2018

josharian commented Dec 26, 2018

rsc commented Jan 9, 2019

josharian commented Jan 11, 2019

rsc commented Jan 23, 2019

josharian commented Jan 23, 2019

CAFxX commented Sep 27, 2018 •

edited

Loading

CAFxX commented Oct 9, 2018 •

edited

Loading

CAFxX commented Oct 13, 2018 •

edited

Loading

CAFxX commented Oct 15, 2018 •

edited

Loading

CAFxX commented Oct 16, 2018 •

edited

Loading