Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: memoize strings during decode #32779

Open
rsc opened this issue Jun 25, 2019 · 50 comments · May be fixed by #33416
Open

encoding/json: memoize strings during decode #32779

rsc opened this issue Jun 25, 2019 · 50 comments · May be fixed by #33416

Comments

@rsc
Copy link
Contributor

@rsc rsc commented Jun 25, 2019

Part of the motivation for #32593 is the observation in json-iterator/go#376 that when you json decode, many of the same strings appear over and over in the JSON and get copied and reconstructed in the result over and over as well.

We do not want to make the JSON coalesce all the allocated strings into one giant buffer, because then holding on to just one of them holds onto the entire thing.

But it might make sense to add to the decoder state a simple map[[]byte]string and remember which specific byte slices we've already converted to string and reuse those allocated strings instead of doing the same conversions again in a particular Decode or Unmarshal operation.

It's unclear to me whether the map in a Decoder should persist between Decode operations. Probably? That will affect users who put Decoders in a pool or something like that, though. (Solution: don't put decoders in pools.)

@bserdar
Copy link

@bserdar bserdar commented Jun 25, 2019

A while ago I wrote some code to test this exact same thing with the json decoder, but could not detect any significant difference between the memory usage of the decoder with memoized strings and the regular json decoder, which made me think maybe this is already done somewhere? Apparently not. I can try to find out my test code...

@bserdar
Copy link

@bserdar bserdar commented Jun 25, 2019

Here's a link to my test code that does this for decoding json into interface{}:

https://github.com/bserdar/jsdecodertest

My initial comment was wrong, it does make a difference to store strings in a map. My test code stores only field keys in a map because those are the strings that are likely to repeat.

The json/ package is a copy of the stdlib encoding/json package modified to include a map[string]string in the decoder state.

@martisch
Copy link
Contributor

@martisch martisch commented Jun 26, 2019

Nit: I assume the proposal meant to use a map[string]string with map[string([]byte)] lookup as map[[]byte]string is not a supported map type.

@mvdan
Copy link
Member

@mvdan mvdan commented Jun 26, 2019

Also cc @josharian, who had a very similar idea a while ago. That was briefly discussed in #5160.

As a thought, we probably don't want to reuse all strings. For example, in a map the keys probably tend to repeat much more often than the values.

Also, do we want to make this configurable? Someone who writes a program likely knows whether they'll read somewhat static/repeated data or not.

bserdar added a commit to bserdar/go that referenced this issue Aug 1, 2019
The existing decoder creates a new string for every key.
With this change, decoder keeps keys in a map[string]string
and uses a single copy for repeated keys.

Fixes golang#32779
@bserdar bserdar linked a pull request that will close this issue Aug 1, 2019
@gopherbot
Copy link

@gopherbot gopherbot commented Aug 1, 2019

Change https://golang.org/cl/188500 mentions this issue: encoding/json: memoize keys during decode

@rsc
Copy link
Contributor Author

@rsc rsc commented Aug 6, 2019

@mvdan, are you concerned about the overhead of the map lookup in your Jun 26 comment? From the outside there can be no possible downside to reusing all strings that are constructed.

bserdar added a commit to bserdar/go that referenced this issue Aug 15, 2019
The existing decoder creates a new string for every key.
With this change, decoder keeps keys in a map[string]string
and uses a single copy for repeated keys.

Fixes golang#32779
@bserdar
Copy link

@bserdar bserdar commented Aug 27, 2019

The CL mentioned above has an implementation of this for JSON decoder. It introduces some overhead when decoding to an interface{}, because that's the only case where the keys read during decoding will be kept after decoding is done.

However, I think there is a better way to do this. What I propose is to write an encoding/LiteralStore type (a map[string]string) containing a Get(string) string method to retrieve the shared instance of the string. Then this can be used from the decoders in encoding/xml and encoding/json. To use the LiteralStore, the caller has to assign an instance to the JSON/XML decoder. This way, the caller has the option to use a literal store for large input where it makes a difference, use the store for multiple docs, or avoid it completely.

What do you think?

@bcmills
Copy link
Member

@bcmills bcmills commented Aug 28, 2019

This is one of the conventional use-cases for weak references in GC'd languages.

Ideally, we should reuse strings that remain reachable from the last Decode invocation, and let the collector clean up any strings that have since become completely unreachable. That way, the frequently-used strings will remain in the map, but the infrequent strings can still be collected (so that the map doesn't grow without bound if the same Decoder is used on many different inputs over time).

@bcmills
Copy link
Member

@bcmills bcmills commented Aug 28, 2019

Alternatives that do not require weak references include:

  • Make the interning behavior an explicit opt-in using a Decoder method (perhaps func (*Decoder) InternStrings(func([]byte)bool)).
  • Make the interning behavior an explicit annotation on the struct field tag.
    • But this gets awkward to describe the behavior for map keys vs. map values.
@bserdar
Copy link

@bserdar bserdar commented Aug 28, 2019

@bcmills, I propose to make the interning behavior explicit opt-in using a Decoder field instead of a method. After reading #32593, I believe one way to make this acceptable in terms of performance and usage is this:

  • Add a strings/LiteralStore type (a map[string]string), with an Add(string) string method that will return a shared copy for string
  • Add Decoder.KeyLiterals field of type LiteralStore to both json and xml decoders.
  • Use the KeyLiterals if it is non-nil during decoding to store keys

This implementation does not change the existing behavior and makes the use of a literal store explicit opt-in.

A literal store like this is only useful for larger json/xml input, and only if it is manually decoded, or unmarshaled into an interface{}. If the input is unmarshaled/decoded into a struct, then there is no need to keep the keys as they will be lost.

Re: weak references, afaik there is no way to implement this using finalizers, am I wrong?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 28, 2019

@bserdar I don't think there is any way to do this with finalizers because existing users will be using string values, not pointers, so there is nothing to hang a finalizer on. For that matter weak references wouldn't work either: you would have to change all the users to use the weak reference type rather than string.

In general finalizers and weak references are equivalent in power, so anything you can do with one you can do with the other. But you have to be able to control the types that people will use.

@bserdar
Copy link

@bserdar bserdar commented Aug 28, 2019

@ianlancetaylor thanks for the clarification.

If there is interest in this, I can modify https://golang.org/cl/188500 to implement this. I already have some wrapper code around json/xml decoder to do this because reusing just the key strings makes a lot of difference for large docs.

cc @mvdan

@bcmills
Copy link
Member

@bcmills bcmills commented Aug 29, 2019

For that matter weak references wouldn't work either: you would have to change all the users to use the weak reference type rather than string.

No? We would need some sort of language-supported “weak string” type, but then only the json.Decoder would hold a weak reference — the decoded messages would use a regular string so that the data doesn't disappear out from beneath them.

(But that assumes a language change that seems pretty unlikely to ever happen.)

@bcmills
Copy link
Member

@bcmills bcmills commented Aug 29, 2019

In general finalizers and weak references are equivalent in power, so anything you can do with one you can do with the other. But you have to be able to control the types that people will use.

Ah, I think I see what you're saying. If we could control the types in use, then we could add a pointer indirection, and require users to keep the pointer alive in order to retain the string value in the decoder's cache.

But then you can't use those pointers in the same way as ordinary strings: they can't be map keys, they won't work with the normal built-in string operations, and they can't be passed to the strings package without falling back to manual memory management (via runtime.KeepAlive).

It may be true that they are technically equivalent in power, but they are not equivalently usable.

@bcmills
Copy link
Member

@bcmills bcmills commented Aug 29, 2019

@bserdar, strings.LiteralStore seems unfortunate given that we're also discussing adding generics to the language, because there is a much more general “interning table” API that can be used with any type that supports hashing and equality.

The general form could have an API like:

package intern

type Table(type T HashEqualer) […]

func (t Table(T)) Get(x T) (canonicalX T) {
	[…]
}

Such an API is also useful for building things like abstract syntax trees, where it can be useful to compress trees by coalescing occurrences of a common subexpression to the same canonical instance.

(A long time ago I maintained a server that evaluated large numbers of boolean expressions for ad targeting, and it used a similar approach — to great effect — to compress the targeting expressions, including in our wire protocol.)

@bserdar
Copy link

@bserdar bserdar commented Aug 29, 2019

@bcmills, interning table would be nice, but it won't be available any time soon. Until that becomes available, we can keep interning internal to the Decoder, but expose an InternKeys(bool) method for explicit opt-in. Or, implement the strings.LiteralStore, and change it to use the intern package when generics are done.

bserdar added a commit to bserdar/go that referenced this issue Sep 7, 2019
The existing decoder creates a new string for every key.
For large JSON input where keys are repeated, we can avoid
this by interning keys. This change adds a Decoder.InternKeys
function that causes the decoder to store keys in a
map[string]string and uses a single copy for repeated keys.
The default behavior is not to intern keys.

This change only affects Decoder when decoding to an interface{}
or, when tokenizing the input. It does not affect Unmarshal, or
decoding to a struct.

Fixes golang#32779
bserdar added a commit to bserdar/go that referenced this issue Sep 11, 2019
The existing decoder creates a new string for every key.
For large JSON input where keys are repeated, we can avoid
this by interning keys. This change adds a Decoder.InternKeys
function that causes the decoder to store keys in a
map[string]string and uses a single copy for repeated keys.
The default behavior is not to intern keys.

This change only affects Decoder when decoding to an interface{}
or, when tokenizing the input. It does not affect Unmarshal, or
decoding to a struct.

Fixes golang#32779
@bserdar
Copy link

@bserdar bserdar commented Oct 2, 2019

xml.Decoder can benefit from interning strings as well. What is the opinion about that? Decoder.InternKeys() would be the explicit opt-in for interning keys during decode. Something similar to this: https://go-review.googlesource.com/c/go/+/188500

@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@rsc
Copy link
Contributor Author

@rsc rsc commented Apr 1, 2020

The proposal is to only memoize in a single json.Decoder, not globally.
If you want to avoid the memoization, you could use multiple json.Decoders.
Note that each call to json.Unmarshal uses its own Decoder, so all those would be independent.
(If you decoded an object list into a []map[string]int, then all the map keys in one decoding could be shared, but not across decodings.)

Another possibility is to do the memoization by default but allow a hook in the Decoder (a method SetSaveString(func([]byte) string) or something like that) that would let you override the policy to have a global pool, or a pool only for small strings.

But it seems like we if there's a win-win here then we should take that win by default.

Does anyone object to marking this accepted assuming that there is a win-win?
Obviously if the benchmarks say there isn't a win then we wouldn't take the CL.

@rsc
Copy link
Contributor Author

@rsc rsc commented Apr 1, 2020

I think this still seems like a likely accept, but I'm leaving it for another week to make sure I'm reading the thread right.

@mvdan
Copy link
Member

@mvdan mvdan commented Apr 5, 2020

If you want to avoid the memoization, you could use multiple json.Decoders.

My point is precisely that a program keeping a long-lived json.Decoder isn't doing anything wrong, but could silently gain a "memory leak" of sorts when upgrading its Go version. That's what I tried to explain in my earlier comment. I guess the changelog could tell them to redo their code to not reuse a decoder forever, but that seems a bit weird.

Also, even though this proposal is marked as a likely accept, I'm still not clear on this detail which I think is pretty important:

It's unclear to me if the current proposal is to memoize all strings, or just keys when decoding into map[string]T.

I don't object to the proposal being accepted per se, but I don't support it either, given that for now it's just a "maybe, we'll see how the CL turns out" :)

@bserdar
Copy link

@bserdar bserdar commented Apr 5, 2020

Would adding an API to json.Decoder to disable (or enable) interning help? There's already a CL (though quite old) with optional interning.

@CAFxX
Copy link
Contributor

@CAFxX CAFxX commented Apr 8, 2020

Just as a data point, I actually included an experience report of when we did this via josharian/intern in #5160 (comment) (at the time it was just about reducing memory usage and not memory allocations, since allocations happened during unmarshaling - i.e. before we could dedup the strings)

In the same comment I actually proposed as well to make it configurable on a per-field basis using the json field tag, so +1 from me for making it configurable:

type Collection struct {
  Objects []Object `json:"objects"`
}

type Object struct {
  GUID string `json:"guid"`
  ParentGUID string `json:"parent_guid,intern"` // perform interning during unmarshaling
  // ...
}

FWIW I'm implementing proper support for this - with the syntax above - in mailru/easyjson#279.

@rsc
Copy link
Contributor Author

@rsc rsc commented Apr 8, 2020

No change in consensus here, so accepting.

Let's start with the simple per-Decoder interning on by default and go from there.
But let's plan to land this at the start of a cycle, so not for Go 1.15.
That will give time to soak and maybe discover reasons it's a bad idea.

@rsc rsc moved this from Likely Accept to Accepted in Proposals Apr 8, 2020
@bserdar
Copy link

@bserdar bserdar commented Apr 8, 2020

I can rebase this old CL unless there are other plans: https://go-review.googlesource.com/c/go/+/188500

@rsc rsc modified the milestones: Proposal, Backlog Apr 8, 2020
@rsc rsc changed the title proposal: encoding/json: memoize strings during decode? encoding/json: memoize strings during decode Apr 8, 2020
@mvdan
Copy link
Member

@mvdan mvdan commented Apr 9, 2020

@rsc I think we should answer the two questions I raised again in #32779 (comment) before getting into a CL. To repeat my current understanding:

  1. Would this only affect map keys, or all strings? This is unclear, and IMO should be decided before we jump to an implementation.

  2. The change, as proposed right now, would silently introduce memory leaks in programs that currently keep a Decoder alive for a long time. You say:

If you want to avoid the memoization, you could use multiple json.Decoders.

In my opinion, this is a subtle and dangerous breaking change. Even if we strongly marked this change in behavior in the release notes, discouraging the use of global or long-lived decoders, it still seems like a change that could bring pain to our users since it's easy not to notice a leak until a service runs out of memory a week later.

@frioux
Copy link

@frioux frioux commented Apr 10, 2020

Maybe this is a dumb question but, would the strings being memoized be from the types, or strings that just flew by? If it's the types themselves I would be comfortable running this in prod, but if it's strings that happened to be passed from a user (like the values in a map or in a list or "unknown" keys to a struct) this sounds like a possible DoS.

I don't know how you could keep an encoder around for a long time in a web context other than maybe web sockets???

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 10, 2020

@mvdan, what if we added some sort of decay behavior or LRU caching, so that infrequently used-strings would expire instead of being memoized forever?

That would at least prevent unbounded memory leaks for infrequent keys, while still providing most of the memory reduction of memoization.

@CAFxX
Copy link
Contributor

@CAFxX CAFxX commented Apr 12, 2020

@mvdan, what if we added some sort of decay behavior or LRU caching, so that infrequently used-strings would expire instead of being memoized forever?

That would at least prevent unbounded memory leaks for infrequent keys, while still providing most of the memory reduction of memoization.

Agreed, this is similar in spirit to what you get with josharian/intern, and it worked well in practice for us. Worth noting that we were interning only the field values, not the field keys (we didn't have maps in the decoded struct, but we had quite some string fields whose values were likely to repeat).

Would this only affect map keys, or all strings?

Given what I wrote above, I think we should do the latter. If the worry is keeping too many strings alive it should be easy enough to cap the size of the interning map (even just random replacement would likely be enough, and trivial to implement; as the current proposal is to have an interning map per Decoder, it will likely not make much of a difference to implement a better eviction policy since multiple Decoders will allocate the same strings anyway).

@mvdan
Copy link
Member

@mvdan mvdan commented Apr 20, 2020

Some sort of LRU, or mechanism to limit the amount of memory we hold onto at once, seems like an OK tradeoff to me. I guess we can work this out on the CL itself, but it seemed clearly backwards incompatible to me to make this entirely unbounded by design.

@dsnet
Copy link
Member

@dsnet dsnet commented Nov 21, 2020

I've been experimenting with several possible implementations of this.

Here are some of my thoughts:

  • As a string gets longer, the probability of a cache hit drops and the cache lookup cost grows. This makes sense since longer strings have an exponentially larger number of possible representations and also take longer to compare or hash. Thus, we should only do caching for small strings (e.g., <= 16B).
  • The Go memory allocator is already fairly fast in allocating small strings. For example, a 16B string takes 35ns on my machine. Granted, this is the up-front CPU cost, and doesn't include the asynchronous costs associated with GC heap scanning and memory freeing. Any cache lookup cost should be comparable to or faster than 35ns.
  • For most realistic JSON data, strings in JSON object names have a high degree of locality. That is, seeing a specific object name has a high probability of seeing it again within the next few dozen object names. This is especially true for a JSON array containing homogenous JSON objects, but is also for tree-like representations of JSON objects where each node has a similar set of fields. Thus, a small cache can be highly beneficial for JSON object names.
  • For most realistic JSON data, identical strings in general JSON values exhibit poor locality. Repeated strings in data tend to occur either for enum-like values (e.g., "SUCCESS") or what are functionally "pointers" to create a graph out of JSON data (e.g., a ID reference to some other part of the JSON data). In order to detect repeated usages, we would need a relatively large cache. Also since most of strings will be a cache miss, there is the wasted cost of looking up a string in the common case where there is no benefit.
  • A naive map[string]string is not a good data structure for a string cache since its 1) hard to prevent unbounded growth 2) has double memory use (one for the key and one for the value; even though they hold the same value), and 3) holds a reference to a string value that may no longer be used by the application.

Here's a simple data structure that I tried that does seem sufficiently effective to use by default:

type stringCache [64]string

func (c *stringCache) make(b []byte) string {
	if len(b) < 2 || len(b) > 16 {
		return string(b)
	}
	h := hashBytes(b) % len(*c)
	if s := (*c)[h]; s == string(b) {
		return s
	}
	s := string(b)
	(*c)[h] = s
	return s
}

It's a small, fixed-size cache. Thus it avoids any concerns about unbounded memory growth or overtly wasting too much space holding onto old, unused strings. At worst, the memory usage for this is len(stringCache)*(unsafe.Sizeof(string) + maxStringSize) where maxStringSize is the maximum string length we try caching (16B in the above implementation). The effectiveness of this is highly dependent on the speed of hashBytes. The size of stringCache should be determined based on how often we expect to see the same strings repeat themselves and according to the birthday problem.

Using runtime.memhash as the implementation (see #42710) of hashBytes, I get the following numbers:

BenchmarkMalloc/FrequentNames    	48031	 25150 ns/op	 9600 B/op	1000 allocs/op
BenchmarkCache/FrequentNames     	66622	 18019 ns/op	   96 B/op	  10 allocs/op // 0.7x slower than Malloc
BenchmarkMap/FrequentNames       	40252	 31818 ns/op	  678 B/op	  11 allocs/op // 1.3x slower than Malloc

BenchmarkMalloc/PathalogicalNames	76096	 15348 ns/op	 8000 B/op	 500 allocs/op
BenchmarkCache/PathalogicalNames 	48894	 24289 ns/op	 8000 B/op	 500 allocs/op // 1.6x slower than Malloc
BenchmarkMap/PathalogicalNames   	10000	123864 ns/op	91250 B/op	 524 allocs/op // 8.1x slower than Malloc

This tests three different approaches:

  • Malloc is using a naive []byte to string conversion.
  • Cache is using stringCache.
  • Map is using a map[string]string as the cache.

This tests two extreme sets of data:

  • FrequentNames represents the best-case scenario where the same 10 strings occur over repeatedly.
  • Pathological represents the worst-case scenario where every string is exactly 16B long, but all different, resulting in a 100% cache miss. This is pathological for several reasons: 1) every entry in the cache will be populated, 2) the string in every entry is also 16B, so we must do a byte-for-byte comparison, 3) only the last few bytes of the string differ, 4) every entry results in a miss. Thus, for every string we do O(n) work to hash the string, and O(n) work to compare the string (where n is the string length).

Some notes:

  • A map[string]string rarely out-performs malloc and seems to be a net loss.
  • Assuming that most data look more like FrequentNames than PathologicalNames, the use of stringCache is probably a net win.
  • At least the pathological case for stringCache is only 1.3x slower (surprisingly 8x slower for map[string]string).
@josharian
Copy link
Contributor

@josharian josharian commented Nov 21, 2020

@dsnet nice analysis. One more thing to try: use a fixed size array of strings, iterate instead of hashing for lookups, and do random replacement. Something like josharian@ab074c9. (There are many variations on this theme available.)

@randall77
Copy link
Contributor

@randall77 randall77 commented Nov 21, 2020

Move-to-front is also a good caching scheme when there are a few very common keys.

@bserdar
Copy link

@bserdar bserdar commented Nov 23, 2020

@dsnet
Copy link
Member

@dsnet dsnet commented Nov 23, 2020

@josharian and @randall77. I tried a simple linear search in some of my earlier approaches, and it does decent where there is a hit. However, I felt like it performed poorly in the cases where there were never any matches since the miss cost was proportional to the number of maximum elements that we linear search over.

One advantage of a linear search (with move-to-front) is that it provides a heuristic for which entry to evict. This avoids the collision problem that my naive hashing approach exhibits (due to the birthday problem). In order to have a decent hit rate on a list of strings with only N unique entries, the cache size needs to be approximately 2*N.

@randall77
Copy link
Contributor

@randall77 randall77 commented Nov 23, 2020

We could consider any the frequent item algorithms described here: http://dimacs.rutgers.edu/~graham/pubs/papers/freq.pdf

It might be worth dividing the set of keys up in to buckets, hashing to the bucket, and performing one of the above algorithms on a single bucket.

@josharian
Copy link
Contributor

@josharian josharian commented Nov 23, 2020

...with the cheapest possible “hash” being string length, since we are only considering lengths 2..16.

@dsnet
Copy link
Member

@dsnet dsnet commented Nov 30, 2020

@randall77 and @josharian. Thanks for the idea. I haven't yet applied any of the specific algorithms in the paper you referenced, but I suspect they won't perform well in this specific application since allocation of small strings is already so dang fast that any book-keeping is often just as expensive as malloc itself (defeating the point of string interning).

I incorporated some of the ideas above and some new ideas:

  • Using the data provided by @bserdar, I computed a histogram of the string lengths of all unique object names:
 1 |> 0
 2 |=> 1
 3 |======> 6
 4 |==========> 10
 5 |===========> 11
 6 |====================> 20
 7 |==============> 14
 8 |================> 16
 9 |===============> 15
10 |=========> 9
11 |========> 8
12 |========> 8
13 |=====> 5
14 |=======> 7
15 |=======> 7
16 |===> 3
  • Combining the above data with the idea to bucketize based on the length, it seems that 8 buckets using the lowest 3 bits of the length might provide an okay uniform distribution across all buckets (i.e., we bucketize 8B and 16B together, 1B and 9B together, 2B and 10B together, etc). This produces a histogram as follows:
 1, 9 |===============> 15
 2,10 |==========> 10
 3,11 |==============> 14
 4,12 |==================> 18
 5,13 |================> 16
 6,14 |===========================> 27
 7,15 |=====================> 21
 8,16 |===================> 19

Bucketizing based on length has the advantage of isolating the effects of a given string length from affecting another bucket. However, it has the disadvantage that certain lengths may be under-utilized (i.e, the 2B,10B bucket for the specific data I'm playing with), leading to wasted space.

  • I also observed that most general-purpose hashing algorithms perform slowly for small strings since 1) they are often not inlineable, 2) may not benefit from any wide-integer instructions (at least from my observation). Thus, I tried performing a hash of a short string by relying on wide-integer loads and hashing the resulting integers.

The current algorithm I came up with is:

type stringCache [256]string

func (c *stringCache) make(b []byte) string {
	if len(b) < 2 || len(b) > 16 {
		return string(b)
	}

	var h uint64
	{
		// Read the string in two parts using wide-integer loads.
		// The prefix and suffix may overlap, which is fine.
		switch {
		case len(b) >= 8:
			h ^= uint64(binary.LittleEndian.Uint64(b[:8]))
			h *= 0x00000100000001B3 // inspired by FNV-64
			h ^= uint64(binary.LittleEndian.Uint64(b[len(b)-8:]))
		case len(b) >= 4:
			h ^= uint64(binary.LittleEndian.Uint32(b[:4]))
			h *= 0x01000193 // inspired by FNV-32
			h ^= uint64(binary.LittleEndian.Uint32(b[len(b)-4:]))
		default:
			h ^= uint64(binary.LittleEndian.Uint16(b[:2]))
			h *= 0x010f // inspired by hypothetical FNV-16
			h ^= uint64(binary.LittleEndian.Uint16(b[len(b)-2:]))
		}

		// Collapse a 64-bit, 32-bit, or 16-bit hash into an 8-bit hash.
		h ^= h >> 32
		h ^= h >> 16
		h ^= h >> 8

		// The cache has 8 buckets based on the lower 3-bits of the length.
		h = ((h << 3) | uint64(len(b)&0b111)) % uint64(len(*c))
	}

	if s := (*c)[h]; s == string(b) {
		return s
	}
	s := string(b)
	(*c)[h] = s
	return s
}

Relative to my earlier algorithm, this is decently faster and results is fewer misses:

BenchmarkMalloc/Best	41395	26997 ns/op	9600 B/op	1000 allocs/op
BenchmarkCache1/Best	66477	18359 ns/op	  96 B/op	  10 allocs/op // 1.47x faster
BenchmarkCache2/Best	88276	13699 ns/op	  96 B/op	  10 allocs/op // 1.97x faster
BenchmarkMap/Best   	39099	30543 ns/op	 678 B/op	  11 allocs/op // 0.88x faster

BenchmarkMalloc/Real	2503	505818 ns/op	161456 B/op	18529 allocs/op
BenchmarkCache1/Real	2769	440636 ns/op	 28960 B/op	 2475 allocs/op	// 1.15x faster
BenchmarkCache2/Real	4138	302386 ns/op	 17712 B/op	 1139 allocs/op	// 1.67x faster
BenchmarkMap/Real   	1668	741258 ns/op	 31706 B/op	  467 allocs/op	// 0.68x faster

BenchmarkMalloc/Worst	72957	 16689 ns/op	 8000 B/op	500 allocs/op
BenchmarkCache1/Worst	46144	 26646 ns/op	 8000 B/op	500 allocs/op // 1.60x slower
BenchmarkCache2/Worst	49250	 23897 ns/op	 8000 B/op	500 allocs/op // 1.43x slower
BenchmarkMap/Worst   	10000	135081 ns/op	91233 B/op	524 allocs/op // 8.09x slower

where Cache2 is the algorithm in this post, and Cache1 is the earlier algorithm. The Real benchmark was switched to using object names from the dataset provided by @bserdar.

@gopherbot
Copy link

@gopherbot gopherbot commented Dec 12, 2020

Change https://golang.org/cl/277376 mentions this issue: WIP: runtime: add string interning profile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Linked pull requests

Successfully merging a pull request may close this issue.

You can’t perform that action at this time.