Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/json: memoize strings during decode? #32779

Open
rsc opened this issue Jun 25, 2019 · 4 comments

Comments

Projects
None yet
5 participants
@rsc
Copy link
Contributor

commented Jun 25, 2019

Part of the motivation for #32593 is the observation in json-iterator/go#376 that when you json decode, many of the same strings appear over and over in the JSON and get copied and reconstructed in the result over and over as well.

We do not want to make the JSON coalesce all the allocated strings into one giant buffer, because then holding on to just one of them holds onto the entire thing.

But it might make sense to add to the decoder state a simple map[[]byte]string and remember which specific byte slices we've already converted to string and reuse those allocated strings instead of doing the same conversions again in a particular Decode or Unmarshal operation.

It's unclear to me whether the map in a Decoder should persist between Decode operations. Probably? That will affect users who put Decoders in a pool or something like that, though. (Solution: don't put decoders in pools.)

@bserdar

This comment has been minimized.

Copy link

commented Jun 25, 2019

A while ago I wrote some code to test this exact same thing with the json decoder, but could not detect any significant difference between the memory usage of the decoder with memoized strings and the regular json decoder, which made me think maybe this is already done somewhere? Apparently not. I can try to find out my test code...

@bserdar

This comment has been minimized.

Copy link

commented Jun 25, 2019

Here's a link to my test code that does this for decoding json into interface{}:

https://github.com/bserdar/jsdecodertest

My initial comment was wrong, it does make a difference to store strings in a map. My test code stores only field keys in a map because those are the strings that are likely to repeat.

The json/ package is a copy of the stdlib encoding/json package modified to include a map[string]string in the decoder state.

@martisch

This comment has been minimized.

Copy link
Member

commented Jun 26, 2019

Nit: I assume the proposal meant to use a map[string]string with map[string([]byte)] lookup as map[[]byte]string is not a supported map type.

@mvdan

This comment has been minimized.

Copy link
Member

commented Jun 26, 2019

Also cc @josharian, who had a very similar idea a while ago. That was briefly discussed in #5160.

As a thought, we probably don't want to reuse all strings. For example, in a map the keys probably tend to repeat much more often than the values.

Also, do we want to make this configurable? Someone who writes a program likely knows whether they'll read somewhat static/repeated data or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.