When using the JSON package, if I encode a struct like
and then decode it into
it will still allocate memory for the list of names, even though it just gets thrown away. This becomes an annoyance when I have several multigigabyte JSON files like this. It would be neat if the JSON parser could identify what fields it cares about, or somehow be told what fields to ignore, and chuck them.
The text was updated successfully, but these errors were encountered:
I don't believe this is true. Specifically, I don't believe it allocates any memory for fields being discarded. If you think it does, please explain why you think that or point to the allocation. Thanks.
I wrote a small test that writes a json file of Data and then (as a new process) reads it into SmallData and prints runtime.MemStats.TotalAlloc: http://play.golang.org/p/5CB3FUL86m
I ran it with --make=10 to --make=1e8, stepping powers of ten.
The resulting plot is , which indicates that the larger the json file, the more memory is consumed reading into SmallData. It's not obvious, but each datapoint is actually a collection of 3 runs; the variance between runs was very small.
The title says "Decoder internally buffers full input" but it might be better phrased as "Decoder buffers an entire value at a time".
We introduced Decoder.Token last cycle so it is technically possible for the user to use that and avoid buffering a whole value at once. Admittedly that would take a bunch of code.
It would also be possible for the decoder to stop decoding a whole value at once and instead read from the stream into the target structure incrementally. That would be a big refactoring of the decoder. Is that what this bug requires, or is there some simpler option I'm overlooking?
The new decoder.token should let people build incremental parsers customized to a particular use case. We cannot change the default behavior: right now if encoding/json consumes a very large but ultimately malformed JSON value, nothing is written to the destination. Incremental decoding would change those semantics by writing to the destination before realizing the value was malformed.
It might be possible to have a different opt-in mode in the Decoder, but certainly not at this point in the Go 1.7 cycle.