New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: Decoder internally buffers full input #11046

Open
kurin opened this Issue Jun 3, 2015 · 8 comments

Comments

Projects
None yet
6 participants
@kurin

kurin commented Jun 3, 2015

When using the JSON package, if I encode a struct like

type Data struct {
    Count int
    Names []string
}

and then decode it into

type SmallData struct {
    Count int
}

it will still allocate memory for the list of names, even though it just gets thrown away. This becomes an annoyance when I have several multigigabyte JSON files like this. It would be neat if the JSON parser could identify what fields it cares about, or somehow be told what fields to ignore, and chuck them.

@bradfitz bradfitz added the Performance label Jun 3, 2015

@bradfitz bradfitz added this to the Go1.6 milestone Jun 3, 2015

@bradfitz bradfitz changed the title from Decoding JSON allocates memory for fields that aren't used. to encoding/json: decoding allocates memory for fields that aren't used. Jun 3, 2015

@rsc

This comment has been minimized.

Contributor

rsc commented Nov 5, 2015

I don't believe this is true. Specifically, I don't believe it allocates any memory for fields being discarded. If you think it does, please explain why you think that or point to the allocation. Thanks.

@kurin

This comment has been minimized.

kurin commented Nov 5, 2015

I wrote a small test that writes a json file of Data and then (as a new process) reads it into SmallData and prints runtime.MemStats.TotalAlloc: http://play.golang.org/p/5CB3FUL86m

I ran it with --make=10 to --make=1e8, stepping powers of ten.

The resulting plot is
here, which indicates that the larger the json file, the more memory is consumed reading into SmallData. It's not obvious, but each datapoint is actually a collection of 3 runs; the variance between runs was very small.

@ALTree

This comment has been minimized.

Member

ALTree commented Nov 6, 2015

Pprof says 99% of bytes are allocated here.

@rsc

This comment has been minimized.

Contributor

rsc commented Nov 25, 2015

The memory here is for holding the JSON input as read in from the file, not for decoding unused fields.

@rsc rsc changed the title from encoding/json: decoding allocates memory for fields that aren't used. to encoding/json: Decoder internally buffers full input Nov 25, 2015

@rsc rsc modified the milestones: Go1.7, Go1.6 Nov 25, 2015

@cespare

This comment has been minimized.

Contributor

cespare commented Apr 13, 2016

The title says "Decoder internally buffers full input" but it might be better phrased as "Decoder buffers an entire value at a time".

We introduced Decoder.Token last cycle so it is technically possible for the user to use that and avoid buffering a whole value at once. Admittedly that would take a bunch of code.

It would also be possible for the decoder to stop decoding a whole value at once and instead read from the stream into the target structure incrementally. That would be a big refactoring of the decoder. Is that what this bug requires, or is there some simpler option I'm overlooking?

@rsc rsc modified the milestones: Unplanned, Go1.7 Apr 13, 2016

@rsc

This comment has been minimized.

Contributor

rsc commented Apr 13, 2016

The new decoder.token should let people build incremental parsers customized to a particular use case. We cannot change the default behavior: right now if encoding/json consumes a very large but ultimately malformed JSON value, nothing is written to the destination. Incremental decoding would change those semantics by writing to the destination before realizing the value was malformed.

It might be possible to have a different opt-in mode in the Decoder, but certainly not at this point in the Go 1.7 cycle.

@cespare

This comment has been minimized.

Contributor

cespare commented Apr 14, 2016

The corresponding change for an Encoder is #7872.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Oct 26, 2017

Related to #14140 which mentions some possible API changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment