New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: Encoder internally buffers full output #7872

Open
extemporalgenome opened this Issue Apr 26, 2014 · 21 comments

Comments

Projects
None yet
@extemporalgenome

extemporalgenome commented Apr 26, 2014

What does 'go version' print?

go version devel +2f6b9f80be36 Fri Apr 25 09:46:07 2014 -0600 linux/amd64

What steps reproduce the problem?
If possible, include a link to a program on play.golang.org.

1. Use json.NewEncoder(writer).Encode(value)

What happened?

Each call to json.Encoder.Encode uses an internal bytes.Buffer to buffer all encoded
output prior to writing any of that output.

What should have happened instead?

Output should use little or no internal buffering. Encoder should be able to efficiently
encode (and incrementally output) very large inputs.
@bradfitz

This comment has been minimized.

Member

bradfitz commented Apr 26, 2014

Comment 1:

Yeah, suppose it could use a bufio.Writer instead of a bytes.Buffer.

Labels changed: added release-go1.4, repo-main.

Status changed to Accepted.

@gopherbot

This comment has been minimized.

gopherbot commented Jun 25, 2014

Comment 2 by nicolashillegeer:

> Yeah, suppose it could use a bufio.Writer instead of a bytes.Buffer.
This could indeed save a useless copy, as also noted in this issue:
https://golang.org/issue/5683
However, there are some small things. For example, I'm using an Encoder to encode
straight into a go.net websocket. Due to the current implementation, the Encoder issues
a Write once the entire message is completed. With a bufio.Writer, there could be
several writes. The go.net WebSocket implementation takes every write as an entire
message (as far as I can see from perusing the source). This would make my json possibly
arrive in arbitrary pieces.
Note that what I'm doing is not necessarily the appropriate usage of go.net WebSockets,
it has websocket.JSON.Receive/Send methods to send and receive entire messages at a
time. And I should be using that. But those methods use json.Marshal/json.Unmarshal
under the hood, which cause buffer churn when used a lot (my use case is relatively
high-throughput). Ideally the go.net WebSockets package would use Encoder/Decoder with a
bytes.Buffer under the hood for the JSON.(Send|Receive) calls, reusing the buffer every
time (either using a sync.Pool for the buffers or storing them in the websocket
connection itself, since I believe those don't support concurrent writing anyway).
I would not mourn too much if my current usage of go.net WebSockets stops working.
@rsc

This comment has been minimized.

Contributor

rsc commented Sep 15, 2014

Comment 3:

I don't believe we can change this today. The buffering ensures that if an error is
encountered, no partial JSON is emitted. The only thing we could do is add some method
to encoder to allow people to turn it off themselves.

Labels changed: added release-go1.5, removed release-go1.4.

@bradfitz bradfitz modified the milestone: Go1.5 Dec 16, 2014

@bradfitz bradfitz removed the release-go1.5 label Dec 16, 2014

@slimsag

This comment has been minimized.

slimsag commented Dec 22, 2014

I was rather shocked when I discovered this, actually. If my understanding is correct this means that JSON cannot be streamed to a file without it consuming large amounts of memory.

The only thing we could do is add some method to encoder to allow people to turn it off themselves.

I highly agree with this approach.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Dec 22, 2014

@slimsag, I don't understand. Each call to encoding/json.Encoder.Encode produces one Write. Even if Encode wrote more eagerly to its Writer, you still have the entire argument to Encode (the whatever interface{}) in memory. So it's at worst a linear factor of memory. This isn't the difference of being able to stream or not.

Realistically if you want to stream JSON, you'd do something like:

     e := json.NewEncoder(f)
     io.WriteString(f, "[")
     for .... {
           dataToStream := ....
           e.Encode(dataToStream)
           io.WriteString(f, ",")
     }
     io.WriteString(f, "]")

... and that pattern works fine even today.

Changing the internal buffering (this bug) has nothing to do with making it possible to programmatically generate the data to stream. That would a much more invasive API change, and not obviously better than the loop above.

ydnar added a commit to ydnar/go that referenced this issue Jan 3, 2015

@ydnar

This comment has been minimized.

ydnar commented Jan 3, 2015

This should satisfy the existing requirements (truncation on error), and limited buffering to enable streaming structures of unbounded/unknown size: https://github.com/ydnar/go/compare/jsonio

@slimsag

This comment has been minimized.

slimsag commented Jan 3, 2015

@bradfitz I can see now that I misunderstood what this bug is about, my apologies!

From the title "Encoder internally buffers full output" I had incorrectly thought all calls to Encode were buffered. I have re-read the issue understand now that only the data written by a single call to Encode is buffered, and so streaming works fine.

@rsc rsc removed accepted labels Apr 14, 2015

@bradfitz bradfitz modified the milestones: Go1.6, Go1.5 Jul 7, 2015

@gopherbot

This comment has been minimized.

gopherbot commented Aug 22, 2015

CL https://golang.org/cl/13818 mentions this issue.

@freeformz

This comment has been minimized.

Contributor

freeformz commented Aug 22, 2015

I took a stab at this: https://golang.org/cl/13818

I don't think I'm really happy with the outcome though.

Review / comments welcome.

PS: This is my first attempt to contribute.

@cespare

This comment has been minimized.

Contributor

cespare commented Apr 14, 2016

Based on what @rsc wrote in https://golang.org/cl/13818, what we want to do here is to provide an opt-in mechanism for an Encoder to generate output incrementally.

The corresponding change for a Decoder is #11046.

@rsc rsc modified the milestones: Go1.8, Go1.7 May 18, 2016

@quentinmit quentinmit added the NeedsFix label Oct 6, 2016

@rsc rsc added this to the Go1.9 milestone Oct 26, 2016

@rsc rsc removed this from the Go1.8 milestone Oct 26, 2016

@bradfitz

This comment has been minimized.

Member

bradfitz commented May 9, 2017

Too late. Didn't happen for Go 1.9.

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9 May 9, 2017

@ImJasonH

This comment has been minimized.

ImJasonH commented Jul 20, 2017

Would an API like json.NewEncoder(w).Buffer(numBytes).Encode(foo) be acceptable? Buffer (name TBD obviously) would tell the encoder to flush to w every numBytes bytes. The default behavior would be unchanged.

Buffer would return an Encoder to enable method chaining. Encoder's current SetIndent method doesn't facilitate chaining, but it could. I don't know if it was an explicit decision in the design not to facilitate this.

A similar API change could be made for json.NewDecoder(r).Buffer(numBytes).Decode(&foo).

@andybons

This comment has been minimized.

Member

andybons commented Aug 25, 2017

@rsc What do you think of Jason’s approach?

@andybons

This comment has been minimized.

Member

andybons commented Oct 20, 2017

@ianlancetaylor for further triage/thoughts.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Oct 20, 2017

Related to #20169.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Oct 20, 2017

Apart from the issues discussed in #20169 I don't see a reason to set a number of bytes to accumulate. I would just have an option along the lines of SetEscapeHTML that directs the Encoder to stream output data rather than accumulate it. Then instead of writing to a bytes.Buffer, we write directly to the writer. If the caller wants to buffer at a certain size, they can use the bufio package themselves.

Note that the SetIndent support would require a streaming version of Indent, which looks feasible to me.

@ImJasonH

This comment has been minimized.

ImJasonH commented Oct 20, 2017

Do you have opinions about enabling SetIndent, SetEscapeHTML and the future SetBuffer to be chained?

enc := json.NewEncoder(out)
enc.SetEscapeHTML(true)
enc.SetBuffer(2<<20)
_ = enc.Encode(foo)

vs

_ = json.NewEncoder(out).
    SetEscapeHTML(true).
    SetBuffer(2<<20).
    Encode(foo)

Do any backward-compatibility concerns arise from changing SetIndent to return a *Encoder?

@rsc

This comment has been minimized.

Contributor

rsc commented Oct 20, 2017

There's no chaining today; please don't add any.

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@gopherbot gopherbot modified the milestones: Go1.11, Unplanned May 23, 2018

@Xeoncross

This comment has been minimized.

Xeoncross commented Jul 30, 2018

The buffering ensures that if an error is encountered, no partial JSON is emitted.

I have seen multiple developers guarding against partial JSON by using a buffer pool. I myself wasn't aware Go did this. Perhaps the documentation could be updated to mention that on error, no partial encoding is written.

Other languages have made us really defensive programmers, and It's really nice to see the forethought that Go has put into it's implementations (while not becoming overly abstracted and superfluous).

@tv42

This comment has been minimized.

tv42 commented Jul 31, 2018

Note for anyone who thinks they don't have to defend against partial writes with JSON responses: if you set Content-Type: application/json, you've already committed to success and returning JSON.

@gopherbot

This comment has been minimized.

gopherbot commented Sep 15, 2018

Change https://golang.org/cl/135595 mentions this issue: encoding/json: add SetBufferSize to Encoder to limit memory usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment