Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: Encoder internally buffers full output #7872

Open
extemporalgenome opened this issue Apr 26, 2014 · 25 comments
Labels
Milestone

Comments

@extemporalgenome
Copy link
Contributor

@extemporalgenome extemporalgenome commented Apr 26, 2014

What does 'go version' print?

go version devel +2f6b9f80be36 Fri Apr 25 09:46:07 2014 -0600 linux/amd64

What steps reproduce the problem?
If possible, include a link to a program on play.golang.org.

1. Use json.NewEncoder(writer).Encode(value)

What happened?

Each call to json.Encoder.Encode uses an internal bytes.Buffer to buffer all encoded
output prior to writing any of that output.

What should have happened instead?

Output should use little or no internal buffering. Encoder should be able to efficiently
encode (and incrementally output) very large inputs.
@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Apr 26, 2014

Comment 1:

Yeah, suppose it could use a bufio.Writer instead of a bytes.Buffer.

Labels changed: added release-go1.4, repo-main.

Status changed to Accepted.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Jun 25, 2014

Comment 2 by nicolashillegeer:

> Yeah, suppose it could use a bufio.Writer instead of a bytes.Buffer.
This could indeed save a useless copy, as also noted in this issue:
https://golang.org/issue/5683
However, there are some small things. For example, I'm using an Encoder to encode
straight into a go.net websocket. Due to the current implementation, the Encoder issues
a Write once the entire message is completed. With a bufio.Writer, there could be
several writes. The go.net WebSocket implementation takes every write as an entire
message (as far as I can see from perusing the source). This would make my json possibly
arrive in arbitrary pieces.
Note that what I'm doing is not necessarily the appropriate usage of go.net WebSockets,
it has websocket.JSON.Receive/Send methods to send and receive entire messages at a
time. And I should be using that. But those methods use json.Marshal/json.Unmarshal
under the hood, which cause buffer churn when used a lot (my use case is relatively
high-throughput). Ideally the go.net WebSockets package would use Encoder/Decoder with a
bytes.Buffer under the hood for the JSON.(Send|Receive) calls, reusing the buffer every
time (either using a sync.Pool for the buffers or storing them in the websocket
connection itself, since I believe those don't support concurrent writing anyway).
I would not mourn too much if my current usage of go.net WebSockets stops working.
@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Sep 15, 2014

Comment 3:

I don't believe we can change this today. The buffering ensures that if an error is
encountered, no partial JSON is emitted. The only thing we could do is add some method
to encoder to allow people to turn it off themselves.

Labels changed: added release-go1.5, removed release-go1.4.

@bradfitz bradfitz modified the milestone: Go1.5 Dec 16, 2014
@bradfitz bradfitz removed the release-go1.5 label Dec 16, 2014
@slimsag

This comment has been minimized.

Copy link

@slimsag slimsag commented Dec 22, 2014

I was rather shocked when I discovered this, actually. If my understanding is correct this means that JSON cannot be streamed to a file without it consuming large amounts of memory.

The only thing we could do is add some method to encoder to allow people to turn it off themselves.

I highly agree with this approach.

@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Dec 22, 2014

@slimsag, I don't understand. Each call to encoding/json.Encoder.Encode produces one Write. Even if Encode wrote more eagerly to its Writer, you still have the entire argument to Encode (the whatever interface{}) in memory. So it's at worst a linear factor of memory. This isn't the difference of being able to stream or not.

Realistically if you want to stream JSON, you'd do something like:

     e := json.NewEncoder(f)
     io.WriteString(f, "[")
     for .... {
           dataToStream := ....
           e.Encode(dataToStream)
           io.WriteString(f, ",")
     }
     io.WriteString(f, "]")

... and that pattern works fine even today.

Changing the internal buffering (this bug) has nothing to do with making it possible to programmatically generate the data to stream. That would a much more invasive API change, and not obviously better than the loop above.

ydnar added a commit to ydnar/go that referenced this issue Jan 3, 2015
@ydnar

This comment has been minimized.

Copy link

@ydnar ydnar commented Jan 3, 2015

This should satisfy the existing requirements (truncation on error), and limited buffering to enable streaming structures of unbounded/unknown size: https://github.com/ydnar/go/compare/jsonio

@slimsag

This comment has been minimized.

Copy link

@slimsag slimsag commented Jan 3, 2015

@bradfitz I can see now that I misunderstood what this bug is about, my apologies!

From the title "Encoder internally buffers full output" I had incorrectly thought all calls to Encode were buffered. I have re-read the issue understand now that only the data written by a single call to Encode is buffered, and so streaming works fine.

@rsc rsc removed accepted labels Apr 14, 2015
@bradfitz bradfitz modified the milestones: Go1.6, Go1.5 Jul 7, 2015
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Aug 22, 2015

CL https://golang.org/cl/13818 mentions this issue.

@freeformz

This comment has been minimized.

Copy link
Contributor

@freeformz freeformz commented Aug 22, 2015

I took a stab at this: https://golang.org/cl/13818

I don't think I'm really happy with the outcome though.

Review / comments welcome.

PS: This is my first attempt to contribute.

@cespare

This comment has been minimized.

Copy link
Contributor

@cespare cespare commented Apr 14, 2016

Based on what @rsc wrote in https://golang.org/cl/13818, what we want to do here is to provide an opt-in mechanism for an Encoder to generate output incrementally.

The corresponding change for a Decoder is #11046.

@rsc rsc modified the milestones: Go1.8, Go1.7 May 18, 2016
@quentinmit quentinmit added the NeedsFix label Oct 6, 2016
@rsc rsc added this to the Go1.9 milestone Oct 26, 2016
@rsc rsc removed this from the Go1.8 milestone Oct 26, 2016
@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented May 9, 2017

Too late. Didn't happen for Go 1.9.

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9 May 9, 2017
@ImJasonH

This comment has been minimized.

Copy link

@ImJasonH ImJasonH commented Jul 20, 2017

Would an API like json.NewEncoder(w).Buffer(numBytes).Encode(foo) be acceptable? Buffer (name TBD obviously) would tell the encoder to flush to w every numBytes bytes. The default behavior would be unchanged.

Buffer would return an Encoder to enable method chaining. Encoder's current SetIndent method doesn't facilitate chaining, but it could. I don't know if it was an explicit decision in the design not to facilitate this.

A similar API change could be made for json.NewDecoder(r).Buffer(numBytes).Decode(&foo).

@andybons

This comment has been minimized.

Copy link
Member

@andybons andybons commented Aug 25, 2017

@rsc What do you think of Jason’s approach?

@andybons

This comment has been minimized.

Copy link
Member

@andybons andybons commented Oct 20, 2017

@ianlancetaylor for further triage/thoughts.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 20, 2017

Related to #20169.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 20, 2017

Apart from the issues discussed in #20169 I don't see a reason to set a number of bytes to accumulate. I would just have an option along the lines of SetEscapeHTML that directs the Encoder to stream output data rather than accumulate it. Then instead of writing to a bytes.Buffer, we write directly to the writer. If the caller wants to buffer at a certain size, they can use the bufio package themselves.

Note that the SetIndent support would require a streaming version of Indent, which looks feasible to me.

@ImJasonH

This comment has been minimized.

Copy link

@ImJasonH ImJasonH commented Oct 20, 2017

Do you have opinions about enabling SetIndent, SetEscapeHTML and the future SetBuffer to be chained?

enc := json.NewEncoder(out)
enc.SetEscapeHTML(true)
enc.SetBuffer(2<<20)
_ = enc.Encode(foo)

vs

_ = json.NewEncoder(out).
    SetEscapeHTML(true).
    SetBuffer(2<<20).
    Encode(foo)

Do any backward-compatibility concerns arise from changing SetIndent to return a *Encoder?

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Oct 20, 2017

There's no chaining today; please don't add any.

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017
@gopherbot gopherbot modified the milestones: Go1.11, Unplanned May 23, 2018
@Xeoncross

This comment has been minimized.

Copy link

@Xeoncross Xeoncross commented Jul 30, 2018

The buffering ensures that if an error is encountered, no partial JSON is emitted.

I have seen multiple developers guarding against partial JSON by using a buffer pool. I myself wasn't aware Go did this. Perhaps the documentation could be updated to mention that on error, no partial encoding is written.

Other languages have made us really defensive programmers, and It's really nice to see the forethought that Go has put into it's implementations (while not becoming overly abstracted and superfluous).

@tv42

This comment has been minimized.

Copy link

@tv42 tv42 commented Jul 31, 2018

Note for anyone who thinks they don't have to defend against partial writes with JSON responses: if you set Content-Type: application/json, you've already committed to success and returning JSON.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Sep 15, 2018

Change https://golang.org/cl/135595 mentions this issue: encoding/json: add SetBufferSize to Encoder to limit memory usage

@ImJasonH

This comment has been minimized.

Copy link

@ImJasonH ImJasonH commented Aug 21, 2019

There's no chaining today; please don't add any.

@rsc: Can you elaborate on why allowing chaining isn't preferred?

It can be nice for scoping down variables into only the scope where they're needed:

if err := json.NewEncoder(r).SetIndent("", "  ").Encode(i); err != nil {
  // uh oh
}

or

if err := json.NewEncoder(r).
    SetIndent("", "  ").
    Encode(i); err != nil {
  // uh oh
}

vs.

e := json.NewEncoder(r)
e.SetIndent("", "  ")
if err := e.Encode(i); err != nil {
  // uh oh
}
// e is in scope out here now :(

Chaining can also preserve vertical space in exchange for horizontal space, which most displays have more of anyway.

I agree it can be abused and taken to an unhelpful extreme, but for simple/common cases it seems worthwhile, backward-compatible, and fairly straightforward to implement.

@Xeoncross

This comment has been minimized.

Copy link

@Xeoncross Xeoncross commented Aug 21, 2019

I believe one of the complaints is that method chaining adds cognitive complexity by combining multiple differing transforms, actions, or side effects on the same line.

In your example, you can also use block scoping.

{
	e := json.NewEncoder(r)
	e.SetIndent("", "  ")
	if err := e.Encode(i); err != nil {
	  // uh oh
	}
}

// e is undefined

However, this is not normally an issue as both the compiler and tooling should yell at you.

@ImJasonH

This comment has been minimized.

Copy link

@ImJasonH ImJasonH commented Aug 21, 2019

@Xeoncross That's a reasonable stance. If that's too much cognitive overhead for you (or your code reviewer), you can always just not chain the methods -- the existing behavior would work the same whether or not SetIndent returned its e.

If the stdlib enabled chaining, users who want it could use it in the scenarios where it makes sense to them. Users who don't want to use it, don't have to. 🤷‍♂

@flimzy

This comment has been minimized.

Copy link

@flimzy flimzy commented Sep 20, 2019

I've been looking closely at https://golang.org/cl/135595, and found some drawbacks. It has one confirmed bug (easily fixed), and one likely bug (unconfirmed), and introduces a drastic increase in allocs, and slows throughput.

I think all of these problems are easily fixed, and I'd be happy to contribute a CL, but the existing CL has some outstanding discussion points (not least of which lead to #27735), but seems to have died. Would an additional CL be welcome at this point? Or should the existing CL be updated with the necessary fixes (which I can elaborate on)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.