Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: incorrect usage of sync.Pool #27735

Open
dsnet opened this issue Sep 18, 2018 · 15 comments
Open

encoding/json: incorrect usage of sync.Pool #27735

dsnet opened this issue Sep 18, 2018 · 15 comments
Milestone

Comments

@dsnet
Copy link
Member

@dsnet dsnet commented Sep 18, 2018

https://golang.org/cl/84897 introduces a denial-of-service attack on json.Marshal via a live-lock situation with sync.Pool.

Consider this snippet:

type CustomMarshaler int

func (c CustomMarshaler) MarshalJSON() ([]byte, error) {
	time.Sleep(500 * time.Millisecond) // simulate processing time
	b := make([]byte, int(c))
	b[0] = '"'
	for i := 1; i < len(b)-1; i++ {
		b[i] = 'a'
	}
	b[len(b)-1] = '"'
	return b, nil
}

func main() {
	processRequest := func(size int) {
		json.Marshal(CustomMarshaler(size))
		time.Sleep(1 * time.Millisecond) // simulate idle time
	}

	// Simulate a steady stream of infrequent large requests.
	go func() {
		for {
			processRequest(1 << 28) // 256MiB
		}
	}()

	// Simulate a storm of small requests.
	for i := 0; i < 1000; i++ {
		go func() {
			for {
				processRequest(1 << 10) // 1KiB
			}
		}()
	}

	// Continually run a GC and track the allocated bytes.
	var stats runtime.MemStats
	for i := 0; ; i++ {
		runtime.ReadMemStats(&stats)
		fmt.Printf("Cycle %d: %dB\n", i, stats.Alloc)
		time.Sleep(time.Second)
		runtime.GC()
	}
}

This is a variation of #23199 (comment) of a situation suggested by @bcmills.

Essentially, we have a 1-to-1000 ratio of a routines that either use 1KiB or 256MiB, respectively. The occasional insertion of a 256MiB buffer into the sync.Pool gets continually held by the 1KiB routines. On my machine, after 300 GC cycles, the above program occupies 6GiB of my heap, when I expect it to be 256MiB in the worst-case.

\cc @jnjackins @bradfitz

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 18, 2018

Change https://golang.org/cl/136035 mentions this issue: encoding/json: fix usage of sync.Pool

@bcmills bcmills added this to the Go1.12 milestone Sep 19, 2018
@ianlancetaylor ianlancetaylor modified the milestones: Go1.12, Go1.13 Dec 12, 2018
@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@flimzy
Copy link

@flimzy flimzy commented Aug 25, 2019

It's been nearly a year, and no real attention to this issue.

So what are the options for fixing this?

The simplest would be to set an arbitrary size limit for buffers to recycle. Whatever number is selected will hurt performance for someone, I'd wager.

When using a json.Encoder, an additional configuration parameter could be exposed (i.e. SetMaxBufferSize). That complicates sharing buffer pools across encoder instances, though.

Providing a package-level variable is an option, too, but an ugly one that should absolutely be avoided (I just mention it for completeness sake).

Eliminating the need internal buffering in the json encoder could be another option. It wouldn't solve this problem for everyone, but it would provide recourse for those who are affected enough to care. I believe this was the exact issue that I think brought this up (https://go-review.googlesource.com/c/go/+/135595), which makes it a bit ironic that this issue may be holding up that one :)

@bcmills
Copy link
Member

@bcmills bcmills commented Aug 26, 2019

@flimzy, one approach would be to keep statistics on the output size, and only return a buffer to the pool if it isn't a statistical outlier. The problem there is that the statistical computations need to be cheap enough that they don't swamp out the benefit from using sync.Pool in the first place.

Another (simpler?) approach would be to pull a buffer from the pool, use it, and abandon it instead of returning it to the pool according to some probability distribution computed from the final high-water mark and the actual buffer size. (One simple distribution would be: 1.0 if the buffer is within 2x of the final high-water mark, or 0.0 otherwise.)

@dsnet
Copy link
Member Author

@dsnet dsnet commented Aug 26, 2019

@flimzy, are you running into an issue related to this in production? Seeing how it performs badly in production may be helpful data in coming up with a reasonable approach for this.

When I filed this issue, it was just a theoretical issue I saw. It was not a concrete problem I was experiencing.

@flimzy
Copy link

@flimzy flimzy commented Aug 26, 2019

@dsnet: No, I'm not. I'm interested in resolving this issue because it seems to be a blocker for making headway on some other json-encoder related issues that are affecting me: https://go-review.googlesource.com/c/go/+/135595, and by extension #33714

@flimzy
Copy link

@flimzy flimzy commented Aug 26, 2019

Thanks for the feeback, @bcmills

I like the simplicity of your second approach. I think we need to be careful not to discard buffers to aggressively, in the case of applications that naturally have a large disparity of JSON payloads. A simple example comes to mind: A REST API that responds with {"ok":true} for PUT requests, but with 10s or 100s of kb responseses for a 'GET'. Perhaps considering 10k (or another arbitrary limit) as the high-water mark, if it was smaller than that would guard against this.

I can still imagine certain applications where this would be sub-optimal, but maybe this is a reasonable starting point, to be tuned later

@flimzy
Copy link

@flimzy flimzy commented Aug 26, 2019

As I consider this further, I have to wonder: Is this really a DoS attack vector? If it were for json.Unmarshal, I can see it being easily exploited (just send a huge payload to your favorite Go-powered REST API). But as this is for json.Marshal, for this to be exploited, it would mean that several other possible validation opportunities were disregarded.

I don't mean to suggest this doesn't warrant some amount of attention, I just suspect the danger is over-stated.

@iand
Copy link
Contributor

@iand iand commented Aug 26, 2019

Another approach would be to remove the package level encodeStatePool and allow the caller to optionally supply one when creating an encoder. The reasoning here is that the caller has more knowledge about the distribution of payloads and the edge cases that a package-level statistical count would struggle with.

@flimzy
Copy link

@flimzy flimzy commented Aug 26, 2019

Another approach would be to remove the package level encodeStatePool and allow the caller to optionally supply one when creating an encoder.

This does seem like the only approach to cover all use cases optimally. Is there any precedent for something similar elsewhere in the stdlib to use for guidance?

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 13, 2019

Change https://golang.org/cl/195217 mentions this issue: Add a simple, optional 'keep buffer' capability to the Encoder type

@flimzy
Copy link

@flimzy flimzy commented Sep 13, 2019

I have created https://golang.org/cl/195217 as the simplest change I could think of to address the concerns raised in #27735. My main concern with this implementation is that it would become possible for a third-party library to starve the buffer pool by, for instance, never returning anything to the pool.

The other realistic approach, IMO, is to expose the encodeStatePool object to the public API. This will have a larger footprint on the public API, and will require all users of json.NewEncoder wishing to use the new functionality, to propagate their own pool throughout their code (including to any third-party libraries which do JSON encoding), to avoid using lots of fragmented pools.

Even with the drawbacks, the latter may be the better approach, but I started with the former because it is so simple, it seemed worth the minor effort to put it in front of reviewers.

@dsnet
Copy link
Member Author

@dsnet dsnet commented Dec 5, 2020

I've been thinking of ways to guarantee bounds on the worst-case behavior, and this is a prototype idea:

// bufferPool is a pool of buffers.
//
// Example usage:
//
//	p := getBuffer()
//	defer putBuffer(p)
//	p.buf = appendFoo(p.buf, foo)        // may resize buffer to arbitrarily large sizes
//	return append([]byte(nil), p.buf...) // single copy of the buffer contents
//
var bufferPool = sync.Pool{
	New: func() interface{} { return new(pooledBuffer) },
}

type pooledBuffer struct {
	buf     []byte
	strikes int // number of times the buffer was under-utilized
}

func getBuffer() *pooledBuffer {
	return bufferPool.Get().(*pooledBuffer)
}

func putBuffer(p *pooledBuffer) {
	// Recycle large buffers if sufficiently utilized.
	// If a buffer is under-utilized enough times sequentially,
	// then it is discarded, ensuring that a single large buffer
	// won't be kept alive by a continuous stream of small usages.
	switch {
	case cap(p.buf) <= 1<<16: // always recycle buffers smaller than 64KiB
		p.strikes = 0
	case cap(p.buf)/2 <= len(p.buf): // at least 50% utilization
		p.strikes = 0
	case p.strikes < 4:
		p.strikes++
	default:
		return // discard the buffer; too large and too often under-utilized
	}
	p.buf = p.buf[:0]
	bufferPool.Put(p)
}

In the worst-case scenario, we can ensure a bound on the minimal utilization based on the % threshold (e.g., 50% in the example above) and maximum number of sequential recycles below that threshold (e.g., 4x in the example above).

For the constants chosen above, the worst case sequence of utilization would be:

  • 50%, 0%, 0%, 0%, 0%, 50%, 0%, 0%, 0%, 0%, 50%, 0%, 0%, 0%, 0%, ...

On average, that's a worst case utilization of 10%, which is far better than the theoretical worst-case of 0% if the code naively recycles buffers without any limits.

Advantages of the algorithm:

  • It's simple; easy to analyze and reason about.
  • It's fast; only requires basic integer arithmetic.
  • It doesn't depend on any global state where statistics are concurrently gathered.
  • For usages of all the same approximate size (regardless of how big or small), this ensures buffers are always recycled.

Happy to hear suggestions for improvements.

@iand
Copy link
Contributor

@iand iand commented Dec 5, 2020

@dsnet I like this. It's elegant and low cost. What is the reasoning behind dropping <64k buffers?

@josharian
Copy link
Contributor

@josharian josharian commented Dec 5, 2020

The idea of keeping statistics locally is a nice one. We do lose some amount of information every time we throw away an over-sized buffer. An alternative, which might be helpful if we kept more sophisticated statistics, would be to nil out buf and return the pooledBuffer object to the pool. Then on Get we could pre-allocate a new buf with a better initial size.

Even with these simple stats, it might even be worth zeroing and returning pooledBuffer to avoid an allocation. Maybe even include a small byte array struct field in pooledBuffer that we could initialize buf to use, providing an allocation-free path when a small buf is needed.

@dsnet
Copy link
Member Author

@dsnet dsnet commented Dec 5, 2020

@dsnet I like this. It's elegant and low cost. What is the reasoning behind dropping <64k buffers?

It's not dropping 64k buffers, but always choosing to recycle them in the pool. That specific code path isn't strictly necessary, but for buffers below a certain size it probably doesn't hurt to always recycle them regardless of the utilization.

An alternative, which might be helpful if we kept more sophisticated statistics, would be to nil out buf and return the pooledBuffer object to the pool. Then on Get we could pre-allocate a new buf with a better initial size.

Interesting idea. I think a useful statistic to obtain would perhaps be the maximum length encountered over the last N usages. If we evict a buffer due to low utilization, we can allocate a new one based on a recently seen maximum length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants