Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: Pool example suggests incorrect usage #23199

Open
dsnet opened this issue Dec 20, 2017 · 30 comments
Open

sync: Pool example suggests incorrect usage #23199

dsnet opened this issue Dec 20, 2017 · 30 comments

Comments

@dsnet
Copy link
Member

@dsnet dsnet commented Dec 20, 2017

The operation of sync.Pool assumes that the memory cost of each element is approximately the same in order to be efficient. This property can be seen by the fact that Pool.Get returns you a random element, and not the one that has "the greatest capacity" or what not. In other words, from the perspective of the Pool, all elements are more or less the same.

However, the Pool example stores bytes.Buffer objects, which have an underlying []byte of varying capacity depending on how much of the buffer is actually used.

Dynamically growing an unbounded buffers can cause a large amount of memory to be pinned and never be freed in a live-lock situation. Consider the following:

pool := sync.Pool{New: func() interface{} { return new(bytes.Buffer) }}

processRequest := func(size int) {
	b := pool.Get().(*bytes.Buffer)
	time.Sleep(500 * time.Millisecond) // Simulate processing time
	b.Grow(size)
	pool.Put(b)
	time.Sleep(1 * time.Millisecond) // Simulate idle time
}

// Simulate a set of initial large writes.
for i := 0; i < 10; i++ {
	go func() {
		processRequest(1 << 28) // 256MiB
	}()
}

time.Sleep(time.Second) // Let the initial set finish

// Simulate an un-ending series of small writes.
for i := 0; i < 10; i++ {
	go func() {
		for {
			processRequest(1 << 10) // 1KiB
		}
	}()
}

// Continually run a GC and track the allocated bytes.
var stats runtime.MemStats
for i := 0; ; i++ {
	runtime.ReadMemStats(&stats)
	fmt.Printf("Cycle %d: %dB\n", i, stats.Alloc)
	time.Sleep(time.Second)
	runtime.GC()
}

Depending on timing, the above snippet takes around 35 GC cycles for the initial set of large requests (2.5GiB) to finally be freed, even though each of the subsequent writes only use around 1KiB. This can happen in a real server handling lots of small requests, where large buffers allocated by some prior request end up being pinned for a long time since they are not in Pool long enough to be collected.

The example claims to be based on fmt usage, but I'm not convinced that fmt's usage is correct. It is susceptible to the live-lock problem described above. I suspect this hasn't been an issue in most real programs since fmt.PrintX is typically not used to write very large strings. However, other applications of sync.Pool may certainly have this issue.

I suggest we fix the example to store elements of fixed size and document this.

\cc @kevinburke @LMMilewski @bcmills

@dsnet
Copy link
Member Author

@dsnet dsnet commented Dec 20, 2017

I should also note that if #22950 is done, then usages like this will cause large buffers to be pinned forever since this example has a steady state of Pool usage, so the GC would never clear the pool.

@dsnet
Copy link
Member Author

@dsnet dsnet commented Dec 20, 2017

Here's an even worse situation than earlier (suggested by @bcmills):

pool := sync.Pool{New: func() interface{} { return new(bytes.Buffer) }}

processRequest := func(size int) {
	b := pool.Get().(*bytes.Buffer)
	time.Sleep(500 * time.Millisecond) // Simulate processing time
	b.Grow(size)
	pool.Put(b)
	time.Sleep(1 * time.Millisecond) // Simulate idle time
}

// Simulate a steady stream of infrequent large requests.
go func() {
	for {
		processRequest(1 << 28) // 256MiB
	}
}()

// Simulate a storm of small requests.
for i := 0; i < 1000; i++ {
	go func() {
		for {
			processRequest(1 << 10) // 1KiB
		}
	}()
}

// Continually run a GC and track the allocated bytes.
var stats runtime.MemStats
for i := 0; ; i++ {
	runtime.ReadMemStats(&stats)
	fmt.Printf("Cycle %d: %dB\n", i, stats.Alloc)
	time.Sleep(time.Second)
	runtime.GC()
}

Rather than a single one-off large request, let there be a steady stream of occasional large requests intermixed with a large number of small requests. As this snippet runs, the heap keeps growing over time. The large request is "poisoning" the pool such that most of the small requests eventually pin a large capacity buffer under the hood.

@kevinburke
Copy link
Contributor

@kevinburke kevinburke commented Dec 20, 2017

Yikes. My goal in adding the example was to try to show the easiest-to-understand use case for a Pool. fmt was the best one I could find in the standard library.

@ulikunitz
Copy link
Contributor

@ulikunitz ulikunitz commented Dec 21, 2017

The solution is of course to put only buffers with small byte slices back into the pool.

if b.Cap() <= 1<<12 {
         pool.put(b)
}

@dsnet
Copy link
Member Author

@dsnet dsnet commented Dec 21, 2017

Alternatively, you could use an array of sync.Pools to bucketize the items by size: https://github.com/golang/go/blob/7e394a2/src/net/http/h2_bundle.go#L998-L1043

@bcmills
Copy link
Member

@bcmills bcmills commented Dec 21, 2017

The solution is of course …

There are many possible solutions: the important thing is to apply one of them.

A related problem can arise with goroutine stacks in conjunction with “worker pools”, depending on when and how often the runtime reclaims large stacks. (IIRC that has changed several times over the lifetime of the Go runtime, so I'm not sure what the current behavior is.) If you have a pool of worker goroutines executing callbacks that can vary significantly in stack usage, you can end up with all of the workers consuming very large stacks even if the overall fraction of large tasks remains very low.

@kevinburke
Copy link
Contributor

@kevinburke kevinburke commented Dec 21, 2017

Do you have any suggestions for better use cases we could include in the example, that are reasonably compact?

Maybe the solution is not to recommend a sync.Pool at all anymore? This is my understanding from a comment I read about how GC makes this more or less useless

@jzelinskie
Copy link
Contributor

@jzelinskie jzelinskie commented Dec 22, 2017

Would changing the example to use an array (fixed size) rather than a slice solve this problem?
In Chihaya, this is how we've used sync.Pool and our implementation before it was in the standard library.

Maybe the solution is not to recommend a sync.Pool at all anymore?

I legitimately don't think there ever was a time to generally recommend sync.Pool. I find it a pretty contentious add to the standard library because of how careful and knowledgable of the runtime you need to be in order to use it effectively. If you need optimization at this level, you probably know how to implement this best for your own use case.

Sorry to interject randomly, but I saw this thread on Twitter and have strong opinions on this feature.

@aclements
Copy link
Member

@aclements aclements commented Dec 22, 2017

Maybe the solution is not to recommend a sync.Pool at all anymore? This is my understanding from a comment I read about how GC makes this more or less useless

We would certainly like to get to this point, and the GC has improved a lot, but for high-churn allocations with obvious lifetimes and no need for zeroing, sync.Pool can still be a significant optimization. As @RLH likes to say, every use of sync.Pool is a bug report on the GC. But we're still taking those bug reports. :)

I should also note that if #22950 is done, then usages like this will cause large buffers to be pinned forever since this example has a steady state of Pool usage, so the GC would never clear the pool.

That's clearly true, but even right now it's partly by chance that these examples are eventually dropping the large buffers. And in the more realistic stochastic mix example, it's not clear to me that #22950 would make it any better or worse.

I agree with @dsnet's original point that we should document that sync.Pool treats all objects interchangeably, so they should all have roughly the same "weight". And it would be good to provide some suggestions for what to do in situations where this isn't the case, and perhaps some concrete examples of poor sync.Pool usage.

@FMNSSun
Copy link

@FMNSSun FMNSSun commented Jul 22, 2018

We've used sync.Pool for dealing with network packets and others do too (such as lucas-clemente/quic-go) because for those use cases you gain performance when using them. However, in those cases []byte is used instead of bytes.Buffer and they all have the same capacity and are re-sliced as needed. Before putting them back they are re-sliced to MaxPacketSize and before reading you Get a new []byte and re-slice it to the size of the network packet.

The same we did even for structs such as when parsing packets to a struct func ParsePacket([]byte data) *Packet (which overwrites all fields in a struct anyway) which means instead of allocating/freeing thousands of new structs each second they can be re-used using sync.Pool which makes things a little bit faster.

@FMNSSun
Copy link

@FMNSSun FMNSSun commented Jul 23, 2018

I guess a minimal example of such a usage would be:

package main

import (
	"sync"
	"net"
)

const MaxPacketSize = 4096

var bufPool = sync.Pool {
	New: func() interface{} {
		return make([]byte, MaxPacketSize)
	},
}

func process(outChan chan []byte) {
	for data := range outChan {
		// process data

		// Re-slice to maximum capacity and return it
		// for re-use. This is important to guarantee that
		// all calls to Get() will return a buffer of
		// length MaxPacketSize.
		bufPool.Put(data[:MaxPacketSize])
	}
}

func reader(conn net.PacketConn, outChan chan []byte) {
	for {
		data := bufPool.Get().([]byte)
		
		n, _, err := conn.ReadFrom(data)
		
		if err != nil {
			break
		}
		
		outChan <- data[:n]
	}
	
	close(outChan)
}

func main() {
	N := 3
	var wg sync.WaitGroup
	
	outChan := make(chan []byte, N)
	
	wg.Add(N)
	
	for i := 0; i < N; i++ {
		go func() {
			process(outChan)
			wg.Done()
		}()
	}
	
	wg.Add(1)

	conn, err := net.ListenPacket("udp", "localhost:10001")

	if err != nil {
		panic(err.Error())
	}
	
	go func() {
		reader(conn, outChan)
		wg.Done()
	}()
	
	wg.Wait()
}

but of course... whether this is going to be faster than the GC depends on how many packets per second you have to deal with and what exactly you do with the data etc. In real world you'd benchmark with GC/sync.Pool and compare the two. At the time we wrote our code there was a significant time spent allocating new stuff and using a scheme as above we've managed to increase the throughput. Of course, one should re-benchmark this with every update to the GC.

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 18, 2018

Change https://golang.org/cl/136035 mentions this issue: encoding/json: fix usage of sync.Pool

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 18, 2018

Change https://golang.org/cl/136115 mentions this issue: sync: clarify proper Pool usage for dynamically sized buffers

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 18, 2018

Change https://golang.org/cl/136116 mentions this issue: fmt: fix usage of sync.Pool

@puellanivis
Copy link

@puellanivis puellanivis commented Sep 24, 2019

We (HelloFresh) hit a similar issue to this with one of our services, where a number of large logs using zap end up hanging around for a long time.

@kevinburke
Copy link
Contributor

@kevinburke kevinburke commented Sep 25, 2019

I added the example here and frankly I think the best course of action is to remove it. I'm not happy with the number of errors that have been reported and this class of error is particularly difficult/frustrating to debug. I think you are right that we should make sync.Pool use more difficult by removing the example of how to use it. @dsnet what do you think?

@andig
Copy link
Contributor

@andig andig commented May 19, 2021

I guess a minimal example of such a usage would be:

bufPool.Put(data[:MaxPacketSize])

@FMNSSun sorry for the late comment, I've only just stumbled over the example. In terms of getting it right- wouldn't this approach still pin the the original memory as the underlying array is still kept around?

@puellanivis
Copy link

@puellanivis puellanivis commented May 19, 2021

Yes, it would still pin the underlying memory, but even more specifically, the data[:maxPacketSize] would keep the capacity of the backing buffer as well, so putting a make([]byte, 0, 1014*1024*1024) into the pool would still pin a gibibyte of memory.

@FMNSSun
Copy link

@FMNSSun FMNSSun commented Jul 20, 2021

@andig

That's exactly what you'd want anyway. You want to re-use the whole array.

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 26, 2021

Change https://golang.org/cl/337393 mentions this issue: sync: replace the incorrect Pool usage example

@peterbourgon
Copy link

@peterbourgon peterbourgon commented Aug 18, 2021

I've found it's very difficult to correctly use sync.Pools of raw []byte. Growing the returned slice has many sharp edges that easily cause serious bugs when interacting with pools. I generally prefer using bytes.Buffer when possible — is that not possible here?

@dsnet
Copy link
Member Author

@dsnet dsnet commented Aug 18, 2021

@peterbourgon, wouldn't using bytes.Buffer still have the same issue unless the user explicitly enforces some maximum bounds on the buffer length? If so, presumably the same technique that someone could do with a raw []byte?

I've been trying to find a way to use variable-length buffers with sync.Pool, and the approach that I've been taking is something like: #27735 (comment)

@peterbourgon
Copy link

@peterbourgon peterbourgon commented Aug 18, 2021

wouldn't using bytes.Buffer still have the same issue unless the user explicitly enforces some maximum bounds on the buffer length?

Ah, so, the assumption in the example is that the returned []byte shall not be grown by the caller?

edit

I suggest we fix the example to store elements of fixed size and document this.

It seems unrealistic to expect or suggest that sync.Pool be used to store items of a fixed size. I can't recall a single time I've seen a sync.Pool "in the wild" where this constraint could be satisfied. Size classes, as in the bufio pools in net/http, sure.

@andig
Copy link
Contributor

@andig andig commented Aug 18, 2021

Imho the point is to not persists a potentially ever-growing buffer without bounds like seen in #46285

@puellanivis
Copy link

@puellanivis puellanivis commented Aug 18, 2021

Not only do we often accidentally end up pinning large buffers by naïvely putting []byte buffers into a sync.Pool, I have found at times that it acts really weird when putting []byte values in the pool directly, rather than in something like *bytes.Buffer or even a &struct{ []byte }{ b }.

In the end, for github.com/pkg/sftp, we switched to using a buffered chan []byte type to avoid the overhead of boxing/unboxing the []byte values through an interface{}, then we added some checks for buffers that were too small and too large. But then, we also have the great fortune there to have pretty well bounded packet sizes in that package. I’m not sure such a solution would work generically.

@storozhukBM
Copy link

@storozhukBM storozhukBM commented Aug 19, 2021

@dsnet

What if we suggest users probabilistically not return objects into the pool if their size is bigger than expected. Such an approach can help avoid occasional huge objects hanging up in a pool forever. But if you actually have more or less frequent necessities in big buffers, the pool will still work but will be a bit less efficient.

Here is your original example but with a 5% probability of not storing a big buffer:

pool := sync.Pool{New: func() interface{} { return new(bytes.Buffer) }}
randPool := sync.Pool{New: func() interface{} { return rand.New(rand.NewSource(rand.Int63())) }}

returnToPool := func(buffer *bytes.Buffer) {
	localRand := randPool.Get().(*rand.Rand)
	defer randPool.Put(localRand)
	expectedSize := 1 << 11 // 2KB
	if buffer.Cap() > expectedSize && localRand.Intn(100) < 5 {
		return
	}
	pool.Put(buffer)
}

processRequest := func(size int) {
	b := pool.Get().(*bytes.Buffer)
	time.Sleep(500 * time.Millisecond) // Simulate processing time
	b.Grow(size)
	returnToPool(b)
	time.Sleep(1 * time.Millisecond) // Simulate idle time
}

// Simulate a set of initial large writes.
for i := 0; i < 10; i++ {
	go func() {
		processRequest(1 << 28) // 256MiB
	}()
}

time.Sleep(time.Second) // Let the initial set finish

// Simulate an un-ending series of small writes.
for i := 0; i < 10; i++ {
	go func() {
		for {
			processRequest(1 << 10) // 1KiB
		}
	}()
}

// Continually run a GC and track the allocated bytes.
var stats runtime.MemStats
for i := 0; ; i++ {
	runtime.ReadMemStats(&stats)
	fmt.Printf("Cycle %d: %dB\n", i, stats.Alloc)
	time.Sleep(time.Second)
	runtime.GC()
}

With 5% it takes ~12 cycles to get rid of initial state, with 15% only 3 cycles, so by adjusting probability, we can make it more aggressive.

This approach is also working relatively fine in your modified example with the pathological case of the constant stream of big buffers. It is not allocating as much memory. With a 15% of buffer dropping rate, it ends up having ~800MB of heap in 100 cycles instead of 17GB. But having separate pools, in that case, would be better.

I understand that making such examples in the standard library is a bit of an overkill, but still, it may be worthwhile at least mentioning different strategies for dealing with such problems.

@dsnet
Copy link
Member Author

@dsnet dsnet commented Aug 19, 2021

@storozhukBM

But if you actually have more or less frequent necessities in big buffers, the pool will still work but will be a bit less efficient.

This is unfortunately the problem with a probabilistic model. I found it to be much less efficient for applications that did continually use a large buffer size. Ideally, we want perfect reuse if an application consistently uses a large buffer size.

I believe that returning a variable-length buffer to the pool needs to consider history of how well the buffer was utilized. With regard to history, you can either go with global history or local history. Global history is more accurate, but then you have to deal with coordination, or you can go with local history, and hope that its good enough. Did you see my code snippet in #27735 (comment)? I only relies on local history and addresses the problem of discarding large buffers when they are underutilized, while ensuring perfect reuse when consistently use large buffers.

@storozhukBM
Copy link

@storozhukBM storozhukBM commented Aug 19, 2021

@dsnet, I’ve missed that comment. This is a really good idea, thanks.

kzys added a commit to kzys/stargz-snapshotter that referenced this issue Sep 3, 2021
Most of bytes.Buffer in stargz-snapshotter is unbounded in terms of
size and sync.Pool's Get may choose to ignore the pool.

Resizing a buffer before returning to the pool may alleviate the
snapshotter's memory utilization issue.

golang/go#23199 has a long discussion about
the issue.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
@lesismal
Copy link

@lesismal lesismal commented Sep 22, 2021

Something about use raw []byte: #48535 (comment)

@lesismal
Copy link

@lesismal lesismal commented Sep 22, 2021

I've found it's very difficult to correctly use sync.Pools of raw []byte. Growing the returned slice has many sharp edges that easily cause serious bugs when interacting with pools. I generally prefer using bytes.Buffer when possible — is that not possible here?

It does difficult to use raw []byte, but sometimes we must do that to optimize memory usage and gc, else performs worse and stw comes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet