[PSA] closing output on context cancel #4592

Stebalien · 2018-01-18T21:13:39Z

@ipfs/go-team

Hello all,

I've noticed (and have fallen prey to) a bad pattern with contexts and channels that we all need to be aware of (it's non-obvious and rarely causes visible bugs but can cause nasty, hard to track down bugs). For example, the following function has a bug:

// Count returns a channel that counts to `to` and then closes the channel.
func Count(ctx context.Context, to int) <-chan int {
	output := make(chan int)
	go func() {
		defer close(output)
		for i := 0; i < to; i++ {
			select {
			case output <- i:
			case <-ctx.Done():
				return
			}
		}
	}()
	return output
}

What's the bug? Well, let's say we don't trust the function to exit when it should (and want to bail as fast as we can) so we do the following:

func main() {
	ctx, cancel := context.WithTimeout(context.Background(), 1 * time.Millisecond)
	defer cancel()
	ch := Count(ctx, 3)
	for {
		select {
		case i, ok := <-ch:
			if !ok {
				fmt.Println("done!")
				return
			}
			fmt.Println("count: ", i)
		case <-ctx.Done():
			fmt.Println("abort!")
			return
		}
	}
}

We'd expect this program to either print:

count: 0
count: 1
count: 2
done!

Or:

count: 0
count: 1
count: 2
abort!

Unfortunately, this program can actually print, e.g.:

count: 0
done!

How? Well, if we something cancels the context (e.g., a timeout expires), Count could notice this, return, and close the output channel. Unfortunately, main could notice this before it notices that the context has been canceled and think we're done.

One solution is to write an error to the output channel. However, that's kind of a pain and still expects the caller to not wait on the context itself.

The correct solution, IMO, is to simply not close the output channel when the context is canceled. The Caller should deal with this case itself.

Correct solution:

// Count returns a channel that counts to `to` and then closes the channel.
func Count(ctx context.Context, to int) <-chan int {
	output := make(chan int)
	go func() {
		for i := 0; i < to; i++ {
			select {
			case output <- i:
			case <-ctx.Done():
				return
			}
		}
                close(output)
	}()
	return output
}

Note: I've also noticed the following issue but, at least, that simply leaks a goroutine and doesn't cause any nasty silent bugs. However, please be careful of the following:

func Count(ctx context.Context, to int) <-chan int {
	output := make(chan int)
	go func() {
		defer close(output)
		for i := 0; i < to; i++ {
			output <- i // no select on the context.
		}
	}()
	return output
}

The text was updated successfully, but these errors were encountered:

Stebalien · 2019-11-05T18:05:12Z

Looking at the docker source, it looks like a good way to do this is to:

Return both a result and an error channel.
Give the error channel a buffer of 1.
Never close the result channel.
Always close the error channel.

That means callers can just wait for all the error channels to close to know that the requests have completed.

https://github.com/moby/moby/blob/76dbd884d3f1a02dc193305d2ac5824bcd3e4f0f/client/events.go#L19-L70

mvdan · 2020-12-07T18:42:45Z

I fully agree that the current API is error-prone. In one codebase that had a goroutine leak, I ended up finding something like:

pins, err := x.CoreAPI.Pin().Ls(ctx)
if err != nil {...}
fmt.Printf("number of pins: %d\n", len(pins))

This will build fine, and most likely print something unhelpful like 0, as the length is just the size of the buffered channel. More worryingly, it leaked goroutines forever, even hours after the ctx above had been cancelled. This code went months without being noticed, because to the casual reader, pins just looks and behaves like a slice, in this code.

So, knowing that the API actually returns a channel, my fix was like:

pins, err := x.CoreAPI.Pin().Ls(ctx)
if err != nil {...}
count := 0
for range pins {
        count++
}

The leak is gone, but this code is wrong once again. I naively thought that just looking at the PinAPI.Ls godoc would be enough, and did not even notice that the iteration errors are passed through the Pin interface in each channel element.

I personally think that using channels for iterator APIs is a clever hack that can work well when prototyping APIs, but it's got a number of sharp edges that are difficult to fix. You could add more documentation with warnings, but that's not the best solution. Adding a second channel to the function signature adds more ways to misuse the API, unfortunately.

I think that, for an API with such high exposure as go-ipfs, it's worth it to declare iterator types for each of these types to iterate on. The API could roughly be like:

for iter := x.CoreAPI.Pin().Ls(ctx); iter.More(); {
    pin, err := iter.Next()
    if err != nil {...}
    // use pin
}

Here, the error is returned alongside the pin, so forgetting to check it would most likely result in a "declared but not used" compiler error. We also don't have any channel, so there are no sharp edges like goroutine leaks.

This would require declaring a new named type for each type we want to iterate on, but I reckon it would just be 2-3 methods (as shown above) and at most 30 or so lines of boilerplate. I think that's definitely worth it to end up with a nicer API for the many end users. If multiple types need iterators we'll need a bit of copy-pasting, but I still think that's worth it. And it will get easier once generics are in place.

cc @aschmahmann since we discussed this briefly on Slack

gammazero · 2020-12-07T21:33:41Z

@Stebalien Regarding the comment:

Return both a result and an error channel.

Give the error channel a buffer of 1.

Never close the result channel.

Always close the error channel

I am not sure that 3 is best, since it prevents receiving results in a range loop, and more channels means slower select. I do think that an error channel is preferable, and as long as 1 and 4 are true, then the error channel can be checked after seeing that the results channel is closed.

With no error channel, the results channel must not be closed. Otherwise, when the context is canceled, then the caller cannot be certain if the goroutine finished before or after the cancellation. Downside is that results cannot be ranged over, and both results and ctx.Done() must be examined in select.

With an error channel, the results channel should be closed. That will allow ranging over results and examining the error channel afterwards. Downside of this is that behavior must be well documented so the caller can trust that results channel is closed in the event of an error/cancel.

The pin objects sent over a channel also embed an error value within them. If we see an error, we must handle it and stop. The upstream issue ipfs/kubo#4592 covers some of the rough edges of this channel-based API.

Stebalien · 2020-12-16T00:34:14Z

@mvdan

I personally think that using channels for iterator APIs is a clever hack that can work well when prototyping APIs, but it's got a number of sharp edges that are difficult to fix. You could add more documentation with warnings, but that's not the best solution. Adding a second channel to the function signature adds more ways to misuse the API, unfortunately.

This isn't a cleaver hack in this case. The API returns a stream, not just an iterator. Using a channel allows one to select/loop over the channel normally.

There would need to be a very strong motivation for making such an API the default choice in situations like this, given the added complexity, loss of select, etc.

It may make sense for these external APIs given their infrequency, but I'd take a thorough survey of existing projects to get a sense for some consensus/common design pattern.

@gammazero

I am not sure that 3 is best, since it prevents receiving results in a range loop, and more channels means slower select. I do think that an error channel is preferable, and as long as 1 and 4 are true, then the error channel can be checked after seeing that the results channel is closed.

I agree. Take a look at ipfs/interface-go-ipfs-core#62.

Basically, the resulting pattern is:

res, err := MakeRequest(ctx)

var results []stuff
for r := res {
    results = append(results, r)
    // do stuff
}

return results, <-err

Not reading off of the buffered err channel is fine.
Returning without finishing reading from res is fine (everything will get canceled/cleaned up when the context is canceled).
Range works.

etc...

Stebalien mentioned this issue Jan 18, 2018

Potential go routine leak in pin/gc/gc.go #4593

Closed

momack2 added this to Inbox in ipfs/go-ipfs May 9, 2019

Stebalien mentioned this issue May 6, 2020

feat: use error channels ipfs/interface-go-ipfs-core#62

Closed

aschmahmann mentioned this issue Oct 25, 2023

feat: blockstore: GetMany blockstore method ipfs/boxo#492

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PSA] closing output on context cancel #4592

[PSA] closing output on context cancel #4592

Stebalien commented Jan 18, 2018

Stebalien commented Nov 5, 2019 •

edited

Loading

mvdan commented Dec 7, 2020 •

edited

Loading

gammazero commented Dec 7, 2020

Stebalien commented Dec 16, 2020

[PSA] closing output on context cancel #4592

[PSA] closing output on context cancel #4592

Comments

Stebalien commented Jan 18, 2018

Stebalien commented Nov 5, 2019 • edited Loading

mvdan commented Dec 7, 2020 • edited Loading

gammazero commented Dec 7, 2020

Stebalien commented Dec 16, 2020

Stebalien commented Nov 5, 2019 •

edited

Loading

mvdan commented Dec 7, 2020 •

edited

Loading