Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PSA] closing output on context cancel #4592

Open
Stebalien opened this issue Jan 18, 2018 · 4 comments
Open

[PSA] closing output on context cancel #4592

Stebalien opened this issue Jan 18, 2018 · 4 comments

Comments

@Stebalien
Copy link
Member

@ipfs/go-team

Hello all,

I've noticed (and have fallen prey to) a bad pattern with contexts and channels that we all need to be aware of (it's non-obvious and rarely causes visible bugs but can cause nasty, hard to track down bugs). For example, the following function has a bug:

// Count returns a channel that counts to `to` and then closes the channel.
func Count(ctx context.Context, to int) <-chan int {
	output := make(chan int)
	go func() {
		defer close(output)
		for i := 0; i < to; i++ {
			select {
			case output <- i:
			case <-ctx.Done():
				return
			}
		}
	}()
	return output
}

What's the bug? Well, let's say we don't trust the function to exit when it should (and want to bail as fast as we can) so we do the following:

func main() {
	ctx, cancel := context.WithTimeout(context.Background(), 1 * time.Millisecond)
	defer cancel()
	ch := Count(ctx, 3)
	for {
		select {
		case i, ok := <-ch:
			if !ok {
				fmt.Println("done!")
				return
			}
			fmt.Println("count: ", i)
		case <-ctx.Done():
			fmt.Println("abort!")
			return
		}
	}
}

We'd expect this program to either print:

count: 0
count: 1
count: 2
done!

Or:

count: 0
count: 1
count: 2
abort!

Unfortunately, this program can actually print, e.g.:

count: 0
done!

How? Well, if we something cancels the context (e.g., a timeout expires), Count could notice this, return, and close the output channel. Unfortunately, main could notice this before it notices that the context has been canceled and think we're done.

One solution is to write an error to the output channel. However, that's kind of a pain and still expects the caller to not wait on the context itself.

The correct solution, IMO, is to simply not close the output channel when the context is canceled. The Caller should deal with this case itself.

Correct solution:

// Count returns a channel that counts to `to` and then closes the channel.
func Count(ctx context.Context, to int) <-chan int {
	output := make(chan int)
	go func() {
		for i := 0; i < to; i++ {
			select {
			case output <- i:
			case <-ctx.Done():
				return
			}
		}
                close(output)
	}()
	return output
}

Note: I've also noticed the following issue but, at least, that simply leaks a goroutine and doesn't cause any nasty silent bugs. However, please be careful of the following:

func Count(ctx context.Context, to int) <-chan int {
	output := make(chan int)
	go func() {
		defer close(output)
		for i := 0; i < to; i++ {
			output <- i // no select on the context.
		}
	}()
	return output
}
@Stebalien
Copy link
Member Author

Stebalien commented Nov 5, 2019

Looking at the docker source, it looks like a good way to do this is to:

  1. Return both a result and an error channel.
  2. Give the error channel a buffer of 1.
  3. Never close the result channel.
  4. Always close the error channel.

That means callers can just wait for all the error channels to close to know that the requests have completed.

https://github.com/moby/moby/blob/76dbd884d3f1a02dc193305d2ac5824bcd3e4f0f/client/events.go#L19-L70

@mvdan
Copy link
Contributor

mvdan commented Dec 7, 2020

I fully agree that the current API is error-prone. In one codebase that had a goroutine leak, I ended up finding something like:

pins, err := x.CoreAPI.Pin().Ls(ctx)
if err != nil {...}
fmt.Printf("number of pins: %d\n", len(pins))

This will build fine, and most likely print something unhelpful like 0, as the length is just the size of the buffered channel. More worryingly, it leaked goroutines forever, even hours after the ctx above had been cancelled. This code went months without being noticed, because to the casual reader, pins just looks and behaves like a slice, in this code.

So, knowing that the API actually returns a channel, my fix was like:

pins, err := x.CoreAPI.Pin().Ls(ctx)
if err != nil {...}
count := 0
for range pins {
        count++
}

The leak is gone, but this code is wrong once again. I naively thought that just looking at the PinAPI.Ls godoc would be enough, and did not even notice that the iteration errors are passed through the Pin interface in each channel element.

I personally think that using channels for iterator APIs is a clever hack that can work well when prototyping APIs, but it's got a number of sharp edges that are difficult to fix. You could add more documentation with warnings, but that's not the best solution. Adding a second channel to the function signature adds more ways to misuse the API, unfortunately.

I think that, for an API with such high exposure as go-ipfs, it's worth it to declare iterator types for each of these types to iterate on. The API could roughly be like:

for iter := x.CoreAPI.Pin().Ls(ctx); iter.More(); {
    pin, err := iter.Next()
    if err != nil {...}
    // use pin
}

Here, the error is returned alongside the pin, so forgetting to check it would most likely result in a "declared but not used" compiler error. We also don't have any channel, so there are no sharp edges like goroutine leaks.

This would require declaring a new named type for each type we want to iterate on, but I reckon it would just be 2-3 methods (as shown above) and at most 30 or so lines of boilerplate. I think that's definitely worth it to end up with a nicer API for the many end users. If multiple types need iterators we'll need a bit of copy-pasting, but I still think that's worth it. And it will get easier once generics are in place.

cc @aschmahmann since we discussed this briefly on Slack

@gammazero
Copy link
Contributor

@Stebalien Regarding the comment:

  1. Return both a result and an error channel.
  2. Give the error channel a buffer of 1.
  3. Never close the result channel.
  4. Always close the error channel

I am not sure that 3 is best, since it prevents receiving results in a range loop, and more channels means slower select. I do think that an error channel is preferable, and as long as 1 and 4 are true, then the error channel can be checked after seeing that the results channel is closed.

With no error channel, the results channel must not be closed. Otherwise, when the context is canceled, then the caller cannot be certain if the goroutine finished before or after the cancellation. Downside is that results cannot be ranged over, and both results and ctx.Done() must be examined in select.

With an error channel, the results channel should be closed. That will allow ranging over results and examining the error channel afterwards. Downside of this is that behavior must be well documented so the caller can trust that results channel is closed in the event of an error/cancel.

p4u pushed a commit to vocdoni/vocdoni-node that referenced this issue Dec 9, 2020
The pin objects sent over a channel also embed an error value within
them. If we see an error, we must handle it and stop.

The upstream issue ipfs/kubo#4592 covers
some of the rough edges of this channel-based API.
@Stebalien
Copy link
Member Author

@mvdan

I personally think that using channels for iterator APIs is a clever hack that can work well when prototyping APIs, but it's got a number of sharp edges that are difficult to fix. You could add more documentation with warnings, but that's not the best solution. Adding a second channel to the function signature adds more ways to misuse the API, unfortunately.

This isn't a cleaver hack in this case. The API returns a stream, not just an iterator. Using a channel allows one to select/loop over the channel normally.

There would need to be a very strong motivation for making such an API the default choice in situations like this, given the added complexity, loss of select, etc.

It may make sense for these external APIs given their infrequency, but I'd take a thorough survey of existing projects to get a sense for some consensus/common design pattern.


@gammazero

I am not sure that 3 is best, since it prevents receiving results in a range loop, and more channels means slower select. I do think that an error channel is preferable, and as long as 1 and 4 are true, then the error channel can be checked after seeing that the results channel is closed.

I agree. Take a look at ipfs/interface-go-ipfs-core#62.

Basically, the resulting pattern is:

res, err := MakeRequest(ctx)

var results []stuff
for r := res {
    results = append(results, r)
    // do stuff
}

return results, <-err
  • Not reading off of the buffered err channel is fine.
  • Returning without finishing reading from res is fine (everything will get canceled/cleaned up when the context is canceled).
  • Range works.

etc...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

3 participants