Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: change standard library to check for io.EOF using errors.Is #39155

Open
tonistiigi opened this issue May 20, 2020 · 29 comments
Open

proposal: change standard library to check for io.EOF using errors.Is #39155

tonistiigi opened this issue May 20, 2020 · 29 comments
Labels
Projects
Milestone

Comments

@tonistiigi
Copy link
Contributor

@tonistiigi tonistiigi commented May 20, 2020

Go1.13 introduced errors.Is/As/Unwrap for adding typed information to errors. With this change, errors turn into chains of values instead of a plain string. I think this is a significant improvement, but one of the issues stopping us from getting the full benefit from this is that the error checks inside stdlib still use the old == equality check instead of errors.Is.

This means that whenever we wrap errors with extra type information, we need to make sure these errors are not returned by any interface that might be passed to stdlib.

The most common case for this is io.Reader that should return io.EOF when there is no more data. With a quick search, I found that there are over 200 places in stdlib today where EOF is checked with == condition. For a typical example, io.Copy uses a check like this to determine if reader error also becomes a copy error. If we would change all these places to use errors.Is(err, io.EOF) instead, custom implementations of io.Reader could return much more useful errors.

For example, http.RoundTripper implementation could associate the request/response body with the information about the request, so that when an error happens later, it contains information about what specific request produced the data that ended up failing. Currently, most of the time, we would just get an unexpected EOF or broken pipe in this case.

I'm not suggesting any of the current io.EOF errors returned from stdlib (eg. os.File) should be wrapped. That would not be backward compatible. But just changing the internal equality checks and documenting it should be harmless.

Initially, 3rd party libraries will still likely check io.EOF directly, but over time they will get updated as well. Before stdlib makes the change and provides an official recommendation, they don't have any incentive to do that.

@gopherbot gopherbot added this to the Proposal milestone May 20, 2020
@gopherbot gopherbot added the Proposal label May 20, 2020
@ianlancetaylor ianlancetaylor changed the title proposal: use errors.Is for typed errors checks on stdlib interfaces proposal: change standard library to check for io.EOF using errors.Is May 20, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 20, 2020

I believe that io.EOF is the exception here, so retitling.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 20, 2020

The standard library says EOF is the error returned by Read when no more input is available.. I think we agree that that can't change without breaking the Go 1 guarantee. Since io.Copy is using Read, it seems OK that io.Copy checks using == io.EOF rather than errors.Is. The question is whether there are places in the standard library that use == io.EOF even though they are checking errors that are not returned from a Read method.

@tonistiigi
Copy link
Contributor Author

@tonistiigi tonistiigi commented May 20, 2020

I believe that io.EOF is the exception here, so retitling.

The issue I'm hitting is definitely with io.EOF but I haven't done a full scan. io.ErrUnexpectedEOF seems to have similar checks, although usually it is generated by stdlib itself (but doesn't need to be).

The standard library says EOF is the error returned by Read when no more input is available.

The semantic question in here is what does "is the error" mean after go1.13 . Is an error that is wrapped still the same error? From the method names (and design objectives) added in go1.13 it would strongly suggest that it is.

I think we agree that that can't change without breaking the Go 1 guarantee.

All the Read() methods in stdlib that generate io.EOF directly should continue to do that. Old code that used this and does io.EOF check with == will continue to work.

As I described, after this would ship, and a 3rd party library decides to wrap an IO error, they could break another 3rd party library that upgrades to use them. But 3rd party libraries are not under any go1 backward compatibility guarantees. Nobody can be sure they don't decide to not implement io.EOF at all. Semver can be used in these modules to signal changes.

It will definitely take time for all people to always use errors.Is when dealing with errors. My point is that the quicker we take some steps for transition, the sooner the messy period will be over. As this doesn't break anyone, it could be just considered following a best practice like the errors package suggests.

The question is whether there are places in the standard library that use == io.EOF even though they are checking errors that are not returned from a Read method.

My issue definitely is that there is no way to add typed information to errors that arise through io processing (at least without forking a significant portion of stdlib). If there are other cases they are less likely to actually solve the issues.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 20, 2020

The semantic question in here is what does "is the error" mean after go1.13 . Is an error that is wrapped still the same error? From the method names (and design objectives) added in go1.13 it would strongly suggest that it is.

Personally I don't think we have that leeway. I think that the contract of the standard and widely-used Read method is that it should return exactly io.EOF on end-of-file.

That said, I think you are suggesting that where the standard library calls Read, it should check for io.EOF using errors.Is rather than ==. I know you've mentioned other errors too, but I think we have to be specific about what is going to change here. Without looking at real code we don't know what other programs might break. For example, you are suggesting that we change io.ReadAtLeast to use errors.Is(err, EOF) rather than err == EOF.

I don't think that would do any real harm but I'm not personally convinced that it is necessary, given my reading of how I think Read is required to behave.

Happy to hear other opinions.

@tiborvass
Copy link

@tiborvass tiborvass commented May 20, 2020

I'm curious to understand how adjusting the docs about io.EOF would break compatibility between two 3rd-party codebases in a way that's not already possible today. Furthermore, if the error checking paradigm from Go 1.13 is fully embraced, shouldn't tooling such as linters or maybe even gofix also prefer errors.Is over equality?

@dsnet
Copy link
Member

@dsnet dsnet commented May 20, 2020

Unless we change all code to use errors.Is on io.EOF instead of performing a direct == comparison (which we cannot feasibly do), I believe having the standard library use errors.Is is more harmful than good. It provides the illusion that wrapping io.EOF in Read is permissible, when it is not. Some users will depend on that behavior and think that its okay to return wrapped io.EOF errors since it works with the standard library, but be surprised that it doesn't work with some other library that uses io.Reader, despite that library being fully compliant with the current specified behavior.

I'm curious to understand how adjusting the docs about io.EOF would break compatibility between two 3rd-party codebases in a way that's not already possible today. Furthermore, if the error checking paradigm from Go 1.13 is fully embraced, shouldn't tooling such as linters or maybe even gofix also prefer errors.Is over equality?

It's not an issue of whether tooling can migrate existing code. It's a question of whether correct code today will continue to function after the documented change. Most code today only do an err == io.EOF check. Changing the documentation makes it such that this correct code is now incorrect by redefinition.

@SamWhited
Copy link
Member

@SamWhited SamWhited commented May 20, 2020

TL;DR — I have a general rule of thumb that I don't return io.EOF and always return a new error if I need more context, or, more frequently, handle the io.EOF at the read call site because EOF isn't actually an error. We should not encourage checking that wrapped errors started out in life as an EOF, we should treat them like any other error.


I tend to think of io.EOF as a special case; a sentinel value indicating expected behavior, not an error. This already leads to the unfortunate "if err != nil && err != io.EOF" scattered throughout my codebase anywhere some code calls Read. It also frequently leads to bugs in my code where I either forget to check for and handle the EOF from a Read call, or I some piece of code returns EOF when I'm not expecting it and I can't always easily track down where such a generic error message is coming from. This all happens because EOF is not an error, it's expected behavior. The error comes later when I go to do something with my data and realize that the expected data wasn't completely read, or was the wrong data, etc.

Because of this, I don't think I've ever seen a place where checking for io.EOF further up the stack (not directly at the call site) would do anything but hide a bug where I wanted to handle io.EOF before, and I don't think this should be encouraged. There are occasions where we want to actually return an io.EOF (eg. if the function in question also has read like semantics), so we need a way to distinguish between an actual new error and an EOF while still keeping context around. EOF doesn't provide any context, and one read site that returned EOF is as good as another, so I generally don't think it should be wrapped with more context if we want to continue to treat it as a magical sentinel value.

This means that if I really want to check for io.EOF further up the stack, I should just return it and not wrap it so that it can be treated as a signal, not an error. If I do want to signal to a user of my code that the EOF was unexpected and therefore an actual error, I should return an error, not EOF (which does not indicate that an error occurred). If I'm a user of a library that has wrapped EOF, this to me indicates that this was an actual error and I should treat the wrapped error just like I would any other error (and it doesn't matter that the root error was an EOF signal).

@as
Copy link
Contributor

@as as commented May 20, 2020

I tend to think of io.EOF as a special case; a sentinel value indicating expected behavior, not an error.

This is only true if the "file" is not infinite. It is the user/developer that has the expectation that there will ever be an end-of-file, not the io.Reader interface. I see it as an expected error rather than a special case, hence why the concrete value is so ubiquitous.

@rsc rsc added this to Incoming in Proposals Jun 10, 2020
sayboras added a commit to sayboras/cilium that referenced this issue Jul 30, 2020
This PR is not only to correct existing error comparison (e.g. e1 ==
e1), but also to enable linting check.

Excetion:
- Skip the error comparision in test file
- Skip io.EOF as per golang/go#39155

Signed-off-by: Tam Mach <sayboras@yahoo.com>
sayboras added a commit to sayboras/cilium that referenced this issue Jul 30, 2020
This PR is not only to correct existing error comparison (e.g. e1 ==
e1), but also to enable linting check.

Excetion:
- Skip the error comparision in test file
- Skip io.EOF as per golang/go#39155

Signed-off-by: Tam Mach <sayboras@yahoo.com>
sayboras added a commit to sayboras/cilium that referenced this issue Jul 30, 2020
This PR is not only to correct existing error comparison (e.g. e1 ==
e1), but also to enable linting check.

Excetion:
- Skip the error comparision in test file
- Skip io.EOF as per golang/go#39155

Signed-off-by: Tam Mach <sayboras@yahoo.com>
nathanjsweet pushed a commit to cilium/cilium that referenced this issue Jul 30, 2020
…12707)

This PR is not only to correct existing error comparison (e.g. e1 ==
e1), but also to enable linting check.

Excetion:
- Skip the error comparision in test file
- Skip io.EOF as per golang/go#39155

Signed-off-by: Tam Mach <sayboras@yahoo.com>
@dpifke
Copy link
Contributor

@dpifke dpifke commented Aug 16, 2020

I was bitten by this, when implementing a Reader which wraps another Reader, and includes additional diagnostic information when passing through any errors. In order to make my Reader usable with archive/zip, I had to special-case io.EOF, and not return any diagnostics in that case.

I think the stdlib should use errors.Is() when checking error values, and I'd be happy to contribute patches to that effect. (Presumably the reviewers' preference would be to split these up by package, e.g. one for archive/zip, one for compress/bzip2, etc? Advice welcome.)

It's a valid concern that third-party packages may also be doing equality checks instead of errors.Is(), but this doesn't seem like a good reason the stdlib shouldn't follow best practice; in my case, I'm not using any third-party packages so it's just the stdlib holding me back. If I'm understanding this argument correctly, it seems a little like a chicken-and-egg situation, in that if wrapped io.EOF worked in the stdlib, package authors would be encouraged to follow suit.

I also think go vet should warn about checking equality instead of using errors.Is, but that is perhaps a separate issue.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Aug 16, 2020

I don’t understand the association. io.EOF is a sentinel value, it’s defined in the various io interfaces that it must be returned on end of file. There is no need to use errors.Is because by definition io.EOF must not be wrapped.

@dpifke
Copy link
Contributor

@dpifke dpifke commented Aug 16, 2020

by definition io.EOF must not be wrapped.

The io.Reader documentation doesn't mention wrapping, and I don't think the error wrapping documentation calls out that some errors should never be wrapped. If this proposal is rejected, I propose updating the documentation to make this clear.

Is there a downside to using errors.Is(err, io.EOF) instead of err == io.EOF?

@randall77
Copy link
Contributor

@randall77 randall77 commented Aug 16, 2020

The downside is API confusion. Once we split the ecosystem, some clients of Read check with err == io.EOF and some check with errors.Is(err, io.EOF). We're never going to get rid of all of the people using err == io.EOF. So a robust implementation of Read can never return a wrapped io.EOF. But we might get rid of most of the err == io.EOF clients, at which point people might be tempted to use a wrapped io.EOF. And in the code they test with, it might work. But it will break some err == io.EOF clients they don't even know about, but who depend on their Read method.

It's better if the stdlib keeps using err == io.EOF, because people who try to use a wrapped io.EOF will likely fail immediately, and they will know not to do that.

Why do you want to wrap io.EOF anyway? I don't see why context would matter for that error value.

@dpifke
Copy link
Contributor

@dpifke dpifke commented Aug 16, 2020

It's not so much that I want to wrap io.EOF, it's that I don't want to special-case io.EOF in any place where it can be returned.

@dpifke
Copy link
Contributor

@dpifke dpifke commented Aug 16, 2020

If the consensus is that certain errors (including io.EOF) should never be wrapped, how can the compiler and/or stdlib and/or go vet programmatically enforce this, to prevent an entire class of subtle bugs? I can imagine a list of "banned" values that could be reported if used with fmt.Errorf("%w"), but this feels very brittle, and wouldn't catch cases where users are returning something which implements the Unwrap() interface.

The alternative, to encourage errors.Is() instead of err == everywhere (including in the stdlib), seems much more robust.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Aug 16, 2020

It's not so much that I want to wrap io.EOF, it's that I don't want to special-case io.EOF in any place where it can be returned.

If you’re implementing an io reader or writer it’s unavoidable. If you’re working at a layer above then that is the place to normalise io.EOF handling, see bufio.Scanner as an example. At that point you or you’re caller doesn’t need to know anything more about the error than it was nil, indicating success, or non nil, a failure.

@dpifke
Copy link
Contributor

@dpifke dpifke commented Aug 16, 2020

It's better if the stdlib keeps using err == io.EOF, because people who try to use a wrapped io.EOF will likely fail immediately, and they will know not to do that.

Would you support a patch which causes fmt.Errorf to panic if the "%w" argument == io.EOF? This would have saved me a few hours of debugging today.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Aug 16, 2020

Could you tell me more about your debugging process. In the case of io.Reader the logic usually goes

var buf []byte = ...
n, err := r.Read(buf)
buf = buf[:n]
// process buf
if err != nil {
    return err
}
return nil // can be shortened too return err

could you explain how a wrapped io.EOF caused you to loose time debugging? maybe the solution is not to outlaw err == io.EOF but elsewhere.

@dpifke
Copy link
Contributor

@dpifke dpifke commented Aug 16, 2020

I was using io.Copy, and found it useful to distinguish whether the error came from the underlying bufio-like Reader (which can wrap any other io.Reader, thus I have no definitive list of possible error values, but is typically a network socket), or from the Writer. So I introduced something like:

type ReadError struct {
    Filename string
    Err error
}

func (err ReadError) Error() string { return fmt.Sprintf("reading %q: %v", err.Filename, err.Err) }
func (err ReadError) Unwrap() error { return err.Err }

func (r *MyReader) Read(b []byte) (int, error) {
    n, err := r.underlying.Read(b)
    if err != nil {
        err = ReadError{Filename: r.Filename, Err: err}
    }
    return n, err
}

This produced corrupt output from archive/zip, without any indication as to why. It was not at all obvious that archive/zip wasn't recognizing io.EOF; I thought the problem was with the zipfile data coming from the network getting corrupted somehow.

The function now reads:

func (r *MyReader) Read(b []byte) (int, error) {
    n, err := r.underlying.Read(b)
    if err != nil && !errors.Is(err, io.EOF) {
        // Don't wrap io.EOF, because there's still a lot of
        // code out there (including in the standard library)
        // which checks err == io.EOF.
        err = ReadError{Err: err}
    }
    return n, err
}

I'm not sure this is correct, because other errors (io.UnexpectedEOF? others?) may also be unwrappable - that some errors are unwrappable is not documented anywhere, AFAIK. The comment is needed so that someone doesn't "optimize" out the io.EOF check down the road.

If fmt.Errorf is known to panic if passed an unwrappable error, I would replace ReadError in the above with it. (Edited to add: actually, that wouldn't work, because the point of ReadError is to determine whether I should attempt to download from a different source.)

@dpifke
Copy link
Contributor

@dpifke dpifke commented Aug 16, 2020

Heh, and in "fixing" it, I just realized I introduced a bug where err.Filename is no longer getting set. :)

@davecheney
Copy link
Contributor

@davecheney davecheney commented Aug 16, 2020

Note that

if err != nil && !errors.Is(err, io.EOF)

Can be written as

if err != nil && err != io.EOF

Which I realise is kind of the point you’re making, but it’s also my point — when you operate at this level — interposition your reader on top of another — you have to maintain the io.EOF invariant

@dpifke
Copy link
Contributor

@dpifke dpifke commented Aug 16, 2020

Which I realise is kind of the point you’re making, but it’s also my point — when you operate at this level — interposition your reader on top of another — you have to maintain the io.EOF invariant

Wouldn't it be easier if we didn't have to maintain that invariant? Saying "always use errors.Is, don't use err ==" is easily understood, and could be automated using tools like go vet. It's about a 200-300 line patch to fix it everywhere io.EOF is checked in the standard library.

Maintaining a list of "unwrappable" errors in documentation and/or tooling seems really hard.

I'm offering to do the work to make the standard library easier to use, by replacing err == io.EOF with errors.Is(err, io.EOF). If that work is unwelcome, I'm willing to spend an equivalent amount of effort elsewhere to prevent the next person from falling into this trap. Other suggestions as to better places to direct that effort are welcome.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Aug 16, 2020

Wouldn't it be easier if we didn't have to maintain that invariant?

Possibly, but not backward compatible with go 1

@dpifke
Copy link
Contributor

@dpifke dpifke commented Aug 16, 2020

Possibly, but not backward compatible with go 1

I'm not sure I follow, especially after reading Ian's comment on May 19 which addressed this exact point.

I don't believe anyone is proposing changing the return value of Read. errors.Is(err, io.EOF) is true for all cases where err == io.EOF, so I can't think of existing code that would break if a wrapped io.EOF were accepted. Can you provide an example?

@davecheney
Copy link
Contributor

@davecheney davecheney commented Aug 16, 2020

Io.Reader implementations must return io.EOF on eof conditions (as you’ve found), so there is no reason to use errors.Is(err, io.EOF) when a simple comparison will work just fine. The go 1 constraft prohibits changing the former, so there’s no value to be realised from the latter.

@dpifke
Copy link
Contributor

@dpifke dpifke commented Aug 16, 2020

If that's the case, can we at least agree that fmt.Errorf should panic if "%w" == io.EOF? And that the documentation (both for Reader and regarding error wrapping) should be updated to make this explicit? Ideally, there'd be a way to check for other errors which can never be wrapped, or at least an authoritative list somewhere.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Aug 16, 2020

I’d encourage you raise that as a separate proposal.

@dpifke
Copy link
Contributor

@dpifke dpifke commented Aug 16, 2020

I’d encourage you raise that as a separate proposal.

Done, thanks!

#40827

@andig
Copy link
Contributor

@andig andig commented Sep 5, 2020

If that's the case, can we at least agree that fmt.Errorf should panic if "%w" == io.EOF?

That's a horribly bad idea. I might use fmt.Errorf for any reason, not necessarily for passing errors down to further processing and I certainly don't want that code (which might also wrap any other error) to unexpectedly panic.

@Merovius
Copy link

@Merovius Merovius commented Sep 30, 2020

It might be possible to write a vet-check (or any of the gajillion third-party static analysis tools) that tries to find code that returns a wrapped io.EOF from a Read([]byte) (int, error) method. I agree with others that wrapping io.EOF is not incorrect in general, so shouldn't panic. And I also agree with others that replacing err == io.EOF with errors.Is(err, io.EOF) is counterproductive. But I don't see the harm in a vet-check, if it's selective enough to only look at Read-methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Incoming
Linked pull requests

Successfully merging a pull request may close this issue.

None yet