Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytes: bytes.Reader returns EOF on zero-byte Read, which doesn't conform with io.Reader interface documentation #40385

Open
metala opened this issue Jul 24, 2020 · 31 comments

Comments

@metala
Copy link

@metala metala commented Jul 24, 2020

What version of Go are you using (go version)?

$ go version

go version go1.14.4 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GOARCH="amd64"
GOHOSTARCH="amd64"
GOHOSTOS="linux"

What did you do?

I was deserialising binary data and I hit an unexpected io.EOF when reading zero bytes. Here is a minimal example that illustrates the problem.

package main

import (
	"fmt"
	"bytes"
	"encoding/binary"
)

func main() {
    r := bytes.NewReader([]byte{0,0,0,0})
    
    var length uint32                                                               
    err := binary.Read(r, binary.BigEndian, &length)                   
    if err != nil{
        panic(err)
    }   
    fmt.Printf("length = %d\n", length)
    
    rest := make([]byte, length)
    _, err = r.Read(rest)
    fmt.Printf("error = %s\n", err)
}

What did you expect to see?

length = 0
error = %!s(<nil>)

What did you see instead?

length = 0
error = EOF

On conformity with io.Reader and other standards

An excerpt from the documentation of io.Reader with the important parts emboldened:

type Reader interface {
   Read(p []byte) (n int, err error)
}

Implementations of Read are discouraged from returning a zero byte count
with a nil error, except when len(p) == 0. Callers should treat a return of
0 and nil as indicating that nothing happened; in particular it does not
indicate EOF.

An excerpt from the man page of read(3):

This volume of IEEE Std 1003.1-2001 requires that no action be taken for read() or write() when nbyte is zero. This is not intended to take precedence over detection of errors (such as invalid buffer pointers or file descriptors). This is consistent with the rest of this volume of IEEE Std 1003.1-2001, but the phrasing here could be misread to require detection of the zero case before any other errors. A value of zero is to be considered a correct value, for which the semantics are a no-op.

Related issues

It looks like this issue is in stark contrast to Go2 proposal issue #27531.
There is also an interesting discussion (#5310) about returning (0, nil) from a Read(p []byte).

@metala
Copy link
Author

@metala metala commented Jul 24, 2020

@gopherbot please remove Documentation

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 24, 2020

It seems to me that the behavior of bytes.Reader is permitted by the documentation of io.Reader. The data is at EOF, after all. I don't see anything in the documentation of io.Reader that prohibits returning 0, io.EOF in such a case.

You are asking for a special case in bytes.Reader.Read: if the caller asks for zero bytes, always return 0, nil. And if we want to make this the required behavior for io.Reader, you are asking for a special check for a read of zero bytes in many other readers as well. I think we would need a pretty convincing argument to require existing readers to change in that way.

@metala
Copy link
Author

@metala metala commented Jul 24, 2020

I am hoping for a discussion on whether this behaviour is expected or natural. Getting an EOF error on zero-byte read looks like a replacement for a missing EOF() method.

However, I agree that changing multiple Read methods is not a favourable option, because it could break a lot of stuff.
If you think that there couldn't be any discussion on the topic, we can close this issue and leave it for future reference. It is a case when the result (0, nil) makes sense.

Edit.
The reason I even posted the issue, is because it felt wrong that I need to change only the last if-statement, by removing the err != nil || check and leaving only the len(field) != fieldLen, at the last field in the deserialisation function.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 24, 2020

I think there can be discussion on the topic. My comment above was my attempt at discussion, by pointing out the issues that I found relevant. Sorry if I seem to be preventing discussion.

@metala
Copy link
Author

@metala metala commented Jul 24, 2020

Is there a reason why there is no EOF() bool method in Reader structs in the standard library?
I think it is why zero-length Read would return an io.EOF, because it's the only way to check if you are at the end, except for pos, _ := Seek(0, current); pos == Len() or Size(), which is probably not available for all Reader structs.

Add.
I will reluctantly agree that the documentation does not prohibit (0, io.EOF) in this case, but it was my take on "except when len(p) == 0", which made me think that it's just natural to have a (0, nil) on a zero-byte Read().

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 24, 2020

On something like a Unix pipe, an EOF method can only be implemented by calling read, in which case we need to have somewhere to store the data and return it on a subsequent Read. Simpler to not require the Reader to handle that, and push the issue onto the caller.

@metala
Copy link
Author

@metala metala commented Jul 24, 2020

As far as I remember, read(fd, buf, len) = 0, when there is an EOF. The errno is not set and there is not even a EEOF errno.
But I see your point that performance-wise it's better to handle syscall-dependant streams this way.

@metala
Copy link
Author

@metala metala commented Jul 24, 2020

Read()-ing seems to be inconsistent in the standard library:

$ cat main.go 
package main

import (
	"os"
	"fmt"
)

func main() {
	buf := make([]byte, 0)
	n, err := os.Stdin.Read(buf)
	fmt.Printf("n=%d, err=%s\n", n, err)
}
$ go run main.go  < /dev/null
n=0, err=%!s(<nil>)

In the bytes.Reader the last position is interpreted as io.EOF, but os.Stdin.Read that makes the syscall read(0, buf, 0) = 0, is interpreted as nil error.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Jul 24, 2020

The io.Reader interface is like no other in the Go ecosystem. Read is the only method where the caller must examine the other values returned from a function/method call before examining the error value.

@metala
Copy link
Author

@metala metala commented Jul 25, 2020

It is sad that I had to figure it out in runtime. When writing the unit tests, I was not expecting the last field to have zero length, but that's on me.

@as
Copy link
Contributor

@as as commented Jul 25, 2020

It seems incorrect to make the EOF condition directly a function of the input's length. Your example reads from a data source that contains no data. It reads 0 bytes from that data source and triggers the end of file condition because it knows that the next read will also return io.EOF.

To me it does not make sense for the first read to be successful on a data source that contains nothing. The same Read would not return io.UnexpectedEOF if your slice was 100 bytes. It would still return io.EOF. The length of the actual input slice provided to the reader has no effect on the returned error.

@metala
Copy link
Author

@metala metala commented Jul 25, 2020

It seems incorrect to make the EOF condition directly a function of the input's length. Your example reads from a data source that contains no data. It reads 0 bytes from that data source and triggers the end of file condition because it knows that the next read will also return io.EOF.

To me it does not make sense for the first read to be successful on a data source that contains nothing. The same Read would not return io.UnexpectedEOF if your slice was 100 bytes. It would still return io.EOF. The length of the actual input slice provided to the reader has no effect on the returned error.

I appreciate the way the discussion goes. Let's say we want the Reader to report io.EOF as soon as it is aware of it and let the caller handle it. But then we also care about consistency.
In Go a byte slice source in a bytes.Reader would trigger io.EOF on the last position. But reading from an empty Linux file or stdin will return (0, nil).

And then there is this:

$ cat main3.go 
package main
import "fmt"
func main() {
	data := []byte{0}
	s := data[1:1]
	fmt.Printf("%#v\n", s)
}

$ go run main3.go 
[]byte{}

A zero-length slice that is out-of-bound by a byte is just an empty slice, which I find to be similar to reading zero bytes from the end of a bytes buffer. It's kind of the no-op I expect, when making zero-length reads...

@cagedmantis cagedmantis added this to the Backlog milestone Jul 27, 2020
@metala
Copy link
Author

@metala metala commented Jul 30, 2020

I have decided to do some tests:

Case / Setup Operation(s) Linux 5.4.0 amd64 macOS Darwin 19.5.0 x86_64 Windows 10 Version 1909 (64-bit)
r := os.Stdin
n, err := r.Read(make([]byte, 0))
fmt.Printf("n=%d, err=%s\n", n, err)
n=0, err=%!s(<nil>)
// Empty file
f, _ := os.Open("./empty"
n, err := r.Read(make([]byte, 0))
fmt.Printf("n=%d, err=%s\n", n, err)
n=0, err=%!s(<nil>)
// Empty file, trigger EOF first
f, _ := os.Open("./empty")
n, err := f.Read(make([]byte, 1))
fmt.Printf("n=%d, err=%s\n", n, err)
n, err = f.Read(make([]byte, 0))
fmt.Printf("n=%d, err=%s\n", n, err)
n=0, err=EOF
n=0, err=%!s(<nil>)
// bytes.Buffer
r := bytes.NewBuffer([]byte{})
n, err := r.Read(make([]byte, 0))
fmt.Printf("n=%d, err=%s\n", n, err)
n=0, err=%!s(<nil>)
// bytes.Buffer, trigger EOF first
r := bytes.NewBuffer([]byte{})
n, err := r.Read(make([]byte, 1))
fmt.Printf("n=%d, err=%s\n", n, err)
n, err = r.Read(make([]byte, 0))
fmt.Printf("n=%d, err=%s\n", n, err)
n=0, err=EOF
n=0, err=%!s(<nil>)
// bytes.Reader
r := bytes.NewReader([]byte{})
n, err := r.Read(make([]byte, 0))
fmt.Printf("n=%d, err=%s\n", n, err)
n=0, err=EOF
// byte slice
data := []byte{0x00}
slice := data[1:1]
fmt.Printf("%#v\n", slice)
[]byte{}

Contrary to my belief, there were no discrepancies between OSes.
However, I found out that bytes.Reader behaviour is different compared to all other cases.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 6, 2020

There are many different kinds of readers. The io.Reader type defines a contract for all different kinds of readers.

Currently that contract permits returning io.EOF on a read of zero bytes. This is not discussed explicitly, but nothing prohibits a reader from doing that.

The options I see here are:

  1. Change io.Reader to prohibit returning io.EOF for a read of zero bytes. Require it to always return 0, nil in such a case. This would break an unknown number of existing readers, including bytes.Reader. We would have to identify and fix all broken readers. In the standard library this would be straightforward, but of course any type defined by any package can implement a Read method.
  2. Explicitly document that on a read of zero bytes a Read method is permitted, but not required, to return 0, io.EOF if the input stream is at the end of the file.
  3. Do nothing.

Does anybody see any other options? Thanks.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Aug 6, 2020

I vote for 2. /cc @minux who spent a lot of time arguing for this a few years back.

@metala
Copy link
Author

@metala metala commented Aug 18, 2020

It would be nice to have a warning in the bytes.Reader documentation that it returns io.EOF on zero-byte Read(), unlike bytes.Buffer. This way people can pick which structure to use as a io.Reader depending on their needs.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Aug 18, 2020

Why would this need a warning? It seems like the correct behaviour.

@metala
Copy link
Author

@metala metala commented Aug 18, 2020

For me at least, bytes.Buffer behaves correctly and bytes.Reader does not. I have switched from bytes.Reader to bytes.Buffer to avoid running into bugs. When I wrote the code, I was expecting it behaved like bytes.Buffer.

@hrissan
Copy link

@hrissan hrissan commented Oct 1, 2020

I have a code which reads string in format #bytes, [bytes]

So, I read #bytes, create byte array of that size, then read into it.

Now, when string of length 0 appears in the middle of reader, it will read successfully, but when the string of length 0 is at the end of reader, it will not.

IMHO this is an indicator of a problem in design, reading of 0 bytes is not special and should always succeed independent of read position.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 1, 2020

@hrissan As far as I can see the choices are as listed at #40385 (comment). What do you recommend?

@hrissan
Copy link

@hrissan hrissan commented Oct 2, 2020

@ianlancetaylor Format some_encoding(#bytes), [bytes], is common, and any well-tested parser (and all fuzz-tested parsers) definitely already have "if #bytes != 0" protection around reading [bytes] part, so they already behave as if approach 2) was implemented.

Any code which has no "if #bytes != 0" protection around reading of potentially zero bytes will behave differently at the middle and at the end of the bytes.Reader and all other readers which have this bug. It seems to me, most if not all code without such protection is already incorrect due to this.

So approach 1) will break already incorrect code, which might be actually good for that code.

BTW all similar code which uses readers which return error on every attempt to read 0 bytes, independent of stream position, already has "if #bytes != 0" protection and will not break from approach 1).

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 2, 2020

Right now, today, the behavior of bytes.Reader is permitted according to the docs. If we change the docs as suggested in option 1 of #40385 (comment), then bytes.Reader will be broken. This is not a matter of code that checks #bytes != 0. This is a matter of bytes.Reader itself. So I don't agree that approach 1 will break already incorrect code. It will break already correct code. If that code is not changed to conform to the new requirements, then it will break future code that assumes that the documented requirements are implemented by existing readers.

@metala
Copy link
Author

@metala metala commented Oct 5, 2020

I wasn't expecting another user to have exactly the same issue like mine, so soon. I will invoke the timeless mantra WE DO NOT BREAK USERSPACE! and say that adding a warning to bytes.Reader should be enough. This way we can defer any discussions on whether io.Reader it should be able to return (0, nil) or not.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Oct 5, 2020

@metala Ian explained that bytes.Reader is not broken as described by the io.Reader contract. What warning do you think should be added?

@metala
Copy link
Author

@metala metala commented Oct 5, 2020

@davecheney Yes, indeed. I am not arguing that bytes.Reader is broken according to io.Reader contract, but it feels inconsistent.

About the warning / notice, I am thinking of something like:
The bytes.Reader returns io.EOF as soon as it reaches the end of the byte slice, after which a read of zero bytes will yield an error. This may pose an issue in cases where you are deserialising data and the last field is length-prefixed with length equals zero.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Oct 5, 2020

@metala I don't understand how bytes.Reader returning n = 0 could be mistaken for a n = 1 where the buffer contains []byte{ 0 }. Could you perhaps provide a code sample that illustrates the problem?

@metala
Copy link
Author

@metala metala commented Oct 6, 2020

@davecheney I am not sure what you want, but lets take those two cases:

// Deserialisation using bytes.Reader
https://play.golang.org/p/3qRu_LlDa6h

// Deserialisation using bytes.Buffer... the same code, single line changed.
https://play.golang.org/p/0Lr0_9rFi1O

The deserialisation of fields follow the structure, read field length, if necessary, then read content. If there is an error or the length is different, return an error.
The first example fails when the last fields is length-prefixed with zero length and the second example succeeds just because we are using bytes.Buffer, instead of bytes.Reader.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Oct 6, 2020

There is an error in your code

	n, err := r.Read(header)
	if err != nil || n != int(headerLen)  {
		return fmt.Errorf("failed to read header: %w", err)
	}

The io.Reader contract states the caller must process n before inspecting the error value.

https://godoc.org/io#ReadFull might be a better choice for your application.

@metala
Copy link
Author

@metala metala commented Oct 6, 2020

You are probably referring to this paragraph, which is either ambiguous or it doesn't cover the case n = 0.

Callers should always process the n > 0 bytes returned before considering the error err. Doing so correctly handles I/O errors that happen after reading some bytes and also both of the allowed EOF behaviors.

Since the input is a byte slice, and not a stream, it doesn't make much sense to waste CPU cycles on io.ReadFull(), if bytes.Reader.Read() would do the same job.
However, it seems that io.ReadFull actually fixes the issue.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Oct 6, 2020

Since the input is a byte slice, and not a stream, it doesn't make much sense to waste CPU cycles on io.ReadFull(), if bytes.Reader.Read() would do the same job.
However, it seems that io.ReadFull actually fixes the issue.

This argument is specious; if you're worried about the overhead of io.ReadFull when you have a []byte slice then you're probably also worried about the overhead of an interface call over just scrobbling in the []byte slice directly.

@metala
Copy link
Author

@metala metala commented Oct 6, 2020

You are right, but I did not think about that when I wrote the code. However, it's a bit cleaner to use a reader instead of incrementing and passing indices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants