Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytes: return EOF early in Reader.Read #21852

Closed
destel opened this issue Sep 12, 2017 · 18 comments
Assignees
Milestone

Comments

@destel
Copy link

@destel destel commented Sep 12, 2017

I have a bytes.Reader reader and I want copy it contents to bytes.Buffer using ReadFrom method. I create buffer with preallocated space and expect that additional allocations will not happen.

Instead buffer allocates twice more memory than required to store data. This happens because bytes.Reader doesn't return EOF immediately when it reaches end of stream. Instead it returns EOF only on the next call to Read causing bytes.Buffer to double it's underlying storage.

Link to example code https://play.golang.org/p/j7S3hNkEB5

Code consists of 3 blocks.
First one demonstrates bytes.Reader behaviour.
Second one demonstrates bytes.Buffer.ReadFrom behaviour with preallocated space
Third one avoids reallocation by preallocating slightly bigger buffer

My go version is 1.9

@mdempsky mdempsky changed the title Double memory allocation when copying data from bytes.Reader to bytes.Buffer bytes: double memory allocation when copying data from Reader to Buffer Sep 12, 2017
@dsnet dsnet added the Performance label Sep 12, 2017
@dsnet dsnet changed the title bytes: double memory allocation when copying data from Reader to Buffer bytes: return EOF early in Reader.Read Sep 12, 2017
@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Sep 12, 2017

We can change bytes.Reader to return io.EOF the moment it hits the end. This is entirely within the contract of io.Reader.

@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Sep 12, 2017
@destel

This comment has been minimized.

Copy link
Author

@destel destel commented Sep 12, 2017

And by the way strings.Reader behaves the same

@destel

This comment has been minimized.

Copy link
Author

@destel destel commented Sep 12, 2017

And the last thing. buffer.ReadFrom always does reallocation when there is less than 512 bytes of free space. So currently it's absolutely useless to preallocate small amounts of memory manually.

buffer.ReadFrom function has loop inside, What do you guys think about skipping reallocation on the first iteration of the loop?

Just to give some context, I'll describe my use case.
I have function which does some stream processing (for example encryption) and I know how to calculate output size by input size. I want to create additional function which does the same thing but uses byte slices instead of streams. And I'd like to preallocate needed memory manually and avoid reallocations.
Code below:

func EncryptStream(data io.Reader) io.Reader {
	//do some useful stuff here and return reader of the known (same) size
	return ...
}

func Encrypt(data []byte) []byte {
	res := bytes.NewBuffer(make([]byte, 0, len(data)))

	res.ReadFrom(EncryptStream(bytes.NewReader(data)))

	return res.Bytes()
}
@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Sep 12, 2017

If we fix Reader.Read to return io.EOF eagerly, Buffer.ReadFrom should be allocation free for most cases since b.grow(MinRead) only ensures that there is at least MinRead bytes available (which will be true most of the time). This is only problematic when io.EOF is not returned eagerly, and it must go through the loop a second time.

This will of-course always allocate if Reader.Len < MinRead, but I believe that situation is rare.

@cznic

This comment has been minimized.

Copy link
Contributor

@cznic cznic commented Sep 13, 2017

We can change bytes.Reader to return io.EOF the moment it hits the end. This is entirely within the contract of io.Reader.

Even though this is true, I'd think twice before fixing what's not broken. I guess there are/may be programs in the wild depending incorrectly on the current behavior.

@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Sep 13, 2017

I tried to change bytes and strings Reader once to do early EOF and a gazillion tests broke. I didn't deem it worth it. Hopefully the io.Reader signature would be different in Go 2

@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Sep 13, 2017

(that was years ago. Maybe it's gotten better but I doubt it. I suspect more failures)

@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Sep 13, 2017

Sigh... incorrect usages of io.Reader is fairly annoying. I'll do a regression test and see what breaks if we change this.

@dsnet dsnet self-assigned this Sep 13, 2017
@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Sep 17, 2017

No, it wasn't incorrect usages. It was just tests expecting a certain golden behavior output from things.

@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Sep 19, 2017

In my test, there were about 40 failures. By far, the most common failure is code doing the equivalent of:

b := make([]byte, n)
if _, err := r.Read(b); err != nil {
    return err // Assume err != nil implies something went drastically wrong.
}
// Use b assuming it read n bytes...

This is a straight up wrong use of io.Reader.

There were a small handful that I guess you could call "golden behavior" changes, and they have to deal with the use of iotest.DataErrReader(bytes.NewReader(...)).

@peterGo

This comment has been minimized.

Copy link
Contributor

@peterGo peterGo commented Oct 1, 2017

@bradfitz

I don't understand why this issue is still open. It's a golang-nuts question.

The issue is an XY problem: "I have a bytes.Reader reader and I want to copy its contents to bytes.Buffer using the ReadFrom method." "The XY problem is asking about your attempted solution rather than your actual problem (http://xyproblem.info)."

The issue is a performance question, but there are no Go benchmarks, only example code (https://play.golang.org/p/j7S3hNkEB5).

The solution to the actual problem, I have a bytes.Reader reader and I want to efficiently copy its contents to bytes.Buffer, is to use io.Copy (an implicit io.WriteTo) or io.WriteTo. Don't use io.ReadFrom.

For example, see BenchmarkCopy and BenchmarkWriteTo,

BenchmarkReadRaw-8            	 2000000     791 ns/op	   10240 B/op    1 allocs/op
BenchmarkReadFromExact-8      	  500000    2493 ns/op	   32160 B/op    4 allocs/op
BenchmarkReadFromExactMin-8   	 1000000    1001 ns/op	   11040 B/op    3 allocs/op
BenchmarkCopy-8               	 2000000     923 ns/op	   10400 B/op    3 allocs/op
BenchmarkWriteTo-8            	 2000000     858 ns/op	   10352 B/op    2 allocs/op

rf_test.go: https://play.golang.org/p/ZB-CtPqJHJ

@peterGo

This comment has been minimized.

Copy link
Contributor

@peterGo peterGo commented Oct 1, 2017

@dsnet

fix Reader.Read to return io.EOF eagerly

I disagree.

@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Oct 1, 2017

@peterGo, I don't understand what your objection actually is.

The use of io.WriterTo is one possible solution, and I think it's a good one, but it doesn't invalidate other approaches that may be taken. There is a more general problem that this is demonstrating, and that is returning io.EOF the moment the end of a stream has been hit is a very useful signal for performance purposes (examples: [1], [2], [3]). Given that bytes.Reader is a very popular implementation of io.Reader, there are many applications that can benefit from this given that the
change is entirely within the contracts of io.Reader and how simple the change is.

That being said, I'm leaning against making any change to bytes.Reader, only because making the change (even though it should be legal) is going to result in breaking a bunch of code that makes poor assumptions about io.Reader. I haven't gotten around to doing an analysis of the cost of fixing these and classifying the extent of the "damage".

@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Nov 8, 2017

Having analyzed the failures, most of them are obviously wrong similar to my example above:

b := make([]byte, n)
if _, err := r.Read(b); err != nil {
    return err // Assumes err != nil always means failure, and forgets to handle io.EOF specially
}

If making this change breaks code that does something similar to the above, then that's a good thing. It's exposing a bug that was always there.

Given that it is after the freeze and this is not a bug fix, I'm pushing this to the next milestone.

@dsnet dsnet modified the milestones: Go1.10, Go1.11 Nov 8, 2017
@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Nov 8, 2017

SGTM

@bradfitz bradfitz modified the milestones: Go1.11, Unplanned May 29, 2018
@pongad

This comment has been minimized.

Copy link
Contributor

@pongad pongad commented Sep 19, 2018

It looks like no one is working on this, so I'll give it a go. I have a draft CL; will mail after polishing it a little more.

I've been thinking perhaps we should randomly pick whether to return EOF with the last Read. Somehow that idea sounds simultaneously terrible and terribly tempting.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Oct 1, 2018

Change https://golang.org/cl/138588 mentions this issue: bytes: make Reader.Read return EOF early

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Nov 13, 2018

Sorry but I don't think we can reasonably change this. The bytes and strings Reader.Read behavior is what it is, and that is not returning EOF until there's nothing else to return (just like bufio.Reader.Read). Like Brad said back in 2017.

The original report was not about EOF. It was about trying to an efficient copy from a bytes.Reader into a bytes.Buffer. Don't use ReadFrom for that. Use WriteTo, which can do a better job. Even better, use io.Copy, which is clearer and knows tricks like "WriteTo is better than ReadFrom".

@rsc rsc closed this Nov 13, 2018
@golang golang locked and limited conversation to collaborators Nov 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
9 participants
You can’t perform that action at this time.