Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: io: CopyBuffer should avoid ReadFrom/WriteTo #16474

Open
dsnet opened this Issue Jul 22, 2016 · 19 comments

Comments

Projects
None yet
7 participants
@dsnet
Copy link
Member

dsnet commented Jul 22, 2016

Currently, io.CopyBuffer uses WriteTo if the src supports it and ReadFrom if the dst supports it, presumably to avoid an allocation.

The problem is that there exist certain io.ReaderFrom that are not efficient 100% of the time. For example, consider the following implementation of ReadFrom:

type myWriter struct {
    w io.Writer
}

func (mw *myWriter) ReadFrom(src io.Reader) (int64, error) {
    if r, ok := mw.w.(io.ReaderFrom); ok {
        return r.ReadFrom(src)
    }
    return io.Copy(struct{ io.Writer }{mw}, src)
}

This is unfortunate as myWriter satisfies the io.ReaderFrom interface, but it's implementation is no better than io.Copy (always incurring an allocation) in some circumstances.

In the case of io.CopyBuffer, I would argue that the user's expectation is that the buffer passed in is always used. It has the detriment of an extra copy, but I believe the cost of that is much lower than the allocation that is already saved by the user providing the buffer. It main benefit is that it avoids any shenanigans that occur because of inefficient implementations of io.ReaderFrom and io.WriterTo.

Alternative is to do anonymous struct value hack:

io.CopyBuffer(struct{ io.Writer }{dst}, struct{ io.Reader }{src}, buf)

But, this hack is not intuitive to most people.

Filed on behalf of @jkji

@dsnet dsnet changed the title io: CopyBuffer should not perform io.ReaderFrom and io.WriterTo check io: CopyBuffer should not perform ReaderFrom and WriterTo optimization Jul 22, 2016

@nhooyr

This comment has been minimized.

Copy link
Contributor

nhooyr commented Jul 26, 2016

Why would someone implement io.ReaderFrom or io.WriterTo if they are inefficient? Seems really backwards to me.

I see what you mean with the buf. It doesn't make much sense to pass in a buffer when io.CopyBuffer will just end up calling io.ReaderFrom. That is unintuitive.

I don't think we can change this because of the Go 1 compatibility anyway.

@jkji

This comment has been minimized.

Copy link

jkji commented Jul 26, 2016

ReaderFrom does not have to be horribly inefficient, but it can be inefficient enough for the CopyBuffer caller to care. In my case, this was severely reducing the file transfer rate but it might be totally fine for some other application. Someone probably added it for a different use case, without realizing the side effects on CopyBuffer.
Conceptually, it seems that ReaderFrom should have a buffer argument and ignore it if it is not needed or nil. Efficient buffer-less ReaderFrom implementation sounds like a rare beast to me.

@nhooyr

This comment has been minimized.

Copy link
Contributor

nhooyr commented Jul 26, 2016

Well we can't really change this now. Perhaps in Go 2?

@dsnet

This comment has been minimized.

Copy link
Member Author

dsnet commented Jul 26, 2016

@nhooyr, how do you see this as a Go 1 compatibility issue? Whether io.CopyBuffer chooses to use ReadFrom or WriteTo or not does not alter the correctness of the function. It's primarily a question of performance.

Part of the problem is that io.ReaderFrom and io.WriterTo doesn't document anything about needing to be "efficient" (however that is defined), so there are many implementations of them that are sub-par compared to the case where io.CopyBuffer just used the user provided buffer only.

@nhooyr

This comment has been minimized.

Copy link
Contributor

nhooyr commented Jul 26, 2016

He said that io.ReaderFrom should have a buffer argument that should be ignored if it is not needed or nil. That cannot happen without breaking Go 1 Compatibility. Perhaps it should be better documented but the whole point of io.ReaderFrom is to not need a buffer. To write it directly like say sendfile(2).

Furthermore, a lot of code relies on theio.Copy functions using the io.ReaderFrom or io.WriterTo interfaces to optimize. Its documented behavior that io.CopyBuffer behaves identically to io.Copy except it stages through the provided buffer rather than allocating a temporary one. By changing io.CopyBuffer we break this documented behaviour that go programs could rely on.

I definitely agree though, I find it unintuitive that io.CopyBuffer doesn't always use the passed buffer, but because io.CopyBuffer using io.ReaderFrom or io.WriterTo is documented behavior, it breaks the Go 1 compatibility promise.

@tv42

This comment has been minimized.

Copy link

tv42 commented Jul 26, 2016

CopyBuffer is identical to Copy except that it stages through the provided buffer (if one is required) rather than allocating a temporary one.

@dsnet

This comment has been minimized.

Copy link
Member Author

dsnet commented Jul 26, 2016

Its documented behavior that io.CopyBuffer behaves identically to io.Copy except it stages through the provided buffer rather than allocating a temporary one. By changing io.CopyBuffer we may break this documented behaviour that go programs could rely on.

Fair argument.

@dsnet dsnet added this to the Unplanned milestone Jul 26, 2016

@dsnet dsnet added the Go2 label Jul 26, 2016

@jkji

This comment has been minimized.

Copy link

jkji commented Jul 26, 2016

@nhooyr, thanks for the clarification. Backwards compatibility argument makes sense, that's why I said "conceptually" ;)

@dsnet

This comment has been minimized.

Copy link
Member Author

dsnet commented Aug 2, 2016

Just noting here that strings.Reader.WriteTo is another example of an implementation of io.WriterTo that is not efficient in all circumstances.

It relies on io.WriteString, which only calls WriteString if the underlying writer supports it. Otherwise, it causes a (potentially large) allocation to convert the string to a []byte.

Related to #13848, which is intended to help alleviate the issue.

@tv42

This comment has been minimized.

Copy link

tv42 commented Aug 2, 2016

@dsnet Is this an actual bottleneck in something that can't be fixed without stdlib changes?

@dsnet

This comment has been minimized.

Copy link
Member Author

dsnet commented Aug 2, 2016

My original post contained the hack to get around this issue:

io.CopyBuffer(struct{ io.Writer }{dst}, struct{ io.Reader }{src}, buf)

It's just not an intuitive hack to most people. Secondly, the need for it doesn't become obvious until a pprof indicates that io.CopyBuffer is spending significant amounts of time and allocations.

@tv42

This comment has been minimized.

Copy link

tv42 commented Aug 2, 2016

@dsnet Was it ever an observed bottleneck in a real, otherwise well-written, system?

@dsnet

This comment has been minimized.

Copy link
Member Author

dsnet commented Aug 2, 2016

Yes, #13848 was filed when I observed a large string was unnecessarily being copied.
This issue was filed on behalf of someone who noticed performance issues in an internal system.

@dsnet dsnet added the Performance label Oct 4, 2016

@nhooyr

This comment has been minimized.

Copy link
Contributor

nhooyr commented Oct 11, 2016

Haha, I just got bitten by this. In tlsmuxd I was copying between two TCP connections with io.CopyBuffer. I expected it to use my buffer from my bufferPool but it turns out that *net.TCPConn has the ReadFrom method for sendfile. However, since it was reading from a net.Conn sendfile did not apply and it fell back to io.Copy. Consequently, my buffer was unused and there was a buffer allocation in io.Copy.

Luckily, I read *net.TCPConn's source one day so I realized my mistake but I've retracted my original view that it a compat issue because I think that this is extremely subtle, unintuitive and I cannot imagine anyone who passes in a buffer to io.CopyBuffer but does not expect it to be used. I understand that it was documented to behave just like io.Copy, but why would you pass in a buffer if you aren't sure it will be used? It just doesn't make any sense.

@dsnet

This comment has been minimized.

Copy link
Member Author

dsnet commented Oct 17, 2016

Not performance related, but http://golang.org/cl/31174 is another case where the behavior of io.CopyBuffer was unexpected. The test relied upon copy being done in specific chunk sizes.

@brentp brentp referenced this issue Jan 2, 2017

Merged

Read full #56

@rsc rsc changed the title io: CopyBuffer should not perform ReaderFrom and WriterTo optimization io: CopyBuffer should avoid ReadFrom/WriteTo Jun 16, 2017

@rsc rsc changed the title io: CopyBuffer should avoid ReadFrom/WriteTo proposal: io: CopyBuffer should avoid ReadFrom/WriteTo Jun 17, 2017

@gopherbot gopherbot added the Proposal label Jun 17, 2017

@bcmills

This comment has been minimized.

Copy link
Member

bcmills commented Jan 2, 2018

Here's a different way to look at this issue, which generalizes to magic-method APIs in general:

For a magic-method API to work with wrapper types, the method's implementation must be able to indicate that the method is not actually supported.

io.ReaderFrom violates that principle: there is no standardized way for ReadFrom to return an error that means “fall back to Write instead”.

You could imagine retrofitting one, such as io.ErrNotSupported, but it would likely break callers on a massive scale. More likely, you'd need to rename the method to something like MaybeReadFrom so that callees can determine whether to return ErrNotSupported or press ahead with a less-efficient implementation.

@dsnet

This comment has been minimized.

Copy link
Member Author

dsnet commented Jan 2, 2018

the method's implementation must be able to indicate that the method is not actually supported.

This is one of the driving usages of #21904.

Part of the problem with "not implemented" is that it's obvious for cases like io.Seeker, where you either support it or you don't. It's not so obvious what you're supposed to do, when you do implement it and your implementation happens to be less efficient than if your caller had just used Write instead.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Jan 16, 2018

@bcmills I don't think this works in general. In the case discussed here, it sometimes makes sense for io.CopyBuffer to call the ReadFrom method, and sometimes it doesn't. It makes sense to use ReadFrom if ReadFrom does not require a buffer at all, as for example for (*net.TCPConn).ReadFrom on systems that support sendfile. It does not make sense for io.CopyBuffer to call ReadFrom if ReadFrom uses an internal buffer. But the fact that ReadFrom uses an internal buffer doesn't mean that it is not supported; it may be a completely appropriate use in some cases, just not for io.CopyBuffer with a larger buffer. So using a "not actually supported" mechanism would require specifying "not actually supported for this specific use" which is an N-x-N problem.

More generally, if you know the types involved, you will call the methods directly. If you don't know the types involved, there is no obvious general approach to know whether to use a magic method or not, short of actual measurement.

We wind up in this situation because there is existing code that uses io.Copy, and we can't or don't want to change that code. But we have a type that has a fast copy mechanism, and we want the existing code to use our fast mechanism. So we invent a magic method, put it on the type, and change io.Copy to use that magic method. In effect we are using a magic method as a communication mechanism across code that doesn't know about it.

It seems that the best way to use this technique effectively is for each magic method to mean exactly one thing. In this case we are using it for two things: io.Copy to copy with a generic buffer, and io.CopyBuffer to copy with a specific buffer. They should not both use the same magic method. With this approach a function like io.CopyBuffer would document clearly the set of magic methods that it supports. If multiple functions support the same magic method, we wind up with an N-x-N crossbar: trying to use the magic method for multiple different purposes, when we have to know which purpose is intended to know whether it makes sense.

On the other hand, if we have a magic method that is specific to io.CopyBuffer, then the code winds up having spooky action at a distance: methods are defined solely so that they can be detected at a completely different, seemingly unrelated, part of the code.

@bcmills

This comment has been minimized.

Copy link
Member

bcmills commented Mar 2, 2018

I'm beginning to think that this issue calls into question the value of CopyBuffer entirely. If callers want an unconditional buffer, they can wrap the Reader in a bufio.Reader of appropriate size, which implements io.WriterTo and thus always wins the magic-method contest in io.Copy.

io.CopyBuffer only adds value if it is conditional, but the fact that it is conditional is what makes it so subtle to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.