Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io: add an Err field to LimitedReader #51115

Open
DeedleFake opened this issue Feb 9, 2022 · 42 comments · May be fixed by #52866
Open

io: add an Err field to LimitedReader #51115

DeedleFake opened this issue Feb 9, 2022 · 42 comments · May be fixed by #52866
Labels
NeedsDecision Proposal Proposal-Accepted Proposal-FinalCommentPeriod release-blocker
Projects
Milestone

Comments

@DeedleFake
Copy link

@DeedleFake DeedleFake commented Feb 9, 2022

New Proposal

As per @ianlancetaylor's suggestion, this proposal is now to add an Err field to io.LimitedReader. If not nil, this field's value will be returned when trying to read past the limit imposed by the reader instead of the default io.EOF.

Original Proposal

For various reasons, I had a need to use MaxBytesReader() in a situation where a ResponseWriter wasn't easily available, so I looked through the code to see if it was safe to pass nil and found that all it does is check a type assertion and then call an unexported method on it:

type requestTooLarger interface {
	requestTooLarge()
}
if res, ok := l.w.(requestTooLarger); ok {
	res.requestTooLarge()
}

While it's true that because of this it's safe to pass it a nil ResponseWriter, it feels quite odd to rely on undocumentated behavior of this kind.

Proposal

Several ways to clean this situation up come to mind:

  1. Add an alternative in the http package that doesn't need a ResponseWriter. Because of how it works, it could actually just return the same implementation, but the function will be more obvious in its usage.
  2. Document how the implementation uses the ResponseWriter and explicitly state that a nil ResponseWriter is valid.
  3. Add a reimplementation of MaxBytesReader() in io that doesn't use a ResponseWriter at all and isn't tied to http behavior, but includes the other differences. This could actually be done, for the most part, by just adding an Err error field to io.LimitedReader that, if non-nil, is the error returned when the limit is reached instead of io.EOF.

In whichever case, the documentation for MaxBytesReader() should probably be filled in a bit. It is quite odd that something called Reader requires a type that is primarily an io.Writer implementation for non-obvious reasons and doesn't mention it anywhere in the documentation.

@gopherbot gopherbot added this to the Proposal milestone Feb 9, 2022
@jimmyfrasche
Copy link
Member

@jimmyfrasche jimmyfrasche commented Feb 9, 2022

I have forked io.LimitedReader to return a custom error at least three times.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 9, 2022

If the proposed function doesn't use http.ResponseWriter, then I can't see any reason that it should live in the net/http package. That seems to reduce this to

type MyLimitedReader struct {
    r   io.LimitedReader
    err error
}

func (mlr *MyLimitedReader) Read(p []byte) (int, error) {
    n, err := mlr.Read(p)
    if err == io.EOF {
        err = mlr.err
    }
    return n, er
}

Is that correct?

If so, perhaps this proposal should be rewritten to add an Err field io.LimitedReader.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Feb 9, 2022
@DeedleFake DeedleFake changed the title proposal: net/http: add an alternative to MaxBytesReader that doesn't require a ResponseWriter proposal: io: add an Err field to LimitedReader Feb 10, 2022
@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Feb 10, 2022

I like the new proposal. Returning io.EOF makes io.LimitedReader pretty unhelpful. But I think it would also be good to document that http.MaxBytesReader accepts nil ResponseWriter.

@bcmills
Copy link
Member

@bcmills bcmills commented Feb 11, 2022

I have forked io.LimitedReader to return a custom error at least three times.

FWIW, my moreio.LimitedWriter has an Err field too:

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Mar 26, 2022

Can someone tag this proposal as under review?

@Xeoncross
Copy link

@Xeoncross Xeoncross commented Mar 26, 2022

As an alternative to defining your own error, I just created a package and export the error so the caller can check against it.

// Throws an error if the reader is bigger than limit.
var ErrSizeExceeded = errors.New("stream size exceeded")

type MaxBytesReader struct {
	io.ReadCloser       // reader object
	N             int64 // max bytes remaining.
}

func NewMaxBytesReader(r io.ReadCloser, limit int64) *MaxBytesReader {
	return &MaxBytesReader{r, limit}
}

func (b *MaxBytesReader) Read(p []byte) (n int, err error) {
	if b.N <= 0 {
		return 0, ErrSizeExceeded
	}

	if int64(len(p)) > b.N {
		p = p[0:b.N]
	}

	n, err = b.ReadCloser.Read(p)
	b.N -= int64(n)
	return
}

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Mar 26, 2022

That would break current users of io.LimitReader/LimitedReader.

I misunderstood the proposal.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 26, 2022

@carlmjohnson It is already so tagged. We'll get to it soon, I think.

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 28, 2022

Change https://go.dev/cl/396215 mentions this issue: io: add an Err field to LimitedReader

@rsc
Copy link
Contributor

@rsc rsc commented Mar 30, 2022

This seems reasonable. Thanks for the proposal.

@rsc
Copy link
Contributor

@rsc rsc commented Mar 30, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals Mar 30, 2022
@rsc
Copy link
Contributor

@rsc rsc commented Apr 6, 2022

Does anyone object to adding this?

@rsc rsc moved this from Active to Likely Accept in Proposals Apr 13, 2022
@rsc
Copy link
Contributor

@rsc rsc commented Apr 13, 2022

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Likely Accept to Accepted in Proposals May 4, 2022
@rsc
Copy link
Contributor

@rsc rsc commented May 4, 2022

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: io: add an Err field to LimitedReader io: add an Err field to LimitedReader May 4, 2022
@rsc rsc removed this from the Proposal milestone May 4, 2022
@rsc rsc added this to the Backlog milestone May 4, 2022
@dmitshur dmitshur removed this from the Backlog milestone May 4, 2022
@dmitshur dmitshur added this to the Go1.19 milestone May 4, 2022
@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented May 12, 2022

Thanks. Working on it.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented May 12, 2022

Seemed like a natural job for fuzzing, but I couldn't figure out how to make fuzzing work with the standard library, so I did it in a separate module and just copied some interesting cases back.

@gopherbot
Copy link

@gopherbot gopherbot commented May 12, 2022

Change https://go.dev/cl/405854 mentions this issue: io: fix for LimitedReader sentinel when N == stream length

@bcmills
Copy link
Member

@bcmills bcmills commented May 12, 2022

@rogpeppe, I'm looking at your example again, but I don't really understand why it's a problem with LimitedReader, given that:

  1. io.Reader doesn't provide a general mechanism for indicating "at EOF”, and
  2. strings.Reader seems to prefer not to return io.EOF when it technically could (https://go.dev/play/p/6F5tnjDaQre?v=gotip).

I think if the caller really wants to get EOF instead of the LimtedReader.Err when either error is valid, they ought to be using a Reader that returns the EOF as soon as it is reached instead of deferring it for the next call.
(Separately, perhaps we should file a proposal to change strings.Reader to do so, since its documentation promises only that it “implements the io.Reader interface” which allows either.)

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented May 12, 2022

@bcmills the point I'm trying to make is that when the limit isn't exceeded, we shouldn't see an error from io.Copy at all.

In my eyes, it makes most sense for the error to be returned when the limit is exceeded, not when it's reached. Of course, with the old LimitReader, there is no distinction between the two cases when the number of bytes exactly equals the limit, but with the new error there is.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented May 12, 2022

Options:

  1. Drop the new sentinel behavior as unworkable.
  2. Leave things as is, document weirdness around N == Reader len.
  3. New behavior (in the CL) of giving the Reader one last chance to return EOF.

Any other ideas?

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented May 12, 2022

The more I think about it, the more I think that the original goal isn't nicely attainable just by adding an Err field to the LimitedReader struct.

LimitReader has the property that it will read at most n bytes from the underlying reader. The originally stated goal of this proposal is to return a special error (requestTooLarge) when the limit is exceeded, implying AFAICS that when we're exactly at the limit, that error would not be returned. Unfortunately those two cases aren't possible to distinguish without reading an extra byte beyond the limit to find out whether the data limit really has been exceeded, which will break the above-stated property.

One possibility might be to document that an extra byte will be read when Err is non-nil, precisely to distinguish the two cases.

Another might be to add another entry point rather than modifying LimitedReader. For the record, here's the code for a LimitReader version I've been involved with in the past: https://go.dev/play/p/xqH4fdMPONa. We could add a LimitReaderWithError function, for example.

@bcmills
Copy link
Member

@bcmills bcmills commented May 12, 2022

@rogpeppe, when the limit isn't exceeded, we don't see an error from io.CopyN:
https://go.dev/play/p/gqEn3aREOJE?v=gotip
That is: if you don't try to read past the limit at all, then you don't get the “limit exceeded” error.

There is no general way to force a reader to tell you that it is at EOF other than trying to read a nonzero number of bytes. (A 0-length read could legitimately return 0, nil, or perhaps 0, io.ErrShortBuffer, even if the Reader is at EOF.)
But trying to read a nonzero number of bytes beyond the limit is exactly what LimitedReader prevents.

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented May 12, 2022

That is: if you don't try to read past the limit at all, then you don't get the “limit exceeded” error.

That's fine AFAICS. In that case, the limit hasn't been exceeded so (to use the original use case) returning a "request too large" error would not be appropriate.

@bcmills
Copy link
Member

@bcmills bcmills commented May 12, 2022

I think for that use-case, if you want to accept a request of up to N bytes then you'd want to set the LimitedReader's initial limit to N+1. Then you check the n returned from io.Copy against the actual limit.

(But I guess at that point it doesn't really matter what you set the Err field to, because all the information you need comes from the Copy call. 🤔)

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented May 12, 2022

Then you check the n returned from io.Copy against the actual limit

But that means that you can exceed the limit when copying, which surely isn't great?

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented May 12, 2022

FWIW, I find the naming around io.LimitReader vs io.LimitedReader confusing (it makes sense if I think about it, verb vs noun, but I have to think), so I would be happy with a new name. MaxBytesReader(r Reader, limit int64, err error) Reader?

@bcmills
Copy link
Member

@bcmills bcmills commented May 12, 2022

Then you check the n returned from io.Copy against the actual limit

But that means that you can exceed the limit when copying, which surely isn't great?

...and that is exactly why I have a LimitedWriter. 😉

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented May 12, 2022

MaxBytesReader sounds exactly like the behaviour that LimitReader currently implements, I think (the whole point would be that it doesn't restrict the number of bytes read from the underlying reader to exactly limit).

LimitErrorReader ?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 12, 2022

I don't see how it makes sense to use io.Copy with io.LimitedReader with a specified error. io.Copy copies until io.EOF is returned. io.LimitedReader with a specified error does not return io.EOF.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented May 13, 2022

I suppose one scenario is that for example, I'm copying a http.Response to disk, but I want to bail out after some size to prevent abuse, and in that case I want to know that it was a short write and delete what's been written so far and possibly signal to some operator or user that the file couldn't be saved.

However, another way to tackle that, which might make more sense is to use a LimitWriter. We could drop the changes to LimitedReader and create that instead.

@elagergren-spideroak
Copy link

@elagergren-spideroak elagergren-spideroak commented May 16, 2022

like @jimmyfrasche, i've also forked LimitedReader a handful of times to return a specific error. But when I think about my forks, though, I think they'd actually be better served as LimitedWriters that return Err if the write would exceed N.

Given all these new questions it seems prudent to put this off until after Go 1.19 rather than get stuck with this less-than-great behavior forever.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented May 16, 2022

This should be tagged with needs-decision / release blocker.

@bcmills bcmills added the NeedsDecision label May 16, 2022
@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented May 17, 2022

I don't see how it makes sense to use io.Copy with io.LimitedReader with a specified error. io.Copy copies until io.EOF is returned. io.LimitedReader with a specified error does not return io.EOF.

I don't understand this assertion. The implementation that I've seen will return the specified error only when the limit is reached. In all other circumstances (and importantly when EOF is reached before the limit) the error from the underlying reader is returned.

AIUI the question that we're debating here is what error to return when the size of the data available from the underlying reader is exactly the limit.

It is my contention that in all the use cases I've seen so far, we want the special error to apply when the limit has been exceeded, not just reached, because otherwise the contract isn't clear: we either allow too many bytes to be transferred or we return a "limit exceeded" error when the limit hasn't actually been exceeded.

It's not possible to know whether the limit has been reached until at least the LimitedReader has attempted to read at least one more byte than the limit, but that breaks the current contract.

For the record, the specific situation I used this last was for stream-parsing the body of an HTTP request. The parser itself isn't aware of the request size limit (and neither should it be). The limit is documented, so it's reasonable for clients to send a request with exactly the limit number bytes and it's reasonable for the reader to assume that the number of bytes won't be more than that

@DeedleFake
Copy link
Author

@DeedleFake DeedleFake commented May 17, 2022

AIUI the question that we're debating here is what error to return when the size of the data available from the underlying reader is exactly the limit.

Close. The problem is the case of where the underlying data is exactly the limit, but the underlying reader does not return io.EOF if a read goes exactly to EOF. io.Reader makes no guarantees about whether io.EOF is returned simultaneously with the final set of data or not, meaning that you could get n > 0 && err == io.EOF or you could not get err == io.EOF until n == 0. Either implementation is fine. The issue is that if io.EOF isn't returned by the underlying reader until the next read, there's no possible way for io.LimitedReader to know that io.EOF was reached and it will return the custom error instead.

For example, here's the scenario without io.LimitedReader:

r := someReaderThatReads100Bytes()
var buf [100]byte
n, err := r.Read(buf[:])
// Right now, n == 100 and err == nil.
n, err = r.Read(buf[:])
// And now n == 0 and err == io.EOF.

And here's the same scenario with it:

r := &io.LimitedReader{
  R: someReaderThatReads100Bytes(),
  N: 100,
  Err: ErrCustom,
}
var buf [100]byte
n, err := r.Read(buf[:])
// Right now, n == 100 and err == nil.
n, err = r.Read(buf[:])
// And now n == 0 and err == ErrCustom, despite the fact that
// EOF was actually reached with the first call to Read().

The only solution is to do an extra read after the limit has been reached, but the whole point of io.LimitedReader is to read exactly the limit number of bytes.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 17, 2022

I don't see how it makes sense to use io.Copy with io.LimitedReader with a specified error. io.Copy copies until io.EOF is returned. io.LimitedReader with a specified error does not return io.EOF.

I don't understand this assertion. The implementation that I've seen will return the specified error only when the limit is reached. In all other circumstances (and importantly when EOF is reached before the limit) the error from the underlying reader is returned.

It's true that if you know for sure that the Reader underlying the LimitedReader has fewer than N bytes, then the LimitedReader will reliably return io.EOF and io.Copy will behave as expected. But in that unusual circumstance there doesn't seem to much reason to use LimitedReader. If you don't know that for certain, then whenever there are N or more bytes available, LimitedReader will not return io.EOF and io.Copy will get confused. I don't see why we need worry about the == N case or the > N case; you are in a strange scenario regardless.

That aside, clearly LimitedReader may not read more than N bytes. So if this is a real problem, then the only answer is to roll back this proposal.

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented May 18, 2022

@ianlancetaylor

io.Copy will get confused

I'm not sure what you mean by "get confused" here. Returning a non-EOF error when the underlying reader returns one is standard and documented behaviour for io.Copy - there's no confusion AFAICS, and I don't understand why anything in any possible scenario (regardless of the number of bytes that could be read from the underlying reader) is "strange".

@rsc
Copy link
Contributor

@rsc rsc commented May 18, 2022

If we have read N bytes and get called for Read again, and Err != io.EOF and Err != nil, is it wrong to read 1 byte from the underlying reader to find out whether to use io.EOF or Err?

In this case the LimitedReader would be limiting the amount returned by its own Read method to N bytes and limiting the amount read from below to N+1 bytes (but only if Err != io.EOF and Err != nil).

Since this is entirely new behavior, reading N+1 bytes underneath seems OK. Does anyone have any reasons otherwise? Thanks.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 18, 2022

@rogpeppe Sorry for the wording. Let me try it this way: why would you want to use io.Copy with a LimitedReader that uses a custom error? What is the use case? Thanks.

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented May 21, 2022

@rsc I think that sounds like a very reasonable way forward. FWIW I wouldn't suggest actually issuing an extra 1-byte read - instead just truncate the read on the underlying reader by one less byte (and don't return that extra byte). That way, in the usual case we'll issue exactly the same number of underlying read calls.

@ianlancetaylor The issue applies to anything that's using a reader. io.Copy is just an easy example that I used for illustration.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented May 21, 2022

@rsc I think that sounds like a very reasonable way forward. FWIW I wouldn't suggest actually issuing an extra 1-byte read - instead just truncate the read on the underlying reader by one less byte (and don't return that extra byte). That way, in the usual case we'll issue exactly the same number of underlying read calls.

IIUC, then the fix is just to change the doc to “If the underlying reader can read exactly N bytes, whether LimitedReader returns the sentinel or io.EOF is undefined.”

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented May 21, 2022

@rsc I think that sounds like a very reasonable way forward. FWIW I wouldn't suggest actually issuing an extra 1-byte read - instead just truncate the read on the underlying reader by one less byte (and don't return that extra byte). That way, in the usual case we'll issue exactly the same number of underlying read calls.

IIUC, then the fix is just to change the doc to “If the underlying reader can read exactly N bytes, whether LimitedReader returns the sentinel or io.EOF is undefined.”

That's not my understanding. The behaviour should be defined - if the exactly N bytes can be read from the underlying reader, the final error should be io.EOF. If more than N bytes can be read, exactly N bytes should be returned and the final error should be Err.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDecision Proposal Proposal-Accepted Proposal-FinalCommentPeriod release-blocker
Projects
Proposals
Accepted
Development

Successfully merging a pull request may close this issue.