proposal: io: add ReadPeeker and implement Peek in bytes.Reader #63548

mhr3 · 2023-10-14T19:39:57Z

Is there any reason why bytes.Reader doesn't have a Bytes() or Next(int) method similar to bytes.Buffer?

Such a method would allow zero-copy usage of bytes.Reader, cause currently all public methods require a separate buffer to copy the data into. Plus these methods already exist on bytes.Buffer, so gophers are already familiar with them.

ianlancetaylor · 2023-10-14T21:43:59Z

The point of bytes.Reader is that you have a function that takes an io.Reader, and you have a []byte, and you want to pass the []byte to the function. Therefore, bytes.Reader has all the methods that that function might want to call. That function isn't going to call a Bytes or Next method, because those aren't implemented by other io.Reader values.

To put it another way, if you feel a need to call a Bytes or Next method, why not just work with the original []byte directly?

mhr3 · 2023-10-14T23:38:48Z

It's a common optimisation pattern to check for specific types inside functions that work with interfaces, therefore any function which accepts an io.Reader could have a fast-path for bytes.Reader if it were possible to get to the byte slice from the Reader.

ianlancetaylor · 2023-10-14T23:53:08Z

Can you point us to some existing code that would take advantage of this optimization? Thanks.

mhr3 · 2023-10-15T02:14:37Z

Pretty much any streaming parser, even stdlib's bufio.Reader could have a special case for the reader being bytes.Reader avoiding copying everything.

Jorropo · 2023-10-15T04:00:33Z

You could implement this today using .WriteTo, it has a few ergonomics issue, mainly require moving some processing in a callback and does not work if you don't plan on consuming the full data. However it is already widely deployed solution for zero-copy.
Depending how disruptive you allow it to be you can always refactor your data pipeline to be inverted and fit in there.

mhr3 · 2023-10-15T15:35:12Z

So I've tried to showcase this in master...mhr3:go:bytes-reader-next

I've added bytes.Reader.Next and adapted bufio.Reader to use the byte slice directly, while not exactly pretty (and I'm not proposing stdlib should do this), it shows how such an API could be used and mainly this is not possible without this new API.

Also result of benchstat of a very simple benchmark in the bufio package (master vs this branch):

name                old time/op    new time/op    delta
ReaderReadString-8    50.8ns ± 0%    44.9ns ± 0%  -11.74%  (p=0.000 n=10+10)

name                old alloc/op   new alloc/op   delta
ReaderReadString-8      144B ± 0%      144B ± 0%     ~     (all equal)

name                old allocs/op  new allocs/op  delta
ReaderReadString-8      1.00 ± 0%      1.00 ± 0%     ~     (all equal)

While preparing this benchmark result, I also realised strings.Reader should have the same Next() method.

dsnet · 2023-10-16T04:07:48Z

The github.com/dsnet/compress module looks for types that implement Buffered, Peek, and Discard methods to perform zero-copy decompression from some input. This provided quite a substantial speed-up for decompression.

At present, the bufio.Reader and bytes.Buffer types provide some mechanism to at least implement zero-copy optimizations, but bytes.Reader provides no API to get at the underlying []byte. Adding it would make the API less safe, but not any less safe than bytes.Buffer and bufio.Reader. Also, it would make the bytes.Reader API inconsistent with strings.Reader.

That said, it'd be nice if there was a unified API for copy-less reads. Since Buffered, Peek, and Discard methods already exist, maybe we should rally around those?

mhr3 · 2023-10-16T13:28:33Z

I think the Next() method suits better for this (given no IO needs to happen, so there's no potential for an error happening, plus the simplicity), but ultimately I don't have a strong opinion, as long as there's a way to do this - Bufferred() + Peek() + Discard() would do the trick too.

rsc · 2023-10-27T03:29:52Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

rsc · 2023-11-02T15:00:47Z

@dsnet, is Buffered really necessary? Can we use just Peek and Discard?

dsnet · 2023-11-02T17:27:57Z

Looking through my code, I couldn't convince myself Buffered was strictly necessary, but it's useful for performance since it tells us how much we can peek. Without Buffered, callers will just have to make a guess at how much is peekable, but then what's the right amount?

The Discard method is not strictly necessary if the implementation of Read when passed with a buffer retrieved from Peek is essentially a noop. This is a similar optimization technique to the AvailableBuffer+Write pattern.

Thus, there is more utility to Buffered then there is to Discard.

dsnet · 2023-11-02T17:33:10Z

We could alter the behavior of bufio.Reader.Peek to allow -1 to mean "peek at everything that is already buffered and don't read any additional bytes; this never fails" and codify this behavior in the interface.

dsnet · 2023-11-02T17:52:16Z

Here's a concrete proposal:

We define a new io.ReadPeeker interface type:

type ReadPeeker interface {
    io.Reader

    // Peek returns the next n bytes without advancing the reader.
    // To advance the reader, call Read with the bytes retrieved from Peek.
    // The returned bytes is only valid until the next call
    // that changes the reader state, such as by calling a Read method.
    //
    // If Peek returns fewer than n bytes,
    // it also returns an error explaining why the read was short.
    // The error is ErrBufferFull if n is larger than b's buffer size.
    // If the provided n is negative, then it returns the entire contents of the buffer
    // without performing any read operations on any possible underlying Reader.
    // Peek never returns an error for negative counts.
    Peek(n int) ([]byte, error)
}

We add io.ErrBufferFull which is copied over from today's bufio.ErrBufferFull.
We make bufio.ErrBufferFull identical to io.ErrBufferFull.
We modify bufio.Reader.Peek to support the new negative count semantics.
We add bytes.Buffer.Peek.
We add bytes.Reader.Peek.

I decided to put io.Reader in the io.ReadPeeker interface since the documentation of Peek references how it behaves relative to Read.

mhr3 · 2023-11-02T18:52:29Z

I believe Peek should definitely be coupled with Discard (/Advance/Skip), special behaviour for passing a peeked buffer into Read sounds very strange. Would rather have the interface specify that a peeked buffer is only valid until next Read().

josharian · 2023-11-06T13:21:21Z

Related: https://dave.cheney.net/paste/gophercon-sg-2023.html / https://github.com/pkg/json/blob/main/reader.go which draws on https://philpearl.github.io/post/reader/

I wonder whether, in addition to new bytes.Reader methods and an io interface, there should be something in package io that uses the optimized path if it exists and if not has a good fallback, maybe along the lines of Dave Cheney's byteReader.

rsc · 2023-11-07T00:08:01Z

It doesn't feel like we've found the right path forward yet. This all feels very kludgy.

josharian · 2023-11-07T00:39:48Z

@rsc agreed re: the proposed solutions. But do you think that the problem at least is clear?

mhr3 · 2023-11-08T09:12:06Z

As @dsnet pointed out, the two methods that are necessary for these use cases are:

knowing the length of the peekable buffer
consuming part (or all) of it

bufio.Reader satisfies these in the form of three methods: Buffered and Peek+Discard
bytes.Buffer allows this with Len and Next
pkg/json.byteReader (see above) can use len(window()) and release()
bytes.Reader already has Len, but is missing the consume part

Therefore, how about we scope this down to just adding bytes.Reader.Next(int) []byte. No new interfaces for peeked reads for now, just that one method, which aligns it closer with bytes.Buffer.

And one last point - strings.Reader also has Len, so why not add Next there too? The purpose here is to have zero-copy access to the slice - given that strings.Reader references a string and not a byte slice, for Next to be safe it would have to make a copy in the first place, that's why this proposal is NOT seeking changes to strings.Reader.

dsnet · 2023-11-08T16:57:08Z

I don't think a next call is sufficient.

Some form of peek+discard pattern is needed by many applications beyond just optimization purposes.

For example, in compress/flate the implementation tries to only decompress exactly the DEFLATE stream and not one byte further. Similar type of problems exist for JSON as well, where the json.Decoder.Buffered API exist only exists because we may have read beyond the last JSON token. Ideally, we can avoid that API to begin with since we can discard exactly the right number of bytes.

mhr3 · 2023-11-08T17:42:51Z

Well, bytes.Reader is also an io.Seeker.

dsnet · 2023-11-08T17:45:56Z

An io.Seeker is unfortunately too powerful, which means it's not possible to wrap a number of readers and convert it efficiently into an io.Seeker. For example, you can't wrap a gzip.Reader and convert it into an io.Seeker since decompression moved in a forward direction unless you buffer everything read so far.

On the other hand, most implementation of an io.Reader already have some form of internal buffer on hand. The peek+discard pattern is exposing that fact.

mhr3 · 2023-11-08T17:55:08Z

My point was that Len+Next+io.Seeker definitely satisfies what you'd need to do. I understand that you'd still want to have a proper interface for these PeekReaders, but a consensus on that wasn't reached, so I'm suggesting something simpler.

dsnet · 2023-11-08T18:03:41Z

I think the discrepancy stems from the fact that I'm trying to tackle this problem for io.Readers in general, while this issue was originally narrowly scoped to just the bytes.Reader concrete type (which supports io.Seeker).

I'm arguing against bytes.Reader.Next because it's too limited in flexibility. Next becomes redundant if we provide some Peek+Discard-like interface. I'd rather see us come up with a generalized approach rather than ad-hoc solutions for each concrete type.

rsc · 2023-12-06T18:15:46Z

I agree we need an established convention here.
In #63548 (comment), @dsnet suggested adding only Peek, and have the Read implementation optimize away copying from a buffer to itself. That keeps the method count small, and if the Read implementation doesn't optimize away the copying, no real harm done. It's just an optimization. This seems reasonable to me.

Does anyone see any strong objections to ReadPeeker?

dsnet · 2023-12-12T02:11:23Z

Should we rename this issue if we're thinking more broadly about a generic interface?

rsc · 2023-12-13T18:19:15Z

Looks like bufio.Reader.Read does not do the Peek optimization. That's fine - a future version can. Are there any Readers with the Peek optimization now?

rsc · 2024-01-10T18:24:31Z

This discussion has died out a bit over the holidays. Does ReadPeeker address the original need raised in the proposal? Is it good enough?

Also maybe it should be PeekReader.

Merovius · 2024-01-10T22:48:33Z

TBH I'm not super happy with the idea of adding Peek to bytes.Reader. Currently, if I pass a bytes.Reader to a function, I can be somewhat confident that the underlying bytes don't get modified. That guarantee would be lost. And I don't really understand why use cases where the performance gains of Peek, where required, can't just be gained by passing a bytes.Buffer instead of a bytes.Reader. Having two separate types, one which is immutable and one which isn't, seems good enough?

dsnet · 2024-01-10T23:03:54Z

IIUC, you're arguing against the addition of Peek on bytes.Reader, but not necessarily the interface method itself, right?

I also feel uncomfortable exposing a degree of type unsafety to bytes.Reader that didn't exist before [*] and makes the API asymmetrical with regard to strings.Reader. I wouldn't mind seeing Peek on bytes.Buffer since it already exposes the underlying bytes through various methods.

That said, I'd still like to see a common interface for copy-less and exact read operations. There is both a performance and semantic benefit to this API. A number of APIs (e.g., "compress/flate") assert for io.ByteReader to ensure it never reads past the end of a variable-length segment. Yet other APIs (e.g., "encoding/json") expose API to access the buffered data after the end of a JSON value. The later API is faster, but annoying to use when piecing readers together. The Peek method would make such usages both faster and cleaner.

[*] Technically, you can access the underlying the []byte by calling bytes.Reader.WriteTo with an io.Writer that leaks the provided buffer. You'd need to jump through a few hoops to do this though and it'd be an obvious code smell.

Merovius · 2024-01-10T23:23:54Z

IIUC, you're arguing against the addition of Peek on bytes.Reader, but not necessarily the interface method itself, right?

Correct.

Jorropo · 2024-01-11T02:17:12Z

@Merovius we can already get a reference to the buffer of a bytes.Reader without using unsafe:

type badWriter struct{}

func (badWriter) Write(b []byte) (int, error) {
 return copy(b, "bonjour"), nil
}

func main() {
 b := []byte("hello !")
 r := bytes.NewReader(b)
 _, err := r.WriteTo(badWriter{})
 if err != nil { panic(err) }
 fmt.Println(string(b))
}

I'm not arguing this code is somehow valid, but that Peek doesn't do anything new.

rsc · 2024-01-18T21:22:13Z

Looking back at #63548 (comment), it is a bit annoying for clients that Peek can return ErrBufferFull or whatever for a "too big" n where "too big" is not defined. So code trying to use this seems like it has to always wrap Peek in a potential fall back to Read. That seems awkward. What if an implementation can only Peek 2 bytes? Or 1024 bytes but you need 1536?

If we had a way to get the maximum Peek value maybe that would be less awkward, but looking back at the original proposal with Buffered and Discard, in bufio.Reader, Buffered is not an upper bound on the amount you can Peek, because Peek will try to refill the buffer if it needs to. So Buffered wouldn't work as that method.

This all seems not quite there yet.

rsc · 2024-02-07T18:18:33Z

It seems like maybe we should decline this and wait for more inspiration, maybe try again in a year or so.

dsnet · 2024-02-14T00:39:02Z

So code trying to use this seems like it has to always wrap Peek in a potential fall back to Read. That seems awkward.

Looking through my programs using a PeekReader-like API, this isn't a problem since the public API just takes in an io.Reader and then type-asserts to a PeekReader, which is either used for performance and/or guarantees on how far can be read. In either case, a fall back implementation was always present regardless.

If we had a way to get the maximum Peek value maybe that would be less awkward

I'm not sure what the utility of this API would be as an implementation usually doesn't care about what the maximum peek value may be, but just that it can peek enough bytes (perhaps everything available) and lets the underlying PeekReader deal with buffering.

Also, being overly concerned with the "maximum" Peek amount seems misguided as it somewhat implies that the maximum is static. Suppose there was an API to obtain the maximum, but the underlying PeekReader might reduce its buffer size dynamically due to memory pressure. How is that change signaled after the fact? Thus, I think the inability to get the maximum Peek amount is a feature, not a bug.

rsc · 2024-02-14T18:18:05Z

So code trying to use this seems like it has to always wrap Peek in a potential fall back to Read. That seems awkward.

This still seems awkward. I don't think we've found the right API yet.

rsc · 2024-02-14T23:50:54Z

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

rsc · 2024-03-01T17:44:19Z

No change in consensus, so declined.
— rsc for the proposal review group

mhr3 added the Proposal label Oct 14, 2023

gopherbot added this to the Proposal milestone Oct 14, 2023

mhr3 changed the title ~~proposal: bytes: Reader should allow zero-copy working with the byte slice~~ proposal: bytes: Reader should allow zero-copy operations with its byte slice Oct 15, 2023

rsc changed the title ~~proposal: bytes: Reader should allow zero-copy operations with its byte slice~~ proposal: bytes: add Reader methods Buffered, Peek, and Discard, matching bufio.Reader Nov 2, 2023

rsc changed the title ~~proposal: bytes: add Reader methods Buffered, Peek, and Discard, matching bufio.Reader~~ proposal: bytes: add Reader methods Buffered, Peek, and Discard Nov 2, 2023

ianlewis mentioned this issue Nov 11, 2023

Add a Buffered method ianlewis/runeio#51

Closed

rsc changed the title ~~proposal: bytes: add Reader methods Buffered, Peek, and Discard~~ proposal: io: add ReadPeeker and implement Peek in bytes.Reader Dec 13, 2023

rsc added the Proposal-FinalCommentPeriod label Feb 14, 2024

rsc closed this as completed Mar 1, 2024

rsc removed the Proposal-FinalCommentPeriod label Mar 1, 2024

AlexanderYastrebov mentioned this issue Oct 15, 2024

protocol: limit use of buffered reader pires/go-proxyproto#120

Open

proposal: io: add ReadPeeker and implement Peek in bytes.Reader #63548

proposal: io: add ReadPeeker and implement Peek in bytes.Reader #63548

Comments

mhr3 commented Oct 14, 2023

ianlancetaylor commented Oct 14, 2023

mhr3 commented Oct 14, 2023

ianlancetaylor commented Oct 14, 2023

mhr3 commented Oct 15, 2023

Jorropo commented Oct 15, 2023 • edited Loading

mhr3 commented Oct 15, 2023 • edited Loading

dsnet commented Oct 16, 2023

mhr3 commented Oct 16, 2023

rsc commented Oct 27, 2023

rsc commented Nov 2, 2023

dsnet commented Nov 2, 2023 • edited Loading

dsnet commented Nov 2, 2023

dsnet commented Nov 2, 2023 • edited Loading

mhr3 commented Nov 2, 2023

josharian commented Nov 6, 2023

rsc commented Nov 7, 2023

josharian commented Nov 7, 2023

mhr3 commented Nov 8, 2023 • edited Loading

dsnet commented Nov 8, 2023

mhr3 commented Nov 8, 2023

dsnet commented Nov 8, 2023

mhr3 commented Nov 8, 2023

dsnet commented Nov 8, 2023 • edited Loading

rsc commented Dec 6, 2023

dsnet commented Dec 12, 2023

rsc commented Dec 13, 2023

rsc commented Jan 10, 2024

Merovius commented Jan 10, 2024

dsnet commented Jan 10, 2024

Merovius commented Jan 10, 2024

Jorropo commented Jan 11, 2024

rsc commented Jan 18, 2024

rsc commented Feb 7, 2024

dsnet commented Feb 14, 2024 • edited Loading

rsc commented Feb 14, 2024

rsc commented Feb 14, 2024

rsc commented Mar 1, 2024

Jorropo commented Oct 15, 2023 •

edited

Loading

mhr3 commented Oct 15, 2023 •

edited

Loading

dsnet commented Nov 2, 2023 •

edited

Loading

dsnet commented Nov 2, 2023 •

edited

Loading

mhr3 commented Nov 8, 2023 •

edited

Loading

dsnet commented Nov 8, 2023 •

edited

Loading

dsnet commented Feb 14, 2024 •

edited

Loading