-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: io: flatten nested SectionReaders #63673
Comments
See race condition issues too: https://go-review.googlesource.com/c/go/+/500476 ✨ I think the simplest way to do that is to always return a new section reader and try the unwrapping in |
Can you explain where the race is in your change? I can't see the semantic subtlety, it seems to me that if the original nested SectionReader logic is race-free, then the change should be as well? It only touches |
https://go-review.googlesource.com/c/go/+/500476/comments/aef8d7b0_b285c620 However now that I checked, |
I don't think this is possible due to this: const s = "hello world"
b := strings.NewReader(s)
s1 := io.NewSectionReader(b, 3, len(s)-3)
s2 := io.NewSectionReader(b, 1, len(s)-4)
s1.Seek(2, io.SeekStart)
s2.Read(buf) // reads "world" because s1.base (3) + s1.off (2) + s2.base (1) = 6, so "hello " is chopped off. |
I don't think that's right. SectionReader only ever calls ReadAt on the underlying In particular, in the example you gave, s2's read is not influenced by s1.off, because s2.Read invokes s1.ReadAt with an absolute position, and s1.ReadAt doesn't incorporate s1.off into its calculations. |
I don't think the two necessarily interact. Isn't it enough to just add an unexported field to store the original/non-flattened reader, and use that in Basically, the way it could work is that in |
We could do that, with the obvious cost of making the structure fatter for a relatively niche semantic difference. If we're forced to maintain compatibility, that's certainly an option, but I think I would advocate for adjusting the semantics of Outer to make the more efficient internals possible. (That is, assuming I'm right that For my particular use case in Matroska parsing, I think the two options are broadly equivalent because I expect the intermediate SectionReaders to have approximately the same lifetime as the child SectionReaders, so it's all about reducing processing cost rather than memory cost. That said, for other parsing use cases, it may be nice to not force the pinning of intermediate readers, so that the programmer is free to optimize for readability, even if it means throwing around a few more SectionReaders during the parse. |
What
Make io.NewSectionReader check if the provided ReaderAt is a SectionReader, and if so return a SectionReader that operates directly on the underlying ReaderAt. In other words, this code would result in a single SectionReader over bytes 120-130 of
f
, rather than 3 nested SectionReaders:Why
io.SectionReader is a way to carve up an io.ReaderAt into subsections. This is very handy when parsing file formats like Matroska, which heavily incentivize random seeking and contain a tree of size-delimited elements. A reasonable incremental parsing API for Matroska's basic framing looks like:
Leaving aside the gubbins of parsing Matroska framing, NextElement slices the provided ReaderAt into a pair of io.SectionReaders, one with the element's bytes and one for the remainder of the file. Each of these SectionReaders can be fed back into NextElement to either parse the children of the returned Element, or the Element's next sibling.
However, if you do this, currently you end up with a deep tree of nested SectionReaders, as each returned SectionReader get wrapped in further SectionReaders by subsequent calls. Each layer of the tree adds a small amount of overhead as it does offset and EOF math on progressively smaller chunks of the original ReaderAt.
Compatibility
This may be too trivial a change to warrant a proposal, since it doesn't make an obvious API change. However, it does interact with #61870 : once SectionReader.Outer is released to stable, flattening out intermediate SectionReaders is a semantic change:
Without this proposal,
sr.Outer()
returnsr
, whereassr2.Outer()
returnssr
. With this proposal, both returnr
(with appropriately adjusted offset/limit, of course). Further, in 1ce8a3f,Outer
explicitly documents that it always returns the ReaderAt that was passed to NewSectionReader.So, if this flattening change seems worth doing, it should happen before the next stable release, while Outer's semantics aren't yet set in stone by the compatibility promise.
Arguing against myself, I will also note that
SectionReader.Outer
combined with type assertions is sufficient for the caller (e.g. my Matroska parser) to do this un-nesting itself, with no stdlib changes. I'm making this proposal despite that, because it feels preferable to me for the stdlib to do this simplification by default, rather than require callers to watch out for pathological nesting of SectionReaders. There is precedent for this inbufio.NewReader
, which has special handling forbufio.Reader
inputs to avoid similar inefficient nesting.The text was updated successfully, but these errors were encountered: