Skip to content

proposal: io: flatten nested SectionReaders #63673

@danderson

Description

@danderson

What

Make io.NewSectionReader check if the provided ReaderAt is a SectionReader, and if so return a SectionReader that operates directly on the underlying ReaderAt. In other words, this code would result in a single SectionReader over bytes 120-130 of f, rather than 3 nested SectionReaders:

var f *os.File

sr1 := io.NewSectionReader(f, 100, 100)
sr2 := io.NewSectionReader(sr1, 0, 50)
sr3 := io.NewSectionReader(sr2, 20, 10)

// equivalent to:
sr3 := io.NewSectionReader(f, 120, 10)

Why

io.SectionReader is a way to carve up an io.ReaderAt into subsections. This is very handy when parsing file formats like Matroska, which heavily incentivize random seeking and contain a tree of size-delimited elements. A reasonable incremental parsing API for Matroska's basic framing looks like:

type Element struct {
  ID uint64
  Size uint64
  Data io.ReaderAt
}

func NextElement(r io.ReaderAt) (elem *Element, remaining io.ReaderAt, err error)

Leaving aside the gubbins of parsing Matroska framing, NextElement slices the provided ReaderAt into a pair of io.SectionReaders, one with the element's bytes and one for the remainder of the file. Each of these SectionReaders can be fed back into NextElement to either parse the children of the returned Element, or the Element's next sibling.

However, if you do this, currently you end up with a deep tree of nested SectionReaders, as each returned SectionReader get wrapped in further SectionReaders by subsequent calls. Each layer of the tree adds a small amount of overhead as it does offset and EOF math on progressively smaller chunks of the original ReaderAt.

Compatibility

This may be too trivial a change to warrant a proposal, since it doesn't make an obvious API change. However, it does interact with #61870 : once SectionReader.Outer is released to stable, flattening out intermediate SectionReaders is a semantic change:

sr := io.NewSectionReader(r, 42, 100)
sr2 := io.NewSectionReader(sr, 50, 10)

Without this proposal, sr.Outer() returns r, whereas sr2.Outer() returns sr. With this proposal, both return r (with appropriately adjusted offset/limit, of course). Further, in 1ce8a3f, Outer explicitly documents that it always returns the ReaderAt that was passed to NewSectionReader.

So, if this flattening change seems worth doing, it should happen before the next stable release, while Outer's semantics aren't yet set in stone by the compatibility promise.

Arguing against myself, I will also note that SectionReader.Outer combined with type assertions is sufficient for the caller (e.g. my Matroska parser) to do this un-nesting itself, with no stdlib changes. I'm making this proposal despite that, because it feels preferable to me for the stdlib to do this simplification by default, rather than require callers to watch out for pathological nesting of SectionReaders. There is precedent for this in bufio.NewReader, which has special handling for bufio.Reader inputs to avoid similar inefficient nesting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Incoming

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions