Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io: add OffsetWriter, NewOffsetWriter #45899

Open
nilsocket opened this issue May 1, 2021 · 20 comments
Open

io: add OffsetWriter, NewOffsetWriter #45899

nilsocket opened this issue May 1, 2021 · 20 comments

Comments

@nilsocket
Copy link

@nilsocket nilsocket commented May 1, 2021

CopyAt(dst WriterAt, src Reader, off int64) (written int64, err error)
CopyAtBuffer(dst WriterAt, src Reader, off int64, buf []byte) written int64, err error)

CopyFileRange(dst WriterAt, woff int64, src ReaderAt, roff int64) (written int64, err error)

or

func NewSectionWriter(r WriterAt, off int64, n int64) *SectionWriter
func (s *SectionWriter) Write(p []byte) (n int, err error)
func (s *SectionWriter) WriteAt(p []byte, off int64) (n int, err error)
func (s *SectionWriter) Seek(offset int64, whence int) (int64, error)
func (s *SectionWriter) Size() int64

Helpful for concurrent writes.

@gopherbot gopherbot added this to the Proposal milestone May 1, 2021
@nilsocket nilsocket changed the title proposal: io: Provide concurrent helper functions io: Provide concurrent helper functions May 1, 2021
@seankhliao seankhliao changed the title io: Provide concurrent helper functions proposal: io: Provide concurrent helper functions May 1, 2021
@seankhliao
Copy link
Contributor

@seankhliao seankhliao commented May 1, 2021

Where would this be used? does it come up often enough to be in the stdlib?

@nilsocket
Copy link
Author

@nilsocket nilsocket commented May 1, 2021

@seankhliao

  • When downloading file concurrently using http ranges.
  • Merging multiple files into one.
  • Any use case where multiple go routines try to write concurrently at specific locations of a file.

Inconvenience with current implementations:

  • WriteAt() needs to be wrapped in for loop with temporary buffer and need to keep track to check if everything is written.
  • WriteAt() with buffer length equal to data being read, uses too much of memory.
  • Seek(), using seek with multiple go routines isn't a valid option.

Better Performance for concurrent writes.

  • Zero Copy functions like ReadFrom(), and WriteTo() can be used for concurrent writes, which is not possible now.

or another option would be to povide ReadFromAt() and WriteToAt()

In-case of unix copyfilerange is a system call, which exactly does this.

Thank you.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 2, 2021

Note that we use the copy_file_range system call with io.Copy or more generally with os.File.ReadFrom. But it's true that we currently always pass the offsets arguments as nil.

In CopyAt it's not immediately clear whether the off argument applies to the reader or the writer. I can infer that it's the writer because the type is WriterAt. But in general I think if we pass an offset for one we should pass an offset for both, which just gives us the CopyFileRange function.

I think in general having both io.Copy and io.CopyBuffer was a mistake, and I wouldn't want to repeat that mistake.

We already have io.WriterAt with a WriteAt method, and os.File already has a WriteAt method. And you can already call that method concurrently from multiple goroutines. So I don't have a clear handle on what new functionality we get from these new functions.

@nilsocket
Copy link
Author

@nilsocket nilsocket commented May 3, 2021

We already have WriteAt

Yes, But some book-keeping is needed. WriteAt() is similar to write().

I don't have a clear handle on what new functionality we get from these new functions.

Like io.Copy() for Write(), io.CopyAt() for WriteAt()

// CopyBufferAt , copies `src` to `dst` at `off` using `buf`
//
// copied from io.copyBuffer
func CopyBufferAt(dst io.WriterAt, src io.Reader, off int64, buf []byte) (written int64, err error) {
	for {
		nr, er := src.Read(buf)
		if nr > 0 {
			nw, ew := dst.WriteAt(buf[0:nr], off+written)
			if nw < 0 || nr < nw {
				nw = 0
				if ew == nil {
					ew = errors.New("invalid write result")
				}
			}
			written += int64(nw)
			if ew != nil {
				err = ew
				break
			}
			if nr != nw {
				err = io.ErrShortWrite
				break
			}
		}
		if er != nil {
			if er != io.EOF {
				err = er
			}
			break
		}
	}
	return written, err
}

As for my limited understanding goes,
Currently it's not possible to do zero copy operations at specific locations.
If this feature is added, it would be good.

Thank you.

@rsc
Copy link
Contributor

@rsc rsc commented May 5, 2021

Adding more variants of Copy seems like a mistake, as others have noted.
Adding SectionWriter to match SectionReader seems plausible.
Does anyone object to adding SectionWriter?

@rsc
Copy link
Contributor

@rsc rsc commented May 5, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals May 5, 2021
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 5, 2021

Does a call to NewSectionWriter(w, off, n) imply that the size of w will always become at least off + n? Or does it depend on subsequent writes? Or is it an error if the size is not already at least off + n?

@nilsocket
Copy link
Author

@nilsocket nilsocket commented May 6, 2021

@ianlancetaylor

I don't think n is necessary here.

I just copied definitions from SectionReader and changed it to SectionWriter.

Edit:

Taking cue from NewSectionReader():

  • For NewSectionReader(), off represents the starting point and n ending point, on trying to read further than n results in io.EOF. Thus representing a section from off to n.
  • For NewSectionWriter(), off represents the starting point and n ending point, on trying to write further than n results in error x. Thus representing a section from off to n.

If n is to stay, then what error should NewSectionWriter() return, should a new error be defined?

If n is removed from NewSectionWriter(), and allows writes to go as further as user needs it.
Should it still be called SectionWriter, as it doesn't represent section anymore.

I think n should stay, and a new error be defined.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 7, 2021

If n should stay, then I think you need to answer my earlier questions:

Does a call to NewSectionWriter(w, off, n) imply that the size of w will always become at least off + n? Or does it depend on subsequent writes? Or is it an error if the size is not already at least off + n?

Thanks.

@nilsocket
Copy link
Author

@nilsocket nilsocket commented May 7, 2021

It depends on subsequent writes.

@opennota
Copy link

@opennota opennota commented May 7, 2021

It depends on subsequent writes.

How would that be helpful for concurrent writes, then?

@nilsocket
Copy link
Author

@nilsocket nilsocket commented May 7, 2021

@opennota

Isn't this going to work?

func main() {
	var length, ps, i int64 = 1000000, 100000, 0
	f, _ := os.Create("file")

	for ; i < length; i += ps {

		go func(from, to int64) {
			req, _ := http.NewRequest(http.MethodGet, "https://someurl.com/file", nil)

			setRange(req, from, to)
			resp, _ := http.DefaultClient.Do(req)

			nw := io.NewSectionWriter(f, from, to)
			io.Copy(nw, resp.Body)
			resp.Body.Close()
		}(i, i+ps)
	}
}

func setRange(req *http.Request, start, end int64) {
	req.Header.Set("Range", fmt.Sprintf("bytes=%d-%d", start, end-1))
}
@rsc
Copy link
Contributor

@rsc rsc commented May 12, 2021

It's true, you don't seem to need n (the section length). But that makes it not really a SectionWriter but more like an OffsetWriter. That's fine of course.

@rsc rsc changed the title proposal: io: Provide concurrent helper functions proposal: io: add OffsetWriter, NewOffsetWriter May 12, 2021
@rsc
Copy link
Contributor

@rsc rsc commented May 12, 2021

OK so it sounds like the API is:

// An OffsetWriter maps writes at offset o to offset o+off in the underlying writer.
type OffsetWriter struct { }

func NewOffsetWriter(r WriterAt, off int64) *OffsetWriter
func (s *OffsetWriter) Write(p []byte) (n int, err error)
func (s *OffsetWriter) WriteAt(p []byte, off int64) (n int, err error)
func (s *OffsetWriter) Seek(offset int64, whence int) (int64, error)

Do I have that right? Does anyone object to this?

@nilsocket
Copy link
Author

@nilsocket nilsocket commented May 12, 2021

Does anyone object to this?

Personally no.

@rsc
Copy link
Contributor

@rsc rsc commented May 19, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Accept in Proposals May 19, 2021
@kortschak
Copy link
Contributor

@kortschak kortschak commented May 19, 2021

Just clairfying, the offsets in these methods end up being on top of the original offset used to create the *OffsetWriter, correct?

func (s *OffsetWriter) WriteAt(p []byte, off int64) (n int, err error)
func (s *OffsetWriter) Seek(offset int64, whence int) (int64, error)
@nilsocket
Copy link
Author

@nilsocket nilsocket commented May 19, 2021

@kortschak

on top of the original offset?

I'm not sure what do mean by on top of but if you mean relatively, yes.

NewOffsetWriter(x, 10)

WriteAt(y, 10) // y is written at offset 20 of x.
@kortschak
Copy link
Contributor

@kortschak kortschak commented May 20, 2021

but if you mean relatively, yes.

Yes. Thanks.

@rsc rsc moved this from Likely Accept to Accepted in Proposals May 26, 2021
@rsc
Copy link
Contributor

@rsc rsc commented May 26, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: io: add OffsetWriter, NewOffsetWriter io: add OffsetWriter, NewOffsetWriter May 26, 2021
@rsc rsc modified the milestones: Proposal, Backlog May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants