Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObjectOpenFile.Read always reads 4096 bytes? #132

Closed
dmolesUC opened this issue Jan 11, 2019 · 4 comments
Closed

ObjectOpenFile.Read always reads 4096 bytes? #132

dmolesUC opened this issue Jan 11, 2019 · 4 comments

Comments

@dmolesUC
Copy link

dmolesUC commented Jan 11, 2019

With Amazon's S3 API, I can use HTTP Range: to read objects in chunks of arbitrary size (in this case 5 MiB), as seen in this code.

But when I try something similar in swift, I always get 4096 bytes, regardless of the buffer size.

I tried setting the Range: header explicitly in ObjectOpen, and getting a new ObjectOpenFile for each 5-MiB chunk, but this didn't help.

Currently I'm just reading the whole thing at whatever rate ObjectOpenFile.Read returns it, but I'm concerned about the overhead. If I'm actually making a new HTTP request every 4 KiB, on a multi-gigabyte file, that adds up. Also, it seems like it would add more opportunities for dropped connections, retries, etc. Though that may not be true in practice.

(That said, I'm not sure whether there's actually a new behind-the-scenes HTTP request every 4 KiB, or if that's just io.ReadSeeker trying to be helpful.)

Is there a way to specify/increase the chunk size?

@ncw
Copy link
Owner

ncw commented Jan 11, 2019

I'm not 100% sure why this is happening. The ObjectOpenFile.Read is a thin wrapper around http.Response.Body.Read if checkHash is false which is is in your case.

Note the last sentence from the io.Reader docs:

Read reads up to len(p) bytes into p. It returns the number of bytes read (0
<= n <= len(p)) and any error encountered. Even if Read returns n < len(p),
it may use all of p as scratch space during the call. If some data is
available but not len(p) bytes, Read conventionally returns what is
available instead of waiting for more.

It is quite possible that there was only 4k of data available right then.

I think therefore that ObjectOpenFile.Read is acting correctly.

You can use this little wrapper function to fill the buffer

// ReadFill reads as much data from r into buf as it can
//
// It reads until the buffer is full or r.Read returned an error.
//
// This is io.ReadFull but when you just want as much data as
// possible, not an exact size of block.
func ReadFill(r io.Reader, buf []byte) (n int, err error) {
	var nn int
	for n < len(buf) && err == nil {
		nn, err = r.Read(buf[n:])
		n += nn
	}
	return n, err
}

You can also use io.ReadFull but read its docs really carefully as there are a number of gotchas!

@dmolesUC
Copy link
Author

Yeah, I saw that same section in the docs and thought maybe there was an internal buffer just waiting for the Swift server to push out 4K worth of response body. Thanks for the code snippet, that looks helpful, as does the pointer to io.ReadFull.

Am I right in thinking I'm only making one HTTP request per Connection.ObjectOpen?

@ncw
Copy link
Owner

ncw commented Jan 12, 2019

Am I right in thinking I'm only making one HTTP request per Connection.ObjectOpen?

Yes each Open should make one http request.

@dmolesUC
Copy link
Author

I updated my code to make a separate ObjectOpen for each ranged request, and, since I already know exactly how many bytes to expect, use io.ReadFull to fill the buffer. Works like a charm. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants