ObjectOpenFile.Read always reads 4096 bytes? #132

dmolesUC · 2019-01-11T00:55:12Z

With Amazon's S3 API, I can use HTTP Range: to read objects in chunks of arbitrary size (in this case 5 MiB), as seen in this code.

But when I try something similar in swift, I always get 4096 bytes, regardless of the buffer size.

I tried setting the Range: header explicitly in ObjectOpen, and getting a new ObjectOpenFile for each 5-MiB chunk, but this didn't help.

Currently I'm just reading the whole thing at whatever rate ObjectOpenFile.Read returns it, but I'm concerned about the overhead. If I'm actually making a new HTTP request every 4 KiB, on a multi-gigabyte file, that adds up. Also, it seems like it would add more opportunities for dropped connections, retries, etc. Though that may not be true in practice.

(That said, I'm not sure whether there's actually a new behind-the-scenes HTTP request every 4 KiB, or if that's just io.ReadSeeker trying to be helpful.)

Is there a way to specify/increase the chunk size?

The text was updated successfully, but these errors were encountered:

ncw · 2019-01-11T10:52:04Z

I'm not 100% sure why this is happening. The ObjectOpenFile.Read is a thin wrapper around http.Response.Body.Read if checkHash is false which is is in your case.

Note the last sentence from the io.Reader docs:

Read reads up to len(p) bytes into p. It returns the number of bytes read (0
<= n <= len(p)) and any error encountered. Even if Read returns n < len(p),
it may use all of p as scratch space during the call. If some data is
available but not len(p) bytes, Read conventionally returns what is
available instead of waiting for more.

It is quite possible that there was only 4k of data available right then.

I think therefore that ObjectOpenFile.Read is acting correctly.

You can use this little wrapper function to fill the buffer

// ReadFill reads as much data from r into buf as it can
//
// It reads until the buffer is full or r.Read returned an error.
//
// This is io.ReadFull but when you just want as much data as
// possible, not an exact size of block.
func ReadFill(r io.Reader, buf []byte) (n int, err error) {
	var nn int
	for n < len(buf) && err == nil {
		nn, err = r.Read(buf[n:])
		n += nn
	}
	return n, err
}

You can also use io.ReadFull but read its docs really carefully as there are a number of gotchas!

dmolesUC · 2019-01-11T22:40:14Z

Yeah, I saw that same section in the docs and thought maybe there was an internal buffer just waiting for the Swift server to push out 4K worth of response body. Thanks for the code snippet, that looks helpful, as does the pointer to io.ReadFull.

Am I right in thinking I'm only making one HTTP request per Connection.ObjectOpen?

ncw · 2019-01-12T17:07:24Z

Am I right in thinking I'm only making one HTTP request per Connection.ObjectOpen?

Yes each Open should make one http request.

dmolesUC · 2019-01-14T17:46:01Z

I updated my code to make a separate ObjectOpen for each ranged request, and, since I already know exactly how many bytes to expect, use io.ReadFull to fill the buffer. Works like a charm. Thanks again!

dmolesUC mentioned this issue Jan 11, 2019

Swift objects are read and hashed in 4K chunks, using a lot of CPU dmolesUC/cos#1

Closed

dmolesUC closed this as completed Jan 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ObjectOpenFile.Read always reads 4096 bytes? #132

ObjectOpenFile.Read always reads 4096 bytes? #132

dmolesUC commented Jan 11, 2019 •

edited

ncw commented Jan 11, 2019

dmolesUC commented Jan 11, 2019

ncw commented Jan 12, 2019

dmolesUC commented Jan 14, 2019

ObjectOpenFile.Read always reads 4096 bytes? #132

ObjectOpenFile.Read always reads 4096 bytes? #132

Comments

dmolesUC commented Jan 11, 2019 • edited

ncw commented Jan 11, 2019

dmolesUC commented Jan 11, 2019

ncw commented Jan 12, 2019

dmolesUC commented Jan 14, 2019

dmolesUC commented Jan 11, 2019 •

edited