Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
swift: readahead support when reading objects #53
Conversation
jrwren
approved these changes
Jul 6, 2017
LGTM as long as it works and performs as good or better than previous.
| + err error | ||
| +} | ||
| + | ||
| +const infinity = int64(1 << 62) |
jrwren
Jul 6, 2017
Contributor
naming quip: I prefer MaxInt64 or VeryLargeInt to infinity here. Its confusing to read because I think it is referring to float inf.
| +// expected. If a server doesn't support range requests for whatever | ||
| +// reason, this should catch it. | ||
| +// | ||
| +// It returns the length of the entire |
|
My quick tests show it as performing much better, but this is using the behavior of io.Copy and io.Discard which has a certain pattern. Other patterns may vary. Old method:
New Method: Reading a small 1193B file, 1000 times takes: 11.413s reading a larger 111062826B file, 1 time takes: 12.6s |
|
Thanks for the measurements? What value for readAhead were you using? It could make a big difference (and readAhead=-1) is essentially equivalent to the old GetReader except you can still seek if you want to. |
mhilton
approved these changes
Jul 7, 2017
This is fine, with a few changes. However I think it's probably more complex than it needs to be. I'd be interested how much less overhead this has than just dropping the connection on every seek. Also it might be worth just buffering the whole object when its sufficiently small so that seeking has very little cost.
| + | ||
| +var errOutOfRange = errors.New("read out of range") | ||
| + | ||
| +func Open(c Client, readAhead int64) (*File, http.Header, error) { |
rogpeppe
Jul 7, 2017
Member
doc comment done. the headers are returned because the goose client returns the headers from its GetReader method for some odd reason.
| + length: -1, // Unknown. | ||
| + readAhead: readAhead, | ||
| + } | ||
| + f.reader0 = f.newReaderAhead(0, readAhead) |
mhilton
Jul 7, 2017
Member
I suspect that if you always did a HEAD request here to get the length and Etag etc. then a number of the special cases in the code below would disappear making things simpler.
rogpeppe
Jul 7, 2017
Member
You're right that it would make the code simpler, but I think it's worth pursuing the goal of issuing a single request when reasonable.
| + err error | ||
| +} | ||
| + | ||
| +const infinity = int64(1 << 62) |
| + }() | ||
| +} | ||
| + | ||
| +func (f *File) newRequest(p0, p1 int64) *Request { |
mhilton
Jul 7, 2017
Member
This could do with a doc comment, especially with the behavior of p1 which is non-obvious.
| +// expected. If a server doesn't support range requests for whatever | ||
| +// reason, this should catch it. | ||
| +// | ||
| +// It returns the length of the entire |
rogpeppe
Jul 7, 2017
Member
I refactored this function into contentRangeFromResponse, which seems to
make the semantics clearer. The comment has gone.
| + } | ||
| + r, ok := parseContentRange(got) | ||
| + if !ok { | ||
| + return 0, fmt.Errorf("bad Content-Length header %q", got) |
| + if !ok { | ||
| + return contentRange{}, false | ||
| + } | ||
| + // Use the usual Go convention for half-open ranges. |
mhilton
Jul 7, 2017
Member
This should not be buried this far down. please put a comment on contentRange making it clear what p1 is.
| + if !ok { | ||
| + return contentRange{}, false | ||
| + } | ||
| + r.length, s, ok = parseInt(s) |
mhilton
Jul 7, 2017
Member
The length can legitimately be "*" under some circumstances, although probably not any we'll see.
| + return f, h, nil | ||
| +} | ||
| + | ||
| +// GetObject retrieves the specified object's data. |
| - | ||
| -func (o *object) Close() error { | ||
| - return nil | ||
| + //log.Printf("%#v -> %#v", req, resp) |
| + | ||
| +// newRequest returns a new Request to read the data | ||
| +// in the interval [p0, p1). If p1 is unlimited, an | ||
| +// unlimited of data is requested. |
|
$$merge$$ |
|
$$merge$$ |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-goose |
jujubot
merged commit 83a3054
into
go-goose:v2
Jul 25, 2017
|
Build failed: Merging failed |
hmlanigan
reviewed
Jul 31, 2017
•
goose.v2 is currently listed as experimental. I was in the process of making compatibility breaking changes... which I'd like to get back to soonish. Given that, there is no need to keep methods like GetRead() if we don't want.
LGTM
rogpeppe commentedJul 6, 2017
We make it more efficient for streaming and seeking
readers by issuing reads ahead of time, meaning we
can be processing some data while the HTTP request
is being read.
Control over the amount of readahead is governed by
the readAhead parameter to the new OpenObject
method, which subsumes GetReadSeeker.