swift: readahead support when reading objects #53

Merged
merged 1 commit into from Jul 25, 2017

Conversation

Projects
None yet
5 participants
Member

rogpeppe commented Jul 6, 2017

We make it more efficient for streaming and seeking
readers by issuing reads ahead of time, meaning we
can be processing some data while the HTTP request
is being read.

Control over the amount of readahead is governed by
the readAhead parameter to the new OpenObject
method, which subsumes GetReadSeeker.

jrwren approved these changes Jul 6, 2017

LGTM as long as it works and performs as good or better than previous.

internal/httpfile/httpfile.go
+ err error
+}
+
+const infinity = int64(1 << 62)
@jrwren

jrwren Jul 6, 2017

Contributor

naming quip: I prefer MaxInt64 or VeryLargeInt to infinity here. Its confusing to read because I think it is referring to float inf.

internal/httpfile/httpfile.go
+// expected. If a server doesn't support range requests for whatever
+// reason, this should catch it.
+//
+// It returns the length of the entire
@jrwren

jrwren Jul 6, 2017

Contributor

... cliffhanger. I'm on the edge of my seat. entire...

Contributor

jrwren commented Jul 6, 2017

My quick tests show it as performing much better, but this is using the behavior of io.Copy and io.Discard which has a certain pattern. Other patterns may vary.

Old method:
reading a small 1193B file, 1000 times takes: 19.111059 seconds
reading a larger 111062826B file, 1 time takes: 204.638050 seconds

	start := time.Now()
	n := int64(0)
	for i := 0; i < *count; i++ {
		req2, _, err := s.OpenObject(*container, *name, -1)
		if err != nil {
			fmt.Printf("ERROR:%s\n", err)
			return
		}
		m, err := io.Copy(ioutil.Discard, req2)
		if err != nil {
			fmt.Printf("ERROR:%s\n", err)
			return
		}
		n += m
		req2.Close()
	}
	fmt.Printf("%d bytes read in %f seconds\n", n, time.Now().Sub(start).Seconds())

New Method:

Reading a small 1193B file, 1000 times takes: 11.413s
1193000 bytes read in 11.413108 seconds

reading a larger 111062826B file, 1 time takes: 12.6s
111062826 bytes read in 12.678619 seconds

Member

rogpeppe commented Jul 6, 2017

Thanks for the measurements? What value for readAhead were you using? It could make a big difference (and readAhead=-1) is essentially equivalent to the old GetReader except you can still seek if you want to.

mhilton approved these changes Jul 7, 2017

This is fine, with a few changes. However I think it's probably more complex than it needs to be. I'd be interested how much less overhead this has than just dropping the connection on every seek. Also it might be worth just buffering the whole object when its sufficiently small so that seeking has very little cost.

internal/httpfile/httpfile.go
+
+var errOutOfRange = errors.New("read out of range")
+
+func Open(c Client, readAhead int64) (*File, http.Header, error) {
@mhilton

mhilton Jul 7, 2017

Member

doc comment

@mhilton

mhilton Jul 7, 2017

Member

also why are you returning headers here?

@rogpeppe

rogpeppe Jul 7, 2017

Member

doc comment done. the headers are returned because the goose client returns the headers from its GetReader method for some odd reason.

internal/httpfile/httpfile.go
+ length: -1, // Unknown.
+ readAhead: readAhead,
+ }
+ f.reader0 = f.newReaderAhead(0, readAhead)
@mhilton

mhilton Jul 7, 2017

Member

I suspect that if you always did a HEAD request here to get the length and Etag etc. then a number of the special cases in the code below would disappear making things simpler.

@rogpeppe

rogpeppe Jul 7, 2017

Member

You're right that it would make the code simpler, but I think it's worth pursuing the goal of issuing a single request when reasonable.

internal/httpfile/httpfile.go
+ err error
+}
+
+const infinity = int64(1 << 62)
@mhilton

mhilton Jul 7, 2017

Member

this really needs a comment.

@rogpeppe

rogpeppe Jul 7, 2017

Member

renamed to "unlimited" and added a comment.

internal/httpfile/httpfile.go
+ }()
+}
+
+func (f *File) newRequest(p0, p1 int64) *Request {
@mhilton

mhilton Jul 7, 2017

Member

This could do with a doc comment, especially with the behavior of p1 which is non-obvious.

internal/httpfile/httpfile.go
+// expected. If a server doesn't support range requests for whatever
+// reason, this should catch it.
+//
+// It returns the length of the entire
@mhilton

mhilton Jul 7, 2017

Member

entire what?

@rogpeppe

rogpeppe Jul 7, 2017

Member

I refactored this function into contentRangeFromResponse, which seems to
make the semantics clearer. The comment has gone.

internal/httpfile/httpfile.go
+ }
+ r, ok := parseContentRange(got)
+ if !ok {
+ return 0, fmt.Errorf("bad Content-Length header %q", got)
@mhilton

mhilton Jul 7, 2017

Member

Content-Range?

+ if !ok {
+ return contentRange{}, false
+ }
+ // Use the usual Go convention for half-open ranges.
@mhilton

mhilton Jul 7, 2017

Member

This should not be buried this far down. please put a comment on contentRange making it clear what p1 is.

+ if !ok {
+ return contentRange{}, false
+ }
+ r.length, s, ok = parseInt(s)
@mhilton

mhilton Jul 7, 2017

Member

The length can legitimately be "*" under some circumstances, although probably not any we'll see.

@rogpeppe

rogpeppe Jul 7, 2017

Member

added a comment

swift/swift.go
+ return f, h, nil
+}
+
+// GetObject retrieves the specified object's data.
@mhilton

mhilton Jul 7, 2017

Member

GetReader?

swift/swift.go
-
-func (o *object) Close() error {
- return nil
+ //log.Printf("%#v -> %#v", req, resp)
@mhilton

mhilton Jul 7, 2017

Member

I don't understand this comment.

@rogpeppe

rogpeppe Jul 7, 2017

Member

removed

mhilton approved these changes Jul 7, 2017

LGTM with a small nitpick

internal/httpfile/httpfile.go
+
+// newRequest returns a new Request to read the data
+// in the interval [p0, p1). If p1 is unlimited, an
+// unlimited of data is requested.
@mhilton

mhilton Jul 7, 2017

Member

unlimited amount ?

swift: readahead support when reading objects
We make it more efficient for streaming and seeking
readers by issuing reads ahead of time, meaning we
can be processing some data while the HTTP request
is being read.

Control over the amount of readahead is governed by
the readAhead parameter to the new OpenObject
method, which subsumes GetReadSeeker.
Member

rogpeppe commented Jul 25, 2017

$$merge$$

Member

mhilton commented Jul 25, 2017

$$merge$$

Member

jujubot commented Jul 25, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-goose

@jujubot jujubot merged commit 83a3054 into go-goose:v2 Jul 25, 2017

Member

jujubot commented Jul 25, 2017

Build failed: Merging failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-goose/49

goose.v2 is currently listed as experimental. I was in the process of making compatibility breaking changes... which I'd like to get back to soonish. Given that, there is no need to keep methods like GetRead() if we don't want.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment