Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry if file download failed #1454

Merged
merged 3 commits into from Aug 24, 2016
Merged

Conversation

larsxschneider
Copy link
Member

A file download can fail for multiple reasons, e.g. due to various
network errors. Use the TransferQueue retry mechansim to try it again.

Possible errors due to spotty network connections:
https://github.com/golang/go/blob/64214792e214bbacb8c00ffea92a7131e30fa59e/src/io/io.go#L26-L47

@larsxschneider
Copy link
Member Author

This should address #1433
But I need to perform a large number of clones to trigger the bad behavior. Therefore I don't know yet if it really solves the issue. But I think it should help 😄
Would that be an acceptable solution for you or is it too broad?

@technoweenie
Copy link
Contributor

Niiiice. I'd rather distinguish between network and local disk errors though. Maybe they're so infrequent, that it makes sense to retry everything?

@ttaylorr
Copy link
Contributor

ttaylorr commented Aug 16, 2016

I think distinguishing between the two would definitely be a nice-to-have 👍

@larsxschneider
Copy link
Member Author

larsxschneider commented Aug 16, 2016

Alright. How would you do it? Would it be OK for you if we wrap the repeatable error here:
https://github.com/github/git-lfs/blob/32b91777a2f57534738b7962d770b4d198cab647/tools/iotools.go#L43
https://github.com/github/git-lfs/blob/32b91777a2f57534738b7962d770b4d198cab647/tools/iotools.go#L51

This would mean all errors happening here actually wrapped:
https://github.com/golang/go/blob/64214792e214bbacb8c00ffea92a7131e30fa59e/src/io/io.go#L376-L414

I haven't dug deeper but I think this will emit more then network errors?!

@technoweenie
Copy link
Contributor

The body is read from the request through this hashing reader. Maybe we can wrap that res.Body argument with a Reader implementation that wraps all errors as retriable errors?

diff --git a/tools/iotools.go b/tools/iotools.go
index 1b99f07..8fdbc74 100644
--- a/tools/iotools.go
+++ b/tools/iotools.go
@@ -7,6 +7,7 @@ import (
        "io"

        "github.com/github/git-lfs/progress"
+       "github.com/github/git-lfs/errutil"
 )

 type readSeekCloserWrapper struct {
@@ -85,3 +86,19 @@ func (r *HashingReader) Read(b []byte) (int, error) {

        return w, err
 }
+
+type retriableReader struct {
+       reader io.Reader
+}
+
+func NewRetriableReader(r io.Reader) io.Reader {
+       return &retriableReader{r}
+}
+
+func (r *retriableReader) Read(b []byte) (int, error) {
+       n, err := r.reader.Read(b)
+       if err != nil {
+               return n, errutil.NewRetriableError(err)
+       }
+       return n, nil
+}
diff --git a/transfer/basic_download.go b/transfer/basic_download.go
index 8c83b1b..e523ccc 100644
--- a/transfer/basic_download.go
+++ b/transfer/basic_download.go
@@ -178,11 +178,12 @@ func (a *basicDownloadAdapter) download(t *Transfer, cb TransferProgressCallback
        }

        var hasher *tools.HashingReader
+       httpReader := tools.NewRetriableReader(res.Body)
        if fromByte > 0 && hash != nil {
                // pre-load hashing reader with previous content
-               hasher = tools.NewHashingReaderPreloadHash(res.Body, hash)
+               hasher = tools.NewHashingReaderPreloadHash(httpReader, hash)
        } else {
-               hasher = tools.NewHashingReader(res.Body)
+               hasher = tools.NewHashingReader(httpReader)
        }

        if dlFile == nil {

@larsxschneider
Copy link
Member Author

@technoweenie Elegant solution! I fixed the PR based on your idea. I am still testing the approach locally here...

@technoweenie
Copy link
Contributor

@larsxschneider Cool, let me know if it helps. I want to do a release this week, and would love to get this in :)

@larsxschneider
Copy link
Member Author

It is very hard for me to reproduce the error because:

  • only happens on Windows
  • only with our company VPN
  • only on repos with a large number of LFS files
  • only sporadically

I really would like to get this fix in, but I also want to be sure we really fix the issue 😄
Apparently one of our engineers saw the exact problem with this fix... still investigating. I just build a special version with my name in the debug output to ensure the correct git-lfs.exe binary is in use 😉

@larsxschneider larsxschneider changed the title [DO NOT MERGE] retry if file download failed retry if file download failed Aug 22, 2016
@larsxschneider
Copy link
Member Author

@technoweenie Please review.

The fix seems to substantially improve the situation for us. However, if we stress test it (multiple clones over a bad VPN connection) then we still see failures. I assume the reason is that we only run a single retry right now.

if errutil.IsRetriableError(err) {
return errutil.NewRetriableError(wrappedErr)
}
return wrappedErr
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks cluttered. @technoweenie @ttaylorr @sinbad do you see a nice golang-ish solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 thanks for the ping. Handling of errors has changed significantly since v1.3.1, most of the work change are contained in #1463 and #1466. The gist of it is that we're now using Dave Cheney's errors package, which is described here.

How does this one feel?

written, err := tools.CopyWithCallback(...)
if err != nil {
        err = errors.Wrapf(err, "cannot write data to tempfile %q", dlfilename)

        return errors.NewRetriableError(err)
}

// ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that... as long as NewRetriableError won't rewrap an existing retriable error :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ttaylorr Is your suggestion equivalent to my original code? (re: @technoweenie 's comment) ?

Copy link
Contributor

@ttaylorr ttaylorr Aug 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now you'll have to check to make sure that the error isn't already retriable, like so:

written, err := tools.CopyWithCallback(...)
if err != nil {
        err = errors.Wrapf(err, "cannot write data to tempfile %q", dlfilename)

        if errutil.IsRetriableError(err) {
                return err
        }
        return errors.NewRetriableError(err)
}

// ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking carefully at it I think your initial suggestion was correct. Please double check the changed error handling here: a941805

Other than that I think the PR is ready from my point of view 😄

RetriableReader wraps a error response of reader as RetriableError()
A file download can fail for multiple reasons, e.g. due to various
network errors. Use a `RetriableReader` to perform a limited number of
retries in case of a read error.
if err == nil || err == io.EOF {
return n, err
}
return n, errors.NewRetriableError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this could re-wrap an already retriable error in the case that r's reader has another RetriableReader underneath, but I don't think it's a huge deal 😄

@ttaylorr
Copy link
Contributor

@larsxschneider This looks good. Eventually I'd like to look at being more flexible at the transfer layer by supporting things like multiple retries per OID, backoffs and other resiliency measures like that.

@ttaylorr ttaylorr merged commit 89f37f4 into git-lfs:master Aug 24, 2016
@ttaylorr ttaylorr mentioned this pull request Aug 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants