New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
retry if file download failed #1454
Conversation
This should address #1433 |
b307819
to
97c65a8
Compare
Niiiice. I'd rather distinguish between network and local disk errors though. Maybe they're so infrequent, that it makes sense to retry everything? |
I think distinguishing between the two would definitely be a nice-to-have 👍 |
Alright. How would you do it? Would it be OK for you if we wrap the repeatable error here: This would mean all errors happening here actually wrapped: I haven't dug deeper but I think this will emit more then network errors?! |
The body is read from the request through this hashing reader. Maybe we can wrap that diff --git a/tools/iotools.go b/tools/iotools.go
index 1b99f07..8fdbc74 100644
--- a/tools/iotools.go
+++ b/tools/iotools.go
@@ -7,6 +7,7 @@ import (
"io"
"github.com/github/git-lfs/progress"
+ "github.com/github/git-lfs/errutil"
)
type readSeekCloserWrapper struct {
@@ -85,3 +86,19 @@ func (r *HashingReader) Read(b []byte) (int, error) {
return w, err
}
+
+type retriableReader struct {
+ reader io.Reader
+}
+
+func NewRetriableReader(r io.Reader) io.Reader {
+ return &retriableReader{r}
+}
+
+func (r *retriableReader) Read(b []byte) (int, error) {
+ n, err := r.reader.Read(b)
+ if err != nil {
+ return n, errutil.NewRetriableError(err)
+ }
+ return n, nil
+}
diff --git a/transfer/basic_download.go b/transfer/basic_download.go
index 8c83b1b..e523ccc 100644
--- a/transfer/basic_download.go
+++ b/transfer/basic_download.go
@@ -178,11 +178,12 @@ func (a *basicDownloadAdapter) download(t *Transfer, cb TransferProgressCallback
}
var hasher *tools.HashingReader
+ httpReader := tools.NewRetriableReader(res.Body)
if fromByte > 0 && hash != nil {
// pre-load hashing reader with previous content
- hasher = tools.NewHashingReaderPreloadHash(res.Body, hash)
+ hasher = tools.NewHashingReaderPreloadHash(httpReader, hash)
} else {
- hasher = tools.NewHashingReader(res.Body)
+ hasher = tools.NewHashingReader(httpReader)
}
if dlFile == nil { |
97c65a8
to
1e1fa43
Compare
@technoweenie Elegant solution! I fixed the PR based on your idea. I am still testing the approach locally here... |
@larsxschneider Cool, let me know if it helps. I want to do a release this week, and would love to get this in :) |
ffe306a
to
bf4cf79
Compare
It is very hard for me to reproduce the error because:
I really would like to get this fix in, but I also want to be sure we really fix the issue 😄 |
@technoweenie Please review. The fix seems to substantially improve the situation for us. However, if we stress test it (multiple clones over a bad VPN connection) then we still see failures. I assume the reason is that we only run a single retry right now. |
if errutil.IsRetriableError(err) { | ||
return errutil.NewRetriableError(wrappedErr) | ||
} | ||
return wrappedErr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks cluttered. @technoweenie @ttaylorr @sinbad do you see a nice golang-ish solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 thanks for the ping. Handling of errors has changed significantly since v1.3.1, most of the work change are contained in #1463 and #1466. The gist of it is that we're now using Dave Cheney's errors
package, which is described here.
How does this one feel?
written, err := tools.CopyWithCallback(...)
if err != nil {
err = errors.Wrapf(err, "cannot write data to tempfile %q", dlfilename)
return errors.NewRetriableError(err)
}
// ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that... as long as NewRetriableError
won't rewrap an existing retriable error :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ttaylorr Is your suggestion equivalent to my original code? (re: @technoweenie 's comment) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now you'll have to check to make sure that the error isn't already retriable, like so:
written, err := tools.CopyWithCallback(...)
if err != nil {
err = errors.Wrapf(err, "cannot write data to tempfile %q", dlfilename)
if errutil.IsRetriableError(err) {
return err
}
return errors.NewRetriableError(err)
}
// ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking carefully at it I think your initial suggestion was correct. Please double check the changed error handling here: a941805
Other than that I think the PR is ready from my point of view 😄
RetriableReader wraps a error response of reader as RetriableError()
A file download can fail for multiple reasons, e.g. due to various network errors. Use a `RetriableReader` to perform a limited number of retries in case of a read error.
bf4cf79
to
a941805
Compare
if err == nil || err == io.EOF { | ||
return n, err | ||
} | ||
return n, errors.NewRetriableError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this could re-wrap an already retriable error in the case that r
's reader
has another RetriableReader
underneath, but I don't think it's a huge deal 😄
@larsxschneider This looks good. Eventually I'd like to look at being more flexible at the |
A file download can fail for multiple reasons, e.g. due to various
network errors. Use the
TransferQueue
retry mechansim to try it again.Possible errors due to spotty network connections:
https://github.com/golang/go/blob/64214792e214bbacb8c00ffea92a7131e30fa59e/src/io/io.go#L26-L47