-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JUJU-1488] Retry when we get "EOF" error from Charmhub API #14369
Conversation
We get these weird errors from Charmhub every so often, for example: ERROR resolving with preferred channel: Post "https://api.charmhub.io/v2/charms/refresh": EOF These aren't valid HTTP responses, so we can't use the existing juju/http retry logic as it only works for valid HTTP responses with certain 40x and 50x status codes. This is an empty TCP response. So retry 3 times (4 attempts) in this EOF case. We need to read in POST bodies up-front because the http library will be reading/copying them to the network more than once if a retry occurs. But they shouldn't be huge, so this seems reasonable. I used a test client and net.Listen server to reproduce this exact case, to ensure the io.EOF test works on the real responses we got. Here is the code for that: * Client: https://pastebin.canonical.com/p/4RGNKggrK6/ * Server: https://pastebin.canonical.com/p/jtFfNP3GFF/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we think this will help, there's no harm in landing it. Perhaps start with just 1 retry given the evidence is anecdotal. We can take a view once we see if CI improves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This create a retry strategy which wraps another retry strategy used in the juju/http package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we decided to stop using the retry package?
@hmlanigan Thanks for the comments.
True, but the errors they retry on are mutually exclusive: this PR handles EOF at the TCP level, and the
No. I initially did this as a simple |
/merge |
We get these weird errors from Charmhub every so often, for example:
These aren't valid HTTP responses, so we can't use the existing
juju/http retry logic as it only works for valid HTTP responses with
certain 40x and 50x status codes. This is an empty TCP response.
So retry 3 times (4 attempts) in this EOF case. We need to read in POST
bodies up-front because the http library will be reading/copying them
to the network more than once if a retry occurs. But they shouldn't be
huge, so this seems reasonable.
I used a test client and net.Listen server to reproduce this exact
case, to ensure the io.EOF test works on the real responses we got.
Here is the code for that:
Checklist
Integration tests, with comments saying what you're testingQA steps
Run unit tests with
go test ./charmhub
. Deploy a charm/bundle and use the Juju Charmhub commands to exercise this code path, for example: