Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JUJU-1488] Retry when we get "EOF" error from Charmhub API #14369

Merged
merged 4 commits into from
Jul 27, 2022

Conversation

benhoyt
Copy link
Member

@benhoyt benhoyt commented Jul 25, 2022

We get these weird errors from Charmhub every so often, for example:

ERROR resolving with preferred channel: Post "https://api.charmhub.io/v2/charms/refresh": EOF

These aren't valid HTTP responses, so we can't use the existing
juju/http retry logic as it only works for valid HTTP responses with
certain 40x and 50x status codes. This is an empty TCP response.

So retry 3 times (4 attempts) in this EOF case. We need to read in POST
bodies up-front because the http library will be reading/copying them
to the network more than once if a retry occurs. But they shouldn't be
huge, so this seems reasonable.

I used a test client and net.Listen server to reproduce this exact
case, to ensure the io.EOF test works on the real responses we got.
Here is the code for that:

Checklist

  • Code style: imports ordered, good names, simple structure, etc
  • Comments saying why design decisions were made
  • Go unit tests, with comments saying what you're testing
  • Integration tests, with comments saying what you're testing
  • doc.go added or updated in changed packages

QA steps

Run unit tests with go test ./charmhub. Deploy a charm/bundle and use the Juju Charmhub commands to exercise this code path, for example:

# Basic deploy of charm and bundle
$ juju bootstrap lxd
$ time juju deploy postgresql
$ time juju deploy charmed-kubernetes

# Other commands that hit the Charmhub API
$ juju info postgresql --logging-config='#charmhub=TRACE' --show-log
$ juju find postgres --logging-config='#charmhub=TRACE' --show-log
$ juju download postgresql --logging-config='#charmhub=TRACE' --show-log --series=xenial
$ juju download, juju refresh etc.

We get these weird errors from Charmhub every so often, for example:

ERROR resolving with preferred channel: Post "https://api.charmhub.io/v2/charms/refresh": EOF

These aren't valid HTTP responses, so we can't use the existing
juju/http retry logic as it only works for valid HTTP responses with
certain 40x and 50x status codes. This is an empty TCP response.

So retry 3 times (4 attempts) in this EOF case. We need to read in POST
bodies up-front because the http library will be reading/copying them
to the network more than once if a retry occurs. But they shouldn't be
huge, so this seems reasonable.

I used a test client and net.Listen server to reproduce this exact
case, to ensure the io.EOF test works on the real responses we got.
Here is the code for that:

* Client: https://pastebin.canonical.com/p/4RGNKggrK6/
* Server: https://pastebin.canonical.com/p/jtFfNP3GFF/
Copy link
Member

@wallyworld wallyworld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we think this will help, there's no harm in landing it. Perhaps start with just 1 retry given the evidence is anecdotal. We can take a view once we see if CI improves.

charmhub/http.go Outdated Show resolved Hide resolved
charmhub/http.go Outdated Show resolved Hide resolved
charmhub/http.go Outdated Show resolved Hide resolved
Copy link
Member

@hmlanigan hmlanigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This create a retry strategy which wraps another retry strategy used in the juju/http package.

Copy link
Member

@hmlanigan hmlanigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we decided to stop using the retry package?

@benhoyt
Copy link
Member Author

benhoyt commented Jul 27, 2022

@hmlanigan Thanks for the comments.

This create a retry strategy which wraps another retry strategy used in the juju/http package.

True, but the errors they retry on are mutually exclusive: this PR handles EOF at the TCP level, and the juju/http thing handles 40x/50x errors at the HTTP level.

Have we decided to stop using the retry package?

No. I initially did this as a simple for loop as I thought it didn't warrant juju/retry, but it's about the same number of lines of code, and the juju/retry version is slightly clearer (and more consistent with the rest of our code), so I switched to that. Good call.

@benhoyt
Copy link
Member Author

benhoyt commented Jul 27, 2022

/merge

@jujubot jujubot merged commit 51766b4 into juju:develop Jul 27, 2022
@benhoyt benhoyt deleted the retry-charmhub-eof branch July 27, 2022 02:21
@benhoyt benhoyt changed the title [JUJU-1488] Do 3 retries when we get "EOF" error from Charmhub API [JUJU-1488] Retry when we get "EOF" error from Charmhub API Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants