Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: fail fast on non-retryable errors #10875

Merged
merged 3 commits into from Oct 7, 2021

Conversation

MiguelPires
Copy link
Contributor

The snap client retries all requests for a long time, regardless of the kind of error (e.g., snap list retries for 2min, if it fails to read auth.json). This is unnecessary and a bit unexpected since not all errors are transient so it obfuscates the error a bit. This commit prevents internal and auth errors from retrying since they're probably not transient.
You can test this by running snap list or version with an incorrect auth.json file, like: touch ~/.snap/auth.json; snap list (assuming you don't already have one).

The snap client would always retry requests for a long time,
regardless of error type (e.g., snap list retries for 2min,
if it fails to read auth.json).
This commit prevents retries caused by internal and auth errors.
client/client.go Outdated
@@ -200,6 +206,17 @@ func (e ConnectionError) Unwrap() error {
return e.Err
}

type InternalError struct{ Err error }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this usage of the new errors.Is etc a lot, but I'm not sure on the name, InternalError is very vague, and it's also not clear which component the error is "internal to", maybe instead this could be ClientInternalError to make it clear this type of error is only for internal errors on the client side and not i.e. Internal Server Error 500 from the snapd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, I renamed it so it's clearer

client/client.go Outdated
Comment on lines 416 to 420
func notRetryable(err error) bool {
return errors.Is(err, AuthorizationError{}) ||
errors.Is(err, InternalError{})
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

@codecov-commenter
Copy link

codecov-commenter commented Oct 5, 2021

Codecov Report

Merging #10875 (cd80af2) into master (81766df) will increase coverage by 0.02%.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10875      +/-   ##
==========================================
+ Coverage   78.29%   78.31%   +0.02%     
==========================================
  Files         900      901       +1     
  Lines      102217   102286      +69     
==========================================
+ Hits        80032    80107      +75     
+ Misses      17193    17186       -7     
- Partials     4992     4993       +1     
Flag Coverage Δ
unittests 78.31% <93.75%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
client/client.go 89.14% <93.75%> (+1.34%) ⬆️
gadget/install/install.go 13.63% <0.00%> (-32.91%) ⬇️
store/cache.go 69.23% <0.00%> (-1.93%) ⬇️
overlord/devicestate/remodel.go 81.33% <0.00%> (-0.87%) ⬇️
daemon/api_connections.go 93.04% <0.00%> (-0.54%) ⬇️
cmd/snap/cmd_debug_state.go 70.18% <0.00%> (-0.46%) ⬇️
boot/debug.go 0.00% <0.00%> (ø)
daemon/response.go 74.86% <0.00%> (ø)
overlord/backend.go 100.00% <0.00%> (ø)
osutil/disks/disks.go 83.33% <0.00%> (ø)
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81766df...cd80af2. Read the comment docs.

@MiguelPires MiguelPires added the Simple 😃 A small PR which can be reviewed quickly label Oct 5, 2021
Copy link
Contributor

@mardy mardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

client/client.go Outdated
@@ -396,6 +413,11 @@ func (client *Client) do(method, path string, query url.Values, headers map[stri
return rsp.StatusCode, nil
}

func notRetryable(err error) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of a little bikeshedding ;), I wonder if it would make sense to reverse the semantics here and follow the naming of the existing ShouldRetryError(err) bool from httputil, this would make it more easy to find all these helpers later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted not to do that because there were two options:
a) have the function determine which errors are retryable instead of which are not retryable. If we do that and a new error is added later or I forget to include one now, then it's not retried (which is worse than unnecessarily retrying).
b) write the function in the negative (like: !(errors.Is(err, AuthorizationError{}) || errors.Is(err, InternalClientError{}))) and then again reverse the result again when calling. It's safer but it seemed more convoluted than just changing the function name.
But I can change the naming to "shouldNotRetryError", to make it consistent with existing code

Copy link
Collaborator

@bboozzoo bboozzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -200,6 +206,17 @@ func (e ConnectionError) Unwrap() error {
return e.Err
}

type InternalClientError struct{ Err error }

func (e InternalClientError) Error() string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that maybe we should use pointer receivers, but I see that there's a couple of other errors here that already use values in Error(), so it's probably not worth it.

Copy link
Member

@anonymouse64 anonymouse64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still lgtm

@mvo5 mvo5 merged commit 1365428 into snapcore:master Oct 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Simple 😃 A small PR which can be reviewed quickly
Projects
None yet
7 participants