Make retrying transport and http errors configurable #1122

DennisDenuto · 2021-09-09T00:31:04Z

Related to #1114

Adds the following 4 options:

WithRetryHTTPBackoff
WithRetryHTTPPredicate
WithRetryTransportBackoff
WithRetryTransportPredicate

Went with a finer level API around configuring these (in my head) 2 groups of retry logic (low-level transport errors and higher level HTTP errors).

I was also aiming to keep the 'default' retry logic values the same (if these options are not provided).

jonjohnsonjr · 2021-09-09T15:48:28Z

So I don't love how many options this adds, but I get the need for this to be more configurable.

For the transport pieces, I think I'd like to revisit this: #740

Ideally, you could supply your own transport via remote.WithTransport and use transport.NewRetry to configure it. The only issue is around double-wrapping because of how we wrap transports by default. One option here is to just not wrap the supplied transport if you pass in a custom transport, but then you'd lose out on some "niceness". If we exposed this stuff in a nice way, you could claw that back by opting into it, e.g.:

// Wraps with retries and useragent and debug logging
remote.Image(..., remote.WithTransport(remote.Transport{t}))

// No wrapping of t
remote.Image(..., remote.WithTransport(t))

// Configurable backoff stuff.
remote.Image(..., remote.WithTransport(transport.NewRetry(t, ...)))

My only issue with that approach is that it's probably a breaking change :/

Another approach might be exposing more on the errors we return from transport, e.g. the request:

go-containerregistry/pkg/v1/remote/transport/error.go

Line 52 in 8388fde

request *http.Request

That would allow you to write a predicate that checked the request method, URL, and status code yourself to determine if you should retry it.

Note: This is a breaking change Authored-by: Dennis Leon <leonde@vmware.com>

codecov-commenter · 2021-09-10T01:06:44Z

Codecov Report

Merging #1122 (4e8dccb) into main (8388fde) will increase coverage by 0.22%.
The diff coverage is 83.95%.

@@            Coverage Diff             @@
##             main    #1122      +/-   ##
==========================================
+ Coverage   75.16%   75.39%   +0.22%     
==========================================
  Files         108      108              
  Lines        7724     7827     +103     
==========================================
+ Hits         5806     5901      +95     
  Misses       1363     1363              
- Partials      555      563       +8

Impacted Files	Coverage Δ
pkg/v1/remote/options.go	`69.23% <65.78%> (+7.98%)`	⬆️
pkg/v1/google/list.go	`70.81% <100.00%> (+0.64%)`	⬆️
pkg/v1/remote/multi_write.go	`62.92% <100.00%> (+3.72%)`	⬆️
pkg/v1/remote/transport/error.go	`100.00% <100.00%> (ø)`
pkg/v1/remote/transport/retry.go	`100.00% <100.00%> (ø)`
pkg/v1/remote/transport/transport.go	`100.00% <100.00%> (ø)`
pkg/v1/remote/write.go	`64.82% <100.00%> (+1.96%)`	⬆️
pkg/v1/daemon/image.go	`75.51% <0.00%> (-24.49%)`	⬇️
pkg/legacy/tarball/write.go	`67.18% <0.00%> (-1.16%)`	⬇️
pkg/v1/mutate/image.go	`69.72% <0.00%> (+0.49%)`	⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8388fde...4e8dccb. Read the comment docs.

DennisDenuto · 2021-09-10T01:21:47Z

@jonjohnsonjr

Yeah I think iterating over the API change to make it as minimal as needed makes lots of sense. Both from maintainability and from a usability perspective! With the changes above sounds like a trade-off between a simpler API vs breaking changes to existing APIs.

I played around with the idea of using WithTransport as a way to allow a consumer to configure retrying.

Another approach might be exposing more on the errors we return from transport

I did this by creating a new RetryError that wraps the transport error (if it exists), and exposes the status code and request.
I also made it implement the Temporary function to make it a compatible with the 'default' transport predicate. (This part of the change IMO feels the most awkward, since we are exposing higher level errors (http status code) to a lower level function call.)

I pushed the changes. I'd love some early feedback to make sure i'm properly understanding your idea to begin with!

This PR comes with breaking changes:

Using WithTransport(t) with a non remote.Transport no longer wraps retry, logging etc...
Retrying with the transport.Retry will return a RetryError (even if the RoundTripper did not return an error). It always returns a RetryError since it contains http data (status code) to allow specifying a Predicate to determine whether to retry or not.

Note:

Some tests are failing due to these breaking changes. But I wanted to get feedback before proceeding down this path.

jonjohnsonjr · 2021-09-10T20:32:53Z

I don't think we need RetryError if we just expose the existing request field on the transport.Error struct. Callers can check if an error is transport.Error and use that information. If it's some other kind of error, they can just do that in an "else" rather than using Inner.

Using WithTransport(t) with a non remote.Transport no longer wraps retry, logging etc...

This feels kinda backwards. I think if someone gives us a transport.Transport, we should avoid wrapping it at all. We could use that as a signal that the caller knows what they're doing and we should get out of their way.

It would still make sense to me to expose Backoff/Predicate options in remote for higher level things that cannot be retried at a request level, e.g. layer uploads. I'm not sure how I would want to structure those hooks, but there are a handful of places that might make sense to inject some kind of retry check.

I'd expect the default predicate to be somewhat conservative, but having the option would let callers do some pretty flexible things like only retry for certain methods or for certain paths.

- Useful by consumers providing their own Predicate to determine whether to retry or not Authored-by: Dennis Leon <leonde@vmware.com>

DennisDenuto · 2021-09-10T23:12:46Z

if we just expose the existing request field on the transport.Error

hmmm. I don't see how transport.Error ever gets returned by a RoundTripper. I can however see how it is returned at the http req/resp layer via CheckError(...)

or put another way...

I don't see how a consumer providing their own Retry Transport would ever get an error of type transport.Error

Although I see how exposing the Request on the transport.Error can be useful in conjunction with exposing Backoff/Predicate options in remote for higher level http retries.

I think if someone gives us a transport.Transport, we should avoid wrapping it at all.

hmm ok, I mis-understood your code snippet above then

// Wraps with retries and useragent and debug logging
remote.Image(..., remote.WithTransport(remote.Transport{t}))

but thinking about it more, it does make sense that it should not wrap, and i think will result in this not being a breaking change. I updated the PR with this.

It would still make sense to me to expose Backoff/Predicate options in remote

I think this helps clarify things a lot for me.

I was trying to have transport.Retry provide a way to retry both low level transport errors and http level errors. I updated the PR to put back Backoff/Predicate options in remote :-)

I'm not sure how I would want to structure those hooks,

I made a first attempt at it. Essentially the writer has predicate and backoff as configurable options

… http retries Authored-by: Dennis Leon <leonde@vmware.com>

jonjohnsonjr · 2021-09-10T23:18:13Z

hmm ok, I mis-understood your code snippet above then

Yeah I didn't love my code snippet, but what you've got here is actually a better solution, I think.

I think this helps clarify things a lot for me.

I was trying to have transport.Retry provide a way to retry both low level transport errors and http level errors. I updated the PR to put back Backoff/Predicate options in remote :-)

Ah yeah, sorry this wasn't more clear. There were two categories of retry stuff going on, but we kind of lumped them together. Current PR is pretty close to optimal, I think.

jonjohnsonjr · 2021-09-10T23:19:50Z

pkg/v1/remote/transport/transport.go

+
+// Transport results in *not* wrapping supplied transport with additional logic such as retries, useragent and debug logging
+// Consumers are opt-ing into providing their own transport without any additional wrapping.
+type Transport struct {


I think all that's left is to make the transport package actually return these.

I don't love the transport.Transport name. Maybe transport.Wrapper or something?

Another issue is that it's kind of a pain to actually construct a transport outside of the remote package (you need to know where layers came from, where you're pushing, etc). We could make that a lot more ergonomic by introducing a new transport implementation that defers those decisions until they're needed, like containerd (#666 (comment)).

Yep! changed it to transport.Wrapper

make the transport package actually return these.

I wasn't 100% sure what you meant here. I took this to mean NewWithContext should return transport.Wrapper. I pushed the change.

Another issue is that it's kind of a pain to actually construct a transport outside of the remote package (you need to know where layers came from, where you're pushing, etc). We could make that a lot more ergonomic by introducing a new transport implementation that defers those decisions until they're needed, like containerd (#666 (comment)).

hmm ok, is it possible to split introducing a new deferring transport implementation into a separate PR? Or do you see this as a blocker to this PR?

I wasn't 100% sure what you meant here. I took this to mean NewWithContext should return transport.Wrapper. I pushed the change.

Yep, that's perfect.

hmm ok, is it possible to split introducing a new deferring transport implementation into a separate PR? Or do you see this as a blocker to this PR?

Not a blocker, just thinking out loud.

jonjohnsonjr · 2021-09-10T23:20:13Z

pkg/v1/remote/options.go

-	if logs.Enabled(logs.Debug) {
-		o.transport = transport.NewLogger(o.transport)
-	}
+	if _, ok := o.transport.(*transport.Transport); !ok {


Add a comment here and in google describing why we're doing things this way.

Also for the WithTransport option, we'll want to describe this workaround.

pkg/v1/remote/options.go

- add test to assert behavior around using a transport.Wrapper results in no additional wrapping such as retry is done. refactoring - add comments - rename transport.Transport -> transport.Wrapper - make transport package return transport.Wrapper Authored-by: Dennis Leon <leonde@vmware.com>

DennisDenuto · 2021-09-16T17:18:32Z

@jonjohnsonjr Just checking in to see if theres anything else needed to have this merged in?

jonjohnsonjr · 2021-09-16T18:07:17Z

Sorry I missed your follow-up commits!

One final thing I forgot to mention. If we're dealing with a transport.Wrapper, we probably also want to skip all of this:

go-containerregistry/pkg/v1/remote/transport/transport.go

Lines 53 to 68 in 8388fde

    
           pr, err := ping(ctx, reg, t) 
        
           if err != nil { 
        
           	return nil, err 
        
           } 
        
           // Wrap t with a useragent transport unless we already have one. 
        
           if _, ok := t.(*userAgentTransport); !ok { 
        
           	t = NewUserAgent(t, "") 
        
           } 
        
           // Wrap t in a transport that selects the appropriate scheme based on the ping response. 
        
           t = &schemeTransport{ 
        
           	scheme:   pr.scheme, 
        
           	registry: reg, 
        
           	inner:    t, 
        
           }

We can do a little introspection to determine if the the given transport can be reused by looking at reg to see if it's the same as what t already pinged. Then we can skip some pinging. For the bearer case, we probably want to merge the existing scopes with the given scopes like with:

go-containerregistry/pkg/v1/remote/transport/bearer.go

Lines 99 to 105 in 8388fde

    
           // Add any scopes that we don't already request. 
        
           got := stringSet(bt.scopes) 
        
           for _, want := range scopes { 
        
           	if _, ok := got[want]; !ok { 
        
           		bt.scopes = append(bt.scopes, want) 
        
           	} 
        
           }

DennisDenuto · 2021-09-21T00:59:00Z

Thanks I appreciate the pointers @jonjohnsonjr

We can do a little introspection to determine if the the given transport can be reused by looking at reg to see if it's the same as what t already pinged

I did this by saving state ('pingedRegistries') in transport.Wrapper

For the bearer case, we probably want to merge the existing scopes with the given scopes like with:

I figured that the scopes should be tied to the registry that the transport pinged against. Hence scopes being a map on the wrapped transport.

jonjohnsonjr · 2021-09-21T23:34:22Z

Okay so this might make you hate me, but I'm not thrilled with how this turned out after my suggestion (I know... I'm sorry).

Do you mind backing this out to the previous revision? We can just submit that because it's useful on its own, then I can revisit the ping optimizations when I have more time to look at it.

Again, sorry, I appreciate your patience.

DennisDenuto · 2021-09-22T17:06:48Z

@jonjohnsonjr lol no worries. i've reverted that commit

fwiw I have enjoyed working on this PR and we get a ton of value from using this library. so more than happy to be patient and get things done right! 😄

pkg/v1/remote/options.go

pkg/v1/remote/transport/transport.go

- Consumers should construct a transport.Wrapper via constructor transport.NewWithContext - options retryBackoff and retryPredicate should only apply to http errors and not lower level transport errors. (Consumers can still provide a transport with the retry behavior they want) Authored-by: Dennis Leon <leonde@vmware.com>

jonjohnsonjr

Thanks for sticking with this. This looks good to me -- let's merge and let it bake for a bit while I try to figure out how we can get rid of extra pings :)

DennisDenuto force-pushed the make-retry-configurable branch from cc132f2 to 822e1a1 Compare September 9, 2021 00:34

jonjohnsonjr mentioned this pull request Sep 9, 2021

User-Agent is still set to "go-containerregistry" in override transport #1107

Closed

pivotaljohn mentioned this pull request Sep 9, 2021

Allow specifying the number of retries carvel-dev/imgpkg#231

Closed

2 tasks

Only wrap transport if it is a transport.Transport

6a5f6b3

Note: This is a breaking change Authored-by: Dennis Leon <leonde@vmware.com>

DennisDenuto force-pushed the make-retry-configurable branch from 822e1a1 to 4cdd7d1 Compare September 10, 2021 01:06

DennisDenuto force-pushed the make-retry-configurable branch from 4cdd7d1 to cfea68c Compare September 10, 2021 01:21

Provide additional information in transport.Error

504619a

- Useful by consumers providing their own Predicate to determine whether to retry or not Authored-by: Dennis Leon <leonde@vmware.com>

DennisDenuto force-pushed the make-retry-configurable branch from cfea68c to bf0d731 Compare September 10, 2021 23:11

Add options to configure predicate/backoff when handling higher level…

a09fa36

… http retries Authored-by: Dennis Leon <leonde@vmware.com>

DennisDenuto force-pushed the make-retry-configurable branch from bf0d731 to a09fa36 Compare September 10, 2021 23:14

jonjohnsonjr reviewed Sep 10, 2021

View reviewed changes

DennisDenuto force-pushed the make-retry-configurable branch 2 times, most recently from b591fe2 to 8178e55 Compare September 11, 2021 00:01

jonjohnsonjr reviewed Sep 11, 2021

View reviewed changes

pkg/v1/remote/options.go Show resolved Hide resolved

DennisDenuto force-pushed the make-retry-configurable branch from 8178e55 to c3c7f34 Compare September 11, 2021 01:07

DennisDenuto force-pushed the make-retry-configurable branch from c3c7f34 to 13a1b0b Compare September 11, 2021 01:21

DennisDenuto requested a review from jonjohnsonjr September 16, 2021 17:18

DennisDenuto mentioned this pull request Sep 16, 2021

Revert "Remove retry logic from registry.go" carvel-dev/imgpkg#246

Merged

DennisDenuto force-pushed the make-retry-configurable branch 2 times, most recently from 788b195 to 4635e06 Compare September 21, 2021 00:55

DennisDenuto force-pushed the make-retry-configurable branch from 4635e06 to 226b7ec Compare September 21, 2021 01:08

DennisDenuto force-pushed the make-retry-configurable branch from 226b7ec to 13a1b0b Compare September 22, 2021 17:04

jonjohnsonjr reviewed Sep 22, 2021

View reviewed changes

pkg/v1/remote/options.go Outdated Show resolved Hide resolved

pkg/v1/remote/transport/transport.go Outdated Show resolved Hide resolved

jonjohnsonjr approved these changes Sep 22, 2021

View reviewed changes

jonjohnsonjr merged commit 34b7f00 into google:main Sep 22, 2021

hasheddan mentioned this pull request Jan 5, 2022

Update go-containerregistry to v0.7.0 crossplane/crossplane#2814

Merged

3 tasks

hasheddan mentioned this pull request Feb 2, 2022

Update go-containerregistry to v0.8.0 upbound/up#147

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make retrying transport and http errors configurable #1122

Make retrying transport and http errors configurable #1122

DennisDenuto commented Sep 9, 2021 •

edited

jonjohnsonjr commented Sep 9, 2021

codecov-commenter commented Sep 10, 2021 •

edited

DennisDenuto commented Sep 10, 2021 •

edited

jonjohnsonjr commented Sep 10, 2021

DennisDenuto commented Sep 10, 2021 •

edited

jonjohnsonjr commented Sep 10, 2021

jonjohnsonjr Sep 10, 2021

DennisDenuto Sep 11, 2021

jonjohnsonjr Sep 11, 2021

jonjohnsonjr Sep 10, 2021

DennisDenuto Sep 10, 2021

DennisDenuto commented Sep 16, 2021

jonjohnsonjr commented Sep 16, 2021

DennisDenuto commented Sep 21, 2021

jonjohnsonjr commented Sep 21, 2021 •

edited

DennisDenuto commented Sep 22, 2021

jonjohnsonjr left a comment

Make retrying transport and http errors configurable #1122

Make retrying transport and http errors configurable #1122

Conversation

DennisDenuto commented Sep 9, 2021 • edited

jonjohnsonjr commented Sep 9, 2021

codecov-commenter commented Sep 10, 2021 • edited

Codecov Report

DennisDenuto commented Sep 10, 2021 • edited

jonjohnsonjr commented Sep 10, 2021

DennisDenuto commented Sep 10, 2021 • edited

jonjohnsonjr commented Sep 10, 2021

jonjohnsonjr Sep 10, 2021

Choose a reason for hiding this comment

DennisDenuto Sep 11, 2021

Choose a reason for hiding this comment

jonjohnsonjr Sep 11, 2021

Choose a reason for hiding this comment

jonjohnsonjr Sep 10, 2021

Choose a reason for hiding this comment

DennisDenuto Sep 10, 2021

Choose a reason for hiding this comment

DennisDenuto commented Sep 16, 2021

jonjohnsonjr commented Sep 16, 2021

DennisDenuto commented Sep 21, 2021

jonjohnsonjr commented Sep 21, 2021 • edited

DennisDenuto commented Sep 22, 2021

jonjohnsonjr left a comment

Choose a reason for hiding this comment

DennisDenuto commented Sep 9, 2021 •

edited

codecov-commenter commented Sep 10, 2021 •

edited

DennisDenuto commented Sep 10, 2021 •

edited

DennisDenuto commented Sep 10, 2021 •

edited

jonjohnsonjr commented Sep 21, 2021 •

edited