Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give the ping context a timeout. #1163

Merged
merged 1 commit into from
Nov 4, 2021
Merged

Conversation

mattmoor
Copy link
Collaborator

@mattmoor mattmoor commented Nov 4, 2021

Debugging why requests to an insecure registry seem to hang, I noticed that there are 3+ requests to the https endpoint before it falls back to http (at that point my logs truncate because the test was over). The time between the request being issued and completing in the roundtripper was ~30s each.

This change gives each ping a generous 5s timeout to complete each request.

Debugging why requests to an insecure registry seem to hang, I noticed that there are 3+ requests to the https endpoint before it falls back to http (at that point my logs truncate because the test was over).  The time between the request being issued and completing in the roundtripper was ~30s each.

This change gives each ping a generous 5s timeout to complete each request.
@codecov-commenter
Copy link

codecov-commenter commented Nov 4, 2021

Codecov Report

Merging #1163 (ff62435) into main (dd49079) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1163   +/-   ##
=======================================
  Coverage   75.22%   75.22%           
=======================================
  Files         108      108           
  Lines        7850     7852    +2     
=======================================
+ Hits         5905     5907    +2     
  Misses       1379     1379           
  Partials      566      566           
Impacted Files Coverage Δ
pkg/v1/remote/transport/ping.go 91.76% <100.00%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dd49079...ff62435. Read the comment docs.

@mattmoor
Copy link
Collaborator Author

mattmoor commented Nov 4, 2021

I'm testing another mechanism for this, where I pass a custom http.Transport (based on the default transport) with a shorter dial timeout:

remote.WithTransport(&http.Transport{
	Proxy: http.ProxyFromEnvironment,
	DialContext: (&net.Dialer{
		Timeout:   5 * time.Second,
		KeepAlive: 30 * time.Second,
	}).DialContext,
	ForceAttemptHTTP2:     true,
	MaxIdleConns:          100,
	IdleConnTimeout:       90 * time.Second,
	TLSHandshakeTimeout:   10 * time.Second,
	ExpectContinueTimeout: 1 * time.Second,
})

If this works, we may want to consider something like this as a better default transport for GGCR since the experience with a repeated dial timeout on the https ping retried 5(!!!) times is miserable.

I haven't confirmed, but I believe I'm hitting the same thing publishing stuff with cosign as well and I'm sure all the other tools will be affected (ko, buildpacks, kaniko).

I'll report back as to whether this actually works, and if so we can discuss which change we'd rather have.

@mattmoor
Copy link
Collaborator Author

mattmoor commented Nov 4, 2021

The custom transport works. Given our built-in retries, I'd be comfortable moving the direction of the transport, but I'm open to other options as well.

Copy link
Collaborator

@jonjohnsonjr jonjohnsonjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok to just merge this until we can find a better solution.

@imjasonh imjasonh merged commit 080751a into google:main Nov 4, 2021
mattmoor added a commit to mattmoor/go-containerregistry that referenced this pull request Nov 4, 2021
mattmoor added a commit to mattmoor/go-containerregistry that referenced this pull request Nov 4, 2021
mattmoor added a commit to mattmoor/go-containerregistry that referenced this pull request Nov 10, 2021
imjasonh pushed a commit that referenced this pull request Nov 11, 2021
* Revert "Give the ping context a timeout. (#1163)"

This reverts commit 080751a.

* Define a new `remote.DefaultTransport`.

Previously we used `http.DefaultTransport` (on which this is based),
but this uses a default dial timeout of 30s, which when we (by default)
wrap things in 5x retries can lead to ~150s delays simply pinging
an http-based registry.

If folks encounter issues with this via the library they can restore
the current behavior with:
```go
remote.WithTransport(http.DefaultTransport)
```

If folks encounter issues with this via `crane` they can restore the
current behavior with:
```
--dial-timeout 30s
```

* Switch to an impossible domain name

* Back out the crane flag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants