Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote: disable http2 for transports by default #799

Closed
wants to merge 4 commits into from
Closed

remote: disable http2 for transports by default #799

wants to merge 4 commits into from

Conversation

steved
Copy link

@steved steved commented Oct 27, 2020

Image pushes are seemingly bottlenecked on http2 flow control that isn't tunable at the moment. Found through moby/buildkit#1420.

This is causing some slowness pushing images from rules_docker.

Bit of contrived example, but the following Dockerfile pushing to quay.io (and this might actually be quay-specific?):

FROM ubuntu
RUN head -c 200m </dev/urandom > rand
$ time ./crane-upstream push /var/folders/qn/89b8d26x7_g9n8dk67bvf02w0000gq/T/tmp.XXnIZ1xjSB/fb4c28b88869.tar quay.io/...
2020/10/27 13:45:20 existing blob: sha256:2def2f19de92c6f5b8ab88ff6b6412ef6b707a57c21b7ed70b551599b99c17c6
2020/10/27 13:45:20 existing blob: sha256:67b07e4ef43acc7e7006ee1f89eca4efaf2f6ff90dedaadce1fd6d2048f544c1
2020/10/27 13:45:20 existing blob: sha256:d020da9bf8302f8f33abeb47848b066c228a292e099510763fb6ba457966e45b
2020/10/27 13:45:23 pushed blob: sha256:fb4c28b8886938f849679b8c70759955354e56033ee8522d305fbfe5f98b49d5
2020/10/27 13:51:38 pushed blob: sha256:eb264d9b5081a6be9899f5dc781af7449312d92005088b24a57f148828b734e9
2020/10/27 13:51:41 quay.io/...: digest: sha256:d1d8a4e6dda28397f0711ef47eed98d6a1f6c4b3a3c7cbb1ce41dfa1d9658233 size: 915

real	6m23.588s
user	0m7.649s
sys	0m2.857s

$ time ./crane push /var/folders/qn/89b8d26x7_g9n8dk67bvf02w0000gq/T/tmp.XXnIZ1xjSB/48d9788dad89.tar quay.io/...
2020/10/27 13:52:46 existing blob: sha256:67b07e4ef43acc7e7006ee1f89eca4efaf2f6ff90dedaadce1fd6d2048f544c1
2020/10/27 13:52:46 existing blob: sha256:d020da9bf8302f8f33abeb47848b066c228a292e099510763fb6ba457966e45b
2020/10/27 13:52:46 existing blob: sha256:2def2f19de92c6f5b8ab88ff6b6412ef6b707a57c21b7ed70b551599b99c17c6
2020/10/27 13:52:49 pushed blob: sha256:48d9788dad897d9c5bbc793aeda5ee7f7c0b759d07645cc709cdf3edaaf3cf9b
2020/10/27 13:53:14 pushed blob: sha256:625043d53457e5d3ec269ddb6b452df166391bda5b93b62a2db6f5728630a4de
2020/10/27 13:53:16 quay.io/...: digest: sha256:a9d5990d9c39d471d4aeff849a0e2033092cc2ce5a49fb50c19612ddd743f83c size: 915

real	0m33.215s
user	0m5.972s
sys	0m4.029s

Image pushes are seemingly bottlenecked on http2 flow control that isn't
tunable at the moment.

Found through moby/buildkit#1420.

"github.com/google/go-containerregistry/pkg/authn"
"github.com/google/go-containerregistry/pkg/logs"
v1 "github.com/google/go-containerregistry/pkg/v1"
"github.com/google/go-containerregistry/pkg/v1/remote/transport"
)

// DefaultTransport is a an extension of the default net/http DefaultTransport with http2 disabled.
// According to moby/buildkit#1420, net/http lack of tunable flow control prevents full throughput on push.
var DefaultTransport = &http.Transport{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to just expose this as a global. In buildkit it looks like they create a new one of these whenever it's needed, so I'm worried that this isn't safe to reuse.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I found that odd since the net/http one is a statically defined like here. Happy to switch it up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a DefaultTransport function to pkg/internal that returns a new one of these (like they do in buildkit) and use that instead of this global. I don't want to commit to exposing this as a public API, and I assume they're constructing a new one each time for a good reason.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that won't work for cmd/crane/cmd/root.go but I think it's fine to just copy a private implementation into that package for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, added to both cmd/crane and pkg/internal/net (honestly not sure what to call it, I didn't want to shadow pkg/v1/remote/transport).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly not sure what to call it

net's fine -- internal names don't matter to much.

@jonjohnsonjr
Copy link
Collaborator

Tests are failing unexpectedly, idk if it's related to jason's recent change @imjasonh

@steved
Copy link
Author

steved commented Oct 29, 2020

Nope, it's related to my change. I don't feel great about 88e4334, but I'm unsure if gcrane deserves a deeper refactor to support arbitrary options.

@steved
Copy link
Author

steved commented Oct 29, 2020

With the move to pkg/internal, this change also won't help with cases like:
https://github.com/bazelbuild/rules_docker/blob/a8b3c6c528271c406ab353de49c45e54a299e023/container/go/cmd/pusher/pusher.go#L189-L192

Duplicate code to copy the default transport to pusher.go is pretty minimal, but just figured I'd bring it up. Maybe worth a remote.WithTransportWrapper?

@codecov-io
Copy link

Codecov Report

Merging #799 into master will not change coverage.
The diff coverage is 81.81%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #799   +/-   ##
=======================================
  Coverage   75.60%   75.60%           
=======================================
  Files         102      102           
  Lines        4185     4185           
=======================================
  Hits         3164     3164           
  Misses        566      566           
  Partials      455      455           
Impacted Files Coverage Δ
pkg/gcrane/copy.go 85.50% <75.00%> (ø)
pkg/internal/legacy/copy.go 56.25% <100.00%> (ø)
pkg/v1/google/list.go 62.06% <100.00%> (ø)
pkg/v1/remote/options.go 78.78% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd5ad27...88e4334. Read the comment docs.

@jonjohnsonjr
Copy link
Collaborator

Nope, it's related to my change. I don't feel great about 88e4334,

Oh yeah what a pain. Have you confirmed at all if this is a quay-specific thing? If so we can just ignore gcrane and google transport stuff (for now?).

but I'm unsure if gcrane deserves a deeper refactor to support arbitrary options.

I would prefer this if adding gcrane.WithTransport would be sufficient. If that's too much work for this PR I am happy to punt on that and file an issue to come back to it.

With the move to pkg/internal, this change also won't help with cases like:
https://github.com/bazelbuild/rules_docker/blob/a8b3c6c528271c406ab353de49c45e54a299e023/container/go/cmd/pusher/pusher.go#L189-L192

Duplicate code to copy the default transport to pusher.go is pretty minimal, but just figured I'd bring it up. Maybe worth a remote.WithTransportWrapper?

I'm in a weird spot where I don't entirely understand the problem being fixed or even the solution. This seems like something that a golang release might eventually fix, so I'm really reluctant to add a public API to do something that will hopefully be rendered unnecessary in the future. Ideally there'd be a separate "betterhttp.DefaultTransport" thing that we could import to fix various issues with http.DefaultTransport that handled different golang versions, but I haven't come across one...

tl;dr I am selfish and don't want to maintain things I don't understand, so I'd prefer just copying this around everywhere even though it's probably more burdensome in aggregate :)

@steved
Copy link
Author

steved commented Oct 30, 2020

Oh yeah what a pain. Have you confirmed at all if this is a quay-specific thing? If so we can just ignore gcrane and google transport stuff (for now?).

Yeah, doesn't appear to be the case for gcr.io.

I'm in a weird spot where I don't entirely understand the problem being fixed or even the solution.

Yeah, I'm in a similar boat. My current understanding is that http2 requires that the server tell the client when and how much buffer size is available to accept the image data and if the server does not advertise adequate buffer size the upload can be throttled on the bidirectional communication. Apparently the initial might be configurable on the client, but not in golang.

So gcr advertises 360k (byte?) buffers where quay only has 16k:

# gcr
2020/10/29 22:57:39 http2: Framer 0xc0002de380: read WINDOW_UPDATE stream=21 len=4 incr=360474
2020/10/29 22:57:39 http2: Transport received WINDOW_UPDATE stream=21 len=4 incr=360474

# quay
2020/10/29 22:58:10 http2: Framer 0xc00071c0e0: read WINDOW_UPDATE stream=21 len=4 incr=16339
2020/10/29 22:58:10 http2: Transport received WINDOW_UPDATE stream=21 len=4 incr=16339

For what it's worth, we have a workaround internally for pushing to quay and I can reach out to their support team and see if this is something for them to adjust.

@jonjohnsonjr
Copy link
Collaborator

Given that this seemingly only affects quay, and that you have a workaround, I am somewhat reluctant to merge (most of) this. Others should be able to similarly work around this by just passing an appropriate transport through to WithTransport if needed.

I'd be fine with keeping the changes in https://github.com/google/go-containerregistry/pull/799/files#diff-04d49325bf28b04678340481ee8de048890fc2fa5701460121459ada4b0d3081 because anyone using crane with quay will be unable to work around this problem otherwise.

On the other hand, I understand that any consumers of this package will see confusingly slow uploads to quay with no indication that they should be doing this... so I'm kind of torn. Of course I'm biased because I don't use quay, and if this was affecting a google registry I would be more inclined to merge it.

Let me know if you hear back from quay support. Ideally, they could just fix this on their end.

@steved steved marked this pull request as draft November 4, 2020 18:00
@steved
Copy link
Author

steved commented Nov 4, 2020

They ended up redirecting me back to the public mailing list so if you'd like to follow along: https://groups.google.com/g/quay-sig/c/lnJ4MAj8YZE.

I'll put this on pause for now and revisit it if they have an answer in the next few days. Thanks for the help / responding quickly!

@jonjohnsonjr
Copy link
Collaborator

Thank you!

Base automatically changed from master to main January 27, 2021 18:57
@steved steved closed this Apr 7, 2021
@jonjohnsonjr
Copy link
Collaborator

FWIW this seems like a reasonable hack to work around this:

But, exporting GODEBUG=http2client=0 fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants