Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2 ReadIdleTimeout/PingTimeout not exposed if configured through http package #40201

Closed
JensErat opened this issue Jul 14, 2020 · 11 comments
Closed

Comments

@JensErat
Copy link

@JensErat JensErat commented Jul 14, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.4 linux/amd64

Does this issue reproduce with the latest release?

Yes, following code on golang/net master branch.

What operating system and processor architecture are you using (go env)?

not relevant

What did you do?

golang/net#55 brought a new feature in golang's http2 implementation that allows early detection of broken network connections. Setting this seems to resolve potentially several rare issues we're facing in production every now and then.

Now, most clients use the (I think also recommended?) way to instantiate an http.Transport, which is then "upgraded" to http2 capabilities. For example, in prometheus/common:

		var rt http.RoundTripper = &http.Transport{
                  // ...
		}
		err := http2.ConfigureTransport(rt.(*http.Transport))

or they set Transport.ForceAttemptHTTP2 = true.

Both ways entirely hide the http2 transport, and I do not see a way to configure http2.Transport.ReadIdleTimeout/http2.Transport.PingTimeout.

This also seems to affect k8s.io/apimachinery/pkg/util/net/http.go, where this golang feature is best bet so far to resolve kubernetes/kubernetes#87615 (it seems to me golang/net#55 was specifically build because of the Kubernetes issue). golang/net#74 proposes an interface to set those values and would resolve this issue.

What did you expect to see?

A possibility to configure http2.Transport from http.Transport where needed, for example by passing in an http2.Transport into the ConfigureTransport function?

What did you see instead?

No way to configure http2.Transport settings.

@gopherbot gopherbot added this to the Unreleased milestone Jul 14, 2020
@andybons andybons changed the title x/net: http2 ReadIdleTimeout/PingTimeout not exposed if configured through http package x/net/http2 ReadIdleTimeout/PingTimeout not exposed if configured through http package Jul 14, 2020
@andybons
Copy link
Member

@andybons andybons commented Jul 14, 2020

@JensErat
Copy link
Author

@JensErat JensErat commented Jul 14, 2020

And maybe @caesarxuchao if I can notify him at all this way.

@networkimprov
Copy link

@networkimprov networkimprov commented Jul 14, 2020

@fraenkel is the go-to guy for HTTP2.

@caesarxuchao
Copy link

@caesarxuchao caesarxuchao commented Jul 14, 2020

I created golang/net#74 but it was too late to get into 1.15.

@caesarxuchao
Copy link

@caesarxuchao caesarxuchao commented Jul 14, 2020

FWIW, in the test I added in golang/net#74, I used the unsafe package to configure the two fields. Obviously, it's not the right way, but can be the last resort.

@JensErat
Copy link
Author

@JensErat JensErat commented Jul 14, 2020

In prometheus/common, x/net/http2 is vendored so I could easily "hack" the field in there. At least we have somewhat good confidence it resolves our Prometheus issues now. I indeed missed golang/net#74 when searching for a solution. Since at least Prometheus seems to actually vendor the code, we might be fine over there as soon as your PR is merged (they pin some very recent master commit, anyway).

@roidelapluie
Copy link

@roidelapluie roidelapluie commented Jul 16, 2020

Do you know if using ForceAttemptHTTP2 (go 1.13+) instead of http2.ConfigureTransport would solve that?

@povsister
Copy link

@povsister povsister commented Jul 17, 2020

@roidelapluie
Noop. ForceAttemptHTTP2 only attempts to upgrade http1.1 to h2 and it's default enabled if my memory is right.
In current implement of h2. The http2 transport never dials tcp connection. It always let http1.1 dial and then upgrade it to h2.
The upgrade procedure is done by http2.ConfigureTransport.

Unfortunately, the h2 transport is not exposed by net/http/Transport which is the default transport for http client.

// nextProtoOnce guards initialization of TLSNextProto and
// h2transport (via onceSetNextProtoDefaults)
nextProtoOnce sync.Once
h2transport h2Transport // non-nil if http2 wired up
tlsNextProtoWasNil bool // whether TLSNextProto was nil when the Once fired

So basically, it's impossible to directly manipulate the h2 transport out of net/http package.
It also means you can not manually enable the h2 connection healthcheck in your application. If you really wish to, you need "hack" the Transport like @caesarxuchao does so.

@JensErat is right. Currently no options is provided for upgrading h1 to h2. We'd better make it available ASAP. It really happens rarely but leads to disaster if you suddenly knock into it. See #39750

go/src/net/http/h2_bundle.go

Lines 6632 to 6645 in fa98f46

// ConfigureTransport configures a net/http HTTP/1 Transport to use HTTP/2.
// It returns an error if t1 has already been HTTP/2-enabled.
func http2ConfigureTransport(t1 *Transport) error {
_, err := http2configureTransport(t1)
return err
}
func http2configureTransport(t1 *Transport) (*http2Transport, error) {
connPool := new(http2clientConnPool)
t2 := &http2Transport{
ConnPool: http2noDialClientConnPool{connPool},
t1: t1,
}
connPool.t = t2

@JensErat
Copy link
Author

@JensErat JensErat commented Aug 11, 2020

We can confirm this resolves our Prometheus and Kubernetes issues (and thus likely the issues of lots of others, some might not even be aware of this).

@brian-brazil from Prometheus already proposed the health check should be default anyway:

My point is more that the developer shouldn't even need to configure this, it should be enabled by default.

And I pretty much agree with him. This is something most applications will mess up. Lots of admin and developer days lost in debugging a hard to trace down issue. If I have long-lasting connections, those should be robust against network failures.

Could this be considered a breaking change or just a fix? Issues with broken servers that cannot handle whatever http2 calls are sent to run the ping?

A golang debug variable to set the value might also help in either case (force-enable or force-disable).

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 25, 2020

Change https://golang.org/cl/236498 mentions this issue: http2: add configuration options when constructing http2 transport

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 20, 2020

Change https://golang.org/cl/264017 mentions this issue: x/net/http2: add ConfigureTransports

airshipbot pushed a commit to airshipit/promenade that referenced this issue Oct 24, 2020
There are several kubernetes bugs [0,1,2] involving connection problems
that seem related to the Go net/http2 library, where the stream state
and connection state can get out of sync. This can manifest as a kubelet
issue, where the node status gets stuck in a NotReady state, but can
also happen elsewhere.

In newer versions of the Go libraries some issues are fixed [3,4], but
the fixes are not present in k8s 1.18.

This change disables http2 in kube-apiserver and webhook-apiserver. This
should be sufficient to avoid the majority of the issues, as disabling
on one side of the connection is enough, and apiserver is generally
either the client or the server.

0: kubernetes/kubernetes#87615
1: kubernetes/kubernetes#80313
2: kubernetes/client-go#374
3: golang/go#40423
4: golang/go#40201

Change-Id: Id693a7201acffccbc4b3db8f4e4b96290fd50288
roidelapluie added a commit to roidelapluie/common that referenced this issue Oct 25, 2020
Given that golang/go#39337 and
golang/go#40201 are fixed:

- We now have a function to configure HTTP/2 transport,
  therefore we have a "ping" every 5 minutes on the line
  if not traffic is doing on
- Dead HTTP/2 connections will be removed from the pool

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

7 participants
You can’t perform that action at this time.