Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: http2: no cached connection was available #22091

Closed
sjenning opened this issue Sep 29, 2017 · 12 comments
Closed

x/net/http2: http2: no cached connection was available #22091

sjenning opened this issue Sep 29, 2017 · 12 comments
Assignees
Milestone

Comments

@sjenning
Copy link

@sjenning sjenning commented Sep 29, 2017

What version of Go are you using (go version)?

go version go1.8.3 linux/amd64

Does this issue reproduce with the latest release?

Unknown, checking now

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/sjennings/projects/go"
GORACE=""
GOROOT="/home/sjennings/.gvm/gos/go1.8.3"
GOTOOLDIR="/home/sjennings/.gvm/gos/go1.8.3/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build663365804=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

In Kubernetes 1.8, we are seeing $SUBJECT error message on pods that have both a readiness and liveness probe. In this case, the probes are nearly simultaneous (within microseconds). About 2-3 times per hour, there is a timing collision resulting in the failure of one of the two probes with http2: no cached connection was available

I confirmed the disabling http2 for the probe connections causes the issue not to occur.

I also confirmed that disabling one of the probes causes the issue not to occur.

This is not load related as the transport is only used for the probes and there are 2 probes every 5 seconds. It is a timing issue.

What did you expect to see?

The connection succeeds every time

What did you see instead?

http2: no cached connection was available

Additional References

#16582

kubernetes/kubernetes#49740

I'll try to write a tight recreator for this since it is difficult and time consuming to recreate on Kubernetes.

@smarterclayton @derekwaynecarr

@sjenning sjenning changed the title net/http: http2: no cached connection was available x/net/http2: http2: no cached connection was available Sep 29, 2017
@bradfitz

This comment has been minimized.

Copy link
Contributor

@bradfitz bradfitz commented Nov 2, 2017

/cc @tombergan

Tom, does this sound familiar? My memory's still fuzzy from my leave.

I see #16582 is marked fixed but later comments (#16582 (comment)) suggest it's still happening on Go 1.9.x.

@bradfitz bradfitz added this to the Go1.10 milestone Nov 2, 2017
@tombergan

This comment has been minimized.

Copy link
Contributor

@tombergan tombergan commented Nov 2, 2017

@WillSewell
I am still receiving this error on Go 1.9 when performing a large number of concurrent requests (400). Looking at the http2 verbose logs, I can see a large number of http2: no cached connection was available, but only a small % of these bubble up to http.Client.Do.

Do you have a repro you can share? I cannot repro with https://play.golang.org/p/zKthX7Y9RW (from #16582).

I was looking through the source code to see why these were not being caught by the fix in 7a62274. The only other place I can see RoundTripper.RoundTrip being called where http2ErrNoCachedConn is not being handled is on this line: https://github.com/golang/go/blob/master/src/net/http/transport.go#L371. I believe the concrete implementation here is noDialH2RoundTripper since we are configuring the client with https://github.com/golang/net/blob/master/http2/transport.go#L125. So my suspicion is this is where the error is being returned from. Is there a reason there is no retrying performed after a error on https://github.com/golang/go/blob/master/src/net/http/transport.go#L371?

Perhaps I misunderstood... noDialH2RoundTripper cannot return ErrNoCachedConn. It translates that error to ErrSkipAltProtocol. Did you mean s/ErrNoCachedConn/ErrSkipAltProtocol/ ?

According to the comments, there is a known possible race with CloseIdleConnections (https://github.com/golang/net/blob/master/http2/client_conn_pool.go#L221). @WillSewell and @sjenning, can you verify that you're not calling CloseIdleConnections?

@sjenning

This comment has been minimized.

Copy link
Author

@sjenning sjenning commented Nov 2, 2017

We are not calling CloseIdleConnections, at least not at the time the issue occurs.

FYI, we worked around this in kube by using seperate transports for the two probe types such that a timing collision doesn't occur kubernetes/kubernetes#53318

@tombergan

This comment has been minimized.

Copy link
Contributor

@tombergan tombergan commented Nov 2, 2017

I wonder if this happens because the transport uses DisableKeepAlives=true?
https://github.com/kubernetes/kubernetes/blob/master/pkg/probe/http/http.go#L41

Looks like yes. I cannot repro with DisableKeepAlives=false, but I can repro with DisableKeepAlives=true. For the record:
https://play.golang.org/p/8-KRho9KPH

My guess: the second request selects the same connection as the first request, but its RoundTrip doesn't execute until after the first request completes. By that point, the connection has gone idle and shut down because keep alives are disabled. This could be a manifestation of https://github.com/golang/net/blob/master/http2/client_conn_pool.go#L221

// TODO: don't close a cc if it was just added to the pool
// milliseconds ago and has never been used. There's currently
// a small race window with the HTTP/1 Transport's integration
// where it can add an idle conn just before using it, and
// somebody else can concurrently call CloseIdleConns and

@sjenning, do you know why you're disabling keep alives? (Wild guess: that flag was added with HTTP/1.1 in mind. Since these are infrequent probes, there's no reason to keep the connection open, so keep alives were disabled.)

@WillSewell, are you also also disabling keep alives?

@sjenning

This comment has been minimized.

Copy link
Author

@sjenning sjenning commented Nov 3, 2017

DisableKeepAlives=true was result of kubernetes/kubernetes#15733 which fixed kubernetes/kubernetes#15643

@WillSewell

This comment has been minimized.

Copy link

@WillSewell WillSewell commented Nov 20, 2017

Do you have a repro you can share? I cannot repro with https://play.golang.org/p/zKthX7Y9RW (from #16582).

I also can't reproduce it when running your example. I'm afraid I don't think I have time right now to boil it down to a minimal test case (I just tried and failed).

According to the comments, there is a known possible race with CloseIdleConnections (https://github.com/golang/net/blob/master/http2/client_conn_pool.go#L221). @WillSewell and @sjenning, can you verify that you're not calling CloseIdleConnections?

I'm not.

@WillSewell, are you also also disabling keep alives?

I'm not, although I do get the error when running your example with keep alives disabled.

@someonegg

This comment has been minimized.

Copy link

@someonegg someonegg commented Dec 1, 2017

If import "golang.org/x/net/http2" directly and use its ConfigureTransport function, the fix [7a62274] will not work.

Just like https://play.golang.org/p/8-KRho9KPH

The err is http2.ErrNoCachedConn instead of http2ErrNoCachedConn.

 func (pc *persistConn) shouldRetryRequest(req *Request, err error) bool {
 	if err == http2ErrNoCachedConn {
 		// Issue 16582: if the user started a bunch of
 		// requests at once, they can all pick the same conn
 		// and violate the server's max concurrent streams.
 		// Instead, match the HTTP/1 behavior for now and dial
 		// again to get a new TCP connection, rather than failing
 		// this request.
 		return true
 	}
 	...
}
@bradfitz

This comment has been minimized.

Copy link
Contributor

@bradfitz bradfitz commented Jan 10, 2018

@tombergan, I'll take this.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Jan 10, 2018

Change https://golang.org/cl/87297 mentions this issue: http2: add internal function isNoCachedConnError to test for ErrNoCachedConn

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Jan 10, 2018

Change https://golang.org/cl/87298 mentions this issue: net/http: fix "http2: no cached connection..." error with x/net/http2

@bradfitz

This comment has been minimized.

Copy link
Contributor

@bradfitz bradfitz commented Jan 10, 2018

@someonegg, thanks!

gopherbot pushed a commit to golang/net that referenced this issue Jan 10, 2018
…hedConn

In a given program there may be two separate copies of
ErrNoCachedConn: the h2_bundle.go version in net/http, and the user's
golang.org/x/net/http2 version. We need to be able to detect either
in net/http.

This CL adds a function to report whether an error value represents
that type of error, and then a subsequent CL to net/http will use it
instead of ==.

Updates golang/go#22091

Change-Id: I86f1e20704eee29b8980707b700d7a290107dfd4
Reviewed-on: https://go-review.googlesource.com/87297
Reviewed-by: Tom Bergan <tombergan@google.com>
@gopherbot gopherbot closed this in bb4356d Jan 10, 2018
@xh3b4sd

This comment has been minimized.

Copy link

@xh3b4sd xh3b4sd commented Feb 2, 2018

For my interest, in which go version will this be fixed? How could I find out by myself?

c3mb0 pushed a commit to c3mb0/net that referenced this issue Apr 2, 2018
…hedConn

In a given program there may be two separate copies of
ErrNoCachedConn: the h2_bundle.go version in net/http, and the user's
golang.org/x/net/http2 version. We need to be able to detect either
in net/http.

This CL adds a function to report whether an error value represents
that type of error, and then a subsequent CL to net/http will use it
instead of ==.

Updates golang/go#22091

Change-Id: I86f1e20704eee29b8980707b700d7a290107dfd4
Reviewed-on: https://go-review.googlesource.com/87297
Reviewed-by: Tom Bergan <tombergan@google.com>
@golang golang locked and limited conversation to collaborators Feb 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.