Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: when StrictMaxConcurrentStreams enabled ClientConn.ReserveNewRequest() causes stalls in request processing #70809

Open
mmatczuk opened this issue Dec 12, 2024 · 4 comments · May be fixed by mmatczuk/xnet#1
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@mmatczuk
Copy link

Go version

go1.23.3

Output of go env in your module/workspace:

GO111MODULE='on'
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/michal/Library/Caches/go-build'
GOENV='/Users/michal/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/michal/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/michal/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/Users/michal/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.3.darwin-arm64'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/Users/michal/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.3.darwin-arm64/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.3'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/michal/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/7q/nyynjpwj5p19npby48ykjpx00000gn/T/go-build1425036568=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Please find a minimal reproducer below:

package reprod

import (
	"flag"
	"fmt"
	"net"
	"net/http"
	"sync"
	"testing"

	"golang.org/x/net/http2"
)

var reserve = flag.Bool("reserve", false, "Reserve new request")

type singleConnPool struct {
	cc *http2.ClientConn
}

func (s *singleConnPool) GetClientConn(req *http.Request, addr string) (*http2.ClientConn, error) {
	if (*reserve && s.cc.ReserveNewRequest()) || s.cc.CanTakeNewRequest() {
		return s.cc, nil
	}
	return nil, fmt.Errorf("no available connection")
}

func (s *singleConnPool) MarkDead(conn *http2.ClientConn) {
	panic("not implemented")
}

func TestReserveNewRequest(t *testing.T) {
	h := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		w.WriteHeader(http.StatusOK)
	})

	tr := &http2.Transport{
		AllowHTTP:                  true,
		StrictMaxConcurrentStreams: true,
		DisableCompression:         true,
	}

	c0, c1 := net.Pipe()

	s := http2.Server{
		MaxConcurrentStreams: 1,
	}
	go s.ServeConn(c0, &http2.ServeConnOpts{Handler: h})

	cc, err := tr.NewClientConn(c1)
	if err != nil {
		t.Fatal(err)
	}
	tr.ConnPool = &singleConnPool{cc: cc}
	client := http.Client{Transport: tr}

	var wg sync.WaitGroup
	for range 2 {
		wg.Add(1)
		go func() {
			defer wg.Done()
			t.Log("sending request")
			res, err := client.Get("http://foobar")
			if err != nil {
				t.Error(err)
			}
			t.Log("got response", res.StatusCode)
		}()
	}
	wg.Wait()
}

What did you see happen?

Running it without reserve flag works as expected

$ go test -race -v .
=== RUN   TestReserveNewRequest
    rep_test.go:61: sending request
    rep_test.go:61: sending request
    rep_test.go:66: got response 200
    rep_test.go:66: got response 200
--- PASS: TestReserveNewRequest (0.00s)
PASS

Running it with reserve flag set hangs

$ go test -race -v . -reserve
=== RUN   TestReserveNewRequest
    rep_test.go:61: sending request
    rep_test.go:61: sending request

I nailed down this bug to change: https://go-review.googlesource.com/c/net/+/617655

What did you expect to see?

I expect it to work as in x/net version 0.30.0.

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 12, 2024
@cagedmantis cagedmantis added this to the Unreleased milestone Dec 12, 2024
@cagedmantis
Copy link
Contributor

cc @neild @tombergan

@Choraden
Copy link

Choraden commented Jan 7, 2025

I believe the deadlock happens in

func (cs *clientStream) writeRequest(req *http.Request, streamf func(*clientStream)) (err error) {

Following the reproducer above: first request takes cc.reqHeaderMu, cc.mu, decrements cc.streamsReserved and awaits for open slot.

select {
	case cc.reqHeaderMu <- struct{}{}:
	...
}

cc.mu.Lock()
...
cc.decrStreamReservationsLocked()
if err := cc.awaitOpenSlotForStreamLocked(cs); err != nil {

Looking further we can see the request hanging on cc.cond.Wait() in awaitOpenSlotForStreamLocked. Although cond wait unlocks cc.mu, the cc.reqHeaderMu is still taken.

// awaitOpenSlotForStreamLocked waits until len(streams) < maxConcurrentStreams.
// Must hold cc.mu.
func (cc *ClientConn) awaitOpenSlotForStreamLocked(cs *clientStream) error {
	for {
		if cc.closed && cc.nextStreamID == 1 && cc.streamsReserved == 0 {
			// This is the very first request sent to this connection.
			// Return a fatal error which aborts the retry loop.
			return errClientConnNotEstablished
		}
		cc.lastActive = cc.t.now()
		if cc.closed || !cc.canTakeNewRequestLocked() {
			return errClientConnUnusable
		}
		cc.lastIdle = time.Time{}
		if cc.currentRequestCountLocked() < int(cc.maxConcurrentStreams) {
			return nil
		}
		cc.pendingRequests++
		cc.cond.Wait()
		cc.pendingRequests--
		select {
		case <-cs.abort:
			return cs.abortErr
		default:
		}
	}
}

Since cc.reqHeaderMu is taken by the first requests that waits on cc.Cond, second request hangs forever on:

select {
case cc.reqHeaderMu <- struct{}{}:

Client connection state confirms it:

=== RUN   TestReserveNewRequest
    rep_test.go:61: sending request
    rep_test.go:61: sending request
{Closed:false Closing:false StreamsActive:0 StreamsReserved:1 StreamsPending:1 MaxConcurrentStreams:1 LastIdle:0001-01-01 00:00:00 +0000 UTC}

Looks like this behaviour was there before. It was exposed by that change golang/net@f35fec9#diff-e9bd9b4a514c2960ad85405e4a827d83c2decaaac70e9e0158a59db2d05807a7R1686.

@mmatczuk
Copy link
Author

mmatczuk commented Jan 7, 2025

Thanks @Choraden your help is much appreciated.

The following patch seems to fix the issue.

diff --git a/http2/transport.go b/http2/transport.go
index 5d198ba..194f9d3 100644
--- a/http2/transport.go
+++ b/http2/transport.go
@@ -1683,7 +1683,7 @@ func (cc *ClientConn) awaitOpenSlotForStreamLocked(cs *clientStream) error {
                        return errClientConnUnusable
                }
                cc.lastIdle = time.Time{}
-               if cc.currentRequestCountLocked() < int(cc.maxConcurrentStreams) {
+               if int64(len(cc.streams)+cc.pendingResets) < int64(cc.maxConcurrentStreams) {
                        return nil
                }
                cc.pendingRequests++

The awaitOpenSlotForStreamLocked() function should not look into reserved (waiting) requests as they have no pressence in connection yet.

mmatczuk added a commit to mmatczuk/xnet that referenced this issue Jan 8, 2025
…bled

Fix regression introduced in CL 617655. The awaitOpenSlotForStreamLocked()
should not take into consideration reserved requests as they have no presence
in the connection.

Fixes golang/go#70809

Change-Id: Ia23968189cbdb44dae860a1de9d32700115b28b4
mmatczuk added a commit to mmatczuk/xnet that referenced this issue Jan 8, 2025
…bled

Fix regression introduced in CL 617655. The awaitOpenSlotForStreamLocked()
should not take into consideration reserved requests as they have no presence
in the connection.

Fixes golang/go#70809

Change-Id: Ia23968189cbdb44dae860a1de9d32700115b28b4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants