Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: client can hang forever if headers' size exceeds connection's buffer size and server hangs past request time #23559

Open
gwik opened this issue Jan 25, 2018 · 9 comments

Comments

@gwik
Copy link
Contributor

@gwik gwik commented Jan 25, 2018

What version of Go are you using (go version)?

go version go1.9.3 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/<redacted>/Dev/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/cd/_7rcv5812531s2lswhn6kp680000gp/T/go-build436968147=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

Also happened on linux/amd64, see below.

What did you do?

A bug in production showed an http2 client hanging more than 250 seconds without respect to the request context which was set to timeout after 5 sec.
All possible timeouts are set on connection and TLS handshake and I didn't see many dials (they are monitored).
Latency graph in ms:
screen shot 2018-01-26 at 12 14 34 am

Clients are using net/http2.Transport directly.

What did you expect to see?

The requests should have timed out after 5s.

What did you see instead?

No or very long timeouts (I believe the server has reset the connection or TCP timed out).

The synchronous write cc.writeHeaders to the connection in ClientConn.roundTrip does not set any deadline on the connection which can block forever (or being timed out by TCP) if the server or network hangs:
https://github.com/golang/net/blob/0ed95abb35c445290478a5348a7b38bb154135fd/http2/transport.go#L833

I wrote a test that demonstrates this:
gwik/net@e4c191a

// TestTransportTimeoutServerHangs demonstrates that client can hang forever
// when the server hangs and the headers exceed the conn buffer size (forcing a
// flush). Without respect to the context's deadline.
func TestTransportTimeoutServerHangs(t *testing.T) {
	clientDone := make(chan struct{})
	ct := newClientTester(t)
	ct.client = func() error {
		defer ct.cc.(*net.TCPConn).CloseWrite()
		defer close(clientDone)

		buf := make([]byte, 1<<19)
		_, err := rand.Read(buf)
		if err != nil {
			t.Fatal("fail to gen random data")
		}
		headerVal := hex.EncodeToString(buf)

		req, err := http.NewRequest("PUT", "https://dummy.tld/", nil)
		if err != nil {
			return err
		}

		ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
		defer cancel()
		req = req.WithContext(ctx)
		req.Header.Add("Authorization", headerVal)
		_, err = ct.tr.RoundTrip(req)
		if err == nil {
			return errors.New("error should not be nil")
		}
		if ne, ok := err.(net.Error); !ok || !ne.Timeout() {
			return fmt.Errorf("error should be a net error timeout was: +%v", err)
		}
		return nil
	}
	ct.server = func() error {
		ct.greet()
		select {
		case <-time.After(5 * time.Second):
		case <-clientDone:
		}
		return nil
	}
	ct.run()
}

Setting the write deadline fixes the test:
gwik/net@052de95

It seems to fail when the header value exceeds 1MB. I might miss something here, the default buffer size of bufio.Writer is 4096 bytes, I had expected to see it fail around that value, maybe compression and/or TCP buffers...
Also I don't think it sent 1MB headers when it failed in production something else must have fill the buffer.

The buffer is on the connection and shared among streams the buffer can be filled by other requests on the same connection.

Besides this particular call which is synchronous, no write nor read to the connection has a deadline set. Can't this lead to goroutine leaks and http2 streams being stuck in the background ?

@odeke-em odeke-em changed the title x/net/http2: client can hang forever x/net/http2: client can hang forever if headers' size exceeds connection's buffer size and server hangs past request time Jan 26, 2018
@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Jan 26, 2018

@gwik

This comment has been minimized.

Copy link
Contributor Author

@gwik gwik commented Jan 26, 2018

Working on a patch to fix this, I realized that using the context deadline to timeout the i/o calls may not be appropriate.
Timing out the I/O might leave the framing protocol in an inconsistent state and the only safe option would be to close the whole connection. Probably not was the caller intends when the context deadline times out. It would work if the deadline was set on context only for this purpose just before doing the request but that's usually not the case.
I think there should be an I/O timeout option to the transport that would serve the purpose of timing out the I/O calls and every I/O calls would set the deadline to now + timeout and reset it after the call to Read or Write.
I'm going to implement this for now. Let me know what you think about it.

gwik added a commit to gwik/net that referenced this issue Feb 3, 2018
Addresses hanging transport when on blocking I/O. There are many scenario where
the roundtrip hangs on write or read and won't be unlocked by current
cancelation systems (context, Request.Cancel, ...).

This adds read and write deadlines support.
The writer disables the read deadline and enables the write deadline, then after
the write is successful, it disables the write deadline and re-enables the read
deadline.
The read loop also sets its read deadline after a successful read since the next
frame is not predictable.
It guarantees that an I/O will not timeout before IOTimeout and will timeout
after a complete block before at least IOTimeout.

See issue: golang/go#23559
gwik added a commit to gwik/net that referenced this issue Feb 3, 2018
Addresses hanging transport when on blocking I/O. There are many scenario where
the roundtrip hangs on write or read and won't be unlocked by current
cancelation systems (context, Request.Cancel, ...).

This adds read and write deadlines support.
The writer disables the read deadline and enables the write deadline, then after
the write is successful, it disables the write deadline and re-enables the read
deadline.
The read loop also sets its read deadline after a successful read since the next
frame is not predictable.
It guarantees that an I/O will not timeout before IOTimeout and will timeout
after a complete block before at least IOTimeout.

See issue: golang/go#23559
@bradfitz bradfitz added this to the Go1.11 milestone Feb 3, 2018
gwik added a commit to gwik/net that referenced this issue Mar 9, 2018
Addresses hanging transport when on blocking I/O. There are many scenario where
the roundtrip hangs on write or read and won't be unlocked by current
cancelation systems (context, Request.Cancel, ...).

This adds read and write deadlines support.
The writer disables the read deadline and enables the write deadline, then after
the write is successful, it disables the write deadline and re-enables the read
deadline.
The read loop also sets its read deadline after a successful read since the next
frame is not predictable.
It guarantees that an I/O will not timeout before IOTimeout and will timeout
after a complete block before at least IOTimeout.

See issue: golang/go#23559

Change-Id: If618a63857cc32d8c3175c0d9bef1f8bf83c89df
gwik added a commit to znly/golang-x-net that referenced this issue Mar 26, 2018
Addresses hanging transport when on blocking I/O. There are many scenario where
the roundtrip hangs on write or read and won't be unlocked by current
cancelation systems (context, Request.Cancel, ...).

This adds read and write deadlines support.
The writer disables the read deadline and enables the write deadline, then after
the write is successful, it disables the write deadline and re-enables the read
deadline.
The read loop also sets its read deadline after a successful read since the next
frame is not predictable.
It guarantees that an I/O will not timeout before IOTimeout and will timeout
after a complete block before at least IOTimeout.

See issue: golang/go#23559

Change-Id: If618a63857cc32d8c3175c0d9bef1f8bf83c89df
ikenchina referenced this issue in gwik/net May 14, 2018
The transport can hang forever if the server hangs after accepting the
connection and the request headers (or control frames) exceeds the connection
write buffer. Without respect to the context's deadline.

The ClientConn writes to the connection without setting a write deadline thus
blocks forever if the buffer is flushed to the socket.

go test -v -run TestTransportTimeoutServerHangs ./http2
@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Jul 12, 2018

Any update here?

Keep in mind that there are 3 goroutines per Transport connection to a server:

  • the writer (which might block forever)
  • the reader (blocked by default, waiting for frames from server)
  • the control one, which never blocks, and owns all state

I think the behavior we want is:

  • if the user said 5 seconds timeout on a request, the control channel should notice and return an error to the user's RoundTrip. I'd be surprised if this part isn't already working?
  • don't accept new streams (new RoundTrips) on that ClientConn if the conn has been stuck in a write for "too long", for some definition TBD
  • if the write appears totally hung, nuke the whole connection and all streams still attached to it
@bradfitz bradfitz modified the milestones: Go1.11, Unplanned Jul 12, 2018
@gwik

This comment has been minimized.

Copy link
Contributor Author

@gwik gwik commented Jul 16, 2018

if the user said 5 seconds timeout on a request, the control channel should notice and return an error to the user's RoundTrip. I'd be surprised if this part isn't already working?

There are cases where this isn't working. I believe the graph and tests demonstrates it.

don't accept new streams (new RoundTrips) on that ClientConn if the conn has been stuck in a write for "too long", for some definition TBD

Unfortunately, the writes almost never time out I don't know if this is specific to my use particular use case or if buffering makes it very long to happen.

I run this CL in production for a very specific service that only makes requests to APNS and Firebase push gateways.

The patch helps to recover from stuck connections and kills it but it's slow to converge, is incomplete and may not work properly for the general use case.

What happens is that there are situations where the APNS gateway is blocking the connection, it behaves like a firewall that drops packet never reset the connection. Most of the time the requests are timing out correctly, the context allow to unlock the client stream goroutine and return to the caller however the connection read loop never times out nor receive any other error causing all requests to time out. (Also not that the APNS gateway doesn't allow pings).

What I see most often is the writes are not timing out. I believe that the only way to guarantee to time out the connection is have a way to unblock the read loop. The reads (ClientConn) are asynchronous to the writes (ClientStream) so we have to find a way to add a deadline on the reads that will follow the writes (requests) or that follow a read (e.g reading the rest of the body) from the client stream side.

The CL takes a simple approach and avoids synchronization between the client stream and the connection read loop. It resets the read deadline and sets a write deadline before every write from the client stream. After the write is done, it sets the read deadline.
This kinda work, I see the that connections are no longer stuck forever but it is slow to time out the connection. Resetting the read deadline before every write continuously pushes back the deadline.
Also there are no read timeout when waiting for subsequent body reads for example.

I wanted to experiment the following algorithm:

Upon waiting for something from the network on the ClientStream / caller side, push a deadline in a heap data structure (probably under ClientConn.wmu lock) keeping track of the stream that pushes it.
If there was no deadline, unlock the read loop by setting one in the past an set the deadline before next read.
Before every read look at the root of the deadline heap and set it.
After every read remove the deadline of the stream that has successfully completed the read. We need some sort of reverse index to find the entry in the heap and remove it (maybe every client stream can track it's unique read deadline entry in the heap).

What do you think ?

gwik added a commit to znly/golang-x-net that referenced this issue Nov 14, 2018
Addresses hanging transport when on blocking I/O. There are many scenario where
the roundtrip hangs on write or read and won't be unlocked by current
cancelation systems (context, Request.Cancel, ...).

This adds read and write deadlines support.
The writer disables the read deadline and enables the write deadline, then after
the write is successful, it disables the write deadline and re-enables the read
deadline.
The read loop also sets its read deadline after a successful read since the next
frame is not predictable.
It guarantees that an I/O will not timeout before IOTimeout and will timeout
after a complete block before at least IOTimeout.

See issue: golang/go#23559

Change-Id: If618a63857cc32d8c3175c0d9bef1f8bf83c89df
@stevenh

This comment has been minimized.

Copy link
Contributor

@stevenh stevenh commented May 28, 2019

This is actually worse than just a single connection hanging, it can cause the entire app to hang.

The blocking behaviour in both the cc.bw.Flush() and cc.fr.WriteHeaders(...) called within cc.writeHeaders(...) is done while holding cc.mu which is required for cc.idleState() called by p.getClientConn() while holding the pool look.

This means a single connection can block all new requests potentially forever.

@gwik

This comment has been minimized.

Copy link
Contributor Author

@gwik gwik commented May 29, 2019

@stevenh Could you please share a bit more on the context ? What your app does, which kind of services it talks to and when this kind of behavior happens ?

@stevenh

This comment has been minimized.

Copy link
Contributor

@stevenh stevenh commented May 29, 2019

I'd have to defer to a colleague to answer that but the high-level basics are its making http2 calls and the entire app hung due to this.

The relevant stacks from the hung app are:

-- goroutine blocked in writeHeaders -> Flush
1 @ 0x43bddc 0x436eea 0x4364e7 0x4a201e 0x4a210d 0x4a3a04 0x577812 0x589dbd 0x5d1887 0x5d1da0 0x5d2cbb 0x662cea 0x6ad1d4 0x52bdce 0x6678dc 0x66680b 0x663215 0x662dda 0x6476b9 0x690a17 0x63b819 0x63b4ad 0x63cb4d 0xb1d000 0xe641df 0xe6213e 0xe8245c 0xe7f42e 0xe7ad28 0xe61e87 0xf598ed 0xf59637
#    0x4364e6    internal/poll.runtime_pollWait+0x56                                                    /usr/local/go/src/runtime/netpoll.go:173
#    0x4a201d    internal/poll.(*pollDesc).wait+0xad                                                    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
#    0x4a210c    internal/poll.(*pollDesc).waitWrite+0x3c                                                /usr/local/go/src/internal/poll/fd_poll_runtime.go:94
#    0x4a3a03    internal/poll.(*FD).Write+0x243                                                        /usr/local/go/src/internal/poll/fd_unix.go:227
#    0x577811    net.(*netFD).Write+0x51                                                            /usr/local/go/src/net/fd_unix.go:220
#    0x589dbc    net.(*conn).Write+0x6c                                                            /usr/local/go/src/net/net.go:188
#    0x5d1886    crypto/tls.(*Conn).write+0x1a6                                                        /usr/local/go/src/crypto/tls/conn.go:832
#    0x5d1d9f    crypto/tls.(*Conn).writeRecordLocked+0x35f                                                /usr/local/go/src/crypto/tls/conn.go:909
#    0x5d2cba    crypto/tls.(*Conn).Write+0x1ba                                                        /usr/local/go/src/crypto/tls/conn.go:1068
#    0x662ce9    net/http.http2stickyErrWriter.Write+0x79                                                /usr/local/go/src/net/http/h2_bundle.go:6803
#    0x6ad1d3    net/http.(*http2stickyErrWriter).Write+0x73                                                <autogenerated>:1
#    0x52bdcd    bufio.(*Writer).Flush+0x7d                                                        /usr/local/go/src/bufio/bufio.go:567
#    0x6678db    net/http.(*http2ClientConn).writeHeaders+0x1eb                                                /usr/local/go/src/net/http/h2_bundle.go:7436
#    0x66680a    net/http.(*http2ClientConn).RoundTrip+0x46a                                                /usr/local/go/src/net/http/h2_bundle.go:7289
#    0x663214    net/http.(*http2Transport).RoundTripOpt+0x144                                                /usr/local/go/src/net/http/h2_bundle.go:6858
#    0x662dd9    net/http.(*http2Transport).RoundTrip+0x39                                                /usr/local/go/src/net/http/h2_bundle.go:6820
#    0x6476b8    net/http.http2noDialH2RoundTripper.RoundTrip+0x38                                            /usr/local/go/src/net/http/h2_bundle.go:990
#    0x690a16    net/http.(*Transport).RoundTrip+0xd46                                                    /usr/local/go/src/net/http/transport.go:371
#    0x63b818    net/http.send+0x1a8                                                            /usr/local/go/src/net/http/client.go:249
#    0x63b4ac    net/http.(*Client).send+0xfc                                                        /usr/local/go/src/net/http/client.go:173
...

-- goroutine waiting for connection lock, held above, while holding the pool lock
1 @ 0x43bddc 0x43bece 0x44d2c4 0x44cfdd 0x48115e 0x66546f 0x6457d8 0x646dce 0x6631c7 0x662dda 0x6476b9 0x690a17 0x63b819 0x63b4ad 0x63cb4d 0xb1d000 0xe641df 0xe6213e 0xe8245c 0xe7f42e 0xe7ad28 0xe61e87 0xf598ed 0xf59637 0x46b1f1
#    0x44cfdc    sync.runtime_SemacquireMutex+0x3c                                                    /usr/local/go/src/runtime/sema.go:71
#    0x48115d    sync.(*Mutex).Lock+0xed                                                            /usr/local/go/src/sync/mutex.go:134
#    0x66546e    net/http.(*http2ClientConn).CanTakeNewRequest+0x3e                                            /usr/local/go/src/net/http/h2_bundle.go:7083
#    0x6457d7    net/http.(*http2clientConnPool).getClientConn+0xe7                                            /usr/local/go/src/net/http/h2_bundle.go:739
#    0x646dcd    net/http.http2noDialClientConnPool.GetClientConn+0x4d                                            /usr/local/go/src/net/http/h2_bundle.go:927
#    0x6631c6    net/http.(*http2Transport).RoundTripOpt+0xf6                                                /usr/local/go/src/net/http/h2_bundle.go:6852
#    0x662dd9    net/http.(*http2Transport).RoundTrip+0x39                                                /usr/local/go/src/net/http/h2_bundle.go:6820
#    0x6476b8    net/http.http2noDialH2RoundTripper.RoundTrip+0x38                                            /usr/local/go/src/net/http/h2_bundle.go:990
#    0x690a16    net/http.(*Transport).RoundTrip+0xd46
...

-- goroutines blocked waiting for the pool lock held above
13033 @ 0x43bddc 0x43bece 0x44d2c4 0x44cfdd 0x48115e 0x645754 0x646dce 0x6631c7 0x662dda 0x6476b9 0x690a17 0x63b819 0x63b4ad 0x63cb4d 0xb1d000 0xe641df 0xe6213e 0xe8245c 0xe7f42e 0xe7ad28 0xe61e87 0xf598ed 0xf59637 0x46b1f1
#   0x44cfdc    sync.runtime_SemacquireMutex+0x3c                                                   /usr/local/go/src/runtime/sema.go:71
#   0x48115d    sync.(*Mutex).Lock+0xed                                                         /usr/local/go/src/sync/mutex.go:134
#   0x645753    net/http.(*http2clientConnPool).getClientConn+0x63                                          /usr/local/go/src/net/http/h2_bundle.go:737
#   0x646dcd    net/http.http2noDialClientConnPool.GetClientConn+0x4d                                           /usr/local/go/src/net/http/h2_bundle.go:927
#   0x6631c6    net/http.(*http2Transport).RoundTripOpt+0xf6                                                /usr/local/go/src/net/http/h2_bundle.go:6852
#   0x662dd9    net/http.(*http2Transport).RoundTrip+0x39                                               /usr/local/go/src/net/http/h2_bundle.go:6820
#   0x6476b8    net/http.http2noDialH2RoundTripper.RoundTrip+0x38                                           /usr/local/go/src/net/http/h2_bundle.go:990
#   0x690a16    net/http.(*Transport).RoundTrip+0xd46                                                   /usr/local/go/src/net/http/transport.go:371
#   0x63b818    net/http.send+0x1a8                                                         /usr/local/go/src/net/http/client.go:249
#   0x63b4ac    net/http.(*Client).send+0xfc                                                        /usr/local/go/src/net/http/client.go:173
#   0x63cb4c    net/http.(*Client).Do+0x28c                                                     /usr/local/go/src/net/http/client.go:602
@evanj

This comment has been minimized.

Copy link
Contributor

@evanj evanj commented Jul 28, 2019

I believe we just hit this same issue or extremely similar issue with go1.12.7. I have thread stacks from the "blocked" threads from kill -ABRT below. To be clear: The caller of http.Client.Do is stuck in http2ClientConn.roundTrip on cc.wmu.Lock() so it can call cc.writeHeaders. The holder of that lock is from scheduleBodyWrite/http2clientStream.writeRequestBody, which is doing an actual network write that is hung.

As a result: Client.Do ignores any context Deadlines in this code path.

Possible fixes:

  • The RoundTrip thread can't try to lock cc.wmu since it can get stuck on network I/O, so writing headers needs to be done with some Go routines so it can use something like awaitRequestCancel.
  • Don't attempt to reuse HTTP/2 connections while another request is using it to send

Details

Stripe Veneur periodically sends a set of HTTP2 POSTs and waits for them to complete. When it is getting CPU overloaded, we have observed these calls to Client.Do getting "stuck" for ~10 minutes, even though the HTTP requests set a 10 second deadline. I think the following is happening:

  1. One thread calls http.Client.Do, but then times out waiting for the response. However, it has already called scheduleBodyWrite.
  2. scheduleBodyWrite's goroutine holds cc.wmu while trying to send the POST body. Because the system is overloaded, this is now stuck waiting for some sort of TCP timer.
  3. Another thread starts http.Client.Do. It gets the same connection from the pool, since it is an HTTP2 connection that supports multiple streams.
  4. This eventually calls:
    cc.wmu.Lock()
    endStream := !hasBody && !hasTrailers
    werr := cc.writeHeaders(cs.ID, endStream, int(cc.maxFrameSize), hdrs)
    cc.wmu.Unlock()
  1. The client gets stuck waiting on cc.wmu.Lock(), and will wait until the io operation completes, ignoring the context

Stacks

The first thread is the writer, holding cc.wmu. The second is the OTHER writer, blocked waiting on cc.wmu. The third is the read loop, for completeness (this lets us see that the http2Framer belongs to the connection, so these are in fact the threads involved in this issue.

internal/poll.runtime_pollWait(0x7f2818062bb0, 0x77, 0xffffffffffffffff)
    /usr/local/go/src/runtime/netpoll.go:182 +0x56
internal/poll.(*pollDesc).wait(0xc003c06298, 0x77, 0x100, 0x19f, 0xffffffffffffffff)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x9b
internal/poll.(*pollDesc).waitWrite(...)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:96
internal/poll.(*FD).Write(0xc003c06280, 0xc004ed7000, 0x401d, 0x401d, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:276 +0x26e
net.(*netFD).Write(0xc003c06280, 0xc004ed7000, 0x401d, 0x401d, 0x401d, 0x4018, 0x0)
    /usr/local/go/src/net/fd_unix.go:220 +0x4f
net.(*conn).Write(0xc000013460, 0xc004ed7000, 0x401d, 0x401d, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:189 +0x69
crypto/tls.(*Conn).write(0xc005b38e00, 0xc004ed7000, 0x401d, 0x401d, 0xc009306000, 0x4000, 0x40000)
    /usr/local/go/src/crypto/tls/conn.go:896 +0x160
crypto/tls.(*Conn).writeRecordLocked(0xc005b38e00, 0x15f4217, 0xc009306000, 0x10009, 0x40000, 0x3, 0x3, 0xc000c24daa)
    /usr/local/go/src/crypto/tls/conn.go:945 +0x2a1
crypto/tls.(*Conn).Write(0xc005b38e00, 0xc0092fe000, 0x10009, 0x48000, 0x0, 0x0, 0x0)
    /usr/local/go/src/crypto/tls/conn.go:1114 +0x184
net/http.http2stickyErrWriter.Write(0x7f281c2393f0, 0xc005b38e00, 0xc00427bad8, 0xc0092fe000, 0x10009, 0x48000, 0x1, 0xc01f3b20c0, 0xc000c24d40)
    /usr/local/go/src/net/http/h2_bundle.go:6792 +0x68
bufio.(*Writer).Write(0xc004686480, 0xc0092fe000, 0x10009, 0x48000, 0xc000052a70, 0xc000052a00, 0x0)
    /usr/local/go/src/bufio/bufio.go:622 +0x145
net/http.(*http2Framer).endWrite(0xc00330fb20, 0x0, 0x0)
    /usr/local/go/src/net/http/h2_bundle.go:1605 +0xab
net/http.(*http2Framer).WriteDataPadded(0xc00330fb20, 0x3bccb, 0xc0084ce000, 0x10000, 0x80000, 0x0, 0x0, 0x0, 0x1525493, 0xc000c24f88)
    /usr/local/go/src/net/http/h2_bundle.go:1927 +0x25d
net/http.(*http2Framer).WriteData(...)
    /usr/local/go/src/net/http/h2_bundle.go:1885
net/http.(*http2clientStream).writeRequestBody(0xc003362000, 0x7f281c239488, 0xc000084d70, 0x7f281c2394a8, 0xc000084d70, 0x0, 0x0)
    /usr/local/go/src/net/http/h2_bundle.go:7690 +0x337
net/http.(*http2Transport).getBodyWriterState.func1()
    /usr/local/go/src/net/http/h2_bundle.go:8878 +0xc2
created by net/http.http2bodyWriterState.scheduleBodyWrite
    /usr/local/go/src/net/http/h2_bundle.go:8925 +0xa3

goroutine 3263511 [semacquire, 10 minutes]:
sync.runtime_SemacquireMutex(0xc00427bad4, 0x40df00)
    /usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc00427bad0)
    /usr/local/go/src/sync/mutex.go:134 +0x109
net/http.(*http2ClientConn).roundTrip(0xc00427b980, 0xc005448300, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/http/h2_bundle.go:7444 +0x467
net/http.(*http2Transport).RoundTripOpt(0xc002094180, 0xc005448300, 0xc01ea24600, 0x6dedb6, 0x0, 0xc000914240)
    /usr/local/go/src/net/http/h2_bundle.go:6867 +0x159
net/http.(*http2Transport).RoundTrip(...)
    /usr/local/go/src/net/http/h2_bundle.go:6829
net/http.http2noDialH2RoundTripper.RoundTrip(0xc002094180, 0xc005448300, 0xc000838000, 0x5, 0xc0009142c8)
    /usr/local/go/src/net/http/h2_bundle.go:8959 +0x3f
net/http.(*Transport).roundTrip(0xc00038c480, 0xc005448300, 0xc02eee6240, 0xc029bcd9bc, 0xc029bcd9c0)
    /usr/local/go/src/net/http/transport.go:430 +0xe90
net/http.(*Transport).RoundTrip(0xc00038c480, 0xc005448300, 0xc00038c480, 0xbf4650ff70c64558, 0x459b8945e32c2)
    /usr/local/go/src/net/http/roundtrip.go:17 +0x35
net/http.send(0xc005448200, 0x1791b80, 0xc00038c480, 0xbf4650ff70c64558, 0x459b8945e32c2, 0x23e7c00, 0xc029e7a3a8, 0xbf4650ff70c64558, 0x1, 0x0)
    /usr/local/go/src/net/http/client.go:250 +0x461
net/http.(*Client).send(0xc0003aa660, 0xc005448200, 0xbf4650ff70c64558, 0x459b8945e32c2, 0x23e7c00, 0xc029e7a3a8, 0x0, 0x1, 0x0)
    /usr/local/go/src/net/http/client.go:174 +0xfb
net/http.(*Client).do(0xc0003aa660, 0xc005448200, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/http/client.go:641 +0x279
net/http.(*Client).Do(...)
    /usr/local/go/src/net/http/client.go:509
github.com/stripe/veneur/http.PostHelper(0x17c4ce0, 0xc000492ed0, 0xc0003aa660, 0xc000440400, 0x1515d91, 0x4, 0xc000838000, 0x50, 0x135a860, 0xc000493200, ...)
    /go/src/github.com/stripe/veneur/http/http.go:229 +0xc94
github.com/stripe/veneur/sinks/datadog.(*DatadogMetricSink).flushPart(0xc0001e2240, 0x17c4ce0, 0xc000492ed0, 0xc034f00000, 0x4ebf, 0x4ebf, 0xc00821a850)
    /go/src/github.com/stripe/veneur/sinks/datadog/datadog.go:356 +0x30f
created by github.com/stripe/veneur/sinks/datadog.(*DatadogMetricSink).Flush
    /go/src/github.com/stripe/veneur/sinks/datadog/datadog.go:144 +0x40a

goroutine 1319 [IO wait, 13 minutes]:
internal/poll.runtime_pollWait(0x7f2818062bb0, 0x72, 0xffffffffffffffff)
    /usr/local/go/src/runtime/netpoll.go:182 +0x56
internal/poll.(*pollDesc).wait(0xc003c06298, 0x72, 0x1900, 0x199a, 0xffffffffffffffff)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x9b
internal/poll.(*pollDesc).waitRead(...)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc003c06280, 0xc003c1af80, 0x199a, 0x199a, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:169 +0x19b
net.(*netFD).Read(0xc003c06280, 0xc003c1af80, 0x199a, 0x199a, 0x203003, 0x0, 0x198d)
    /usr/local/go/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc000013460, 0xc003c1af80, 0x199a, 0x199a, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:177 +0x69
crypto/tls.(*atLeastReader).Read(0xc00e6373e0, 0xc003c1af80, 0x199a, 0x199a, 0xc0126b6068, 0x2, 0xc0005929e0)
    /usr/local/go/src/crypto/tls/conn.go:761 +0x60
bytes.(*Buffer).ReadFrom(0xc005b39058, 0x178fae0, 0xc00e6373e0, 0x40b805, 0x1397300, 0x14c5ee0)
    /usr/local/go/src/bytes/buffer.go:207 +0xbd
crypto/tls.(*Conn).readFromUntil(0xc005b38e00, 0x1791ae0, 0xc000013460, 0x5, 0xc000013460, 0x1)
    /usr/local/go/src/crypto/tls/conn.go:783 +0xf8
crypto/tls.(*Conn).readRecordOrCCS(0xc005b38e00, 0x15f4200, 0xc005b38f38, 0xc000592d58)
    /usr/local/go/src/crypto/tls/conn.go:590 +0x125
crypto/tls.(*Conn).readRecord(...)
    /usr/local/go/src/crypto/tls/conn.go:558
crypto/tls.(*Conn).Read(0xc005b38e00, 0xc006189000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/crypto/tls/conn.go:1236 +0x137
bufio.(*Reader).Read(0xc0003f3380, 0xc00330fb58, 0x9, 0x9, 0x407e54, 0xc028298480, 0xc000592d58)
    /usr/local/go/src/bufio/bufio.go:223 +0x23e
io.ReadAtLeast(0x178f920, 0xc0003f3380, 0xc00330fb58, 0x9, 0x9, 0x9, 0x178fcc0, 0xc02e520ad0, 0xc000084060)
    /usr/local/go/src/io/io.go:310 +0x88
io.ReadFull(...)
    /usr/local/go/src/io/io.go:329
net/http.http2readFrameHeader(0xc00330fb58, 0x9, 0x9, 0x178f920, 0xc0003f3380, 0x0, 0x0, 0xbf4650d0587ebd97, 0x4598cb66a5319)
    /usr/local/go/src/net/http/h2_bundle.go:1476 +0x88
net/http.(*http2Framer).ReadFrame(0xc00330fb20, 0xc03755a150, 0x0, 0x0, 0x1)
    /usr/local/go/src/net/http/h2_bundle.go:1734 +0xa1
net/http.(*http2clientConnReadLoop).run(0xc000592fb8, 0x15f3a90, 0xc0005427b8)
    /usr/local/go/src/net/http/h2_bundle.go:8102 +0x8f
net/http.(*http2ClientConn).readLoop(0xc00427b980)
    /usr/local/go/src/net/http/h2_bundle.go:8030 +0x76
created by net/http.(*http2Transport).newClientConn
    /usr/local/go/src/net/http/h2_bundle.go:7093 +0x637
@evanj

This comment has been minimized.

Copy link
Contributor

@evanj evanj commented Jul 28, 2019

It looks like CL https://go-review.googlesource.com/c/net/+/181457/ mentioned on issue #32388 will probably address my issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.