Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: EOF returned from http.Transport #53472

Open
mtrmac opened this issue Jun 21, 2022 · 13 comments
Open

net/http: EOF returned from http.Transport #53472

mtrmac opened this issue Jun 21, 2022 · 13 comments
Assignees
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@mtrmac
Copy link

mtrmac commented Jun 21, 2022

What version of Go are you using (go version)?

$ go version
go version go1.18.3 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/mitr/Library/Caches/go-build"
GOENV="/Users/mitr/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/mitr/Go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/mitr/Go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.18.3"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/mitr/Go/src/github.com/containers/image/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/tp/yfcwvlb55vx8lkv5gppb43cm0000gn/T/go-build3737075247=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Given a HTTP server that reads the request, but (cleanly) closes the connection without producing any response:

package main

import (
	"fmt"
	"log"
	"net"
	"net/http"
)

func server(ln net.Listener) {
	for {
		conn, err := ln.Accept()
		if err != nil {
			return
		}
		log.Printf("%v: Accepted", conn.RemoteAddr())
		var buf [4096]byte          // Hopefully enough for a full header
		n, err := conn.Read(buf[:]) // Completely read and ignore the header
		if err != nil {
			panic(err)
		}
		log.Printf("%v: Read %d", conn.RemoteAddr(), n)
		err = conn.Close()
		if err != nil {
			panic(err)
		}
		log.Printf("%v: Closed", conn.RemoteAddr())
	}
}

func main() {
	ln, err := net.Listen("tcp", "127.0.0.1:0")
	if err != nil {
		panic(err)
	}
	go server(ln)

	res, err := http.Get(fmt.Sprintf("http://%s/", ln.Addr().String()))
	fmt.Printf("res=%#v, err=%v (%#v)", res, err, err)
}

What did you expect to see?

An error saying something about an unexpectedly closed connection.

What did you see instead?

2022/06/21 02:28:30 127.0.0.1:64135: Accepted
2022/06/21 02:28:30 127.0.0.1:64135: Read 96
2022/06/21 02:28:30 127.0.0.1:64135: Closed
res=(*http.Response)(nil), err=Get "http://127.0.0.1:64134/": EOF (&url.Error{Op:"Get", URL:"http://127.0.0.1:64134/", Err:(*errors.errorString)(0xc000098060)})

i.e. the error is io.EOF, which seems inconsistent with the official definition of that value:

Functions should return EOF only to signal a graceful end of input. If the EOF occurs unexpectedly in a structured data stream, the appropriate error is either ErrUnexpectedEOF or some other error giving more detail.

Notes

I appreciate that this might not be possible to change due to the compatibility promise.

The immediate cause is

_, err := pc.br.Peek(1)
; if that returns io.EOF, it is wrapped in
err = transportReadFromServerError{err}
, but later only unwrapped to become raw io.EOF again, with no logic anywhere to turn it into an “this was unexpected” error.

@seankhliao
Copy link
Member

that's not a raw EOF, it's a net/url.Error

@mtrmac
Copy link
Author

mtrmac commented Jun 21, 2022

Oops, you’re completely correct. I was too focused on the user-visible text rather than on the type.

@mtrmac
Copy link
Author

mtrmac commented Jun 21, 2022

At least using the http.Transport implementation directly (keeping the same server method) really returns an io.EOF:

func main() {
	ln, err := net.Listen("tcp", "127.0.0.1:0")
	if err != nil {
		panic(err)
	}
	go server(ln)

	req, err := http.NewRequest(http.MethodGet, fmt.Sprintf("http://%s/", ln.Addr().String()), nil)
	if err != nil {
		panic(err)
	}
	res, err := http.DefaultTransport.RoundTrip(req)
	fmt.Printf("res=%#v, err=%v (%#v) isEOF=%v\n", res, err, err, err == io.EOF)
}

prints

res=(*http.Response)(nil), err=EOF (&errors.errorString{s:"EOF"}) isEOF=true

@mtrmac
Copy link
Author

mtrmac commented Jun 21, 2022

To be specific, my primary concern is about the lack of a clear user-visible error message text. Right now error messages in logs just say “EOF” with little to allow diagnosing the failure further: it doesn’t indicate the relevant component (client / server / proxy / hypothetically something else like a corrupt local (certificate?) file).

I can, of course, wrap the http.Do call with a check for net.url.Error{Err: io.EOF}, and replace the text, but … what’s a long-term reasonable replacement text? Hard-coding a knowledge about this particular code path might not be accurate if such an "EOF" string could possibly be returned by any other error path, and it is brittle WRT any future changes of the net/http implementation.

So I’d prefer receiving a more descriptive error message from the net/http client.

@seankhliao seankhliao changed the title net/http: EOF returned from http.Client.Do net/http: EOF returned from http.Transport Jun 22, 2022
@seankhliao
Copy link
Member

cc @neild

@seankhliao seankhliao added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 22, 2022
@seankhliao seankhliao added this to the Unplanned milestone Aug 20, 2022
@ckilb
Copy link

ckilb commented Nov 23, 2022

I just like to mention that I also struggle here.
My HTTP client is built like this:

tr := http.DefaultTransport.(*http.Transport).Clone()
tr.ExpectContinueTimeout = 10 * time.Second
tr.DisableKeepAlives = true
tr.IdleConnTimeout = 10 * time.Second

client := http.Client{
    Timeout:   10 * time.Second,
    Transport: tr
}

In a loop I create a new request and pass it to the client:

req, err := http.NewRequest("GET", u, nil)
...
resp, err := c.client.Do(req)

Most of the time err is nil - which is expected behaviour because the URL is valid and a valid response should be delivered. I test this in parallel with my browser.

But in a few cases I get back an EOF error. Without any further information.
I spent already a few hours in debugging and the origin seems to be somwhere in
net/http/transport.go:(pc *persistConn) readLoop()

In there a transportReadFromServerError will be created with closeErr = http: persistConn.readLoop exiting and err= EOF.

It's very likely somehow related to the host. In parallel I do exactly the same with another (local go-) server and I never get that EOF error.

Still, I have no idea what's wrong here. I guess I have to dig even more into it.

I just like to mention here that it would be really, really nice to get better error messages here, especially because the error chain doesn't pass a stack trace.

@neild neild self-assigned this Nov 28, 2022
@lpoli lpoli mentioned this issue Apr 24, 2023
3 tasks
UnseenWizzard added a commit to Dynatrace/dynatrace-configuration-as-code that referenced this issue Aug 23, 2023
…error

For some reason the stdlib http client reports an unexpected closed connection as an io.EOF with no
further information.
This resulted in connection problems (like an internal firewall/proxy/connection config blocking
connections to a Dynatrace SaaS URL) to just be reported as EOF to users.

This fix makes some assumptions about the internals of the http client and may break in the future,
but if the client is changed to no longer return io.EOF errors it will hopefully return a more
helpful error instead.

see also: golang/go#53472
UnseenWizzard added a commit to Dynatrace/dynatrace-configuration-as-code that referenced this issue Aug 23, 2023
…error

For some reason the stdlib http client reports an unexpected closed connection as an io.EOF with no
further information.
This resulted in connection problems (like an internal firewall/proxy/connection config blocking
connections to a Dynatrace SaaS URL) to just be reported as EOF to users.

This fix makes some assumptions about the internals of the http client and may break in the future,
but if the client is changed to no longer return io.EOF errors it will hopefully return a more
helpful error instead.

see also: golang/go#53472
UnseenWizzard added a commit to Dynatrace/dynatrace-configuration-as-code that referenced this issue Aug 23, 2023
…error

For some reason the stdlib http client reports an unexpected closed connection as an io.EOF with no
further information.
This resulted in connection problems (like an internal firewall/proxy/connection config blocking
connections to a Dynatrace SaaS URL) to just be reported as EOF to users.

This fix makes some assumptions about the internals of the http client and may break in the future,
but if the client is changed to no longer return io.EOF errors it will hopefully return a more
helpful error instead.

see also: golang/go#53472
UnseenWizzard added a commit to Dynatrace/dynatrace-configuration-as-code that referenced this issue Aug 23, 2023
…error

For some reason the stdlib http client reports an unexpected closed connection as an io.EOF with no
further information.
This resulted in connection problems (like an internal firewall/proxy/connection config blocking
connections to a Dynatrace SaaS URL) to just be reported as EOF to users.

This fix makes some assumptions about the internals of the http client and may break in the future,
but if the client is changed to no longer return io.EOF errors it will hopefully return a more
helpful error instead.

see also: golang/go#53472
UnseenWizzard added a commit to Dynatrace/dynatrace-configuration-as-code-core that referenced this issue Aug 23, 2023
…error

For some reason the stdlib http client reports an unexpected closed connection as an io.EOF with no
further information.
This results in connection problems (like an internal firewall/proxy/connection config blocking
connections to a Dynatrace SaaS URL) to just be reported as EOF to users.

This fix makes some assumptions about the internals of the http client and may break in the future,
but if the client is changed to no longer return io.EOF errors it will hopefully return a more
helpful error instead.

see also: golang/go#53472
UnseenWizzard added a commit to Dynatrace/dynatrace-configuration-as-code-core that referenced this issue Aug 23, 2023
…error

For some reason the stdlib http client reports an unexpected closed connection as an io.EOF with no
further information.
This results in connection problems (like an internal firewall/proxy/connection config blocking
connections to a Dynatrace SaaS URL) to just be reported as EOF to users.

This fix makes some assumptions about the internals of the http client and may break in the future,
but if the client is changed to no longer return io.EOF errors it will hopefully return a more
helpful error instead.

see also: golang/go#53472
UnseenWizzard added a commit to Dynatrace/dynatrace-configuration-as-code that referenced this issue Aug 23, 2023
…error

For some reason the stdlib http client reports an unexpected closed connection as an io.EOF with no
further information.
This resulted in connection problems (like an internal firewall/proxy/connection config blocking
connections to a Dynatrace SaaS URL) to just be reported as EOF to users.

This fix makes some assumptions about the internals of the http client and may break in the future,
but if the client is changed to no longer return io.EOF errors it will hopefully return a more
helpful error instead.

see also: golang/go#53472
@G-M-twostay
Copy link

G-M-twostay commented Aug 12, 2024

I'm also experiencing this issue even on local. It can happen when many connections are made.

var DefaultTransport = http.Transport{
	DisableCompression:    true,
	IdleConnTimeout:       4 * time.Second,
	ResponseHeaderTimeout: 1 * time.Second,
	MaxResponseHeaderBytes: 256,
	MaxIdleConns:           16,
}

The error message isn't very helpful because it's a plain "EOF". I read the server side log and it seems that the client closes the connection when this "EOF" happens.

I was wondering if it's related to server side closing idle connection when Client side just established the connection.

@dntam00
Copy link

dntam00 commented Nov 15, 2024

@G-M-twostay yes, based on the docs:

// transportReadFromServerError is used by Transport.readLoop when the
// 1 byte peek read fails and we're actually anticipating a response.
// Usually this is just due to the inherent keep-alive shut down race,
// where the server closed the connection at the same time the client
// wrote. The underlying err field is usually io.EOF or some
// ECONNRESET sort of thing which varies by platform. But it might be
// the user's custom net.Conn.Read error too, so we carry it along for
// them to return from Transport.RoundTrip.
type transportReadFromServerError struct {
	err error
}

You should make the connection idle timeout less than the value configured on the server (or LB).

@timoffex
Copy link

Copying over from my issue which got marked as duplicate of this one:


I maintain a library that has a portion written in Go. Every once in a while, our users encounter issues caused by HTTP requests failing with just the message "EOF", for instance

{"time":"2024-10-09T11:33:30.483338864-04:00","level":"DEBUG","msg":"[ERR] POST ---URL--- request failed: Post "---URL---": EOF"}

The outer portion of this message ("[ERR] POST ... failed:") comes from the hashicorp/retryablehttp package. The inner portion is from net/http.

The message is unhelpful and does not help me narrow down the issue, other than that it's a protocol problem (which is also not at all obvious). Users who experience this tend to have unique network configurations, and failures are consistent but don't happen for all kinds HTTP requests made by the client (certain requests consistently work, and other requests consistently fail). Without more context, it's very hard to help users debug their configurations.

It would be massively helpful if net/http simply used fmt.Errorf("...: %w", err) in a few places. Likewise for x/net/http2 (I don't know how to tell which we're using).


It seems this issue has been open a long time. Are there any plans to fix this, or any way to contribute a fix? Also, is there any advice for how to debug this sort of problem?

@timoffex
Copy link

To add to this, we have a user who partially fixed the EOF by using a build of the library downgraded to Go 1.22. We don't see anything in the Go release notes or the GODEBUG history that looks relevant (the SSL changes are suspicious, but I've seen them result in better error messages such as "tls: failed to parse certificate from server: x509: negative serial number"). If it's a regression in the Go standard library, then without a useful error message, I can't file a useful bug report.

@ianlancetaylor
Copy link
Member

@timoffex

any way to contribute a fix?

Patches are certainly welcome. See https://go.dev/doc/contribute. Thanks.

is there any advice for how to debug this sort of problem?

The first post in this issue points to some code that may be at fault.

@timoffex
Copy link

timoffex commented Dec 11, 2024

Did a little bit of digging through the source. Like @mtrmac identified, it looks like the error comes from the 1-byte Peek() in readLoop(). In fact, it doesn't look like readLoop() can return an io.EOF directly at all, and transportReadFromServerError is the only error it returns that could wrap an io.EOF.

persistConn.readLoop() is invoked from persistConn.roundTrip(). This maps the readLoop() error, but it does not unwrap it and also cannot return an io.EOF to the caller. Instead, the unwrapping happens in Transport.roundTrip():

} else if !pconn.shouldRetryRequest(req, err) {
// Issue 16465: return underlying net.Conn.Read error from peek,
// as we've historically done.
if e, ok := err.(nothingWrittenError); ok {
err = e.error
}
if e, ok := err.(transportReadFromServerError); ok {
err = e.err
}
if b, ok := req.Body.(*readTrackingBody); ok && !b.didClose {
// Issue 49621: Close the request body if pconn.roundTrip
// didn't do so already. This can happen if the pconn
// write loop exits without reading the write request.
req.closeBody()
}
return nil, err
}

which references #16465. It appears that the code unwrapping transportReadFromServerError was added in ea2376f due to #16465. The related issue #13667 has a repro for the problem: a server that accepts all TCP connections and immediately closes them will result in io.EOF.


EDIT: It turned out the Go version was a red herring. The user was experiencing dropped connections because of a misconfigured "maximum transmission size" setting, causing packets beyond a certain size to get dropped. This manifested as slow network speeds and occasional EOF errors due to connections getting closed.

Why would this start happening for a user who upgraded from a version of our library compiled with Go 1.22 to one with Go 1.23? The TLS changes in the 1.22->1.23 version bump seem suspect. One more clue is that one particular user only sees io.EOF errors when they don't use an HTTP proxy. Our library connects to a server program that has a different release schedule, and it's possible the user is still running a server compiled with Go 1.22.

Maybe TLS differences between Go 1.22 and 1.23 could cause a Go 1.22 HTTP server to close TCP connections from a Go 1.23 HTTP client under some conditions?

Another path to explore is whether any net.Conn instance could return an io.EOF for any reason other than the remote server closing the connection.


Leaving these notes here in case it helps others debug this error. Unfortunately, wrapping the EOF may be impossible due to backward compatibility requirements. But if it only has a single meaning (an unexpectedly closed connection), this meaning could perhaps be documented on Transport.RoundTrip, along with debugging advice.

@timoffex
Copy link

Another way for an EOF to manifest is described here: 5dd372b

If we try to reuse a connection that the server is in the process of
closing, we may end up successfully writing out our request (or a
portion of our request) only to find a connection error when we try to
read from (or finish writing to) the socket. This manifests as an EOF
returned from the Transport's RoundTrip.

Our requests are not idempotent (we use POST), but we retry EOF errors and see more EOFs, so it's unlikely to be a connection reuse-related race.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

8 participants