Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: TCP RST handling is unreliable on OSX #50254

Open
marten-seemann opened this issue Dec 18, 2021 · 0 comments
Open

net: TCP RST handling is unreliable on OSX #50254

marten-seemann opened this issue Dec 18, 2021 · 0 comments

Comments

@marten-seemann
Copy link
Contributor

@marten-seemann marten-seemann commented Dec 18, 2021

What version of Go are you using (go version)?

$ go version
go version go1.17.5 darwin/amd64

Does this issue reproduce with the latest release?

Yes, tested with go.18beta1.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/marten/Library/Caches/go-build"
GOENV="/Users/marten/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/marten/src/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/marten/src/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.17.5/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.17.5/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.17.5"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/marten/src/go/src/github.com/libp2p/go-libp2p/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/q0/b5ynf00142l7bl9sp8y098zr0000gn/T/go-build1068436186=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I'm seeing occasional TCP connection timeouts, even though the other side has sent (and we have received) the TCP RST packet.

I managed to reproduce the failure using the following minimal working example. It reliably fails at least once when run 1000 times (go test -run TestLinger -count 1000 -v -failfast).

import (
	"fmt"
	"io"
	"math/rand"
	"net"
	"testing"
	"time"
)

func TestLinger(t *testing.T) {
	ln, err := net.ListenTCP("tcp4", nil)
	if err != nil {
		t.Fatal(err)
	}
	defer ln.Close()

	done := make(chan struct{})
	accepted := make(chan struct{})
	go func() {
		defer close(done)
		conn, err := ln.Accept()
		if err != nil {
			t.Error(err)
		}
		close(accepted)
		io.ReadAll(conn)
	}()

	conn, err := net.DialTCP("tcp4", nil, ln.Addr().(*net.TCPAddr))
	if err != nil {
		t.Fatal(err)
	}
	// This makes sure a TCP RST is sent when Close is called.
	if err := conn.SetLinger(0); err != nil {
		t.Fatal(err)
	}
	fmt.Printf("%s <-> %s\n", conn.LocalAddr(), conn.RemoteAddr())
	go func() {
		for {
			b := make([]byte, 1+rand.Intn(1000))
			_, err := conn.Write(b)
			if err != nil {
				return
			}
			time.Sleep(time.Duration(rand.Intn(1000)) * time.Microsecond)
		}
	}()

	<-accepted
	time.Sleep(time.Duration(rand.Intn(30)) * time.Millisecond)
	if err := conn.Close(); err != nil {
		t.Fatal(err)
	}

	select {
	case <-done:
	case <-time.After(10 * time.Second):
		t.Fatal("remote conn didn't close")
	}

What did you expect to see?

I expect the tcp.Conn to be reliably closed / reset when we receive the TCP RST packet.
This works reliably on Linux (incl. the code I posted above), but not on OSX.

What did you see instead?

Occasionally (the test case above fails in maybe 1 out of 100 runs), the TCP RST doesn't seem to have any effect on the connection at all. The connection stays open, and eventually (after a long time) runs into a connection timeout.

Here's a pcap of this transport:
tcprst.pcapng.gz, showing that the RST was actually (and received).

marten-seemann added a commit to libp2p/go-libp2p that referenced this issue Dec 18, 2021
TCP RST handling is flaky on OSX, see golang/go#50254.
We can avoid this test from randomly failing by using QUIC instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants