Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/internal/socket: race detector not detecting ipv4.PacketConn.ReadFrom as writer #35329

Closed
yath opened this issue Nov 3, 2019 · 7 comments
Labels
Milestone

Comments

@yath
Copy link

@yath yath commented Nov 3, 2019

Hi,

I’ve been tracking down a race condition in my code that was caused by sharing a []byte buffer (or a subslice of it) between one "golang.org/x/net/ipv4".(*PacketConn).ReadFrom reader goroutine and another one processing it. I was surprised by the race detector not complaining about this at all, it seems like the ReadFrom isn’t detected as a writer to that buffer(’s data).

The same buffer sharing misuse is being reported when I use a plain copy() instead. Below are two test cases.

What version of Go are you using (go version)?

$ go version
go version go1.13.1 linux/amd64

Does this issue reproduce with the latest release?

Yes (tested with go.13.4 on arm64).

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/yath/.cache/go-build"
GOENV="/home/yath/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/yath/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.13"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.13/pkg/tool/linux_amd64"
GCCGO="/usr/bin/gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/yath/lect/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build357863678=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Incorrectly sharing a []byte buffer between goroutines and running go test -race. Here’s the sample test code, the first goroutine sends "1\n" through "9\n" in 10ms intervals through UDP localhost:1234. The second goroutine receives this data from a listening socket to a buffer, sending buf[:1] over a channel to the main goroutine that receives in 20ms intervals:

func TestRacePacketConn(t *testing.T) {
        type res struct {
                buf []byte
        }

        go func() {
                conn, err := net.Dial("udp", "127.0.0.1:12345")
                if err != nil {
                        t.Fatalf("can't dial localhost: %v", err)
                }
                for i := 0; i < 10; i++ {
                        time.Sleep(10 * time.Millisecond)
                        if _, err := fmt.Fprintf(conn, "%d\n", i); err != nil {
                                t.Fatalf("can't write %d: %v", i, err)
                        }
                }
        }()

        c := make(chan res, 1)
        go func() {
                conn, err := net.ListenPacket("udp", "127.0.0.1:12345")
                if err != nil {
                        t.Fatalf("can't listen on localhost: %v", err)
                }

                pc := ipv4.NewPacketConn(conn)
                defer pc.Close()

                buf := make([]byte, 100) // <- This buffer’s data should not be shared
                for i := 0; i < 10; i++ {
                        _, _, _, err := pc.ReadFrom(buf)
                        if err != nil {
                                t.Fatalf("can't read: %v", err)
                        }
                        c <- res{buf[:1]}
                }
                close(c)
        }()

        i := []byte{'0'}
        for r := range c {
                if b := r.buf[0]; b != i[0] {
                        t.Errorf("buf[0] = %v, want %v", b, i[0])
                }
                i[0]++
                time.Sleep(20 * time.Millisecond)
        }
}

What did you expect to see?

The tests failing and the race detector detecting a race, such like in this example that uses a plain copy() instead of PacketConn.ReadFrom:

func TestRaceCopy(t *testing.T) {
        type res struct {
                buf []byte
        }

        c := make(chan res, 1)
        go func() {
                buf := make([]byte, 100)
                for i := 0; i < 10; i++ {
                        time.Sleep(10 * time.Millisecond)
                        copy(buf, fmt.Sprintf("%d", i))
                        c <- res{buf[:1]}
                }
                close(c)
        }()

        i := []byte{'0'}
        for r := range c {
                if b := r.buf[0]; b != i[0] {
                        t.Errorf("buf[0] = %v, want %v", b, i[0])
                }
                i[0]++
                time.Sleep(20 * time.Millisecond)
        }
}
% go test -v -race
=== RUN   TestRaceCopy                     
==================
WARNING: DATA RACE
Write at 0x00c00001a0e0 by goroutine 9:
  runtime.slicestringcopy()
      /usr/lib/go-1.13/src/runtime/slice.go:232 +0x0
  github.com/yath/lect/rctest.TestRaceCopy.func1()
      /home/yath/lect/rctest/race_test.go:22 +0x108

Previous read at 0x00c00001a0e0 by goroutine 8:
  github.com/yath/lect/rctest.TestRaceCopy()
      /home/yath/lect/rctest/race_test.go:30 +0x12a
  testing.tRunner()
      /usr/lib/go-1.13/src/testing/testing.go:909 +0x199

Goroutine 9 (running) created at:
  github.com/yath/lect/rctest.TestRaceCopy()
      /home/yath/lect/rctest/race_test.go:18 +0x7a
  testing.tRunner()
      /usr/lib/go-1.13/src/testing/testing.go:909 +0x199

Goroutine 8 (running) created at:
  testing.(*T).Run()
      /usr/lib/go-1.13/src/testing/testing.go:960 +0x651
  testing.runTests.func1()
      /usr/lib/go-1.13/src/testing/testing.go:1202 +0xa6
  testing.tRunner()
      /usr/lib/go-1.13/src/testing/testing.go:909 +0x199
  testing.runTests()
      /usr/lib/go-1.13/src/testing/testing.go:1200 +0x521
  testing.(*M).Run()
      /usr/lib/go-1.13/src/testing/testing.go:1117 +0x2ff
  main.main()
      _testmain.go:46 +0x223
==================
--- FAIL: TestRaceCopy (0.21s)
    race_test.go:31: buf[0] = 51, want 50
    race_test.go:31: buf[0] = 52, want 51
    race_test.go:31: buf[0] = 53, want 52
    race_test.go:31: buf[0] = 54, want 53
    race_test.go:31: buf[0] = 55, want 54
    race_test.go:31: buf[0] = 56, want 55
    race_test.go:31: buf[0] = 57, want 56
    testing.go:853: race detected during execution of test

What did you see instead?

Just the tests failing, the race detector not having any issues.

--- FAIL: TestRacePacketConn (0.22s)
    race_test.go:80: buf[0] = 51, want 50
    race_test.go:80: buf[0] = 52, want 51
    race_test.go:80: buf[0] = 53, want 52
    race_test.go:80: buf[0] = 54, want 53
    race_test.go:80: buf[0] = 55, want 54
    race_test.go:80: buf[0] = 56, want 55
    race_test.go:80: buf[0] = 57, want 56
@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Nov 3, 2019

My guess is that there is some internal synchronization in (*PacketConn).ReadFrom that is masking the data race.
Probably x/net/ipv4/packet_go1_9.go:17, the c.rawOpt.RLock().

@yath

This comment has been minimized.

Copy link
Author

@yath yath commented Nov 4, 2019

I’m somehow ending up in ipv4.(*payloadHandler).ReadFrom (ipv4/payload_cmsg.go) in my test case, but that one has a similar locking that I have commented out for testing, without any change.

What did make the race detector trigger, though, was not passing the buffer directly to the syscall but allocating a temporary one for the struct iovec:

diff --git a/ipv4/payload_cmsg.go b/ipv4/payload_cmsg.go
index e761466..7250ae5 100644
--- a/ipv4/payload_cmsg.go
+++ b/ipv4/payload_cmsg.go
@@ -27,10 +27,12 @@ func (c *payloadHandler) ReadFrom(b []byte) (n int, cm *ControlMessage, src net.
        c.rawOpt.RUnlock()
        switch c.PacketConn.(type) {
        case *net.UDPConn:
-               m.Buffers = [][]byte{b}
+               buf := make([]byte, len(b), cap(b))
+               m.Buffers = [][]byte{buf}
                if err := c.RecvMsg(&m, 0); err != nil {
                        return 0, nil, nil, &net.OpError{Op: "read", Net: c.PacketConn.LocalAddr().Network(), Source: c.PacketConn.LocalAddr(), Err: err}
                }
+               copy(b, buf)
        case *net.IPConn:
                h := make([]byte, HeaderLen)
                m.Buffers = [][]byte{h, b}

Is it maybe the syscall in combination with the scatter-gather iovec that is masking the write?

@yath

This comment has been minimized.

Copy link
Author

@yath yath commented Nov 4, 2019

After digging through how I think -race works together with TSan, I wonder if the recvmsg() isn’t supposed to be hooked by lib/sanitizer_common/sanitizer_common_interceptors.inc somehow, but since socket.recvmsg() ends up doing syscall.Syscall(syscall.SYS_RECVMSG, …), there is no way for TSan to intercept this (or any?) syscall because it’s not going through libc?

@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Nov 4, 2019

Ok, I can reproduce by putting a buf[0]=0 statement just before the c.RecvMsg(&m, 0) call (@net/ipv4/payload_cmsg.go:31).
Without that statement, the race detector passes.
So you're right, RecvMsg somehow isn't marked as writing into its argument buffers.

@dvyukov : What is the right way to tell the race detector about reads and writes done by a syscall?

@dvyukov

This comment has been minimized.

Copy link
Member

@dvyukov dvyukov commented Nov 5, 2019

race.WriteRange(unsafe.Pointer(&p[0]), n)

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Nov 5, 2019

Change https://golang.org/cl/205461 mentions this issue: internal/socket: tell race detector about syscall reads and writes

@FiloSottile FiloSottile added the NeedsFix label Nov 5, 2019
@FiloSottile FiloSottile added this to the Go1.14 milestone Nov 5, 2019
@FiloSottile FiloSottile changed the title Race detector not detecting ipv4.PacketConn.ReadFrom as writer internal/socket: race detector not detecting ipv4.PacketConn.ReadFrom as writer Nov 5, 2019
@FiloSottile FiloSottile changed the title internal/socket: race detector not detecting ipv4.PacketConn.ReadFrom as writer x/net/internal/socket: race detector not detecting ipv4.PacketConn.ReadFrom as writer Nov 5, 2019
@FiloSottile FiloSottile modified the milestones: Go1.14, Unreleased Nov 5, 2019
@yath

This comment has been minimized.

Copy link
Author

@yath yath commented Nov 6, 2019

I can confirm that this CL fixes the issue for me and the race detector now also reports the bug in my non-test program correctly. Thanks for the quick reponse! 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.