Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: data race in (*TCPConn).File #24483

Closed
twmb opened this issue Mar 21, 2018 · 9 comments

Comments

Projects
None yet
5 participants
@twmb
Copy link
Contributor

commented Mar 21, 2018

What version of Go are you using (go version)?

go version go1.7.6 linux/amd64
go version go1.8.3 linux/amd64
go version go1.9.2 linux/amd64
go version go1.10 linux/amd64
go version devel +7974f0815e Mon Mar 19 21:51:23 2018 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/home/twmb/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/twmb/go"
GORACE=""
GOROOT="/home/twmb/go/go"
GOTMPDIR=""
GOTOOLDIR="/home/twmb/go/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build795784354=/tmp/go-build -gno-record-gcc-switches"

What did you do?

https://play.golang.org/p/7nPiZfCjbgq

What did you expect to see?

starting
done

What did you see instead?

starting
==================
WARNING: DATA RACE
Read at 0x00c420112c90 by goroutine 73:
  net.(*netFD).dup()
      /home/twmb/go/go/src/net/fd_unix.go:307 +0x40
  net.(*conn).File()
      /home/twmb/go/go/src/net/net.go:294 +0x4d
  main.main.func2()
      /home/twmb/testing/junk.go:29 +0x3a

Previous write at 0x00c420112c90 by goroutine 75:
  internal/poll.(*FD).destroy()
      /home/twmb/go/go/src/internal/poll/fd_unix.go:72 +0xa4
  internal/poll.(*FD).decref()
      /home/twmb/go/go/src/internal/poll/fd_mutex.go:211 +0x55
  internal/poll.(*FD).Close()
      /home/twmb/go/go/src/internal/poll/fd_unix.go:93 +0x6b
  net.(*netFD).Close()
      /home/twmb/go/go/src/net/fd_unix.go:184 +0x60
  net.(*conn).Close()
      /home/twmb/go/go/src/net/net.go:200 +0x60
  net.(*TCPConn).Close()
      <autogenerated>:1 +0x43
  main.main.func3()
      /home/twmb/testing/junk.go:35 +0x3e

Goroutine 73 (running) created at:
  main.main()
      /home/twmb/testing/junk.go:28 +0x13f

Goroutine 75 (finished) created at:
  main.main()
      /home/twmb/testing/junk.go:34 +0x16b
==================
==================
WARNING: DATA RACE
Read at 0x00c4204078af by goroutine 296:
  internal/poll.(*FD).Close()
      /home/twmb/go/go/src/internal/poll/fd_unix.go:99 +0x91
  net.(*netFD).Close()
      /home/twmb/go/go/src/net/fd_unix.go:184 +0x60
  net.(*conn).Close()
      /home/twmb/go/go/src/net/net.go:200 +0x60
  net.(*TCPConn).Close()
      <autogenerated>:1 +0x43
  main.main.func3()
      /home/twmb/testing/junk.go:35 +0x3e

Previous write at 0x00c4204078af by goroutine 263:
  internal/poll.(*FD).SetBlocking()
      /home/twmb/go/go/src/internal/poll/fd_unix.go:121 +0x8f
  net.(*netFD).dup()
      /home/twmb/go/go/src/net/fd_unix.go:316 +0x7d
  net.(*conn).File()
      /home/twmb/go/go/src/net/net.go:294 +0x4d
  main.main.func2()
      /home/twmb/testing/junk.go:29 +0x3a

Goroutine 296 (running) created at:
  main.main()
      /home/twmb/testing/junk.go:34 +0x16b

Goroutine 263 (running) created at:
  main.main()
      /home/twmb/testing/junk.go:28 +0x13f
==================
done

This is different from #24481 because one of the two races is present back in old versions of Go as well. The first (destroy + dup) is old. The second is likely from 0f3ab149ec4.

@bcmills bcmills added this to the Go1.11 milestone Mar 21, 2018

@bcmills bcmills changed the title net: concurrent Close with File is racy net: data race in (*TCPConn).File Mar 21, 2018

@bcmills bcmills added the NeedsFix label Mar 21, 2018

@bcmills bcmills changed the title net: data race in (*TCPConn).File internal/poll: data race in (*FD).SetBlocking Mar 21, 2018

@bcmills bcmills added NeedsDecision and removed NeedsFix labels Mar 21, 2018

@bcmills

This comment has been minimized.

Copy link
Member

commented Mar 21, 2018

net.Conn methods are explicitly safe for concurrent use, but File is not strictly a method of net.Conn (you had to type-assert to *net.TCPConn to call it).

Ideally File should be safe for concurrent use too, but it's not obvious to me what kind of thorny side-effects that would bring.

@bcmills

This comment has been minimized.

Copy link
Member

commented Mar 21, 2018

(CC @ianlancetaylor for polling internals.)

@twmb

This comment has been minimized.

Copy link
Contributor Author

commented Mar 21, 2018

As an aside, #24481 is more specific to SetBlocking being racy. This issue has an additional race on *FD reference counting between a simultaneous File and Close.

@bcmills

This comment has been minimized.

Copy link
Member

commented Mar 21, 2018

Ah, ok.

@bcmills bcmills changed the title internal/poll: data race in (*FD).SetBlocking net: data race in (*TCPConn).File Mar 21, 2018

@bcmills

This comment has been minimized.

Copy link
Member

commented Mar 21, 2018

Let's use #24481 for SetBlocking and focus this one on the reference counting:

==================
WARNING: DATA RACE
Read at 0x00c420112c90 by goroutine 73:
  net.(*netFD).dup()
      /home/twmb/go/go/src/net/fd_unix.go:307 +0x40
  net.(*conn).File()
      /home/twmb/go/go/src/net/net.go:294 +0x4d
  main.main.func2()
      /home/twmb/testing/junk.go:29 +0x3a

Previous write at 0x00c420112c90 by goroutine 75:
  internal/poll.(*FD).destroy()
      /home/twmb/go/go/src/internal/poll/fd_unix.go:72 +0xa4
  internal/poll.(*FD).decref()
      /home/twmb/go/go/src/internal/poll/fd_mutex.go:211 +0x55
  internal/poll.(*FD).Close()
      /home/twmb/go/go/src/internal/poll/fd_unix.go:93 +0x6b
  net.(*netFD).Close()
      /home/twmb/go/go/src/net/fd_unix.go:184 +0x60
  net.(*conn).Close()
      /home/twmb/go/go/src/net/net.go:200 +0x60
  net.(*TCPConn).Close()
      <autogenerated>:1 +0x43
  main.main.func3()
      /home/twmb/testing/junk.go:35 +0x3e

Goroutine 73 (running) created at:
  main.main()
      /home/twmb/testing/junk.go:28 +0x13f

Goroutine 75 (finished) created at:
  main.main()
      /home/twmb/testing/junk.go:34 +0x16b
==================
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 22, 2018

This is apparently a race between the File method and the Close method. I'm likely missing something but I don't see why a real program would have such a race. This is not the same as a race between Read and Close, where Read blocks for some period of time and it might be reasonable for the program to call Close while some other goroutine is blocked in Read. Neither File nor Close block, so there is no good reason to call them concurrently.

That said this is easy enough to fix if we think it's worth fixing. We just need to copy the dup method from net/fd_unix.go to internal/poll/fd_unix.go, change the net one to call the new one, and have the new one call incref/decref, like Shutdown or SetBlocking.

@twmb

This comment has been minimized.

Copy link
Contributor Author

commented Mar 22, 2018

For realistic context, we have a program and we want to check the tcp queue depth of long lived connections occasionally. We will be updating it to use SyscallConn, but historically, we had to use File to get the file descriptor that we could run sys_ioctl on through syscall. Rather than keeping around some mysterious *os.File, we called .File on the connection whenever we wanted to check the depth. It's lifetime was managed concurrently, though, and this race could happen.

I would be satisfied if documentation warned of the issue -- fixing this in the runtime guts is not strictly necessary.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2018

@ianlancetaylor says that this will be solved by #24481, which is fine. Note that this wouldn't otherwise be worth fixing necessarily: race between File and Close seems worth reporting, since the code is profoundly confused. If we can't report it, oh well.

@gopherbot

This comment has been minimized.

Copy link

commented Jun 20, 2018

Change https://golang.org/cl/119955 mentions this issue: os, net: avoid races between dup, set-blocking-mode, and closing

@gopherbot gopherbot closed this in 00eac89 Jun 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.