Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os, internal/poll: fd.Read with read deadline on fifo before writer has opened returns EOF prematurely #44176

Open
jaqx0r opened this issue Feb 9, 2021 · 9 comments

Comments

@jaqx0r
Copy link
Contributor

@jaqx0r jaqx0r commented Feb 9, 2021

What version of Go are you using (go version)?

$ go version
go version go1.15.6 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/jaq/.cache/go-build"
GOENV="/home/jaq/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/jaq/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/jaq/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.15"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.15/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build328231033=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Create a fifo with os.Mkfifo, then open the reader in nonblocking, RDONLY mode. Start a goroutine to read the pipe, with a SetReadDeadline,. in a loop until EOF is received.

Open the writer end and write a single line, then close the file.

A race condition exists in such code where the read is scheduled before the write, and in just the right conditions, the reader returns 0 bytes read and EOF before the write succeeds.

https://play.golang.org/p/cZ9A_FCur43 has a demonstration that uses wg2 to force the race condition. With the race disabled (the race is still there, just harder to trigger) the writer opens the file before the read goroutine is scheduled and the reader gets all the data.

In the errant case (race enabled), the reader gets scheduled first. The write end of the pipe has not been opened yet, and according to man pipe(7) and read(2) the reader should get an EAGAIN, not EOF. EOF is to signal to the reader that the writer has closed the pipe.

What did you expect to see?

When reading from a nonblocking named pipe on Linux before the writer has opened the pipe, one expects to get an EAGAIN error. https://man7.org/linux/man-pages/man2/read.2.html

I think that the correct behaviour for Go, having handled EAGAIN, is to return n = 0, err = nil from fd.Read when a read deadline is set, which would align with pipe(7) in as much as EOF remains reserved as a signal that the writer has closed the pipe.

Otherwise (as is presently the case) there is no way to correctly shut down a pipe reader and handle this race correctly.

What did you see instead?

The reader gets EOF too early. This looks like a problem with how fd.eofError() in internal/poll/fd_posix.go is implemented, and/or how ZeroReadIsEOF is set for nonblocking reads in os/file_unix.go.

@qingyunha
Copy link
Contributor

@qingyunha qingyunha commented Feb 9, 2021

openat(AT_FDCWD, "/tmp/fifo", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 3
read(3, "", 4096)                        = 0

The above strace log show that read(2) not return error when write end of the pipe has not been opened.

@jaqx0r
Copy link
Contributor Author

@jaqx0r jaqx0r commented Feb 9, 2021

I tried to reproduce your comment by copygin my play.golang.org example above into bug.go, commenting out the demo(false) line in main, then go build bug.go and strace ./bug > /dev/null.

I see:

write(1, "\nRace\n", 6)                 = 6
mknodat(AT_FDCWD, "/tmp/fifo", S_IFIFO|0600) = -1 EEXIST (File exists)
openat(AT_FDCWD, "/tmp/fifo", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 3
epoll_create1(EPOLL_CLOEXEC)            = 4
pipe2([5, 6], O_NONBLOCK|O_CLOEXEC)     = 0
epoll_ctl(4, EPOLL_CTL_ADD, 5, {EPOLLIN, {u32=5884208, u64=5884208}}) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=3547467400, u64=139637229227656}}) = 0
fcntl(3, F_GETFL)                       = 0x8800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
futex(0xc000080148, FUTEX_WAKE_PRIVATE, 1) = 1
read(3, "", 4096)                       = 0
write(1, "n: 0, err: EOF\n", 15)        = 15
write(1, "b: []\n", 6)                  = 6
futex(0xc000042548, FUTEX_WAKE_PRIVATE, 1) = 1
epoll_ctl(4, EPOLL_CTL_DEL, 3, 0xc0000a1dac) = 0
close(3)                                = 0
futex(0x56d9e8, FUTEX_WAIT_PRIVATE, 0, NULL

at the point it hangs on wg2.Wait() as expected. But what I see differently to you is that there's several epoll calls in between the openat and the read. I don't understand what is happening here but I can guess it is because of the SetReadDeadline and the implementation of file IO in Go for Linux being based on epoll. I can make a wild guess that in an epoll-triggered read, there must be data waiting on the fd, rather than my earlier assumption of calling read on a not-ready-yet fd which would trigger EAGAIN.

But the result is the same -- if the read did not error (strace shows it didn't return -1) then the fd is not closed, and thus the higher level implementation should not be returning io.EOF in this case.

@dmitshur dmitshur changed the title fd.Read with read deadline on named pipe before writer has opened returns EOF prematurely os, internal/poll: fd.Read with read deadline on named pipe before writer has opened returns EOF prematurely Feb 9, 2021
@dmitshur dmitshur added this to the Backlog milestone Feb 9, 2021
@dmitshur
Copy link
Contributor

@dmitshur dmitshur commented Feb 9, 2021

@ianlancetaylor ianlancetaylor removed this from the Backlog milestone Feb 9, 2021
@ianlancetaylor ianlancetaylor added this to the Go1.17 milestone Feb 9, 2021
@jaqx0r
Copy link
Contributor Author

@jaqx0r jaqx0r commented Mar 16, 2021

What sort of help do you need? :-) Would you like a change that implements what I suggested, or are you looking for an expert third party to approve of the proposed change first?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 16, 2021

A change that implements the suggestion, along with a test, would be fine. Thanks.

@jaqx0r
Copy link
Contributor Author

@jaqx0r jaqx0r commented Mar 19, 2021

I've drafted a test based on my playground code above, and so far have managed to figure out that this is going to be harder than I thought.

jaqx0r added a commit to google/mtail that referenced this issue Mar 19, 2021
The test was correct, but due to golang/go#44176 we must make sure the writer
end is open first.

Issue #486
@jaqx0r
Copy link
Contributor Author

@jaqx0r jaqx0r commented Mar 31, 2021

I'm stuck again, confusing myself in the depths of internal/poll and now runtime/netpoll.go

I'm reminded that golang/go/20280 is the reason I'm exploring this path. I have a fifo that I want to read from until either EOF or a context is cancelled. In order to notice that the context is cancelled, I use SetReadDeadline on the fifo, hence the strange construct in the playground example.

The earlier statements about expecting EAGAIN on read was incorrect, because Go runtime is using epoll in the runtime, so we should expect to see EPOLLHUP on the reader while the write end is still closed. I think the EPOLLHUP gets eaten in netpoll_epoll.go:164 and the intent of "the writer isn't there" never gets passed up, but I haven't verified yet.

(For the benefit of the tape, the best way to debug this that I've found is after having built a local go fresh from Git (~/src/go/go for me) use the new install's go to install delve (~/src/go/go/bin/go install github...delve@latest), and then cd into ~/src/go/go/src/os and run PATH=~/src/go/go/bin:$PATH GOROOT=~/src/go/go dlv test to run the current test with the correct toolchain and be able to recompile the test as you modify it to explore the code.)

Meanwhile I've noticed #20280 (comment) which I think is a better approach than setting a read deadline every time just to test for cancellation, so while I think there's still a bug here, I think I am no longer dependent on it.

@qingyunha
Copy link
Contributor

@qingyunha qingyunha commented Apr 1, 2021

Could you open fifo in blocking mode, that should avoid the race?

@jaqx0r jaqx0r changed the title os, internal/poll: fd.Read with read deadline on named pipe before writer has opened returns EOF prematurely os, internal/poll: fd.Read with read deadline on fifo before writer has opened returns EOF prematurely Apr 1, 2021
@jaqx0r
Copy link
Contributor Author

@jaqx0r jaqx0r commented Apr 1, 2021

The problem I have is nearly identical to #20280, where one does not want to open in blocking mode because one wants a context cancellation to interrupt the read. The suggestions from that bug report have indeed helped -- the magic is knowing you don't need to set a read deadline on the read before you start the read.

@ianlancetaylor ianlancetaylor removed this from the Go1.17 milestone May 4, 2021
@ianlancetaylor ianlancetaylor added this to the Backlog milestone May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants