Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: 0-byte read from TCP connection always return EOF #10940

Closed
huguesb opened this issue May 23, 2015 · 13 comments

Comments

Projects
None yet
5 participants
@huguesb
Copy link
Contributor

commented May 23, 2015

My understanding is that the recommended way to test whether a TCP connection has been closed by the remote peer is to do a 0-byte read from the socket, which would return EOF if the remote peer sent a FIN.

Unfortunately that doesn't work in go as netFD.Read always return EOF when 0 bytes are read, regardless of the requested read size.

Is that on purpose? Is there another way to detect reception of a FIN? This matters to me because I need to stream data to a write-only raw TCP endpoint that I don't control and the first Write on a closed connection often silently fails, potentially resulting in data loss.

@ianlancetaylor ianlancetaylor changed the title 0-byte read from TCP connection always return EOF net: 0-byte read from TCP connection always return EOF May 23, 2015

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented May 23, 2015

Please write an example program to show what you mean. Also, you neglected to mention which version of Go and what operating system.

TCP connections are always in non-blocking mode, so they will never return a 0 byte read with no error except at EOF.

@huguesb

This comment has been minimized.

Copy link
Contributor Author

commented May 23, 2015

I am using Go 1.4 on unix platforms (Linux and OSX). Below is the function I wrote to detect closed connections before attempting a Write:

func IsClosed(c net.Conn) bool {
    _, err := c.Read([]byte{})
    return err == io.EOF
}

Unfortunately, it always return true. As far as I can tell, all versions of go have a variation of the following:

func (fd *netFD) eofError(n int, err error) error {
    if n == 0 && err == nil && fd.sotype != syscall.SOCK_DGRAM && fd.sotype != syscall.SOCK_RAW {
        return io.EOF
    }
    return err
}

So when syscall.read returns a n=0 bytes read, as it always will when 0 bytes are requested, this will always result in TCPConn.Read returning EOF

Am I missing something?

edit: Interestingly enough, if the TCP connection is wrapped in a TLS connection, I have the reverse problem, i.e. I never get an EOF even after the connection is closed.

@minux

This comment has been minimized.

Copy link
Member

commented May 24, 2015

I don't think the Read with zero length buffer is a valid to determine
if you can write to a TCP connection (not just in Go). [Even though
it doesn't work in Go for another reason explained below.]

TCP connections are fully duplex, and the read and write stream
can be shutdown separately.

What if the remote end has shutdown the write end, so you will
get a EOF when read, but the connection is still valid and you
can still write to it?

For example,
http://play.golang.org/p/MJ62fKYSAD
Even though the client has got an EOF, the connection is still
good for write.

The only way to determine whether you can write to a TCP
connection is to actually write data to it. Data loss should be
handled by higher level protocols.

BTW, according to Go's io.Reader definition, http://golang.org/pkg/io/#Reader
asking a reader with a zero-length buffer doesn't have well defined
the behavior, it could return 0, nil, or 0, EOF, or 0, some other error.

This is working as intended.

@minux minux closed this May 24, 2015

@huguesb

This comment has been minimized.

Copy link
Contributor Author

commented May 24, 2015

Let me backtrack and give a little bit of background:

I have a service with an HTTP endpoint accepting JSON documents describing audit events to be fed to Splunk. Which means I am stuck with TCP or TCP+SSL and do not have the option to punt detection of packet loss or broken connection to a higher level application protocol.

This service is currently written in Java, using Netty and I was hoping I could rewrite it in go to reduce RAM and disk footprint.

My problem arise from the fact that Go does not offer any way for me to check the liveness of a TCP connection and will happily keep writing data into a closed socket.

You are asserting that such a behavior is normal because of shortcomings of TCP, yet Netty has no apparent trouble detecting dropped connections and failed writes, which makes your claim hard to believe. I understand that whatever Netty is doing is not 100% accurate but quite frankly, I'll take 90% any day.

Is low-level networking simply not a thing that go caters to?

@minux

This comment has been minimized.

Copy link
Member

commented May 24, 2015

@minux

This comment has been minimized.

Copy link
Member

commented May 24, 2015

@huguesb

This comment has been minimized.

Copy link
Contributor Author

commented May 24, 2015

"But if that's correctly handled, why bother to check for close before the write?"

Because, as I've tried to explain, the write always succeeds. Or rather, it always silently fails. i.e it returns a nil error and n == len(buf)

I have unit tests where a dummy endpoints closes the connection. I can see the FIN packet in wireshark. The client connection just doesn't seem to care. In some cases if multiple subsequent sends are attempted, a RST may eventually be sent by the server and go will finally realize that the connection is unusable.

The only reason I am looking for a way to manually check connection liveness (and in turn the only reason I asked a question about 0-byte read) is because of that surprising failure to honor the FIN in the first place.

As explained above, Netty, among others, is perfectly capable of detecting the FIN and reacting appropriately. I would expect go to be able to do something similar: at the very least, writes to a socket having received a FIN should fail. Ideally there should be a way to manually test for that condition before attempting the write but I can live without that as long as writes don't silently fail.

TOCTTOU is a race. The issue I am facing is not even close to being a race as evidenced by the fact that writes still silently fail several seconds after the socket is closed.

Is this behavior intentional? If so what is the rationale and is there any way to work around it without forking the net package?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented May 24, 2015

If a write succeeds with a non-zero-sized buffer on a closed connection, that is clearly a bug. Can you show a standalone program that demonstrates this?

@huguesb

This comment has been minimized.

Copy link
Contributor Author

commented May 24, 2015

test code

package main

import (
    "bufio"
    "fmt"
    "net"
    "net/textproto"
    "sync"
    "testing"
    "time"
)

type Endpoint struct {
    l net.Listener
    c net.Conn
    w sync.WaitGroup
    d chan string
}

func NewEndpoint(addr string) *Endpoint {
    e := &Endpoint{}
    l, err := net.Listen("tcp", addr)
    if err != nil {
        panic(err)
    }
    e.l = l
    e.d = make(chan string)
    e.w.Add(1)
    go func() {
        defer e.w.Done()
        for {
            c, err := l.Accept()
            if err != nil {
                fmt.Println("closed listener")
                break
            }
            fmt.Println("accepted", c)
            e.c = c
            r := textproto.NewReader(bufio.NewReader(c))
            for {
                l, err := r.ReadLine()
                if err != nil {
                    fmt.Println("closed socket")
                    c.Close()
                    e.c = nil
                    break
                }
                e.d <- l
            }
        }
    }()
    return e
}

func (e *Endpoint) Stop() {
    fmt.Println("stop endpoint")
    e.l.Close()
    if e.c != nil {
        e.c.Close()
    }
    e.w.Wait()
}

func TestConn_Write_closed(t *testing.T) {
    addr := "127.0.0.1:8888"
    e := NewEndpoint(addr)

    c, err := net.Dial("tcp", addr)
    if err != nil {
        t.Log("failed to connect")
        t.FailNow()
    }
    c.(*net.TCPConn).SetNoDelay(true)

    _, err = c.Write([]byte("hello\n"))
    if err != nil {
        t.Log("write failed")
        t.Fail()
    }
    if "hello" != <-e.d {
        t.Log("corrupted message")
        t.Fail()
    }

    e.Stop()

    // wait to make sure client receives FIN
    time.Sleep(time.Second)

    _, err = c.Write([]byte("world\n"))
    if err == nil {
        t.Log("write to closed socket succeeded")
        t.Fail()
    }
}

output on my machine:

=== RUN TestConn_Write_closed
accepted &{{0xc2080101c0}}
stop endpoint
closed socket
closed listener
--- FAIL: TestConn_Write_closed (1.01s)
    bug_test.go:92: write to closed socket succeeded
FAIL
exit status 1
FAIL    aerofs.com/bug  1.010s

wireshark

@minux

This comment has been minimized.

Copy link
Member

commented May 24, 2015

From the wireshark capture, the client doesn't know the server
has closed the connection, it just knows it has closed its write
end.

For example, the tcpdump for my example
(http://play.golang.org/p/MJ62fKYSAD):

0.397171 IP client > server: Flags [S], seq 564126983, win 43690, options [..], length 0
0.397179 IP server > client: Flags [S.], seq 569645624, ack 564126984, win 43690, options [..], length 0
0.397187 IP client > server: Flags [.], ack 1, win 342, options [..], length 0
0.397227 IP server > client: Flags [P.], seq 1:10, ack 1, win 342, options [..], length 9
0.397231 IP client > server: Flags [.], ack 10, win 342, options [..], length 0
0.397241 IP server > client: Flags [F.], seq 10, ack 1, win 342, options [..], length 0
0.437071 IP client > server: Flags [.], ack 11, win 342, options [..], length 0
1.397365 IP client > server: Flags [P.], seq 1:13, ack 11, win 342, options [..], length 12
1.397371 IP server > client: Flags [.], ack 13, win 342, options [..], length 0
1.397767 IP client > server: Flags [F.], seq 13, ack 11, win 342, options [..], length 0
1.397773 IP server > client: Flags [.], ack 14, win 342, options [..], length 0

(For demonstration purposes, I added an one second sleep to the
client code, so that the ACK to server's FIN is not piggybacked
on the client's next write. I removed all the TCP options and
client/server IP:port in the dump for clarity.)

The first part looks almost exactly the same as your wireshark
capture. At time 1.397365, when the client is about to write, from
the client's standpoint, it really can't know whether the server can
accept the write or not (in your case, it can't, but in my case, it
can).

@huguesb

This comment has been minimized.

Copy link
Contributor Author

commented May 24, 2015

Your example does not correspond to my use case: the server I have to deal with never writes to the client and your code doesn't close the read end of the server.

Even if I change my test such that

  • the server calls CloseWrite immediately after Accept (which, by the way wouldn't be a solution even if it worked because I have no control over the real endpoint)
  • the client calls Read until EOF before writing

the client still allows write after the server calls Close

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented May 24, 2015

I don't see what Go can do here. Try using "go test -c" to build your test into an executable and then run "strace -f EXECUTABLE". You will see that the final write system call succeeds and the kernel reports it as succeeding. Go is just returning that success to the caller. The same thing would happen for a C program.

At the TCP level, you have closed one side of the TCP connection but not the other. The kernel will accept writes until both sides are closed. This is not as simple a matter as you seem to be expecting. I found a decent description of the issue at http://stackoverflow.com/questions/11436013/writing-to-a-closed-local-tcp-socket-not-failing .

You can get the result you want in your test program by adding this line just after the successful call to l.Accept:
c.(*net.TCPConn).SetLinger(0)

@golang golang locked and limited conversation to collaborators Jun 25, 2016

@bradfitz

This comment has been minimized.

Copy link
Member

commented Sep 8, 2016

FWIW, the 0-byte-Read-always-returns-EOF was changed in Go 1.7 with 5bcdd63. But as of Go 1.7 it now always returns nil, which is at least a bit more Go-like, but still doesn't tell you whether FIN was ever seen arriving. There aren't great APIs anywhere (especially not portably) to know that without reading data.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.