New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: add mechanism to wait for readability on a TCPConn #15735

Open
bradfitz opened this Issue May 18, 2016 · 40 comments

Comments

Projects
None yet
@bradfitz
Member

bradfitz commented May 18, 2016

EDIT: this proposal has shifted. See #15735 (comment) below.

Old:

The net/http package needs a way to wait for readability on a TCPConn without actually reading from it. (See #15224)

http://golang.org/cl/22031 added such a mechanism, making Read(0 bytes) do a wait for readability, followed by returning (0, nil). But maybe that is strange. Windows already works like that, though. (See new tests in that CL)

Reconsider this for Go 1.8.

Maybe we could add a new method to TCPConn instead, like WaitRead.

@bradfitz bradfitz added this to the Go1.8 milestone May 18, 2016

@bradfitz bradfitz self-assigned this May 18, 2016

@bradfitz

This comment has been minimized.

Member

bradfitz commented May 18, 2016

@gopherbot

This comment has been minimized.

gopherbot commented May 18, 2016

CL https://golang.org/cl/23227 mentions this issue.

gopherbot pushed a commit that referenced this issue May 19, 2016

net: don't return io.EOF from zero byte reads
Updates #15735

Change-Id: I42ab2345443bbaeaf935d683460fc2c941b7679c
Reviewed-on: https://go-review.googlesource.com/23227
Reviewed-by: Ian Lance Taylor <iant@golang.org>

gopherbot pushed a commit that referenced this issue May 19, 2016

net: don't return io.EOF from zero byte reads on Plan 9
Updates #15735.
Fixes #15741.

Change-Id: Ic4ad7e948e8c3ab5feffef89d7a37417f82722a1
Reviewed-on: https://go-review.googlesource.com/23199
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@RalphCorderoy

This comment has been minimized.

RalphCorderoy commented May 20, 2016

read(2) with a count of zero may be used to detect errors. Linux man page confirms, as does POSIX's read(3p) here. Mentioning it in case it influences this subverting of a Read(0 bytes) not calling syscall.Read.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Oct 21, 2016

I found a way to do without this in net/http, so punting to Go 1.9.

@bradfitz bradfitz modified the milestones: Go1.9, Go1.8 Oct 21, 2016

@bradfitz

This comment has been minimized.

Member

bradfitz commented Dec 12, 2016

Actually, the more I think about this, I don't even want my idle HTTP/RPC goroutines to stick around blocked in a read call. In addition to the array memory backed by the slice given to Read, the goroutine itself is ~4KB of wasted memory.

What I'd really like is a way to register a func() to run when my *net.TCPConn is readable (when a Read call wouldn't block). By analogy, I want the time.AfterFunc efficiency of running a func in a goroutine later, rather than running a goroutine just to block in a time.Sleep.

My new proposal is more like:

package net

// OnReadable runs f in a new goroutine when c is readable;
// that is, when a call to c.Read will not block.
func (c *TCPConn) OnReadable(f func()) {
   // ...
}

Yes, maybe this is getting dangerously into event-based programming land.

Or maybe just the name ("OnWhatever") is offensive. Maybe there's something better.

I would use this in http, http2, and grpc.

/cc @ianlancetaylor @rsc

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Dec 12, 2016

Sounds like you are getting close to #15021.

I'm worried that the existence of such a method will encourage people to start writing their code as callbacks rather than as straightforward goroutines.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Dec 12, 2016

Yeah. I'm conflicted. I see the benefits and the opportunity for overuse.

@dvyukov

This comment has been minimized.

Member

dvyukov commented Jan 6, 2017

If we do OnReadable(f func()), won't we need to fork half of standard library for async style? Compress, io, tls, etc readers all assume blocking style and require a blocked goroutine.
I don't see any way to push something asynchronously into e.g. gzip.Reader. Does this mean that I have to choose between no blocked goroutine + my own gzip impl and blocked goroutine + std lib?

@dvyukov

This comment has been minimized.

Member

dvyukov commented Jan 6, 2017

Re 0-sized reads.
It should work with level-triggered notifications, but netpoll uses epoll in edge-triggered mode (and kqueue iirc). I am concerned if cl/22031 works in more complex cases: waiting for already ready IO, double wait, wait without completely draining read buffer first, etc?

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jan 6, 2017

@dvyukov, no, we would only use OnReadable in very high-level places, like the http1 and http2 servers where we know the conn is expected to be idle for long periods of time. The rest of the code underneath would remain in the blocking style.

@dvyukov

This comment has been minimized.

Member

dvyukov commented Jan 6, 2017

This looks like a half-measure. An http connection can halt in the middle of request...

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jan 6, 2017

@dvyukov, but not commonly. This would be an optimization for the common case.

@dvyukov

This comment has been minimized.

Member

dvyukov commented Jan 7, 2017

An alternative interface can be to register a channel that will receive readiness notifications. The other camp wants this for packet-processing servers, and there starting a goroutine for every packet will be too expensive. However, if at the end you want a goroutine, then the channel will introduce unnecessary overhead.
Channel has a problem with overflow handling (netpoll can't block on send, on the other hand it is not OK to lose notifications).
For completeness, this API should also handle writes.

@DemiMarie

This comment has been minimized.

DemiMarie commented Jan 10, 2017

We need to make sure that this works with Windows IOCP as well.

@rsc

This comment has been minimized.

Contributor

rsc commented Jan 10, 2017

Not obvious to me why the API has to handle writes. The thing about reads is that until the data is ready for reading, you can use the memory for other work. If you're waiting to write data, that memory is not reusable (otherwise you'd lose the data you are waiting to write).

@dvyukov

This comment has been minimized.

Member

dvyukov commented Jan 11, 2017

@rsc If we do just 0-sized reads, then write support is not necessary. However, if we do Brad's "My new proposal is more like": func (c *TCPConn) OnReadable(f func()), then this equally applies to writes as well -- to avoid 2 blocked goroutines per connection.

@noblehng

This comment has been minimized.

noblehng commented Feb 21, 2017

If memory usage is the concern, it is possible to make long parked G use less memory instead of changing programming style? One main selling point of Go to me is high efficiency network servers without resorting to callbacks.

Something like shrink the stack or move the stack to heap by the GC using some heuristics, that will be littile different from spinning up a new goroutine on callback memory usage wise, and scheduling wise a callback is not much different than goready(). Also I assume the liveness change in Go1.8 could help here too.

For the backing array, if it is preallocated buffer, than a callback doesn't make much different than Read(), maybe it will make some different if it is allocated per-callback and use a pool.

Edit:
Actually we could have some GC deadline or gopark time in runtime.pollDesc, so we could get a list of long parked G from the poller, then GC can kick in, but more dance is still needed to avoid race and make it fast.

@noblehng

This comment has been minimized.

noblehng commented Feb 22, 2017

How about a epoll like interface for net.Listener:

type PollableListener interface {
   net.Listener
   // Poll will block till at least one connection been ready for read or write
   // reads and writes are special net.Conn that will not block on EAGAIN
   Poll() (reads []net.Conn, writes []net.Conn)
}

Then the caller of Poll() can has a small number of goroutines to poll for readiness and handle the reads and writes. This should also works well for packet-processing servers.

Note that this only needs to be implemented in the runtime for those Listeners that multiplexed in the kernel, like the net.TCPListener. For other protocol that multiplex in the userspace and doesn't attached to the runtime poller directly, like udp listener or multiplexing streams in a tcp connection, can be implemented outside the runtime. For example, for multiplexing in a tcp connection, we can implemented the epoll like behavior by read from/write to some buffers then poll from them or register callbacks on buffer size changed.

Edit:
To implement this, we can let users of the runtime poller, like socket and os.File, provide a callback function pointer when open the poller for a fd, to notify them the readiness of I/O. The callback should
looks like:

type IOReadyNotify func(mode int32)

And we store this in the runtime.pollDesc, then the runtime.netpollready() function should also call this callback if not nil besides give out the pending goroutine(s).

@aajtodd

This comment has been minimized.

aajtodd commented Feb 27, 2017

I'm fairly new to Go but seeing the callback interface is a little grating given the blocking API exposed everywhere else. Why not expose a public API to the netpoll interfaces?

Go provides no standard public facing event loop (correct me if I'm wrong please). I have need to wait for readability on external FFI socket(s) (given through cgo). It would be nice to re-use the existing netpoll abstraction to also spawn FFI sockets onto rather than having to wrap epoll/IOCP/select. Also I'm guessing wrapping (e.g) epoll from the sys package does not integrate with the scheduler which would also be a bummer.

@mjgarton

This comment has been minimized.

Contributor

mjgarton commented Mar 15, 2017

For a number of my use cases, something like this :

package net

// Readable returns a channel which can be read from whenever a call to c.Read
// would not block.
func (c *TCPConn) Readable() <-chan struct{} {
        // ...
}

.. would be nice because I can select on it. I have no idea whether it's practical to implement this though.

Another alternative (for some of my cases at least) might be somehow enabling reads to be canceled by using a context.

@gobwas

This comment has been minimized.

gobwas commented Apr 19, 2017

Hi! Great idea 🎆

In our go server with >2millions of alive WebSocket connections we were used epoll to bring the same functionality.

epoll.Handle(conn, EPOLLIN | EPOLLONESHOT, func(events uint) {
    // No more calls will be made for conn until we call epoll.Resume().
    ...
    epoll.Resume(conn)
})

That is, it could be useful to make callback call only once after registration.
In our case we've used EPOLLONESHOT to make this work.

@methane

This comment has been minimized.

Contributor

methane commented Nov 30, 2018

I want API for nonblocking Read, or nonblocking Readable test.
Neither blocking wait nor callback don't satisfy my use case.

My use case is DB driver (go-sql-driver/mysql).
Basically, it is request-response protocol. But sometime connection is closed by server or someone in the middle.
If we send query, we can't retry safely because query is not idempotent. So we want to check connection
is closed or not before sending query. Pseudo code is here.

func (mc mysqlConn) Query(query string, args... driver.Value) error {
    if mc.conn.Readable() {
        // Connection is closed from server.
        return mc.handleServerGone()
    }

    mc.sendQuery(query, args...)
    return mc.recvResult()
}

Adding one or more goroutine makes the driver more complicated and slower.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 30, 2018

@methane I'm sorry, I don't understand your example: if Readable returns true, I would not expect the connection to be closed. And, even if you omitted a !, if Readable returns false, that could just mean that the other side hasn't written any data; it doesn't mean that the connection is closed. Determining whether it is possible to read from a socket is not the same as determining whether the socket is closed.

@methane

This comment has been minimized.

Contributor

methane commented Nov 30, 2018

@ianlancetaylor Note that how we get EOF. We need to call conn.Read().

As I said above, what I want is "nonblocking Read, or nonblocking Readable test."

@dvyukov

This comment has been minimized.

Member

dvyukov commented Nov 30, 2018

Won't write on a closed/broken connection return an error? If yes, then you could just write the request always. If write succeeded, request was sent. If write failed, request wasn't send.
This has obvious races, but it's the same as for your code.

@methane

This comment has been minimized.

Contributor

methane commented Nov 30, 2018

@dvyukov No. Even though remote peer closed connection, conn.Write() will success.
It's because TCP allows "half close". (I hadn't tested it on Unix socket yet)

@dvyukov

This comment has been minimized.

Member

dvyukov commented Nov 30, 2018

If the database does not want to receive requests anymore, shouldn't it close the other half? The one that will fail write on the client?

@methane

This comment has been minimized.

Contributor

methane commented Nov 30, 2018

@dvyukov DB closes full connection at socket API level. But it doesn't affect TCP.
Until client closes connection (including half close of write side), Write() will success.

@methane

This comment has been minimized.

Contributor

methane commented Nov 30, 2018

@dvyukov

This comment has been minimized.

Member

dvyukov commented Nov 30, 2018

Yup, write fails as expected:

$ go run /tmp/test.go
0 Read 0 (EOF)
0 Written 6 (<nil>)
1 Read 0 (EOF)
1 Written 0 (write tcp 127.0.0.1:42246->127.0.0.1:39591: write: broken pipe)
2 Read 0 (EOF)
2 Written 0 (write tcp 127.0.0.1:42246->127.0.0.1:39591: write: broken pipe)

I would expect it to be the other way around. Readability test should always check local read buffer first. So if some data is already received, but remote end is closed, readability test will always succeed, but write will fail.

@methane

This comment has been minimized.

Contributor

methane commented Nov 30, 2018

Yup, write fails as expected:

0 Read 0 (EOF)
0 Written 6 (<nil>)

You send 6 bytes, even after you received EOF.

@dvyukov

This comment has been minimized.

Member

dvyukov commented Nov 30, 2018

Why does kernel do this? Is it a kernel bug? The connection is already in CLOSE_WAIT state. This makes read fail, but not write...

@methane

This comment has been minimized.

Contributor

methane commented Nov 30, 2018

Because there are no difference between half close (CloseWrite) and full close on TCP layer.
Both send just one FIN packet.
So client don't know remote peer will receive data or not.

When client send data packet, server will return RST packet.
That's why second send returns EPIPE. Here is tcpdump of the sample program.

tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
19:45:09.516488 IP (tos 0x0, ttl 64, id 21231, offset 0, flags [DF], proto TCP (6), length 60)
    localhost.33704 > localhost.11451: Flags [SEW], cksum 0xfe30 (incorrect -> 0x751c), seq 30788114, win 43690, options [mss 65495,sackOK,TS val 4103978601 ecr 0,nop,wscale 7], length 0
19:45:09.516497 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    localhost.11451 > localhost.33704: Flags [S.E], cksum 0xfe30 (incorrect -> 0x0a34), seq 2698581878, ack 30788115, win 43690, options [mss 65495,sackOK,TS val 4103978601 ecr 4103978601,nop,wscale 7], length 0
19:45:09.516505 IP (tos 0x0, ttl 64, id 21232, offset 0, flags [DF], proto TCP (6), length 52)
    localhost.33704 > localhost.11451: Flags [.], cksum 0xfe28 (incorrect -> 0xdcb8), ack 1, win 342, options [nop,nop,TS val 4103978601 ecr 4103978601], length 0
19:45:09.516529 IP (tos 0x0, ttl 64, id 36746, offset 0, flags [DF], proto TCP (6), length 52)
    localhost.11451 > localhost.33704: Flags [F.], cksum 0xfe28 (incorrect -> 0xdcb7), seq 1, ack 1, win 342, options [nop,nop,TS val 4103978601 ecr 4103978601], length 0
19:45:09.616632 IP (tos 0x2,ECT(0), ttl 64, id 21234, offset 0, flags [DF], proto TCP (6), length 58)
    localhost.33704 > localhost.11451: Flags [P.], cksum 0xfe2e (incorrect -> 0x9869), seq 1:7, ack 2, win 342, options [nop,nop,TS val 4103978701 ecr 4103978601], length 6
19:45:09.616641 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    localhost.11451 > localhost.33704: Flags [R], cksum 0x4929 (correct), seq 2698581880, win 0, length 0
@dvyukov

This comment has been minimized.

Member

dvyukov commented Nov 30, 2018

Interesting. Thanks.

But readability is also not what you want, right? Readability will say OK is connection is half closed the other way, or fully closed but have local read data queued.

@methane

This comment has been minimized.

Contributor

methane commented Nov 30, 2018

"Readability" meant "conn.Read() will return without block".
MySQL protocol is basically "request-response" protocol. Until we send a query, server don't send
anything in most case. But there are exceptions: server shut down, server (or Linux or router) closed idle connection, etc.

In such cases, we don't want send query because query may be not idempotent.
That's why I want non-blocking Read.

@DemiMarie

This comment has been minimized.

DemiMarie commented Dec 2, 2018

@methane

This comment has been minimized.

Contributor

methane commented Dec 2, 2018

@DemiMarie It's off topic.

@dbskccc

This comment has been minimized.

dbskccc commented Dec 8, 2018

wow, it takes so long to make such a decision,
for long connected tcp server which connections above c10k, the memory and cpu was wasted so much by one connection one goroutine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment