Server timeouts broken since #302

Hey everyone!

After more than 2 years being blocked, PR #302 was merged just two weeks ago. It purportedly fixes issues were queries were being wrongly "retried" (i.e. being ran twice by the `database/sql` client even though the original query went through). To accomplish this, it reduces the number of places where the driver returns a `driver.ErrBadConn` error: it only returns this value when we're 100% sure that the database has **not** received the original query, as returning `ErrBadConn`  forces the `database/sql` client to retry the query altogether with a different connection, hence leading to redundant/repeated queries.

In practice, however, this breaks the driver in any situation where the MySQL server can kill connections on the server side. Let's analyse why and try to find a workaround for this issue.

### Reproduction

We're gonna make the driver crash with a pretty simple reproduction recipe.

First, we need a MySQL server configured to timeout idle connections. We're gonna use MySQL 5.7 and simply `set global wait_timeout=3;`. This will make `mysqld` kill any connections that haven't been active (i.e. sending queries) for 3 seconds. This whole example also works with any maintained MySQL release, including 5.6, and all MariaDB releases. The 3 second timeout is a bit aggressive but it's something similar to what we run in our main MySQL cluster here at GitHub.

With this MySQL server in place, we can reproduce the issue with a few lines of Go code, using the `master` branch of `go-sql-driver/mysql`:

```go
db, err := sql.Open("mysql", dsn)
assert.NoError(t, err)
defer db.Close()

assert.NoError(t, db.Ping())

// Wait for 5 seconds. This should be enough to timeout the conn, since `wait_timeout` is 3s
time.Sleep(5 * time.Second)

// Simply attempt to begin a transaction
tx, err := db.Begin()
assert.NoError(t, err)
```

The result of pinging the DB, sleeping for 5 seconds, then attempting to begin a transaction, is a crash:

```
Error:          Received unexpected error: invalid connection
```

It is clear that this is not the right behaviour. The `mysqld` instance is still up throughout the test, so even if the initial connection got killed by the server, calling `db.Begin()` should create a new connection instead of returning an error. **This was the behaviour before #302 was merged**.

So, what's the underlying issue here, and why is `database/sql` not retrying the connection after the changes in #302?

### Deep Dive

The changes in #302 are sensible and appear correct when reviewed. The "most important" modification happens in `packets.go`, in the code to handle writes:

https://github.com/go-sql-driver/mysql/blob/26471af196a17ee75a22e6481b5a5897fb16b081/packets.go#L121-L147

The PR adds line 140, where `if n == 0 && pktLen == len(data)-4` is checked. The rationale is as follows:

1. Any interaction (i.e. `query`, `exec`) from the driver to the MySQL server happens in two phases: First, we write the packet with the query/exec command to the socket connected to the server. If this write succeeds, it means the server _must_ write back an "answer" packet, so we immediately read the response from the socket.

2. If this write doesn't succeed, it means the connection has gone bad (maybe the server killed it?), but we know for sure the server has **not** received our query, so we can return `errBadConnNoWrite`, which will get translated down the driver into `driver.ErrBadConn` and force `database/sql` to retry the query with another connection.

3. However: If the write succeeds but the follow-up read from the socket fails (i.e. we never receive the "answer packet" from MySQL), we cannot return `ErrBadConn`, because the server has received our query and may have executed it even though we received no answer. We cannot let `database/sql` retry this query.

This is perfectly sensible, but the previous reproduction recipe shows it doesn't work in practice. In our code, the following sequence of actions happens:

1. We connect to the MySQL server. `sql.Open`
2. We ping the MySQL server by sending a "ping" packet
3. We receive a "pong" packet response from MySQL -- `db.Ping` returns successfully
4. We sleep for 5 seconds
5. After just 3 seconds, the MySQL server is going to kill our driver's only open connection
6. We wake up from sleep and attempt to begin a transaction, by `exec`ing a `START TRANSACTION`.
7. We write an `exec` packet to the server with `START TRANSACTION`. **The write succeeds**
8. We then read the response packet from MySQL: **This read fails, because MySQL killed our connection** -- but now the driver cannot return `ErrBadConn` to retry this operation, because our initial write succeeded! The server may have executed `START TRANSACTION`! So the driver's only option is to hard crash.

When written like this, the problem becomes really clear. #302 made an assumption about the TCP protocol that doesn't hold in practice: that after MySQL server has killed our connection, trying to write a packet from the driver to the server will return an error. That's not how TCP works: let us prove it by looking at some packet captures. 😄 

This are steps 1 through 8 seen through Wireshark:

![screen shot 2017-09-06 at 16 43 42](https://user-images.githubusercontent.com/42793/30118168-995ba8f0-9322-11e7-866b-f4c2dc522cab.png)

The beginning is straightforward. We open the connection with a TCP handshake, we send a ping packet and we get a reply back (you can actually see two pings + two replies in that capture). We then go to sleep.

...And just 3 seconds later, in frame 14, the MySQL server enacts the timeout we've configured. Our TCP connection gets closed on the server side and we receive a `[FIN, ACK]` pair... To which our Golang client simply responds with an `ACK`! There's no `FIN` response from the driver because we haven't closed _our_ side of the connection, as we're sleeping.

Seven seconds later, at T+10 (frame 18), we wake up from our sleep and attempt to exec a `START TRANSACTION`. And (here's what throws us off), the Linux kernel accepts the write just fine. It returns no error to Golang, because this is how TCP works. TCP is a bidirectional protocol, and _our_ side of the connection is still open, so the write goes through.

Of course, the MySQL server immediately replies -- with a `RST`. "Dude, this connection is over". But it's too late for our driver: the write of the `exec` packet already succeeded, and with this new logic, we assume that the query may have been executed by the server (even though the MySQL server was not even _listening_ on our socket anymore).

And meanwhile, the kernel has received the `RST` packet from the MySQL server. The next time we attempt to read from our connection, we will _definitely_ crash. But with this new logic, a retry won't be attempted! This is bad news, because our original `exec` was not even close to making it to the server -- let alone being executed.

### What are the potential fixes?

There is, in theory, a "right way" to handle this general case of the server closing a connection on us: performing a zero-size read from the socket right before we attempt to write our query packet should let us know about that `FIN` packet we've received: "_MySQL has closed this connection, buddy. Your side is open, so you can still write, and the kernel will swallow it... But it's pointless because MySQL isn't reading anymore and -- even if it were --, the FIN means it wouldn't be able to reply_".

Here's the issue: the "zero read" semantics from Golang are a bit fuzzy right now. [This commit](https://github.com/golang/go/commit/5bcdd639331cd7f8d844fd38a674c4751423f938) from `bradfitz` special-cases this situation so the read never makes it to the kernel; before that, the read would always return `io.EOF`.

[This closed issue in `golang/go`](https://github.com/golang/go/issues/10940) has a very similar analysis to this one (at the TCP level -- not really related to MySQL), with several Golang core devs chiming in.

Ian Lance Taylor comments:

> I don't see what Go can do here. Try using "go test -c" to build your test into an executable and then run "strace -f EXECUTABLE". You will see that the final write system call succeeds and the kernel reports it as succeeding. Go is just returning that success to the caller. The same thing would happen for a C program.

> At the TCP level, you have closed one side of the TCP connection but not the other. The kernel will accept writes until both sides are closed. This is not as simple a matter as you seem to be expecting. I found a decent description of the issue at http://stackoverflow.com/questions/11436013/writing-to-a-closed-local-tcp-socket-not-failing .

The "final write system call" that Ian talks about here is the same as our "trying to write `START TRANSACTION` after our sleep, and indeed, it succeeds at the kernel level.

Brad Fitzpatrick wraps up the issue with a comment on zero-size reads from Golang:

> FWIW, the 0-byte-Read-always-returns-EOF was changed in Go 1.7 with 5bcdd63. But as of Go 1.7 it now always returns nil, which is at least a bit more Go-like, but **still doesn't tell you whether FIN was ever seen arriving**. There aren't great APIs anywhere (especially not portably) to know that without reading data.

Another [still open issue in `golang/go`](https://github.com/golang/go/issues/15735) explains why is there --still-- no way to see if the socket is good for reading (and hence it makes sense writing to it).

### In Conclusion

I hope this in-depth analysis wasn't too boring, and I also hope it serves as definitive proof that the concerns that some people (@xaprb, @julienschmidt) had on the original PR are indeed very real: we tried rolling out the `master` branch of this driver in one of our Golang services and it's unusable with any of our MySQL clusters here at GitHub.

As for a conclusive fix: I honestly have no idea. I would start by reverting the PR; I think the risk of duplicated queries is worrisome, but the driver as it is right now is unusable for large MySQL clusters that run tight on connections and do pruning.

Doing a `Ping` before attempting new queries on connections that could be "stale" seems like a feasible workaround. But how do we mark such connections as stale? What's the timeframe? Should we let the users configure it? What's the overhead going to be like?

I would love to hear from the maintainers on approaches to fix this. I have time and obviously interest on using the `master` branch of this driver in production and I'll gladly write & PR any fix for this issue that you deem acceptable.

Cheers!




	n, err := mc.netConn.Write(data[:4+size])
	if err == nil && n == 4+size {
	mc.sequence++
	if size != maxPacketSize {
	return nil
	}
	pktLen -= size
	data = data[size:]
	continue
	}

	// Handle error
	if err == nil { // n != len(data)
	mc.cleanup()
	errLog.Print(ErrMalformPkt)
	} else {
	if cerr := mc.canceled.Value(); cerr != nil {
	return cerr
	}
	if n == 0 && pktLen == len(data)-4 {
	// only for the first loop iteration when nothing was written yet
	return errBadConnNoWrite
	}
	mc.cleanup()
	errLog.Print(err)
	}
	return ErrInvalidConn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Server timeouts broken since #302 #657

Reproduction

Deep Dive

What are the potential fixes?

In Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Server timeouts broken since #302 #657

Description

Reproduction

Deep Dive

What are the potential fixes?

In Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions