Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: no way to protect driver.Connector.Connect(), driver.Conn.Close(), driver.Stmt.Close() from blocking #38185

Closed
georgysavva opened this issue Mar 31, 2020 · 12 comments

Comments

@georgysavva
Copy link

@georgysavva georgysavva commented Mar 31, 2020

I was reading database/sql source code, looking into how it manages context when integrating with third party drivers and how context can be used to timeout operations that might block due to network failures. I think I found a few flaws:

  1. sql.DB calls driver.Connector.Connect(ctx) to establish a new connection via the driver, and it does pass a context into it. One of those calls happens inside db.openNewConnection() function, but this function is a part of the background goroutine db.connectionOpener() and the context is a background context and it takes the beginning there. So the driver won't receive a request bounded context from the user (as it would receive in this call) and the user won't be able to timeout connection establishment via context and the background goroutine db.connectionOpener() might block in case of a network error.
    The driver might have an independent connect_timeout setting that it will use in .Connect() to pass it to net.Dialer.Timeout, but it only works for the original connection dial and driver may need to exchange some system messages with the database and those read operations might block if the network is unreliable. A request bounded context with timeout set by the user would solve it. But regardless of the timeout I think it's a bit weird from the driver perspective, that sometimes it receives a background context from sql and sometimes it receives request bounded context from the user.
    My actionable here is to always pass a request bounded context from the user into driver.Connector.Connect(ctx).
  2. sql.DB calls driver.Conn.Close() to signal the driver to close the connection, this call might include network operation and in case of a failure will block the caller, some places there sql calls .Close():
    here, and here. If the first call blocks, it will block the background goroutine db.connectionOpener(), if the second call blocks, it will block the main user goroutine. Another affected place is .closeDBLocked function: it sequentially closes all connections in the pool on db.Close() and if one of those Conn.Close() blocks, it won't be able to close other connections in the pool.
    I think it would be better to close driver connection asynchronously (in a separate gorutine) with a reasonable, default timeout: e.g. 5 seconds. Or the original idea of the sql library, that driver itself must ensure that Conn.Close() doesn't block?
  3. The last thing is driver.Stmt.Close(): sql doesn't pass context into it and calls it in a lot of places, the most obvious here. Driver Stmt.Close() call most likely will involves network operations that might block and it will hang the main user goroutine.
    I think sql should pass a request bounded context from the user into Stmt.Close(), because Stmt.Close is a user facing operation and user should be able to timeout it.

Please let me know If I missing something.

What version of Go are you using (go version)?

go version go1.14 darwin/amd64


Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/georgysavva/Library/Caches/go-build"
GOENV="/Users/georgysavva/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=" -mod="
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/georgysavva/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="..."
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/gd/lq1x2k352p3d74_5zby2jjbc0000gn/T/go-build959211670=/tmp/go-build -gno-record-gcc-switches -fno-common"

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 31, 2020

@kardianos
Copy link
Contributor

@kardianos kardianos commented Apr 1, 2020

Correct on all counts.

Closing a connection is unlikely to involve a network round-trip. Or rather, I don't know of any that would require that, so I'm less worried about this.

Closing a Stmt will always require a network round trip, unless the entire connection is broken. The hard thing with this is the that the sql package doesn't expose a Stmt object, just a Stmt pool object. Which makes synchronous, context bounded closing harder.

Lastly, yes, the connection opener context improves the situation, but it doesn't address all the cases. I'm not 100% sure we can fix this with the current API, but you could try / make suggestions on how to best address this.


Some of these issues are caused by the original API that will be hard to change unless we issue a v2. For somethings, simply better documentation, stating that drivers should include a sane timeout for certain calls would also be an improvement.

@georgysavva
Copy link
Author

@georgysavva georgysavva commented Apr 1, 2020

If you don't expect Conn.Close() to block due to a network operation, I think it should be better reflected in the driver documentation. Because I found out, that pgx postgres driver does include both write and read network operation in Conn.Close(): pgx close. And also pq postgres driver, include a write operation (that can also block): pq close.
BTW (as a side question, sorry if it's a stupid one) regardless of separate write and read operations, driver will call the underlying net.TCPConn.Close() at least, can this call block? As far as I understand it should send some termination signal to the server?

Yeah, passing context to Stmt.Close() would require a significant change in sql code, because of the pool. So I think at least we could add some notes to the documentation, that the driver should use some default timeout for Stmt.Close()

Regarding the connection opener, I thought we could pass a request bounded context from the user into connector.Connect() the following way:
In .conn() function, instead of putting just a channel, we bound and put together both the result channel and the context into db.connRequests map. And after, in .maybeOpenNewConnection() we retrieve actual reqKeys and corresponding contexts from the db.connRequests map and send those contexts to db.openerCh instead of just an empty struct. And finally, In .connectionOpener() we receive the context and can pass it to the db.connector.Connect(ctx).
But I realized the it's wrong. Imagine the following situation:
Process A requests a connection and it has a really short timeout (0.5 seconds), we start opening a connection for it (with context from the process A) and after 0.1 seconds we get a free connection returned to the pool and give it to the process A, so it's no longer waiting and can use the connection. In the meantime process B requests a connection without timeout at all. But right after that, connection opening that we initiated for the process A (with context from the process A) with timeout 0.5, failed with a timeout error and we sending this error to process B here, because it waits for a result. So in the outcome process B will receive a timeout error in less than 0.5 seconds, without specifying timeout at all.
It appeared to me that this solution doesn't work and to improve the situation sql package could improve the docs and state, that Connector.Connect(ctx) can be called from a background goroutine with a background context that doesn't have a timeout. And the driver is responsible for making this call timeout and don't block with some configurable setting.

@andybons andybons added this to the Unplanned milestone Apr 1, 2020
@georgysavva
Copy link
Author

@georgysavva georgysavva commented Apr 22, 2020

Hi. Any update/resolution on this?

@kardianos
Copy link
Contributor

@kardianos kardianos commented Apr 22, 2020

@georgysavva I could look into updating the docs on the close. I'll also annotate the database/sql/v2 tracking issue.

I'll look into the connection opener. Thanks for the ping.

@georgysavva
Copy link
Author

@georgysavva georgysavva commented Apr 22, 2020

Thanks for the reply. Waiting for you feedback on connection opener.

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 28, 2020

Change https://golang.org/cl/230438 mentions this issue: database/sql: document Connect and Close may need a timeout

@kardianos
Copy link
Contributor

@kardianos kardianos commented Apr 28, 2020

@georgysavva I agree that the background connection opener makes it difficult to provide a relevant context for all situations. Can you review the above CL to see if it addresses your concerns? If you have a gerrit account, please leave feedback on the CL itself, otherwise feedback here is fine too. Thanks.

@georgysavva
Copy link
Author

@georgysavva georgysavva commented Apr 28, 2020

Sorry, I forgot about the third thing. Stmt.Close() shouldn't we update the docs for it as well?

@georgysavva georgysavva reopened this Apr 28, 2020
@kardianos
Copy link
Contributor

@kardianos kardianos commented Apr 28, 2020

Goood point, I'll update in just a bit.

@kardianos
Copy link
Contributor

@kardianos kardianos commented Apr 28, 2020

@georgysavva I've updated and included Stmt.Close as well. (also, when merged, the CL will close this issue)

@georgysavva
Copy link
Author

@georgysavva georgysavva commented Apr 28, 2020

Thanks!

@gopherbot gopherbot closed this in ca854f3 Apr 28, 2020
xujianhai666 added a commit to xujianhai666/go-1 that referenced this issue May 21, 2020
Opening a connection with Connect should still create a derived
context with a timeout because some clients will not use a timeout
and the connection pool may open a connection asynchronously.

Likewise, if a connection close makes a network operation it should
provide some type of sane timeout for the operation.

Fixes golang#38185

Change-Id: I9b7ce2996c81c486170dcc84b12672a99610fa27
Reviewed-on: https://go-review.googlesource.com/c/go/+/230438
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.