Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: a potential circular wait between db.openerCh and db.mu.Lock() #37178

Closed
lzhfromustc opened this issue Feb 11, 2020 · 2 comments
Closed

Comments

@lzhfromustc
Copy link
Contributor

@lzhfromustc lzhfromustc commented Feb 11, 2020

What version of Go are you using (go version)?

I believe this issue exists since go1.10 to the latest master branch

$ go version
go version go1.12.9 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

I believe this issue is not relevant to OS or processor.

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/me/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/me/warehouse/myfork/go_proj_Gerrit"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/3j/2hnb78gj2clc98lqrhd1s7_r0000gn/T/go-build966675892=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you see?

In /src/database/sql/sql.go, db.openerCh is used in the following way:

// This is the size of the connectionOpener request chan (DB.openerCh).
// This value should be larger than the maximum typical value
// used for db.maxOpen. If maxOpen is significantly larger than
// connectionRequestQueueSize then it is possible for ALL calls into the *DB
// to block until the connectionOpener can satisfy the backlog of requests.
var connectionRequestQueueSize = 1000000
db.openerCh := make(chan struct{}, connectionRequestQueueSize)

// Goroutine 1
db.mu.Lock()
for ... {
    db.openerCh <- struct{}{} // Block
}()
db.mu.Lock()

// Goroutine 2, a.k.a. connectionOpener()
for {
    select {
    case <-ctx.Done():
        return
    case <- db.openerCh:
        db.mu.Lock() // Block
        ...
        db.mu.Unlock()
        ...
        continue
}

When the buffer is full, db.openerCh <- struct{}{} can be blocked. In the annotation above, it is believed that this send will be unblocked when "the connectionOpener can satisfy the backlog of requests". However, connectionOpener (a.k.a. Goroutine 2) can also be blocked trying to acquire db.mu.Lock(), which is held by Goroutine 1. So a circular wait presents.

I suppose this problem is not noticed because the buffer size is big and hard to be full. However, as the annotation says, send can still be blocking. I am not a security guy, but maybe DOS attack can force this happen? I found an issue mentioning related code, but it seems that they are not talking about circular wait: #10886

To fix this problem, I have written a patch, which can make these goroutines unblockable no matter what happens, but it is quite ugly... Do you think this is a problem? If so, shall I open a PR?

@josharian
Copy link
Contributor

@josharian josharian commented Feb 12, 2020

@kardianos
Copy link
Contributor

@kardianos kardianos commented Feb 12, 2020

I don't think this is a problem. A connection pool with a million connections, or even a million active goroutines seems unlikely. If someone actually hits this in practice, or is even able to hit this issue in a contrived test case with a real database, let me know.

@kardianos kardianos closed this Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.