-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
database/sql: QueryContext() and Scan() data race #23519
Comments
cc @kardianos |
@felixge The fix that went into 1.9.2 was indeed a partial fix; it fixed the case that was actively causing problems. At the same time we put a complete fix in for 1.10. When testing locally I've been unable to trigger a data race with go1.10. |
@kardianos did you test the patch I provided against the go master branch? I'm not seeing a race without the patch, but once I add the My colleague is also able to reproduce this on his machine and will post his test output here shortly. |
@kardianos I use the same code from @felixge to test it on master, but it seems the same issue happened again. go env
go version
TestIssue18429 output
|
Git sha hash please. Which commit?
…On Wed, Jan 24, 2018, 02:23 robertdxy ***@***.***> wrote:
@kardianos <https://github.com/kardianos> I use the same code from
@felixge <https://github.com/felixge> to test it on master, but it seems
the same issue happened again.
go env
➜ sql git:(master) ✗ go env
GOARCH="amd64"
GOBIN="/Users/robertdeng/App/goroot/go/bin"
GOCACHE="/Users/robertdeng/Library/Caches/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/robertdeng/App/go"
GORACE=""
GOROOT="/Users/robertdeng/App/goroot/go"
GOTMPDIR=""
GOTOOLDIR="/Users/robertdeng/App/goroot/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/7y/chtf56s51v13dwy77t2hb90w0000gn/T/go-build348294305=/tmp/go-build -gno-record-gcc-switches -fno-common"
go version
➜ sql git:(master) ✗ go version
go version devel +157d8cfbc1 Wed Jan 24 07:18:36 2018 +0000 darwin/amd64
TestIssue18429 output
➜ sql git:(master) ✗ while go test -v -race -run 'TestIssue18429'; do :; done
=== RUN TestIssue18429
--- PASS: TestIssue18429 (0.07s)
PASS
ok database/sql 1.083s
=== RUN TestIssue18429
--- PASS: TestIssue18429 (0.07s)
PASS
ok database/sql 1.089s
=== RUN TestIssue18429
==================
WARNING: DATA RACE
Write at 0x00c4201560f0 by goroutine 83:
database/sql.(*rowsCursor).Close()
/Users/robertdeng/App/goroot/go/src/database/sql/fakedb_test.go:1025 +0xda
database/sql.(*Rows).close.func1()
/Users/robertdeng/App/goroot/go/src/database/sql/sql.go:2920 +0x6e
database/sql.withLock()
/Users/robertdeng/App/goroot/go/src/database/sql/sql.go:3032 +0x74
database/sql.(*Rows).close()
/Users/robertdeng/App/goroot/go/src/database/sql/sql.go:2919 +0x1af
database/sql.(*Rows).awaitDone()
/Users/robertdeng/App/goroot/go/src/database/sql/sql.go:2588 +0x138
Previous read at 0x00c4201560f0 by goroutine 25:
runtime.slicebytetostring()
/Users/robertdeng/App/goroot/go/src/runtime/string.go:72 +0x0
database/sql.convertAssign()
/Users/robertdeng/App/goroot/go/src/database/sql/convert.go:239 +0x21f0
database/sql.(*Rows).Scan()
/Users/robertdeng/App/goroot/go/src/database/sql/sql.go:2886 +0x280
database/sql.TestIssue18429.func1()
/Users/robertdeng/App/goroot/go/src/database/sql/sql_test.go:3199 +0x27b
Goroutine 83 (running) created at:
database/sql.(*Rows).initContextClose()
/Users/robertdeng/App/goroot/go/src/database/sql/sql.go:2572 +0xda
database/sql.(*DB).queryDC()
/Users/robertdeng/App/goroot/go/src/database/sql/sql.go:1509 +0x51f
database/sql.(*Tx).QueryContext()
/Users/robertdeng/App/goroot/go/src/database/sql/sql.go:2128 +0x19f
database/sql.TestIssue18429.func1()
/Users/robertdeng/App/goroot/go/src/database/sql/sql_test.go:3193 +0x1fe
Goroutine 25 (running) created at:
database/sql.TestIssue18429()
/Users/robertdeng/App/goroot/go/src/database/sql/sql_test.go:3176 +0x19f
testing.tRunner()
/Users/robertdeng/App/goroot/go/src/testing/testing.go:777 +0x16d
==================
--- FAIL: TestIssue18429 (0.07s)
testing.go:730: race detected during execution of test
FAIL
exit status 1
FAIL database/sql 0.082s
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#23519 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAuFsd09NTggDCCcCbhgHTDkZua7pJ6Yks5tNwSigaJpZM4Rpyns>
.
|
@kardianos the git sha hashes are included in the version numbers we reported. For your convenience:
Basically we both tested against HEAD of master and were able to see the race once the Thank you so much for looking into this! Let me know if we can provide more information. |
I can see the mechanism for this race, even though I have been unable to reproduce on my Linux amd64 boxes. It seems to be caused by the fakedb close corrupting the buffer (on purpose for 2622) while a scan is concurrently taking place. The "scan" method that contains a convert call for the buffer doesn't explicitly call the driver, but it does touch the provided buffer (which is corrupted in fakedb, thus the race where Close is called concurrently with Scan in that test. I can see three ways of resolving this:
I think (3) would provide the most correct solution if we say the row buffer could be corrupted any time we call into the driver. (2) is tempting because it wouldn't add any additional locking (it would increase the lock duration), but still resolve this case. A part of me feels like this is a driver bug that we shouldn't try to protect against. |
@kardianos FWIW we're seeing this race with lib/pq in our application test suite, so it seems real-world drivers are behaving similar to fakedb. It seems like solution 2/3 would fix the problem without requiring the drivers to change (nice), but solution 1 sounds reasonable as well. |
@felixge Can you share the race you are seeing with lib/pq along with the revision of lib/pq? |
@kardianos sure. Here is the info: lib/pq version: 61fe37aa2ee24fabcdbe5c4ac1d4ac566f88f345 (just upgraded to latest master/HEAD before re-running the test)
|
Change https://golang.org/cl/89596 mentions this issue: |
Ah, in lib/pq it will handle a pre-mature close by reading all the remaining rows into the buffer. Interesting choice. If you have a chance, please try out the linked CL; it should resolve your situation (I was unable to make the unit test display the race). |
@kardianos thanks so much, it's getting late here, but I'll try the CI first thing tomorrow morning. |
@kardianos I tested your CL ( My only suggestion would be to include the Let me know if there is anything else you'd like me to check/verify on my end. |
If I had to guess: This is the only way to reuse the connection of a canceled query. |
Change https://golang.org/cl/89835 mentions this issue: |
Change https://golang.org/cl/89936 mentions this issue: |
@kardianos looks like you went with solution 1 in 651ddbd ? I.e. lib/pq will have to be changed in order to fix the race? |
@felixge I'll be opening some issues for various drivers. |
This test case shows a race condition we see in the test suite of our application. We already created an issue with the [Go project](golang/go#23519), but the conclusion was that the [db drivers should fix it](golang/go#23519 (comment)). I was able to reproduce this issue with the following go versions: - `go version go1.9.3 darwin/amd64` (output shown below) - `go version devel +5af1e7d78e Sun Jan 28 19:54:48 2018 +0000 darwin/amd64` (latest master) Please let me know if I can provide more information and/or if you'd like me to attempt to create a patch to fix the issue. ``` $ while go test -v -race -run 'TestContextCancelQueryWhileScan'; do :; done === RUN TestContextCancelQueryWhileScan --- PASS: TestContextCancelQueryWhileScan (0.13s) PASS ok github.com/lib/pq 1.160s === RUN TestContextCancelQueryWhileScan ================== WARNING: DATA RACE Read at 0x00c4201ae5ae by goroutine 23: runtime.slicebytetostring() /usr/local/Cellar/go/1.9.3/libexec/src/runtime/string.go:72 +0x0 database/sql.convertAssign() /usr/local/Cellar/go/1.9.3/libexec/src/database/sql/convert.go:244 +0x2333 database/sql.(*Rows).Scan() /usr/local/Cellar/go/1.9.3/libexec/src/database/sql/sql.go:2727 +0x295 github.com/lib/pq.TestContextCancelQueryWhileScan.func1() /Users/felixge/code/go/src/github.com/lib/pq/go18_test.go:188 +0x38a Previous write at 0x00c4201ae5a8 by goroutine 50: runtime.slicecopy() /usr/local/Cellar/go/1.9.3/libexec/src/runtime/slice.go:160 +0x0 bufio.(*Reader).Read() /usr/local/Cellar/go/1.9.3/libexec/src/bufio/bufio.go:224 +0x240 io.ReadAtLeast() /usr/local/Cellar/go/1.9.3/libexec/src/io/io.go:309 +0x93 io.ReadFull() /usr/local/Cellar/go/1.9.3/libexec/src/io/io.go:327 +0x72 github.com/lib/pq.(*conn).recvMessage() /Users/felixge/code/go/src/github.com/lib/pq/conn.go:971 +0x38a github.com/lib/pq.(*conn).recv1Buf() /Users/felixge/code/go/src/github.com/lib/pq/conn.go:1007 +0x46 github.com/lib/pq.(*rows).Next() /Users/felixge/code/go/src/github.com/lib/pq/conn.go:1399 +0x13b github.com/lib/pq.(*rows).Close() /Users/felixge/code/go/src/github.com/lib/pq/conn.go:1356 +0x8f database/sql.(*Rows).close.func1() /usr/local/Cellar/go/1.9.3/libexec/src/database/sql/sql.go:2761 +0x6e database/sql.withLock() /usr/local/Cellar/go/1.9.3/libexec/src/database/sql/sql.go:2873 +0x76 database/sql.(*Rows).close() /usr/local/Cellar/go/1.9.3/libexec/src/database/sql/sql.go:2760 +0x1ca database/sql.(*Rows).awaitDone() /usr/local/Cellar/go/1.9.3/libexec/src/database/sql/sql.go:2436 +0x14c Goroutine 23 (running) created at: github.com/lib/pq.TestContextCancelQueryWhileScan() /Users/felixge/code/go/src/github.com/lib/pq/go18_test.go:174 +0x163 testing.tRunner() /usr/local/Cellar/go/1.9.3/libexec/src/testing/testing.go:746 +0x16c Goroutine 50 (running) created at: database/sql.(*Rows).initContextClose() /usr/local/Cellar/go/1.9.3/libexec/src/database/sql/sql.go:2420 +0xdd database/sql.(*DB).queryDC() /usr/local/Cellar/go/1.9.3/libexec/src/database/sql/sql.go:1334 +0xb7f database/sql.(*DB).query() /usr/local/Cellar/go/1.9.3/libexec/src/database/sql/sql.go:1304 +0x193 database/sql.(*DB).QueryContext() /usr/local/Cellar/go/1.9.3/libexec/src/database/sql/sql.go:1281 +0xe2 github.com/lib/pq.TestContextCancelQueryWhileScan.func1() /Users/felixge/code/go/src/github.com/lib/pq/go18_test.go:183 +0x22f ================== --- FAIL: TestContextCancelQueryWhileScan (0.12s) testing.go:699: race detected during execution of test FAIL exit status 1 FAIL github.com/lib/pq 0.153s ```
I think Rows.close(err) should add the judgment: if err != nil && err != io.EOF {
rs.lasterr = err
return nil
} or if context.DeadlineExceeded == err {
rs.lasterr = err
return nil
} then code below: func (rs *Rows) close(err error) error {
rs.closemu.Lock()
defer rs.closemu.Unlock()
if err != nil && err != io.EOF {//throws context.DeadlineExceeded when QueryContext's context timeout
rs.lasterr = err
return nil
}
if rs.closed {
return nil
}
rs.closed = true
if rs.lasterr == nil {
rs.lasterr = err
}
withLock(rs.dc, func() {
err = rs.rowsi.Close()
})
if fn := rowsCloseHook(); fn != nil {
fn(rs, &err)
}
if rs.cancel != nil {
rs.cancel()
}
if rs.closeStmt != nil {
rs.closeStmt.Close()
}
rs.releaseConn(err)
return err
} because buffers provided to Rows.Next should not be modified by Rows.Close |
What version of Go are you using (
go version
)?I reproduced this issue with multiple versions:
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?What did you do?
I have a test case involving a
QueryContext
query that is canceled in the middle of a call torows.Scan()
. The best minimal reproduction I was able to come up with was to apply this test patch to the Go core:The race detector doesn't always pick it up, so I was testing this in a loop:
What did you expect to see?
No race condition.
What did you see instead?
I strongly suspect that the work done in 1126d14 / #21117 is still missing some locking.
The text was updated successfully, but these errors were encountered: