Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: segfault in connection pool when the driver frequently fails to connect #24445

Closed
pjm0616 opened this issue Mar 19, 2018 · 10 comments
Closed
Assignees
Milestone

Comments

@pjm0616
Copy link

@pjm0616 pjm0616 commented Mar 19, 2018

What version of Go are you using (go version)?

go version go1.10 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/home/pjm0616/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/pjm0616/go"
GORACE=""
GOROOT="/home/pjm0616/local/go"
GOTMPDIR=""
GOTOOLDIR="/home/pjm0616/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build045015591=/tmp/go-build -gno-record-gcc-switches"

What did you do?

All line numbers are from go1.10 release (https://github.com/golang/go/blob/bf86aec25972f3a100c3aa58a6abcbcc35bdea49/src/database/sql/sql.go)

1062 // Timeout the connection request with the context.
1063 select {
1064 case <-ctx.Done():
1065 	// Remove the connection request and ensure no value has been sent
1066 	// on it after removing.
1067 	db.mu.Lock()
1068 	delete(db.connRequests, reqKey)
1069 	db.mu.Unlock()
1070 	select {
1071 	default:
1072 	case ret, ok := <-req:
1073 		if ok {
1074 			db.putConn(ret.conn, ret.err, false)
1075 		}
1076 	}
1077 	return nil, ctx.Err()

Goroutine 1:
DB.conn() is called. Passes the first context check at line 1021~1026.

Goroutine 2:
Cancel the context that was passed to DB.conn() in Goroutine 1.

Goroutine 1:
Block on db.mu.Lock() on sql.go:1067

db.mu.Lock()

Goroutine 3:
DB.openNewConnection() calls db.putConnDBLocked(nil, err).
The comment for DB.putConnDBLocked() says If err != nil, the value of dc is ignored, so this is valid.

Goroutine 1:
In sql.go:1072 it receives connRequest{conn: nil, err: err} sent by Goroutine 3.
Calls db.putConn(nil, err).
db.putConn accesses dc.inUse where dc is nil, and panics.

The comment for DB.putConn() says err is optionally the last error that occurred on this connection,
so there should be a check before calling db.putConn(), but there aren't.

What did you expect to see?

It should return an error, ret.err or maybe ctx.Err().

What did you see instead?

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x7f8492]

goroutine 304611707 [running]:
database/sql.(*DB).putConn(0xc42265f360, 0x0, 0xd10b60, 0xc422efbd60, 0xc425d19200)
	/home/pjm0616/local/go/src/database/sql/sql.go:1154 +0x42
database/sql.(*DB).conn(0xc42265f360, 0xd18000, 0xc4269259b0, 0x1, 0x0, 0xc425d19370, 0x5b038a)
	/home/pjm0616/local/go/src/database/sql/sql.go:1074 +0x9b6
database/sql.(*DB).begin(0xc42265f360, 0xd18000, 0xc4269259b0, 0xc426a38140, 0x1, 0x0, 0x0, 0x0)
	/home/pjm0616/local/go/src/database/sql/sql.go:1566 +0x4f
database/sql.(*DB).BeginTx(0xc42265f360, 0xd18000, 0xc4269259b0, 0xc426a38140, 0xc425d19498, 0x412169, 0xc42cd9cf30)
	/home/pjm0616/local/go/src/database/sql/sql.go:1548 +0x8a
@ALTree ALTree added this to the Go1.11 milestone Mar 19, 2018
@kardianos kardianos self-assigned this Mar 20, 2018
@kardianos
Copy link
Contributor

@kardianos kardianos commented Mar 20, 2018

Thanks for the report. I'll look into it.

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 26, 2018

Change https://golang.org/cl/102477 mentions this issue: database/sql: check for nil connRequest.conn before use

@kardianos
Copy link
Contributor

@kardianos kardianos commented Mar 26, 2018

I'm having trouble reproducing, but here is what I have confirmed:

  • DB.putConnDBLocked(nil, err) is called, so dc *driverConn could be null when put into the connRequest
  • The above connRequest has a code path to put the dc into DB.putConn
  • DB.putConn does not check if dc is nil and thus the nil ref.

I can confirm so far when I insert a call to putConnDBLocked in conn.

@kardianos
Copy link
Contributor

@kardianos kardianos commented Mar 28, 2018

@andybons the cl needs a +2, but the linked cl would be a good candidate for a 1.10.1. I'm not sure it it a regression from 1.9.

@pjm0616
Copy link
Author

@pjm0616 pjm0616 commented Mar 28, 2018

Thanks for the fix. I've been running my program with the ret.conn != nil patch under the same conditions for 8 days and it's working okay so far.

The panic happened two times on different computers, and it took 3~4 days of continuously failing connection attempts.
I was also unable to separately trigger the bug without editing database/sql/sql.go.

@odeke-em
Copy link
Member

@odeke-em odeke-em commented May 3, 2018

/cc @bradfitz

@andybons
Copy link
Member

@andybons andybons commented May 3, 2018

@gopherbot please backport to 1.10 with CL https://golang.org/cl/102477

@gopherbot
Copy link

@gopherbot gopherbot commented May 3, 2018

Backport issue(s) opened: #25235 (for 1.10).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@andybons
Copy link
Member

@andybons andybons commented May 3, 2018

@kardianos once the CL lands, feel free to cherry-pick the change to release-branch.go1.10 per the instructions above. The release team will then determine whether it makes it in. Thanks!

@gopherbot gopherbot closed this in b98ffdf May 10, 2018
@gopherbot
Copy link

@gopherbot gopherbot commented Nov 1, 2018

Change https://golang.org/cl/146778 mentions this issue: [release-branch.go1.10] database/sql: check for nil connRequest.conn before use

gopherbot pushed a commit that referenced this issue Nov 1, 2018
…before use

The connRequest may return a nil conn value. However in a rare
case that is difficult to test for it was being passed to
DB.putConn without a nil check. This was an error as this
made no sense if the driverConn is nil. This also caused
a panic in putConn.

A test for this would be nice, but didn't find a sane
way to test for this condition.

Updates #24445
Fixes #25235

Change-Id: I827316e856788a5a3ced913f129bb5869b7bcf68
Reviewed-on: https://go-review.googlesource.com/102477
Run-TryBot: Daniel Theophanes <kardianos@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
(cherry picked from commit b98ffdf)
Reviewed-on: https://go-review.googlesource.com/c/146778
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Katie Hockman <katie@golang.org>
@golang golang locked and limited conversation to collaborators Nov 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.