Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: DB full resetterCh causes driver.ErrBadConn error #31480

Open
tuanpavn opened this issue Apr 16, 2019 · 13 comments
Open

database/sql: DB full resetterCh causes driver.ErrBadConn error #31480

tuanpavn opened this issue Apr 16, 2019 · 13 comments
Assignees
Labels

Comments

@tuanpavn
Copy link

@tuanpavn tuanpavn commented Apr 16, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env GOHOSTARCH="amd64" GOHOSTOS="linux" GOOS="linux"
$ go env

What did you do?

I am doing stress test for mysql server using golang.
I create a sql.DB and set

db.SetMaxOpenConns(128)
db.SetMaxIdleConns(32) 

Then I create 500 go-routines (500 clients) and send 1000000 queries to mysql server.
After I run the program, it sometimes pops up error "driver: bad connection" (driver.ErrBadConn).

I found that in sql.OpenDB, it creates a *sql.DB struct with:
resetterCh: make(chan *driverConn, 50)

In func (db *DB) putConn(dc *driverConn, err error, resetSession bool), if db.resetterCh is full, it marks connection as bad

select {
default:
	// If the resetterCh is blocking then mark the connection
	// as bad and continue on.
	dc.lastErr = driver.ErrBadConn
	dc.Unlock()
case db.resetterCh <- dc:
}

and if number of connections exceeds max connection, it reuses old connection which is marked as bad and return driver.ErrBadConn.

I can solve it by set max connection less than 50 (which is size of db.resetterCh).
Why did you hardcoded size of db.resetterCh to 50?
Should it be set to max connections?

https://play.golang.org/p/phUILuRV3hJ

What did you expect to see?

What did you see instead?

@kardianos kardianos self-assigned this Apr 16, 2019
@kardianos kardianos added the NeedsFix label Apr 16, 2019
@n-ozerov

This comment has been minimized.

Copy link

@n-ozerov n-ozerov commented Apr 26, 2019

experiencing same issue

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Apr 26, 2019

Change https://golang.org/cl/174122 mentions this issue: database/sql: do not mark conn as bad when resetter is full

@kardianos

This comment has been minimized.

Copy link
Contributor

@kardianos kardianos commented Apr 26, 2019

I can see two solutions to this issue:

  • resize the resetter channel when the pool size is > 50
  • start to process the session reset synchronously. when the fixed size resetter is full.

I'll look into this more. I'm not ready for a final fix (the referenced CL isn't quite right I think). If someone wants to submit a CL in the next couple of days that would be great. We are really close to the 1.13 freeze.

@wekb

This comment has been minimized.

Copy link

@wekb wekb commented Jul 15, 2019

I think you'd need to do both. This is a bit of a black-box issue that sporadically effects production services—is there an ETA here, or would a submitted CL be processed fairly quickly? Thanks.

@kardianos

This comment has been minimized.

Copy link
Contributor

@kardianos kardianos commented Jul 15, 2019

I think it is too late in the cycle to subit the CL, but if you want to, you could carry-pick the change locally and try it out.

@wekb

This comment has been minimized.

Copy link

@wekb wekb commented Jul 15, 2019

Would 1.12.x be game, or does the 1.13 freeze effect 1.12.x updates?

@kardianos

This comment has been minimized.

Copy link
Contributor

@kardianos kardianos commented Jul 15, 2019

@bcmills I'm unsure on release policy. I don't think 1.12.x would be game and I believe 1.13 is too deeply frozen for this change. Can you provide feedback? Ping me on Hangouts or here if you need detail on exact scope of change, CL linked.

@wekb Can you help validate the linked CL?

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Jul 16, 2019

@kardianos if the change is too invasive or subtle for 1.13 at this point, then it's almost certainly not a candidate for a 1.12 backport either, especially given that there is a workaround (namely, setting the connection limit ≤ 50).

@wekb

This comment has been minimized.

Copy link

@wekb wekb commented Jul 16, 2019

This would be a problem for larger services where 50 concurrent connections isn't adequate, however.

The CL builds and tests out on my end under pretty good stress—500 go routines spamming local and remote MySQL instances. I patched 1.12.7 and the two versions of go performed the same.

That being said, I'm not able to reproduce the reported issue outside of prod, and it would be difficult to deploy this in our production without a bit of a gamut of new Docker images, not to mention that we're trying to not have this issue effect us poorly again. :)

@wekb

This comment has been minimized.

Copy link

@wekb wekb commented Jul 17, 2019

I patched 1.12.7, and it builds and tests out on my end,

I put some fairly heavy load on both 1.12.7 and patched 1.12.7, basically the OP's 500-worker use case. I wasn't able to reproduce the issue locally, but both versions performed fine and comparably. The build-in go tests passed as well, obviously.

It would be difficult to deploy this in our production to reproduce the issue without undoing workaround code and gamut of new Docker images, not to mention that we're trying to not have this issue effect us poorly again. :)

All that said, the CL seems worthy, IMO.

@wekb

This comment has been minimized.

Copy link

@wekb wekb commented Sep 4, 2019

Any chance https://golang.org/cl/174122 can be pushed into 13.++ before it's forgotten? :-)

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Sep 11, 2019

@wekb, last I checked CL 174122 was awaiting a bit of rework to avoid blocking indefinitely in ResetSession.

When the fix is ready, I suspect it will not be eligible for backporting: to my knowledge it is not a regression, and there is a workaround today (setting MaxOpenConns ≤ 50).

@wekb

This comment has been minimized.

Copy link

@wekb wekb commented Sep 12, 2019

Thanks. I just wanted to represent that there's ongoing interest and that it makes it into a release in due time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.