Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: idle connections are not cleaned up properly #39471

Open
stevenh opened this issue Jun 9, 2020 · 7 comments
Open

database/sql: idle connections are not cleaned up properly #39471

stevenh opened this issue Jun 9, 2020 · 7 comments
Milestone

Comments

@stevenh
Copy link
Contributor

@stevenh stevenh commented Jun 9, 2020

What version of Go are you using (go version)?

go version devel +e64675a79f Mon Jun 8 22:06:39 2020 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE=""
GOARCH="amd64"
GOBIN="/data/go/bin"
GOCACHE="/home/steveh/.cache/go-build"
GOENV="/home/steveh/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/data/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/data/go:/home/steveh/code"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/steveh/code/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/steveh/code/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build544771756=/tmp/go-build -gno-record-gcc-switches"

What did you do?

  1. Use a larger number of DB connections with SetConnMaxIdleTime configured.
  2. Reduce the connection number of down to a lower number.
  3. Monitor to check if idle connections were reduced to the lower number after ConnMaxIdleTime

What did you expect to see?

DB connections to drop to ConnMaxIdleTime

What did you see instead?

Very little change in used connections even though large number of connections were showing as idle.

Investigation

After investigating the issue I identified the problem is that connects are not reused and maintained in idle time order in db.freeConn slice. This causes connections to be randomly reused and hence prevents the ConnMaxIdleTime from identifying the unneeded connections.

@gopherbot
Copy link

@gopherbot gopherbot commented Jun 9, 2020

Change https://golang.org/cl/237337 mentions this issue: database/sql: Fix idle connection reuse

@stevenh
Copy link
Contributor Author

@stevenh stevenh commented Jun 9, 2020

@kardianos as author of SetConnMaxIdleTime I thought you might be interested in this.

@kardianos
Copy link
Contributor

@kardianos kardianos commented Jun 10, 2020

@stevenh Thank you for taking time to test this feature. There is quite a bit of change going on here. I'm open to this or something like this but it will take me a little bit to go through this change.

We are late in the Go release cycle: https://github.com/golang/go/wiki/Go-Release-Cycle (currently the tree is frozen) so I think the soonest it could get merged would be mid-July after the Go1.15 release.

I just want to confirm the issue you are trying to address. While the max idle time does work correctly today (technically), because some amount of connections get used (picked at random), then few connections actually idle out because one of them is always picked prior to idle timeout. Your solution was to pick connection from the free list starting with the most recent return time: select 1 conn from free_list order by return_at desc limit 1;. This will FIFO queue I think.

There is another issue #31708 which would like to make the free list a LIFO queue.

I'll need to inspect these two issues at the same time. Again, we can work on this with the aim of getting this ready to be merged at the top of the window when it opens in July. Just don't expect it to go into the Go1.15 release.

@stevenh
Copy link
Contributor Author

@stevenh stevenh commented Jun 10, 2020

What triggered my investigation was we saw a spike in load and connections never returned to normal levels even with our 2m ConnMaxIdleTime, as it turned the connection reuse was resetting returnedAt for all connections hence not allowing any connections to be closed down.

We ended up having to restart the app to bring connection limits and hence the DB memory usage back under control.

Given this I think its import to get this issue fixed before 1.15

@stevenh
Copy link
Contributor Author

@stevenh stevenh commented Jun 10, 2020

Here's a graph of what we saw with ConnMaxIdleTime = 2m, the vertical blue line is the spike in traffic which occurred for just a few seconds. As you can see idle connections didn't clear back down.
image

@stevenh
Copy link
Contributor Author

@stevenh stevenh commented Aug 12, 2020

Now go 1.15 is released I wanted to pick this back up as the current implementation which people will start using is impacted by this.

@kardianos what will it take to get https://golang.org/cl/237337 over the line and into the next 1.15 patch release?

@tzeejay
Copy link

@tzeejay tzeejay commented Sep 28, 2020

Wanted to quickly leave a note for you that I think we hit this by accident. I had written quite inefficient code that is run very frequently. Every once in a while we'd hit a spike and used the max connections limit basically instantly making every other query fail.

Inefficient code is cleaned up and taken care of but the randomness and the instant spikes look similar to the reported issue here.

For reference:

$ go version
go version go1.15.2 darwin/amd64

I am cross compiling the project to Linux/x86-64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.