The previous SQL connection pool behavior used a FIFO queue implemented as a slice, but in 4f6d4bb was changed as a side effect to read the first entry in map recurse order (effectively random among requests not yet timed out when execution started).
I believe we can do much better than this -- we should go back to dequeuing the pending requests in an ordered fashion, while continuing to not leak cancelled requests. And we should be able to instrument to see whether people are getting better performance out of LIFO, FIFO, or random.
See also: #22697 talks about greater pool observability and configurability
See also: #18080 talks about observability as well (Honeycomb has implemented our version of passing contexts into wrapped SQL calls, but loses visibility once they reach the SQL layer)
What version of Go are you using (go version)?
$ go version
go version go1.11.6 linux/amd64
Does this issue reproduce with the latest release?
What operating system and processor architecture are you using (go env)?
Saturate our connection pool with a higher rate of incoming requests than it could temporarily handle after our database got slow. Almost no requests succeeded because we picked random requests from the pool to work on, many of which didn't have enough remaining time to complete without expiring after being pulled off the pending map.
What did you expect to see?
If we have requests coming in 10% faster than they can be fulfilled, 90% of them should succeed and only 10% should time out. And we should have a separate waitDuration counter for successful requests that were dequeued and started running, vs. cancelled requests that timed out before even starting to be serviced.
What did you see instead?
~100% of connections timed out because most of the connections in the pool became dead while being serviced, and random selection wasn't trying first the requests most likely to succeed. and we couldn't tell why, because the waitDuration of the successful requests wasn't separated from the failed requests.
The text was updated successfully, but these errors were encountered:
In the bucket of ideas to try, perhaps the pool could be a heap where the keys during insertion are the time left before expiry, and inherently by the definition of a heap, we'll be able to service requests by critically of time to expire.
My PR for the related issues #39471https://golang.org/cl/237337 implements LIFO by swapping fast delete with slow to maintain order and reusing connections from the end not beginning.
@stevenh I think your PR is not related to this issue, this issue is talking about when driver have a free connection, it actually pick a random one form the queue and give the connection to that. which is usually not what you want! The more connection pending, the more likely your oldest requests are starved waiting! Implement FIFO seems to be a good way of doing it , LIFO has even more fairness issues