document thread starvation mitigation for connection pool #7679

msg555 · 2022-02-08T23:08:42Z

Describe the bug

The QueuePool connection pool does not fairly distribute connections to threads under high contention leading to timeouts. This fundamentally goes back to the Queue implementation in sqlalchemy.util.queue.Queue allowing the currently active Python thread to jump the wait queue and acquire resources from the queue before threads that have been waiting.

The sequence of events looks something like

Thread A acquires lock
Thread A sees connection is available and checks it out
Thread B acquires lock
Thread B sees no connections available, waits
Thread A returns connection, notifies B that it can wake up
Thread A attempts to get a connection again without yielding execution
Thread A acquires lock
Thread A sees connection is available and checks it out, ahead of B
Thread B wakes up, sees no connection is availble, waits again

Below is some example code that demonstrates this problem. Here 100 threads are contending over 10 shared objects. In practice, what ends up happening is that the same 10 threads end up returning and then checking out the same object over and over again. While this configuration is such that there should be contention between threads over shared objects, a fair scheduling would give a thread an object every 100ms or so, certainly not reaching the 10 second time limit here.

This problem isn't really specific to sqlalchemy. Indeed Python's queue.Queue implementation has the same starvation problem.

sqlalchemy==1.4.29
python=3.8.10

To Reproduce

import threading
import time

from sqlalchemy.util.queue import Queue

QUEUE_SIZE = 10
WAIT_TIMEOUT = 10.0
THREAD_COUNT = 100
OPS_PER_THREAD = 10000

q = Queue()
for i in range(QUEUE_SIZE):
    q.put(i)


def work():
    for _ in range(OPS_PER_THREAD):
        # Uncomment this to alleviate the issue, allowing other threads to access queue elements
        # time.sleep(1e-9)
        val = q.get(timeout=WAIT_TIMEOUT)
        time.sleep(0.005)
        q.put(val)


threads = [threading.Thread(target=work) for _ in range(THREAD_COUNT)]
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

Error

In practice we saw exceptions like the below with a QueuePool configuration of pool_size=10, max_overflow=10, pool_timeout=30 with 100 threads.

  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/future/engine.py", line 406, in connect
    return super(Engine, self).connect()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 3204, in connect
    return self._connection_cls(self, close_with_result=close_with_result)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 96, in __init__
    else engine.raw_connection()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 3283, in raw_connection
    return self._wrap_pool_connect(self.pool.connect, _connection)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 3250, in _wrap_pool_connect
    return fn()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 310, in connect
    return _ConnectionFairy._checkout(self)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 868, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 476, in checkout
    rec = pool._do_get()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/impl.py", line 134, in _do_get
    raise exc.TimeoutError(
sqlalchemy.exc.TimeoutError: QueuePool limit of size 10 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)

Versions

OS: ubuntu (docker) and macos
Python: 3.8.10
SQLAlchemy: 1.4.29
Database: N/A
DBAPI (eg: psycopg, cx_oracle, mysqlclient): N/A

Additional context

No response

The text was updated successfully, but these errors were encountered:

zzzeek · 2022-02-09T02:09:08Z

hi , and thanks for doing this research.

I'm trying to get a characterization of your understanding of the issue, between these two phrases: "This fundamentally goes back to the Queue implementation" and "Uncomment this to alleviate the issue". One suggests a foundational design issue and the other suggests a simple mitigation to a problem that seems like it only applies to a specific situation, that of a worker thread which doesnt have any yield (this is a web application?), where the solution is to just make sure a yield is present but otherwise, "everyone waits on a lock to get the thing" is still the pattern that's in use.

I am open to mitigations here of the nature of "adding a sleep here fixes the issue" but I don't know that we have the resources / appetite for a complete rewrite of the pool that's focused on high performance thread contention, Python doesn't even have thread parallelism and is routinely replaced by Go / Rust / Scala / etc. by teams which are measuring things to this degree, so it's very difficult to push for high efficiency in that space in the Python language - people usually don't notice, and in the past when I've published other libraries with various "for speed / efficiency" tricks, it would often turn out that the technique didn't actually work, or had some counter-productive effect within scenario X, but again because it's Python and nobody ever checks, it would just sit out there anyway and nobody really cared or noticed. This pool implementation was even used by Reddit for a time and no such contention issues were ever reported, they long ago just moved onto Go.

In this case, you have a really clear major thing going wrong, that you can actually get some threads to timeout. I'll try to reproduce with an actual connection pool going and if all it needs is a sleep before the get hits the lock, that's easy enough to add.

zzzeek · 2022-02-09T02:25:17Z

wow, just wow. there's nothing subtle about this at all, and a sleep is all it needs.

Here's a test w/ the engine:

from sqlalchemy import create_engine
import threading
import time

QUEUE_SIZE = 10
WAIT_TIMEOUT = 10.0
THREAD_COUNT = 100
OPS_PER_THREAD = 10000

e = create_engine("mysql://scott:tiger@localhost/test", pool_size=10, max_overflow=0)



def work(index):
    for _ in range(OPS_PER_THREAD):
        with e.connect() as conn:
            time.sleep(0.005)
            if _ % 100 == 0:
                print(f"thread {index} up to {_} operations")


threads = [threading.Thread(target=work, args=(_, )) for _ in range(THREAD_COUNT)]
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

when you run this test, you literally see just ten threads cranking through 100, 200, 300, 400 very fast. the other 90 threads, nothing.

apply this patch:

diff --git a/lib/sqlalchemy/pool/impl.py b/lib/sqlalchemy/pool/impl.py
index 7a422cd2ac..5051ab7c1a 100644
--- a/lib/sqlalchemy/pool/impl.py
+++ b/lib/sqlalchemy/pool/impl.py
@@ -12,6 +12,7 @@
 
 import threading
 import traceback
+import time
 import weakref
 
 from .base import _AsyncConnDialect
@@ -119,6 +120,7 @@ class QueuePool(Pool):
     def _do_get(self):
         use_overflow = self._max_overflow > -1
 
+        time.sleep(1e-9)
         try:
             wait = use_overflow and self._overflow >= self._max_overflow
             return self._pool.get(wait, self._timeout)

and you have like a different program. all 100 threads come in and for each of the threads to reach up to OPS_PER_THREAD takes much longer.

the main thing though is that you have this loop where once the connection is returned to the thread, there's no yield. I think a program that is doing some kind of IO, like delivering a web response, would naturally yield, so I'd like to understand the application here that you have where in production you have threads that act this way which are at the same time handling external requests of some kind.

zzzeek · 2022-02-09T02:47:44Z

yeah flushing IO for example also yields, so that would be why the world of web apps doesn't notice this I guess?

def work(index):
    for _ in range(OPS_PER_THREAD):
        with e.connect() as conn:
            time.sleep(0.005)
            if _ % 100 == 0:
                print(f"thread {index} up to {_} operations")
        sys.stdout.write("x")
        sys.stdout.flush()

zzzeek · 2022-02-09T03:02:28Z

now to determine if calling sleep() is feasible. On linux, sleep(0) can be used, which seems ideal, as it adds no overhead to checkouts. any other value makes checkouts 3x slower. Given a timeit script:

from sqlalchemy import create_engine
import timeit

e = create_engine(
    "mysql://scott:tiger@localhost/test", pool_size=10, max_overflow=0
)

def work():
    for _ in range(10000):
        with e.connect() as conn:
            pass

print(timeit.timeit("work()", "from __main__ import work", number=1))

with no sleep:

0.3698992179997731

with sleep(0)

0.35297850600181846

with time.sleep(.00000000001):

0.9955972620009561

im inclined to do sleep(0). maybe someone can test these things on windows.

zzzeek · 2022-02-09T03:03:06Z

windows users claiming sleep(0) doesn't yield: https://stackoverflow.com/a/790246/34549

msg555 · 2022-02-09T03:38:57Z

Yeah, the sleep was there to just demonstrate one possible work around that at least appears to mostly alleviate the issue. I don't mean to recommend it as a patch (although perhaps it's just fine). I think the ideal solution would be for a thread just entering get() to yield to other threads only if there is contention and not give up the GIL otherwise. Honestly though I think a primitive to do this should be in the standard library which is why I'm surprised that queue.Queue has this same issue along with using bare threading.Condition variables, but that's out of scope here I guess.

For whatever it's worth, in my tests on a debian host/ubuntu container sleep(0) did not seem to relieve the contention issue. I'd also wonder if sleep(1e-9) is portable or if that just could underflow to not cause a yield in some platforms.

For background on the situation that brought this up, there are many worker threads performing tasks from a shared queue with each task checking out/returning a connection several times during its execution. I wouldn't be surprised if this frequently happened without any I/O between the connection checkouts. Increasing the max_overflow and/or adding a sleep before the call to connect with a simple wrapper around QueuePool both seem to work as work arounds.

zzzeek · 2022-02-09T03:52:21Z

I think the ideal solution would be for a thread just entering get() to yield to other threads only if there is contention and not give up the GIL otherwise.

That is usually exactly what sleep() is for.

For whatever it's worth, in my tests on a debian host/ubuntu container sleep(0) did not seem to relieve the contention issue. I'd also wonder if sleep(1e-9) is portable or if that just could underflow to not cause a yield in some platforms.

it added a little bit of overhead on speed tests here so I'd rather not just throw it in. I think if someone is writing a multithreaded program they need to make sure to use sleep() to prevent this kind of thing from happening, at least that's how I always used it back when I did a lot of Java (they seem to have thread.yield() now too). I can make this also a configuration option of some kind, since you had to go through the trouble of creating a wrapper...

For background on the situation that brought this up, there are many worker threads performing tasks from a shared queue with each task checking out/returning a connection several times during its execution. I wouldn't be surprised if this frequently happened without any I/O between the connection checkouts. Increasing the max_overflow and/or adding a sleep before the call to connect with a simple wrapper around QueuePool both seem to work as work arounds.

If you have worker threads that aren't doing any IO, per the GIL your program might as well be serialized anyway - but also it would be normal that a program structured like that would be subject to this thread starvation problem in any case if your threads don't ever yield. it's just in this case, it happened to be in a spot where there's a timeout that got in the way.

sqla-tester · 2022-02-09T04:22:51Z

Mike Bayer has proposed a fix for this issue in the main branch:

add pool_sleep_on_get parameter https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/3581

sqla-tester · 2022-02-09T04:24:35Z

Mike Bayer has proposed a fix for this issue in the rel_1_4 branch:

add pool_sleep_on_get parameter https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/3582

CaselIT · 2022-02-09T07:31:36Z

IIRC on Windows the minimum sleep tipe is 16ms, so adding a sleep in all cases is not something that's viable.

Also note that if you have a program that does no other io other than the db, there is no point in creating 100 threads if the db parallelism is 20, since that will limit the parallelism to 20 anyway.

Honestly I'm not sure we need the parameter..

msg555 · 2022-02-09T07:55:40Z

IIRC on Windows the minimum sleep tipe is 16ms, so adding a sleep in all cases is not something that's viable.

Also note that if you have a program that does no other io other than the db, there is no point in creating 100 threads if the db parallelism is 20, since that will limit the parallelism to 20 anyway.

To be clear that is not what is happening in my situation. There are large chunks of I/O happening on each thread but between these I/O calls a connection may be requested/returned many times without any intervening I/O.

Granted, the fact that there is constant contention over the 20 database connections is a sign that our thread level parallelism is too high relative to the number of connections. Slightly oversaturating our connection pool probably shouldn't cause timeout errors, however. I'd worry, even if we tune the parameters better, that a transient change in workload characteristics could bring this issue back up.

Honestly I'm not sure we need the parameter..

CaselIT · 2022-02-09T07:58:32Z

Have you by any change tried different combinations of threads / db connections?

It may be that the 5-1 ratio is just too high here

msg555 · 2022-02-09T09:05:06Z

Have you by any change tried different combinations of threads / db connections?

It may be that the 5-1 ratio is just too high here

Yes, as I've mentioned it is slightly too high of a ratio.

However, the point of this issue is that it is unexpected (to me, at least) that the consequences of midly saturating load on your connection pool should lead to thread starvation/timeouts.

zzzeek · 2022-02-09T14:29:48Z

I think what we might do here, since the starvation condition does cause an error to be raised, is to document this particular problem at the linked error document, which is at https://docs.sqlalchemy.org/en/14/errors.html#error-3o7r . There's already a bulleted list of conditions that cause this error, and we simply add this particular cause as well as the suggested mitigation.

However, the point of this issue is that it is unexpected (to me, at least) that the consequences of midly saturating load on your connection pool should lead to thread starvation/timeouts.

it's really not the connection pool here that is causing the problem, any time you write a multithreaded program in any language, if you create a thread that never yields, issues like these will occur when dealing with any kind of synchronized resource. I searched bugs.python.org for someone reporting this in terms of the built in queue.Queue class but couldn't find any. What we could do is post on the Python mailing list to get opinions if they think this is a bug in queue.Queue (to which I would expect the consensus would be "probably not").

the fact that the example we have here fails very quickly and obviously, combined with nobody has ever reported this issue in the 16 years that we've had this design, leads me to believe this is not generally a problem people have.

zzzeek · 2022-02-09T14:34:52Z

reiterating that we can't really "fix" the problem for everyone because there is no OS-independent, non-time-consuming API feature in Python by which we may instruct a thread to yield. I also can reproduce the issue where sleep(0) is not quite enough compared to sleep(1e-9) on fedora, if you really tighten down the parameters of threads vs connections available vs. how long the thread keeps a connection held out.

@CaselIT the user though has a case where they are just doing a lot of checkouts, and it would be inconvenent for them to place a sleep() above every checkout hence they made a "wrapper" which IMO is too inconvenient as well. might we keep the "sleep()" feature here but obviously have it turned off by default ?

CaselIT · 2022-02-09T16:43:33Z

since the gerrit change is abandoned, I'll copy the reply here:

An alternative to the proposed flag may be the ability of specifying the queue implementation class.
This way an user can subclass our Queue and customize the get however they want.

Also using time.sleep is not compatible with asyncio. Assumed there is the same issue there, we would need to address it there using the asyncio sleep?

msg555 · 2022-02-09T18:06:33Z

I think what we might do here, since the starvation condition does cause an error to be raised, is to document this particular problem at the linked error document, which is at https://docs.sqlalchemy.org/en/14/errors.html#error-3o7r . There's already a bulleted list of conditions that cause this error, and we simply add this particular cause as well as the suggested mitigation.

+1

it's really not the connection pool here that is causing the problem, any time you write a multithreaded program in any language, if you create a thread that never yields, issues like these will occur when dealing with any kind of synchronized resource. I searched bugs.python.org for someone reporting this in terms of the built in queue.Queue class but couldn't find any. What we could do is post on the Python mailing list to get opinions if they think this is a bug in queue.Queue (to which I would expect the consensus would be "probably not").

Yes the lack of discussion about this issue has me a bit concerned I'm missing something. I have found some OS-level discussions that regard fair mutexes as poor for performance due to increased context switching. I'm not sure that applies to shared Queues or how the Python GIL changes that thinking. I am planning on checking Python IRC and/or posting something about queue.Queue/Lock/Condition on a python list to assess what the consensus is.

the fact that the example we have here fails very quickly and obviously, combined with nobody has ever reported this issue in the 16 years that we've had this design, leads me to believe this is not generally a problem people have.

I agree that it is probably a rare case. However if you google the symptom error you'll be told to increase your pool size which is both sound advice and will alleviate your symptoms. Therefore I think it's possible it has gone unrecognized for awhile. Also real life is messier than that demonstration of the issue and may take some amount of unluckyness for a thread to get totally starved. In our case we only saw these errors occasionally despite relatively uniform load.

An alternative to the proposed flag may be the ability of specifying the queue implementation class. This way an user can subclass our Queue and customize the get however they want.

+1, as a curiosity I made a "fair" queue so would be nice if I could plug it in to sqlalchemy.

zzzeek · 2022-04-03T16:45:59Z

correct me if I'm wrong but at the moment we are just adding documentation to the list of items at https://docs.sqlalchemy.org/en/14/errors.html#error-3o7r . the condition here causes an easily identifiable symptom (e.g. isn't some silent performance degradation, this isn't really "performance") and has a straightforward mitigation. So....I think this is just docs.

wbolster · 2024-03-07T13:48:02Z

this seems related and may be interesting:

from python 3.10.8+ and 3.11.1+ onwards, os.sched_yield() releases the GIL; see python/cpython#96078

msg555 added the requires triage New issue that requires categorization label Feb 8, 2022

zzzeek added bug Something isn't working performance where performance can be improved. add "bug" only if it's a performance degradation and removed requires triage New issue that requires categorization labels Feb 9, 2022

zzzeek added this to the 1.x.xx milestone Feb 9, 2022

zzzeek modified the milestones: 1.x.xx, 1.4.x Feb 9, 2022

zzzeek modified the milestones: 1.4.x, 2.0 Feb 9, 2022

zzzeek added documentation and removed bug Something isn't working performance where performance can be improved. add "bug" only if it's a performance degradation labels Apr 3, 2022

zzzeek modified the milestones: 2.0, 1.4.x Apr 3, 2022

zzzeek changed the title ~~QueuePool connection allocation can starve threads~~ document thread starvation mitigation for connection pool Apr 3, 2022

zzzeek added the connection pool label Apr 3, 2022

ethanluoyc mentioned this issue May 30, 2022

New LocalLayout may deadlock block on sample google-deepmind/acme#233

Closed

zzzeek modified the milestones: 1.4.x, 2.0.x Jan 27, 2023

zzzeek added the performance where performance can be improved. add "bug" only if it's a performance degradation label Mar 5, 2024

CaselIT added the PRs (with tests!) welcome a fix or feature which is appropriate to be implemented by volunteers label Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

document thread starvation mitigation for connection pool #7679

document thread starvation mitigation for connection pool #7679

msg555 commented Feb 8, 2022 •

edited

zzzeek commented Feb 9, 2022

zzzeek commented Feb 9, 2022

zzzeek commented Feb 9, 2022

zzzeek commented Feb 9, 2022

zzzeek commented Feb 9, 2022

msg555 commented Feb 9, 2022

zzzeek commented Feb 9, 2022

sqla-tester commented Feb 9, 2022

sqla-tester commented Feb 9, 2022

CaselIT commented Feb 9, 2022

msg555 commented Feb 9, 2022

CaselIT commented Feb 9, 2022

msg555 commented Feb 9, 2022

zzzeek commented Feb 9, 2022

zzzeek commented Feb 9, 2022

CaselIT commented Feb 9, 2022

msg555 commented Feb 9, 2022

zzzeek commented Apr 3, 2022

wbolster commented Mar 7, 2024

document thread starvation mitigation for connection pool #7679

document thread starvation mitigation for connection pool #7679

Comments

msg555 commented Feb 8, 2022 • edited

Describe the bug

To Reproduce

Error

Versions

Additional context

zzzeek commented Feb 9, 2022

zzzeek commented Feb 9, 2022

zzzeek commented Feb 9, 2022

zzzeek commented Feb 9, 2022

zzzeek commented Feb 9, 2022

msg555 commented Feb 9, 2022

zzzeek commented Feb 9, 2022

sqla-tester commented Feb 9, 2022

sqla-tester commented Feb 9, 2022

CaselIT commented Feb 9, 2022

msg555 commented Feb 9, 2022

CaselIT commented Feb 9, 2022

msg555 commented Feb 9, 2022

zzzeek commented Feb 9, 2022

zzzeek commented Feb 9, 2022

CaselIT commented Feb 9, 2022

msg555 commented Feb 9, 2022

zzzeek commented Apr 3, 2022

wbolster commented Mar 7, 2024

msg555 commented Feb 8, 2022 •

edited