Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document thread starvation mitigation for connection pool #7679

Open
msg555 opened this issue Feb 8, 2022 · 19 comments
Open

document thread starvation mitigation for connection pool #7679

msg555 opened this issue Feb 8, 2022 · 19 comments
Labels
connection pool documentation performance where performance can be improved. add "bug" only if it's a performance degradation PRs (with tests!) welcome a fix or feature which is appropriate to be implemented by volunteers
Milestone

Comments

@msg555
Copy link

msg555 commented Feb 8, 2022

Describe the bug

The QueuePool connection pool does not fairly distribute connections to threads under high contention leading to timeouts. This fundamentally goes back to the Queue implementation in sqlalchemy.util.queue.Queue allowing the currently active Python thread to jump the wait queue and acquire resources from the queue before threads that have been waiting.

The sequence of events looks something like

  1. Thread A acquires lock
  2. Thread A sees connection is available and checks it out
  3. Thread B acquires lock
  4. Thread B sees no connections available, waits
  5. Thread A returns connection, notifies B that it can wake up
  6. Thread A attempts to get a connection again without yielding execution
  7. Thread A acquires lock
  8. Thread A sees connection is available and checks it out, ahead of B
  9. Thread B wakes up, sees no connection is availble, waits again

Below is some example code that demonstrates this problem. Here 100 threads are contending over 10 shared objects. In practice, what ends up happening is that the same 10 threads end up returning and then checking out the same object over and over again. While this configuration is such that there should be contention between threads over shared objects, a fair scheduling would give a thread an object every 100ms or so, certainly not reaching the 10 second time limit here.

This problem isn't really specific to sqlalchemy. Indeed Python's queue.Queue implementation has the same starvation problem.

sqlalchemy==1.4.29
python=3.8.10

To Reproduce

import threading
import time

from sqlalchemy.util.queue import Queue

QUEUE_SIZE = 10
WAIT_TIMEOUT = 10.0
THREAD_COUNT = 100
OPS_PER_THREAD = 10000

q = Queue()
for i in range(QUEUE_SIZE):
    q.put(i)


def work():
    for _ in range(OPS_PER_THREAD):
        # Uncomment this to alleviate the issue, allowing other threads to access queue elements
        # time.sleep(1e-9)
        val = q.get(timeout=WAIT_TIMEOUT)
        time.sleep(0.005)
        q.put(val)


threads = [threading.Thread(target=work) for _ in range(THREAD_COUNT)]
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

Error

In practice we saw exceptions like the below with a QueuePool configuration of pool_size=10, max_overflow=10, pool_timeout=30 with 100 threads.

  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/future/engine.py", line 406, in connect
    return super(Engine, self).connect()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 3204, in connect
    return self._connection_cls(self, close_with_result=close_with_result)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 96, in __init__
    else engine.raw_connection()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 3283, in raw_connection
    return self._wrap_pool_connect(self.pool.connect, _connection)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 3250, in _wrap_pool_connect
    return fn()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 310, in connect
    return _ConnectionFairy._checkout(self)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 868, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 476, in checkout
    rec = pool._do_get()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/impl.py", line 134, in _do_get
    raise exc.TimeoutError(
sqlalchemy.exc.TimeoutError: QueuePool limit of size 10 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r) 

Versions

  • OS: ubuntu (docker) and macos
  • Python: 3.8.10
  • SQLAlchemy: 1.4.29
  • Database: N/A
  • DBAPI (eg: psycopg, cx_oracle, mysqlclient): N/A

Additional context

No response

@msg555 msg555 added the requires triage New issue that requires categorization label Feb 8, 2022
@zzzeek zzzeek added bug Something isn't working performance where performance can be improved. add "bug" only if it's a performance degradation and removed requires triage New issue that requires categorization labels Feb 9, 2022
@zzzeek zzzeek added this to the 1.x.xx milestone Feb 9, 2022
@zzzeek
Copy link
Member

zzzeek commented Feb 9, 2022

hi , and thanks for doing this research.

I'm trying to get a characterization of your understanding of the issue, between these two phrases: "This fundamentally goes back to the Queue implementation" and "Uncomment this to alleviate the issue". One suggests a foundational design issue and the other suggests a simple mitigation to a problem that seems like it only applies to a specific situation, that of a worker thread which doesnt have any yield (this is a web application?), where the solution is to just make sure a yield is present but otherwise, "everyone waits on a lock to get the thing" is still the pattern that's in use.

I am open to mitigations here of the nature of "adding a sleep here fixes the issue" but I don't know that we have the resources / appetite for a complete rewrite of the pool that's focused on high performance thread contention, Python doesn't even have thread parallelism and is routinely replaced by Go / Rust / Scala / etc. by teams which are measuring things to this degree, so it's very difficult to push for high efficiency in that space in the Python language - people usually don't notice, and in the past when I've published other libraries with various "for speed / efficiency" tricks, it would often turn out that the technique didn't actually work, or had some counter-productive effect within scenario X, but again because it's Python and nobody ever checks, it would just sit out there anyway and nobody really cared or noticed. This pool implementation was even used by Reddit for a time and no such contention issues were ever reported, they long ago just moved onto Go.

In this case, you have a really clear major thing going wrong, that you can actually get some threads to timeout. I'll try to reproduce with an actual connection pool going and if all it needs is a sleep before the get hits the lock, that's easy enough to add.

@zzzeek zzzeek modified the milestones: 1.x.xx, 1.4.x Feb 9, 2022
@zzzeek
Copy link
Member

zzzeek commented Feb 9, 2022

wow, just wow. there's nothing subtle about this at all, and a sleep is all it needs.

Here's a test w/ the engine:

from sqlalchemy import create_engine
import threading
import time

QUEUE_SIZE = 10
WAIT_TIMEOUT = 10.0
THREAD_COUNT = 100
OPS_PER_THREAD = 10000

e = create_engine("mysql://scott:tiger@localhost/test", pool_size=10, max_overflow=0)



def work(index):
    for _ in range(OPS_PER_THREAD):
        with e.connect() as conn:
            time.sleep(0.005)
            if _ % 100 == 0:
                print(f"thread {index} up to {_} operations")


threads = [threading.Thread(target=work, args=(_, )) for _ in range(THREAD_COUNT)]
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

when you run this test, you literally see just ten threads cranking through 100, 200, 300, 400 very fast. the other 90 threads, nothing.

apply this patch:

diff --git a/lib/sqlalchemy/pool/impl.py b/lib/sqlalchemy/pool/impl.py
index 7a422cd2ac..5051ab7c1a 100644
--- a/lib/sqlalchemy/pool/impl.py
+++ b/lib/sqlalchemy/pool/impl.py
@@ -12,6 +12,7 @@
 
 import threading
 import traceback
+import time
 import weakref
 
 from .base import _AsyncConnDialect
@@ -119,6 +120,7 @@ class QueuePool(Pool):
     def _do_get(self):
         use_overflow = self._max_overflow > -1
 
+        time.sleep(1e-9)
         try:
             wait = use_overflow and self._overflow >= self._max_overflow
             return self._pool.get(wait, self._timeout)

and you have like a different program. all 100 threads come in and for each of the threads to reach up to OPS_PER_THREAD takes much longer.

the main thing though is that you have this loop where once the connection is returned to the thread, there's no yield. I think a program that is doing some kind of IO, like delivering a web response, would naturally yield, so I'd like to understand the application here that you have where in production you have threads that act this way which are at the same time handling external requests of some kind.

@zzzeek
Copy link
Member

zzzeek commented Feb 9, 2022

yeah flushing IO for example also yields, so that would be why the world of web apps doesn't notice this I guess?

def work(index):
    for _ in range(OPS_PER_THREAD):
        with e.connect() as conn:
            time.sleep(0.005)
            if _ % 100 == 0:
                print(f"thread {index} up to {_} operations")
        sys.stdout.write("x")
        sys.stdout.flush()

@zzzeek
Copy link
Member

zzzeek commented Feb 9, 2022

now to determine if calling sleep() is feasible. On linux, sleep(0) can be used, which seems ideal, as it adds no overhead to checkouts. any other value makes checkouts 3x slower. Given a timeit script:

from sqlalchemy import create_engine
import timeit

e = create_engine(
    "mysql://scott:tiger@localhost/test", pool_size=10, max_overflow=0
)

def work():
    for _ in range(10000):
        with e.connect() as conn:
            pass

print(timeit.timeit("work()", "from __main__ import work", number=1))

with no sleep:

0.3698992179997731

with sleep(0)

0.35297850600181846

with time.sleep(.00000000001):

0.9955972620009561

im inclined to do sleep(0). maybe someone can test these things on windows.

@zzzeek
Copy link
Member

zzzeek commented Feb 9, 2022

windows users claiming sleep(0) doesn't yield: https://stackoverflow.com/a/790246/34549

@msg555
Copy link
Author

msg555 commented Feb 9, 2022

Yeah, the sleep was there to just demonstrate one possible work around that at least appears to mostly alleviate the issue. I don't mean to recommend it as a patch (although perhaps it's just fine). I think the ideal solution would be for a thread just entering get() to yield to other threads only if there is contention and not give up the GIL otherwise. Honestly though I think a primitive to do this should be in the standard library which is why I'm surprised that queue.Queue has this same issue along with using bare threading.Condition variables, but that's out of scope here I guess.

For whatever it's worth, in my tests on a debian host/ubuntu container sleep(0) did not seem to relieve the contention issue. I'd also wonder if sleep(1e-9) is portable or if that just could underflow to not cause a yield in some platforms.

For background on the situation that brought this up, there are many worker threads performing tasks from a shared queue with each task checking out/returning a connection several times during its execution. I wouldn't be surprised if this frequently happened without any I/O between the connection checkouts. Increasing the max_overflow and/or adding a sleep before the call to connect with a simple wrapper around QueuePool both seem to work as work arounds.

@zzzeek
Copy link
Member

zzzeek commented Feb 9, 2022

I think the ideal solution would be for a thread just entering get() to yield to other threads only if there is contention and not give up the GIL otherwise.

That is usually exactly what sleep() is for.

For whatever it's worth, in my tests on a debian host/ubuntu container sleep(0) did not seem to relieve the contention issue. I'd also wonder if sleep(1e-9) is portable or if that just could underflow to not cause a yield in some platforms.

it added a little bit of overhead on speed tests here so I'd rather not just throw it in. I think if someone is writing a multithreaded program they need to make sure to use sleep() to prevent this kind of thing from happening, at least that's how I always used it back when I did a lot of Java (they seem to have thread.yield() now too). I can make this also a configuration option of some kind, since you had to go through the trouble of creating a wrapper...

For background on the situation that brought this up, there are many worker threads performing tasks from a shared queue with each task checking out/returning a connection several times during its execution. I wouldn't be surprised if this frequently happened without any I/O between the connection checkouts. Increasing the max_overflow and/or adding a sleep before the call to connect with a simple wrapper around QueuePool both seem to work as work arounds.

If you have worker threads that aren't doing any IO, per the GIL your program might as well be serialized anyway - but also it would be normal that a program structured like that would be subject to this thread starvation problem in any case if your threads don't ever yield. it's just in this case, it happened to be in a spot where there's a timeout that got in the way.

@sqla-tester
Copy link
Collaborator

Mike Bayer has proposed a fix for this issue in the main branch:

add pool_sleep_on_get parameter https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/3581

@sqla-tester
Copy link
Collaborator

Mike Bayer has proposed a fix for this issue in the rel_1_4 branch:

add pool_sleep_on_get parameter https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/3582

@CaselIT
Copy link
Member

CaselIT commented Feb 9, 2022

IIRC on Windows the minimum sleep tipe is 16ms, so adding a sleep in all cases is not something that's viable.

Also note that if you have a program that does no other io other than the db, there is no point in creating 100 threads if the db parallelism is 20, since that will limit the parallelism to 20 anyway.

Honestly I'm not sure we need the parameter..

@msg555
Copy link
Author

msg555 commented Feb 9, 2022

IIRC on Windows the minimum sleep tipe is 16ms, so adding a sleep in all cases is not something that's viable.

Also note that if you have a program that does no other io other than the db, there is no point in creating 100 threads if the db parallelism is 20, since that will limit the parallelism to 20 anyway.

To be clear that is not what is happening in my situation. There are large chunks of I/O happening on each thread but between these I/O calls a connection may be requested/returned many times without any intervening I/O.

Granted, the fact that there is constant contention over the 20 database connections is a sign that our thread level parallelism is too high relative to the number of connections. Slightly oversaturating our connection pool probably shouldn't cause timeout errors, however. I'd worry, even if we tune the parameters better, that a transient change in workload characteristics could bring this issue back up.

Honestly I'm not sure we need the parameter..

@CaselIT
Copy link
Member

CaselIT commented Feb 9, 2022

Have you by any change tried different combinations of threads / db connections?

It may be that the 5-1 ratio is just too high here

@msg555
Copy link
Author

msg555 commented Feb 9, 2022

Have you by any change tried different combinations of threads / db connections?

It may be that the 5-1 ratio is just too high here

Yes, as I've mentioned it is slightly too high of a ratio.

However, the point of this issue is that it is unexpected (to me, at least) that the consequences of midly saturating load on your connection pool should lead to thread starvation/timeouts.

@zzzeek
Copy link
Member

zzzeek commented Feb 9, 2022

I think what we might do here, since the starvation condition does cause an error to be raised, is to document this particular problem at the linked error document, which is at https://docs.sqlalchemy.org/en/14/errors.html#error-3o7r . There's already a bulleted list of conditions that cause this error, and we simply add this particular cause as well as the suggested mitigation.

However, the point of this issue is that it is unexpected (to me, at least) that the consequences of midly saturating load on your connection pool should lead to thread starvation/timeouts.

it's really not the connection pool here that is causing the problem, any time you write a multithreaded program in any language, if you create a thread that never yields, issues like these will occur when dealing with any kind of synchronized resource. I searched bugs.python.org for someone reporting this in terms of the built in queue.Queue class but couldn't find any. What we could do is post on the Python mailing list to get opinions if they think this is a bug in queue.Queue (to which I would expect the consensus would be "probably not").

the fact that the example we have here fails very quickly and obviously, combined with nobody has ever reported this issue in the 16 years that we've had this design, leads me to believe this is not generally a problem people have.

@zzzeek
Copy link
Member

zzzeek commented Feb 9, 2022

reiterating that we can't really "fix" the problem for everyone because there is no OS-independent, non-time-consuming API feature in Python by which we may instruct a thread to yield. I also can reproduce the issue where sleep(0) is not quite enough compared to sleep(1e-9) on fedora, if you really tighten down the parameters of threads vs connections available vs. how long the thread keeps a connection held out.

@CaselIT the user though has a case where they are just doing a lot of checkouts, and it would be inconvenent for them to place a sleep() above every checkout hence they made a "wrapper" which IMO is too inconvenient as well. might we keep the "sleep()" feature here but obviously have it turned off by default ?

@zzzeek zzzeek modified the milestones: 1.4.x, 2.0 Feb 9, 2022
@CaselIT
Copy link
Member

CaselIT commented Feb 9, 2022

since the gerrit change is abandoned, I'll copy the reply here:

An alternative to the proposed flag may be the ability of specifying the queue implementation class.
This way an user can subclass our Queue and customize the get however they want.

Also using time.sleep is not compatible with asyncio. Assumed there is the same issue there, we would need to address it there using the asyncio sleep?

@msg555
Copy link
Author

msg555 commented Feb 9, 2022

I think what we might do here, since the starvation condition does cause an error to be raised, is to document this particular problem at the linked error document, which is at https://docs.sqlalchemy.org/en/14/errors.html#error-3o7r . There's already a bulleted list of conditions that cause this error, and we simply add this particular cause as well as the suggested mitigation.

+1

it's really not the connection pool here that is causing the problem, any time you write a multithreaded program in any language, if you create a thread that never yields, issues like these will occur when dealing with any kind of synchronized resource. I searched bugs.python.org for someone reporting this in terms of the built in queue.Queue class but couldn't find any. What we could do is post on the Python mailing list to get opinions if they think this is a bug in queue.Queue (to which I would expect the consensus would be "probably not").

Yes the lack of discussion about this issue has me a bit concerned I'm missing something. I have found some OS-level discussions that regard fair mutexes as poor for performance due to increased context switching. I'm not sure that applies to shared Queues or how the Python GIL changes that thinking. I am planning on checking Python IRC and/or posting something about queue.Queue/Lock/Condition on a python list to assess what the consensus is.

the fact that the example we have here fails very quickly and obviously, combined with nobody has ever reported this issue in the 16 years that we've had this design, leads me to believe this is not generally a problem people have.

I agree that it is probably a rare case. However if you google the symptom error you'll be told to increase your pool size which is both sound advice and will alleviate your symptoms. Therefore I think it's possible it has gone unrecognized for awhile. Also real life is messier than that demonstration of the issue and may take some amount of unluckyness for a thread to get totally starved. In our case we only saw these errors occasionally despite relatively uniform load.

An alternative to the proposed flag may be the ability of specifying the queue implementation class. This way an user can subclass our Queue and customize the get however they want.

+1, as a curiosity I made a "fair" queue so would be nice if I could plug it in to sqlalchemy.

@zzzeek
Copy link
Member

zzzeek commented Apr 3, 2022

correct me if I'm wrong but at the moment we are just adding documentation to the list of items at https://docs.sqlalchemy.org/en/14/errors.html#error-3o7r . the condition here causes an easily identifiable symptom (e.g. isn't some silent performance degradation, this isn't really "performance") and has a straightforward mitigation. So....I think this is just docs.

@zzzeek zzzeek added documentation and removed bug Something isn't working performance where performance can be improved. add "bug" only if it's a performance degradation labels Apr 3, 2022
@zzzeek zzzeek modified the milestones: 2.0, 1.4.x Apr 3, 2022
@zzzeek zzzeek changed the title QueuePool connection allocation can starve threads document thread starvation mitigation for connection pool Apr 3, 2022
@zzzeek zzzeek modified the milestones: 1.4.x, 2.0.x Jan 27, 2023
@zzzeek zzzeek added the performance where performance can be improved. add "bug" only if it's a performance degradation label Mar 5, 2024
@wbolster
Copy link
Contributor

wbolster commented Mar 7, 2024

this seems related and may be interesting:

from python 3.10.8+ and 3.11.1+ onwards, os.sched_yield() releases the GIL; see python/cpython#96078

@CaselIT CaselIT added the PRs (with tests!) welcome a fix or feature which is appropriate to be implemented by volunteers label Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
connection pool documentation performance where performance can be improved. add "bug" only if it's a performance degradation PRs (with tests!) welcome a fix or feature which is appropriate to be implemented by volunteers
Projects
None yet
Development

No branches or pull requests

5 participants