Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread starvation with threading.Condition #90968

Open
msg555 mannequin opened this issue Feb 21, 2022 · 3 comments
Open

Thread starvation with threading.Condition #90968

msg555 mannequin opened this issue Feb 21, 2022 · 3 comments
Labels
3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@msg555
Copy link
Mannequin

msg555 mannequin commented Feb 21, 2022

BPO 46812
Nosy @tim-one, @msg555

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2022-02-21.02:47:19.547>
labels = ['type-bug', 'library', '3.10']
title = 'Thread starvation with threading.Condition'
updated_at = <Date 2022-02-21.03:24:41.502>
user = 'https://github.com/msg555'

bugs.python.org fields:

activity = <Date 2022-02-21.03:24:41.502>
actor = 'tim.peters'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2022-02-21.02:47:19.547>
creator = 'msg555'
dependencies = []
files = []
hgrepos = []
issue_num = 46812
keywords = []
message_count = 2.0
messages = ['413629', '413630']
nosy_count = 2.0
nosy_names = ['tim.peters', 'msg555']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue46812'
versions = ['Python 3.10']

@msg555
Copy link
Mannequin Author

msg555 mannequin commented Feb 21, 2022

When using Condition variables to manage access to shared resources you can run into starvation issues due to the thread that just gave up a resource (making a call to notify/notify_all) having priority on immediately reacquiring that resource before any of the waiting threads get a chance. The issue appears to arise because unlike the Lock implementation Condition variables are implemented partly in Python and a thread must hold the GIL when it reacquires its underlying condition variable lock.

Coupled with Python's predictable switch interval this means that if a thread notifies others of a resource being available and then shortly after attempts to reacquire that resource it will be able to do so since it will have held the GIL the entire time.

This can lead to some threads being entirely starved (forever) for access to a shared resource. This came up in a real world situation for me when I had multiple threads trying to access a shared database connection repeatedly without blocking between accesses. Some threads were never getting a connection leading to unexpected timeouts. See sqlalchemy/sqlalchemy#7679

Here's a simple example of this issue using the queue.Queue implementation: https://gist.github.com/msg555/36a10bb5a0c0fe8c89c89d8c05d00e21

Similar example just using Condition variables directly: https://gist.github.com/msg555/dd491078cf10dbabbe7b1cd142644910

Analagous C++ implementation. On Linux 5.13 this is still not _that_ fair but does not completely starve threads: https://gist.github.com/msg555/14d8029b910704a42d372004d3afa465

Thoughts:

  • Is this something that's worth fixing? The behavior at the very least is surprising and I was unable to find discussion or documentation of it.
  • Can Condition variables be implemented using standard C libraries? (e.g. pthreads) Maybe at least this can happen when using the standard threading.Lock as the Condition variables lock?
  • I mocked up a fair Condition variable implementation at https://github.com/msg555/fairsync/blob/main/fairsync/condition.py. However fairness comes at its own overhead of additional context switching.

Tested on Python 3.7-3.10

@msg555 msg555 mannequin added 3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Feb 21, 2022
@tim-one
Copy link
Member

tim-one commented Feb 21, 2022

Unassigning myself - I have no insight into this.

I suspect the eternally contentious bpo-7946 is related.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@hallmeier
Copy link

I just encountered this issue in Python 3.12 and want to summarize the problem:
While one thread is calling condition.wait(), the thread holding the condition may call

condition.notify()
condition.release()
condition.acquire()

to reacquire the condition before the waiting thread can do it. The waiting thread is always successfully woken up from its waiting state, but from there it has to go a few lines to acquire the condition and whether or not this will happen before another non-waiting thread acquires the condition is non-deterministic. (OP has even shown non-waiting threads can acquire the condition over and over again)

From how the Condition is introduced and documented, I always assumed its purpose is the guarantee that a waiting thread will take over on condition.notify(); condition.release(). This not being the case complicates using Conditions effectively. I hope this is the intended behavior (I do not see any downsides) and can be fixed, but otherwise a clarification in the documentation would help to not fall for this unintuitive behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

2 participants