-
-
Notifications
You must be signed in to change notification settings - Fork 33.4k
Description
Bug report
Bug description:
There is a race condition in the threading module where a parent thread calling Thread.start() can wait forever if the newly created thread crashes with a MemoryError during its internal bootstrap process.
Case Explanation
In case we have a "Serving Thread" that creates threads on demand (e.g., for each HTTP request):
- When this Serving Thread calls
Thread.start(), the OS-level thread (pthread_create()1 - Linux) is successfully created. The parent thread then waits for the new thread to signal that it has started correctly by callingself._started.wait()2. - The new thread starts, but before it can signal the parent thread (Serving Thread) that it is alive with
self._started.set()3 it encounters aMemoryError. - This
MemoryErrorcan occur at the C level during thePyObject_Callto the_bootstrapmethod or inside the_bootstrap_innermethod before_started.set()is reached, often due to memory pressure from other threads (or a heap limit being reached). - This exception is caught by the C-level entry point
thread_run(), which calls_PyErr_WriteUnraisableMsg4 and prints "Exception ignored in thread started by: ..."
The new thread then exits without ever signaling the _started event, and the parent thread waits indefinitely on the _started.wait().
This also leaves the threading module in an inconsistent state, as the "zombie" thread object may not be correctly cleaned up from the _limbo dict.
How to Reproduce
This has been observed in high-concurrency server applications under heavy, sustained load, where heap memory can be rapidly consumed and exhausted by concurrent threads 5.
This is a race condition that is difficult to reproduce reliably, as it requires triggering a MemoryError at a specific moment.
I found a (deterministic?) way to reproduce the issue by restricting the heap memory until it reaches a threshold where we can start a new thread, but this new thread won't get enough memory for its initialization.
On some machines (and depending of Python versions), it is sometimes necessary to tweak HARD_LIMIT_START / LIMIT_REDUCTION (I reproduced it on Ubuntu based machine with Python 3.11/3.12/3.13/3.14)
import resource
import threading
import gc
def handler():
pass
def serving():
# These should be tweak (depending of Python version + system)
HARD_LIMIT_START = 30_000_000
LIMIT_REDUCTION = 5_000
for _ in range(500_000):
gc.collect(2) # Force getting back memory: seems to increase the determinism of the script
# Limit the heap size available for this process
resource.setrlimit(resource.RLIMIT_DATA, (HARD_LIMIT_START, HARD_LIMIT_START * 2))
try:
handler_thread = threading.Thread(target=handler)
print(f'Start Thread: {handler_thread} - Heap size limit : {HARD_LIMIT_START}')
handler_thread.start()
handler_thread.join()
HARD_LIMIT_START -= LIMIT_REDUCTION
except RuntimeError as r: # If Python refused to launch a new Thread
print(f'RuntimeError: {r} - Cannot start the thread at all => error not detected.')
return
serving_thread = threading.Thread(target=serving)
serving_thread.start()
serving_thread.join()Expected Behavior
I am not sure if this is an "accepted" limitation of (CPython) Thread or not. IMO, Thread.start() shouldn't hang indefinitely if the low-level thread is dead.
I didn't take the time to try to fix it yet (if possible). I would prefer to get your opinions on this first.
CPython versions tested on:
3.12, 3.13, 3.14
Operating systems tested on:
Linux
Linked PRs
Footnotes
-
https://github.com/python/cpython/blob/25243b1461e524560639ebe54bab9b689b6cc31e/Python/thread_pthread.h#L284 ↩
-
https://github.com/python/cpython/blob/f463d05a0979aada4fadcd43ff721b1ff081d2aa/Lib/threading.py#L999 ↩
-
https://github.com/python/cpython/blob/f463d05a0979aada4fadcd43ff721b1ff081d2aa/Lib/threading.py#L1064 ↩
-
https://github.com/python/cpython/blob/89a79fc919419bfe817da13bc2a4437908d7fc07/Modules/_threadmodule.c#L1122 ↩
-
https://github.com/odoo/odoo/blob/350f7b10d4048b84d4a5f9e5aca9a88b8b971801/odoo/service/server.py#L273 ↩
Metadata
Metadata
Assignees
Labels
Projects
Status