New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_PyThreadState_Init and fork race leads to inconsistent key list #73826
Comments
Following crash is sporadically observed in RHEL7 anaconda: (gdb) f 0 key list is protected by keymutex (except for PyThread_ReInitTLS), but there doesn't seem to be any protection against concurrent fork(). What seems to happen is fork() at the moment when key list is not consistent. For example, if I initialize new key to 0xfe: static struct key *find_key(int key, void *value)
// find_key with extra memset()
...
p = (struct key *)malloc(sizeof(struct key));
memset(p, 0xfe, sizeof(struct key));
if (p != NULL) {
p->id = id;
p->key = key;
p->value = value;
p->next = keyhead;
keyhead = p;
}
... Looking at disassembly, compiler reordered last 2 writes: 0x00000000004fcb50 <+272>: callq 0x413d10 <malloc@plt> Now consider what happens, when different threads call fork() in between these 2 writes: we updated keyhead, but keyhead->next has not been updated yet. Now when anaconda hangs, I get: (gdb) list Here's how I think we get into this state: -------------------------> thread 1 -------------------------> thread 2 -------------------------> thread 3 -------------------------> thread 1
continuing Thread.start
self.__started.wait()
Event.wait()
self.__cond.wait
Condition.wait()
waiter = _allocate_lock()
waiter.acquire()
lock_PyThread_acquire_lock
Py_BEGIN_ALLOW_THREADS
PyEval_SaveThread
PyThread_release_lock(interpreter_lock); -------------------------> thread 2 -------------------------> child Attached patch for python makes it easier to reproduce, by adding delays to couple places to make window key list is not consistent larger. |
Here is a proof of concept patch from Jaroslav Škarvada. It fixes the problem by holding the mutex used for PyThread_create_key while forking. To make it more than PoC it needs adding _PyThread_AcquireKeyLock and _ReleaseKeyLock (similar to _PyImport_AcquireLock() etc.) and calling those. Other than that, does this approach look reasonable? |
In order to reproduce: Apply the python.patch from bz1268226_reproducer2.tar.gz Compile python Run the reproduce4.py from bz1268226_reproducer2.tar.gz As indicated by the reproducer, the status returned by os.wait() for the child is 139. I will refine a bit the patch and work on a PR. |
Patch for protecting the key list while forking. |
Hey, any status update on this bug? Suffered a similar issue on a Centos 6.5 kernel when spawning multiple processes in a Twisted environment. Is this PR targeted for inclusion into the source tree? Thanks, Tom |
I'm a bit out of my depth here. Victor, could you chime in? The problem with Harris' patch is that, once fork() is protected by the thread lock, acquiring that lock (by e.g. calling I thought that can be solved by doing the locking in an atfork handler, but that's not working out -- CPython's pthread_atfork (which would lock _PyThread_AcquireKeyLock for the duration of the fork) would need to be called *after* an extension's pthread_atfork (which needs the thread lock temporarily). |
There is a more general issue for any lock and fork(): bpo-6721, "Locks in the standard library should be sanitized on fork". |
Python 3 is not affected by this issue because it uses native thread locale storage (TLS):
I'm not sure that it's doable to backport such enhancement, since Python 2.7 supports many thread implementations, not only NT (Windows) and pthread:
Maybe it's doable for a Linux vendor, but it's going to be a large change that has to be maintained downstream :-/ |
Gah! The more I look into locks & forks ... the more I learn, to put it mildly. Instead of piling on workarounds, I'll try my hand at using native thread-local storage for pthread, and avoid the locking altogether. Hopefully that can make it in as a bugfix? At least for this bug, it most likely *is* the most appropriate fix -- though I can only fix it for pthread. |
WIP pull request: #5141 |
Hi Petr, Do you continue this patch/issue? |
Not immediately, but it is on my TODO list. |
Oh, it seems like I was wrong in my previous comment. Python 2.7 code base is already designed to support native TLS. It's just that we only implement native TLS on Windows. So yeah, it seems doable to implement native TLS for pthread. History of Py_HAVE_NATIVE_TLS: commit 8d98d2c
commit 00f2df4
|
pthread is not generally compatible with int, so it can't be used with Python 2's API. |
PTHREAD_KEY_T_IS_COMPATIBLE_WITH_INT is defined on most (pthread) platforms, no? I understood that the PEP-539 is mostly designed for Cygwin, a platform which is not officially supported by Python. At least, PTHREAD_KEY_T_IS_COMPATIBLE_WITH_INT is set to 1 on my Fedora 27 (Linux). I propose to cast pthread_key_create() result to int, but only define PyThread_create_key() in Python/thread_pthread.h if PTHREAD_KEY_T_IS_COMPATIBLE_WITH_INT is defined. It means that the pthread implementation of Python would still have this bug (race condition) if PTHREAD_KEY_T_IS_COMPATIBLE_WITH_INT is not defined. But backporting the PEP-539 to Python 2.7 doesn't seem worth it. What do you think? |
I don't think that's a good idea. Changing API, even for platforms that aren't officially supported, sounds very harsh this late in the release cycle. But! I suppose we could fix the bug only for platforms with PTHREAD_KEY_T_IS_COMPATIBLE_WITH_INT. Other platforms would keep the current implementation -- they'd still have the bug, but the API would stay unchanged. |
Yes, this is my proposal.
Which API change? I don't propose to modify the existing public C API "int PyThread_create_key(void)". I only propose to change it's implementation to the native pthread API when PTHREAD_KEY_T_IS_COMPATIBLE_WITH_INT is defined. |
As 2.7 is now EOL, I'm closing the issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: