-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyOS_AfterFork should reset socketmodule's lock #70108
Comments
On some platforms there's an exclusive lock in socketmodule, used for getaddrinfo, gethostbyname, gethostbyaddr. A thread can hold this lock while another forks, leaving it locked forever in the child process. Calls to these functions in the child process will hang. (I wrote some more details here: https://emptysqua.re/blog/getaddrinfo-deadlock/ ) I propose that this is a bug, and that it can be fixed in PyOS_AfterFork, where a few similar locks are already reset. |
Maybe instead of releasing the lock in the forked child process, we should try to acquire the lock in the os.fork() implementation, and then release it? Otherwise, suppose that a call to getaddrinfo (call #1) takes a long amount of time. In the middle of it we fork, and then immediately try to call getaddrinfo (call #2, and call #1 is still happening) for some other address. At this point, since getaddrinfo isn't threadsafe, something bad will happen. |
bpo-25924 is related to this, I filed this after reading the blog post. The lock might not be necessary on OSX, and possibly on the other systems as well. Yury: resetting the lock in the child should be safe because after the fork the child only has a single thread that is returning from fork(2). The thread that acquired the lock does not exist in the child process. |
Does the example code (which should be posted here) still hang? If so, automated tests that hang indefinitely on failure are a nuisance. A revised example that failed after, say, a second would be better. |
In bpo-40089, I added _PyThread_at_fork_reinit() for this purpose: reinitialize a lock after a fork to unlocked state. Internally, it leaks memory on purpose and then create a new lock, since there is no portable way to reset a lock after fork. The problem is how to register netdb_lock of Modules/socketmodule.c into a list of locks which should be reinitialized at fork, or maybe how to register a C callback called at fork. There is a *Python* API to register a callback after a fork: os.register_at_fork(). See also the meta-issue bpo-6721: "Locks in the standard library should be sanitized on fork". |
On macOS, Python is only affected if "MAC_OS_X_VERSION_MIN_REQUIRED < MAC_OS_X_VERSION_10_5". Is it still the case in 2020? Copy/paste of socketmodule.c: /* On systems on which getaddrinfo() is believed to not be thread-safe, getaddrinfo is thread-safe on Mac OS X 10.5 and later. Originally it was It's thread-safe in OpenBSD starting with 5.4, released Nov 2013: It's thread-safe in NetBSD starting with 4.0, released Dec 2007: http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/net/getaddrinfo.c.diff?r1=1.82&r2=1.83
*/
#if ((defined(__APPLE__) && \
MAC_OS_X_VERSION_MIN_REQUIRED < MAC_OS_X_VERSION_10_5) || \
(defined(__FreeBSD__) && __FreeBSD_version+0 < 503000) || \
(defined(__OpenBSD__) && OpenBSD+0 < 201311) || \
(defined(__NetBSD__) && __NetBSD_Version__+0 < 400000000) || \
!defined(HAVE_GETADDRINFO))
#define USE_GETADDRINFO_LOCK
#endif |
The macOS test checks if the binary targets macOS 10.4 or earlier. Those versions of macOS have been out of support for a very long time, and we haven't had installers targeting those versions of macOS for a long time as well. 2.7 and 3.5 had installers targeting macOS 10.5, current installers target macOS 10.9. IMHO macOS 10.4 has moved into museum territory and I wouldn't bother supporting it anymore. Support for USE_GETADDRINFO_LOCK is only enabled for very old OS releases, the OS that stopped requiring this the latest is OpenBSD in 2013 (7 years ago). The other OSes stopped requiring this in code in 2007 (13 years ago). I'd drop this code instead of fixing it. |
Hum, FreeBSD, OpenBSD and NetBSD versions which require the fix also look very old. So I agree that it became safe to remove the fix. Would it make sense to only fix it on Python 3.10 and leave other versions with the bug? Or should fix all Python versions? |
Technically this would be a functional change, I'd drop this code in 3.9 and trunk (although it is awfully close to the expected date for 3.9b1). Older versions would keep this code and the bug, that way the older python versions can still be used on these ancient OS versions (but users might run into this race condition). |
I wrote PR 20177 to avoid the netdb_lock in socket.getaddrinfo(), but the lock is still used on platforms which don't provide gethostbyname_r(): #if !defined(HAVE_GETHOSTBYNAME_R) && !defined(MS_WINDOWS)
# define USE_GETHOSTBYNAME_LOCK
#endif |
If I understood correctly, Python 3.8 and 3.9 binaries provided by python.org is *not* impacted by this issue. Only Python binaries built manually with explicit support for macOS 10.4 ("MAC_OS_X_VERSION_MIN_REQUIRED") were impacted. Python 3.9 and older are not fixed (keep the lock). The workaround is to require macOS 10.5 or newer. macOS 10.4 was released in 2004, it's maybe time to stop support it :-) Python 3.7 (and newer) requires macOS 10.6 or newer (again, I'm talking about binaries provided by python.org).
I chose to leave the lock for gethostbyname(). Ronald wrote that this lock is no longer needed: Please open a separated issue for this lock. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: