-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed. #77789
Comments
In order to keep subinterpreters properly isolated, objects
I favor #2, since it is more generally applicable. #3 would |
As a lesser (IMHO) alternative, we could also modify Py_DECREF |
"That is relatively straight-forward except in one case: indicating that the other interpreter doesn't need the object to exist any more (similar to PyBuffer_Release() but more general)" Why an interpreter would access an object from a different interpreter? Each interpreter should have its own object space, no? |
Adding a low level callback based mechanism to ask another interpreter to do work seems like a good way to go to me. As an example of why that might be needed, consider the case of sharing a buffer exporting object with another subinterpreter: when the memoryview in the subinterpreter is destroyed, it needs to request that the buffer view be released in the source interpreter that actually owns the original object. |
I would suggest that sharing of objects between interpreters should be stamped out. Could we have some #ifdef debug checking that would warn or assert so this doesn't happen? I know currently we share a lot of objects. However, in the long term, that does not seem like the correct design. Instead, each interpreter should have its own objects and any passing or sharing should be tightly controlled. |
@neil, we're definitely on the same page. In fact, in a world where subinterpreters do not share a GIL, we can't ever use an object in one interpreter that was created in another (due to thread safety on refcounts). The role of "tightly controlling" passing/sharing objects (for a very loose definition of "sharing") falls to the channels described in PEP-554. [1] However, there are several circumstances where interpreters may collaborate that involves one holding a reference (but not using it) to an object owned by the other. For instance, see PyBuffer_Release(). [2] This issue is about addressing that situation safely. It is definitely not about safely using objects from other interpreters. [1] The low-level implementation, including channels, already exists in Modules/_xxsubinterpretersmodule.c. |
#55826 has introduces a slight performance regression which will need to be resolved. If I can't find the problem right away then I'll probably temporarily revert the commit. See https://mail.python.org/pipermail/python-dev/2019-February/156451.html. |
At least, it *might* be a performance regression. Here are the two commits I tried: after: ef4ac96 (PR bpo-11617, from this issue) After building each (a normal build, not debug), I ran the micro-benchmark Raymond referred to ("./python Tools/scripts/var_access_benchmark.py") multiple times. There was enough variability between runs from the same commit that I'm uncertain at this point if there really is any performance regression. For the most part the results between the two commits are really close. Also, the results from the "after" commit fall within the variability I've seen between runs of the "before" commit. I'm going to try with the "performance" suite (https://github.com/python/performance) to see if it shows any regression. |
Here's what I did (more or less): $ python3 -m pip install --user perf
$ python3 -m perf system tune
$ git clone git@github.com:python/performance.git
$ git clone git@github.com:python/cpython.git
$ cd cpython
$ git checkout ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465
$ ./configure
$ make -j8
$ ./python ../performance/pyperformance run --fast --output speed.after
$ git checkout HEAD^
$ make -j8
$ ./python ../performance/pyperformance run --fast --output speed.before
$ ./python ../performance/pyperformance compare speed.before speed.after -O table Here are my results: speed.before Performance version: 0.7.0 speed.after Performance version: 0.7.0 +-------------------------+--------------+-------------+--------------+-----------------------+ |
...so it doesn't appear that my PR introduces a performance regression. |
IMHO there is no performance regression at all. Just noice in the result which doesn't come with std dev. |
FYI, I have a couple of small follow-up changes to land before I close this issue. |
I suspect changes for this issue may be creating test_io failures on my windows builders, most notably my x86 Windows 7 builder where test_io has been failing both attempts on almost all builds. It fails with a lock failure during interpreter shutdown, and commit ef4ac96 appears the most plausible commit out of the set introduced at the first failing build on Feb 24. See https://buildbot.python.org/all/#/builders/58/builds/1977 for the first failure. test_io has failed both attempts on all but 3 of the subsequent 16 tests of the 3.x branch. It might be worth noting that this builder is slow, so if there are timeouts involved or a race condition of any sort it might be triggering it more readily than other builders. I do, however, see the same failures - albeit less frequently and not usually on both tries - on the Win8 and Win10 builders. For what it's worth one other oddity is that while having test_multiprocessing* failures are relatively common on the Win7 builder during the first round of tests due to exceeding the timeout, they usually pass on the retry, but over this same time frame have begun failing - not due to timeout - even on the second attempt, which is unusual. It might be coincidental but similar failures are also showing up sporadically on the Win8/Win10 builders where such failures are not common at all. |
I've merged the PR for hoisting eval_breaker before the ceval loop starts. @Raymond, see if that addresses the performance regression you've been seeing. I'll close this issue after I've sorted out the buildbot issues David mentioned (on his Windows builders). |
If I can help with testing or builder access or anything just let me know. It appears that I can pretty reliably trigger the error through just the individual tests (test_daemon_threads_shutdown_std{out,err}_deadlock) in isolation, which is definitely less tedious than a full buildbot run to test a change. BTW, while not directly related since it was only just merged, and I won't pretend to have any real understanding of the changes here, I do have a question about PR 12024 ... it appears HEAD_UNLOCK is used twice in _PyInterpreterState_LookUpID. Shouldn't one of those be HEAD_LOCK? |
I created bpo-36177 to track this regression. |
Hi Eric, I'm sorry but I had to revert your recent work. It introduced regressions in at least in test_io and test_multiprocessing_spawn on Windows and FreeBSD. I simply don't have the bandwidth to investigate this regression yet, whereas we really need the CI to remain stable to catch other regressions (the master branch received multiple new features recently, and there are some other regressions by that way ;-)). test_multiprocessing_spawn is *very* hard to reproduce on FreeBSD, but I can reliably reproduce it. It just takes time. The issue is a crash producing a coredump. I consider that the bug is important enough to justify a revert. The revert is an opportunity to seat down and track the root cause rather than urging to push a "temporary" quickfix. This bug seems to be very deep in the Python internals: thread state during Python shutdown. So it will take time to fully understand it and fix it (or redesign your recent changes, I don't know). Tell me if you need my help to reproduce the bug. The bug has been seen on FreeBSD but also Windows:
-- The Night's Watch |
For curious people, Pablo Galindo spent a few hours to investigate https://bugs.python.org/issue36114 and I spent a few more hours to finish the analysis. The bug indirectly crashed my laptop :-) That's what I mean by "I don't have the bandwidth": we already spent hours to identify the regression ;-) |
That's okay, Victor. Thanks for jumping on this. I'll take a look when I get a chance. |
From what I saw, your first commit was enough to reproduce the crash. If I recall correctly, Antoine Pitrou modified the GIL so threads exit immediately when Py_Finalize() is called. I'm thinking at: void
PyEval_RestoreThread(PyThreadState *tstate)
{
...
take_gil(tstate);
/* _Py_Finalizing is protected by the GIL */
if (_Py_IsFinalizing() && !_Py_CURRENTLY_FINALIZING(tstate)) {
drop_gil(tstate);
PyThread_exit_thread();
Py_UNREACHABLE();
}
...
PyThreadState_Swap(tstate);
} Problem: this code uses tstate, whereas the crash occurred because tstate pointed to freed memory: """ (gdb) p *tstate ... Thread 1 (LWP 100696): https://bugs.python.org/issue36114#msg337090 When this crash occurred, Py_Finalize() already completed in the main thread! """
void _Py_NO_RETURN
Py_Exit(int sts)
{
if (Py_FinalizeEx() < 0) { /* <==== DONE! */
sts = 120;
}
} Py_Finalize() is supposed to wait for threads before deleting Python thread states: """
int
Py_FinalizeEx(void)
{
...
/* Delete current thread. After this, many C API calls become crashy. */
PyThreadState_Swap(NULL);
PyInterpreterState_Delete(interp);
} The real problem for years are *deamon threads* which... BY DESIGN... remain alive after Py_Finalize() exit! But as I explained, they must exit as soon at they attempt to get GIL. |
Thanks for taking a look Victor! That info is helpful. It will likely be a few days before I can sort this out. Once I have addressed the problem I'll re-merge. I plan on using the "buildbot-custom" branch to make sure the buildbots are happy with the change before making that PR. |
FWIW, the failures before were in test_io and test_multiprocessing_spawn, not test_asyncio. |
I had really hoped to get this in for 3.8, as I may need it. However, realistically I doubt it will take just a few minutes to resolve this. With the beta 1 targeted for today it makes sense to revert my change. :( (I may still try to convince our fearless RM to allow it in beta 2, but we'll see.) |
I am not aware of a test_asyncio issue related to this issue, only crashes |
What annoy me the most in this issue is that I am unable to reproduce it. Even directly on the FreeBSD CURRENT buildbot, I failed to reproduce rhe crash even when I stressed the buildbot and the test. On the other side, when the CI runs the test suite, crashes are likely!? These bugs are the worst. IMHO we must make multiprocessing more deterministic. Python 3.8 is better, but there are still many known issues. test_venv multiprocessing test misuses the multiprocessing issue, it emirs a ResourceWarning. |
My bet is that multiprocessing triggers a bug in daemon threads. Multiprocessing is using and abusing daemon threads. For example, as I wrote, test_venv misuses the multiprocessing API and implicitly let Python release (or maybe not!) "resources" where "resources" can be processes and threads, likely including daemon threads. Eric: I would love to give you access to a VM where I can reproduce the bug but... I failed to reproduce the bug today :-( I would suggest to write a stress test for daemon threads:
The race condition is that daemon threads may or maybe run during Python finalization depending on the delay. Maybe you can make the crash more likely by adding a sleep of a few *seconds* before the final exit. For example, at the exit of Py_RunMain(). Last time I saw the crash, it was a thread which uses a Python structure whereas the memory of the structure was freed (filled with a random pattern by debug hooks on memory allocators). Ah, either use a debug build of Python, or use PYTHONMALLOC=debug, or -X dev command line option, to fill freed memory with a specific pattern, to trigger the crash. ... Good luck. |
Well, for this PR I actually disabled the execution of pending calls during runtime finialization. I'd hoped it would help, but it did not. :( Regardless, I'll need to look even more closely at what is different during runtime finalization with my PR. |
Also, thanks for the suggestions, Victor! |
Based on Victor's info from https://bugs.python.org/issue36114#msg337090 I believe the crash is essentially what's reproduced in the attached program. From the root of a (built) cpython clone run: gcc -c -o fini_crash.o -IInclude -I. fini_crash.c && gcc -o fini_crash fini_crash.o libpython3.9.a -lcrypt -lpthread -ldl -lutil -lm && ./fini_crash The output should be: MAIN: allow other thread to execute And running it through valgrind: $ valgrind --suppressions=Misc/valgrind-python.supp fini_crash -- COMMAND -- 13:4[12/5973]
==266836== Memcheck, a memory error detector
==266836== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==266836== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==266836== Command: fini_crash
==266836==
MAIN: allow other thread to execute
OTHER: acquired GIL
OTHER: released GIL
MAIN: interpreter finalized
OTHER: attempt to acquire GIL...crash!
==266836== Thread 2:
==266836== Invalid read of size 8
==266836== at 0x15607D: PyEval_RestoreThread (ceval.c:389)
==266836== by 0x15479F: evil_main (in /home/phconnel/dev/cpython/fini_crash)
==266836== by 0x48B94CE: start_thread (in /usr/lib/libpthread-2.30.so)
==266836== by 0x4B232D2: clone (in /usr/lib/libc-2.30.so)
==266836== Address 0x4d17270 is 16 bytes inside a block of size 264 free'd
==266836== at 0x48399AB: free (vg_replace_malloc.c:540)
==266836== by 0x1773FF: tstate_delete_common (pystate.c:829)
==266836== by 0x1773FF: _PyThreadState_Delete (pystate.c:848)
==266836== by 0x1773FF: zapthreads (pystate.c:311)
==266836== by 0x1773FF: PyInterpreterState_Delete (pystate.c:321)
==266836== by 0x174920: finalize_interp_delete (pylifecycle.c:1242)
==266836== by 0x174920: Py_FinalizeEx.part.0 (pylifecycle.c:1400)
==266836== by 0x15487B: main (in /home/phconnel/dev/cpython/fini_crash)
==266836== Block was alloc'd at
==266836== at 0x483877F: malloc (vg_replace_malloc.c:309)
==266836== by 0x178D7C: new_threadstate (pystate.c:557)
==266836== by 0x178D7C: PyThreadState_New (pystate.c:629)
==266836== by 0x178D7C: PyGILState_Ensure (pystate.c:1288)
==266836== by 0x154759: evil_main (in /home/phconnel/dev/cpython/fini_crash)
==266836== by 0x48B94CE: start_thread (in /usr/lib/libpthread-2.30.so)
==266836== by 0x4B232D2: clone (in /usr/lib/libc-2.30.so)
==266836==
==266836== Invalid read of size 8
==266836== at 0x156081: PyEval_RestoreThread (ceval.c:389)
==266836== by 0x15479F: evil_main (in /home/phconnel/dev/cpython/fini_crash)
==266836== by 0x48B94CE: start_thread (in /usr/lib/libpthread-2.30.so)
==266836== by 0x4B232D2: clone (in /usr/lib/libc-2.30.so)
==266836== Address 0x4c3a0f0 is 16 bytes inside a block of size 2,960 free'd
==266836== at 0x48399AB: free (vg_replace_malloc.c:540)
==266836== by 0x174920: finalize_interp_delete (pylifecycle.c:1242)
==266836== by 0x174920: Py_FinalizeEx.part.0 (pylifecycle.c:1400)
==266836== by 0x15487B: main (in /home/phconnel/dev/cpython/fini_crash)
==266836== Block was alloc'd at
==266836== at 0x483877F: malloc (vg_replace_malloc.c:309)
==266836== by 0x177153: PyInterpreterState_New (pystate.c:205)
==266836== by 0x1732BF: pycore_create_interpreter (pylifecycle.c:526)
==266836== by 0x1732BF: pyinit_config.constprop.0 (pylifecycle.c:695)
==266836== by 0x1766B7: pyinit_core (pylifecycle.c:879)
==266836== by 0x1766B7: Py_InitializeFromConfig (pylifecycle.c:1055)
==266836== by 0x1766B7: Py_InitializeEx (pylifecycle.c:1093)
==266836== by 0x154801: main (in /home/phconnel/dev/cpython/fini_crash)
==266836== |
Just to summarise, I'm fairly sure this is exactly what Victor saw: a daemon thread attempts to reacquire the GIL via Py_END_ALLOW_THREADS after interpreter finalisation. Obviously the threadstate pointer held by the thread is then invalid...so we crash. So I see basically two options:
The discussion so far assumes that we should support this, i.e. #1. Any thoughts on that? (I'll have a think about whether this is actually doable!) |
The attached patch (wrap_threadstate.diff) is enough to stop the crash. It's a slightly dirty proof-of-concept, but equally could be the basis for a solution. The main functional issue is that there's still a race on the Py_BLOCK_THREADS side: it's possible that the "is threadstate still valid" check can pass, but the interpreter is finalised while the daemon thread is trying to reacquire the GIL in PyEval_RestoreThread. (The Py_UNBLOCK_THREADS side is non-racy as the GIL is held while the ts and wrapper updates are done). |
Sorry for the delay, Phil. I'll try to take a look in the next couple of hours. |
I'm out of time and this deserves some careful discussion. I'll get to it next Friday (or sooner if possible). Sorry! |
Thanks for the detailed analysis, Phil. I think the results are pretty conclusive: daemon threads are the worst. :) But seriously, thanks. As you demonstrated, it isn't just Python "daemon" threads that cause the problem. It is essentially any external access of the C-API once runtime finalization has started. The docs [1] aren't super clear about it, but there are some fundamental assumptions we make about runtime finalization:
I guess the real question is what to do about this? Given that this is essentially a separate problem, let's move further discussion and effort over related to sorting out problematic threads to bpo-36476, "Runtime finalization assumes all other threads have exited." @phil, would you mind attaching those same two files to that issue? [1] https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx |
I partially reimplemented commit ef4ac96 in the following issues:
I cannot give a list of commits, I made too many of them :-D The most important change is: commit 50e6e99
My plan is now to fix pending calls in subinterpreters. Currently, they are only executed in the "main thread"... _PyRuntimeState.main_thread must be moved to PyInterpreterState, as it was done in commit ef4ac96. This issue shows that it's dangerous to change too many things at once. Python internals were still very fragile: bpo-39877 and and bpo-37127 are good examples of that. I'm trying to move *very slowly*. Move pieces one by one. I added _Py_ThreadCanHandlePendingCalls() function in commit d831688 (bpo-40010): it should ease moving _PyRuntimeState.main_thread to PyInterpreterState. I also added _Py_ThreadCanHandleSignals() in a previous commit, so it's easier to change which thread is allowed or not to handle signals and pending calls. I also plan to revisit (removal) pending_calls.finalizing added by commit 842a2f0 (bpo-33608). I plan to work on that in bpo-37127. |
Just to add my 2 cents. I think this a bad idea and is likely to be unsafe. IMO, objects should *never* be shared across interpreters (once interpreters are expected to be able to run in parallel). Any channel between two interpreters should consist of two objects, one per interpreter. The shared memory they use to communicated can be managed by the runtime. Likewise for inter-interpreter locks. The lock itself should be managed by the runtime, with each interpreter having its own lock object with a handle on the shared lock. |
FYI, in bpo-39984 Victor moved pending calls to PyInterpreterState, which was part of my reverted change. However, there are a few other pieces of that change that need to be applied before this issue is resolved. I'm not sure when I'll get to it, but hopefully soon. |
To be honest, I don't understand the purpose of this issue. Some messages are talking about pending calls, some others are talking about channels, and a few are talking about "shared" objects. Can someone either clarify the exact intent of this issue, or close this one and create a new issue with a clear explanation of what should be done and why? |
I created bpo-40231: Fix pending calls in subinterpreters. |
Pavel Kostyuchenko:
Pavel: would you mind to open a separated issue to suggest to add synchronization and/or avoid daemon thread in multiprocessing? The concurrent.futures module was recently modified to avoid daemon threads in bpo-39812. |
I reworked this function in bpo-40010. |
This issue has a long history. A change has been applied and then reverted three times in a row. Pending calls are now per-interpreter. The issue title is "Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed." but I don't understand if pending calls are expected to be used to communicate between two interpreters. Why not using a UNIX pipe and exchange bytes through it? Py_AddPendingCall() is a weird concept. I would prefer to not abuse it. Moreover, it's unclear if this issue attempts to *share* a same object between two interpreters. I would prefer to avoid that by any possible way. I close this issue with a complex history. If someone wants to continue to work on this topic, please open an issue with a very clear description of what should be done and how it is supposed to be used. |
Yeah, there is more to do. I'll create a new issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: