New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault during garbage collection of the asyncio event loop #113566
Comments
Hello! |
The repro you have so far is very complex and we may not be able to debug this meaningfully, even if we can reproduce it. Let's first start with the backtrace you have. What Python version was that exactly? (And was it built with debug mode?) I can't even find a recent version of CPython where PyObject_Repr calls PyObject_CallOneArg (but I dodn't search very hard). The immediate problem seems to be that the latter is being called with a NULL pointer, but without being able to read the exact source code I don't know where that is coming from. The rest of the stack trace makes me think that we're in some finalization stage where some code, invoked by a generator (or async def) finalization, is trying to remove something from a deque that's not there. The problem occurs during printing the error message. I doubt that this is timing related exactly, though it is quite possible that the root cause is that the thing that's being called has already been removed by the finalization code -- this would be a bug in finalization order. |
I have indeed a hard time producing a MRE, my tries getting a single file exhibiting the behavior without pulling the library were not met with success yet.
|
All right, line 433 in object.c is indeed in res = (*Py_TYPE(v)->tp_repr)(v); Presumably this calls some object's return PyObject_CallOneArg(asyncio_future_repr_func, (PyObject *)fut); which could conceivably have been tail-optimized so that the debugger sees its direct caller as And this leads to the hypothesis that I think to fix this, we may have to add a null pointer check for this variable to I do ask you to ensure if you can repeat the same result in 3.12, where those global variables have been moved to state attributes, and the control flow is a bit different. And again, if you can repeat it on the main branch, where the situation might be slightly different again (not sure). @kumaraditya303 Do you have any thoughts on the premature finalization of the |
@gvanrossum I could narrow down the issue on python 3.11 to doing a On the other hand, I also produced a (different) stack trace on Python 3.12.1 (main, Dec 19 2023, 20:14:15) [GCC 12.2.0]) with the following backtrace, and where toggling that
|
Ok, here is a minimal code reproducing the first issue, but it does seem like python3.12 has it fixed: import asyncio
loop = asyncio.get_event_loop()
q = asyncio.Queue()
async def consume():
while True:
await q.get()
async def coro():
q.put_nowait(1)
await asyncio.sleep(500)
async def main():
asyncio.ensure_future(consume())
asyncio.ensure_future(coro())
loop.run_until_complete(main()) |
Okay, your example fails for me too with 3.11 on my Mac. And with this diff it passes: diff --git a/Modules/_asynciomodule.c b/Modules/_asynciomodule.c
index b2fef017050..a92feebcdbc 100644
--- a/Modules/_asynciomodule.c
+++ b/Modules/_asynciomodule.c
@@ -1377,6 +1377,9 @@ static PyObject *
FutureObj_repr(FutureObj *fut)
{
ENSURE_FUTURE_ALIVE(fut)
+ if (asyncio_future_repr_func == NULL) {
+ return PyUnicode_FromFormat("<Future at %p>", fut);
+ }
return PyObject_CallOneArg(asyncio_future_repr_func, (PyObject *)fut);
}
I think this was fixed in 3.12 by the refactor to remove all the globals. For your other segfault I have no idea -- I don't know that part of the code that well, and looking at that line, nothing comes to my mind. I recommend that you try to repro it on main, and then file a separate bug report, which someone with more expat knowledge will have to address. (If it doesn't repro on main, it's still worth filing, just state as much in your bug report.) I'll cook up a PR for the asyncio crash. |
In 3.11 there only |
Crash report
What happened?
Original details are on: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1040057 ; despite what is written, the issue also happens on python 3.12, and can be reliably reproduced when running the test from inside a PDB, suggesting some kind of timing issue.
I could reproduce independently on Archlinux an python 3.11 as well as on 3.12 in a debian docker image (https://ci.codeberg.org/repos/12939/pipeline/7/26 ).
py-bt
does not catch anything, suggesting it happens during interpreter shutdown, which would make sense. Our workaround is registering a loop.close() in atexit(), which makes the issue disappear, but my feeling is that this should not happen.I can provide the full backtrace but most values are optimized out, making it much less useful.
The setup requires going through quite a bit of code so we do not have a minimal test case (in terms of code), albeit running our test suite should not be too difficult.
CPython versions tested on:
3.10, 3.11, 3.12
Operating systems tested on:
Linux
Output from running 'python -VV' on the command line:
No response
Linked PRs
The text was updated successfully, but these errors were encountered: