-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
SEGFAULT when running a given coroutine #72968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, I stumbled upon a SEGFAULT while trying Python 3.6.0 on a project using The bug appears between 3.6.0a4 (most recent version tested not affected) and I also produced a traceback using gdb (see bellow). The segfault happens around the "await" in the body of Cursor._read_data(), Cheers, Case (also attached as test.py) : loop = asyncio.get_event_loop()
class Connection:
def read_until(self, *args, **kwargs):
return self
async def all(self):
return b"\n"
class Cursor:
def __init__(self):
self._connection = Connection()
self._max_bytes = 100
self._data = bytearray()
async def _read_data(self):
# XXX segfault there, if I change anything in the code, it works...
while True:
data = await self._connection.read_until(
b'\n', max_bytes=self._max_bytes).all()
self._max_bytes -= len(data)
if data == b'\n':
break
self._data.extend(data)
async def main():
await Cursor()._read_data()
loop.run_until_complete(main()) Traceback extract (with Python3.6.0b4, --with-pydebug on Linux): Program received signal SIGSEGV, Segmentation fault. |
I'm marking this as a release blocker since it is a (new) segfault. |
FWIW, hg bisect finds: changeset 103444:a77756e480c2: bad Also, FWIW, the successful runs prior to this revision report: base_events.py:481: ResourceWarning: unclosed event loop <_UnixSelectorEventLoop running=False closed=False debug=False> |
Wow, Martin, this is a very interesting one. Thanks so much for tracking this down and reducing the test case. So far this doesn't seem to be related to async/await: the interpreter stack seems to enter some strange state under some conditions, and async/await just happens to be the thing that explodes. To those trying to debug this: you don't need asyncio, just replace |
Oh wow, the bug is tricky. _PyGen_yf() checks if the next instruction is YIELD_FROM using code[f_lasti+2]. The problem is that WORDCODE kept f_lasti == -1 for a frame not executed yet: f_lasti+2 is 1 in this case. Problem: code[1] is not an operation code, but the argment. Python 3.6 bytecode now uses the "wordcode" format: 16-bit units of (operation, argument). The obvious and simple fix is to add a special case for f_lasti==-1 in _PyGen_yf(): see attached patch. pygen_yf.patch fixes the crash. -- Another much larger change would be to change f_lasti to -2... In the rewiew of the huge WORDCODE patch, if I recall correctly, I asked Demur to use -1 for backward compatibility. Or maybe I asked to kept the backward compatibility at the Python level using a getter converting -2 to -1... I don't recall correctly. See http://bugs.python.org/issue26647 for wordcode. |
It would be nice if Demur Rumed and/or Serhiy Storchaka can review pygen_yf.patch since Demur wrote the wordcode patch and Serhiy reviewed the wordcode patch. |
Attached lasti.patch implements this idea. I consider that it makes the C code simpler because getting the next instruction (f_lasti + 2) doesn't require a special case anymore. My patch keeps f_lasti == -1 at the Python level for backward compatibility. lasti.patch is only a backward incompatible change at the C level. -- Between pygen_yf.patch and lasti.patch, I prefer lasti.patch even if 3.6 is at its last beta version before the final version. I prefer to fix the C API. Later it will be much harder to fix it. -- I read again the wordcode issue bpo-26647: I wrote on the review of wpy7.patch: "The overall change LGTM, but I'm no more 100% sure that starting f_lasti=-1 is safe." I wrote: "IMHO it's ok to break the C API, but I would prefer to keep the backward compatibility for the Python API (replace any negative number with -1 for the Python API)." Serhiy: "I think we should make yet few related changes: (...) * Change f_lasti, tb_lasti etc to count code units instead of bytes." |
I had considered this, for some reason I didn't realize code[1] could be equal to YIELD_FROM, felt it'd always be false f_lasti being -2 has always been my preference, lasti.patch lgtm |
How about we commit pygen_yf.patch to 3.6 and whatever else in 3.7? lasti.patch is too invasive for 3.6 at this point, -1 for it in 3.6. |
I prefer pygen_yf.patch (with addressing Yury's suggestions). It is much simpler, and in any case I want to change f_lasti to count words rather than bytes (bpo-27129). It would be again -1 at the start. |
Updated patch. I added a NEWS entry and the two assertions from lasti.patch. These assertions are basic sanity checks, not directly related to this issue, they shouldn't harm. A started frame must never go back to the not-started state (f_lasti < 0). Yury: "How about we commit pygen_yf.patch to 3.6 and whatever else in 3.7?" Wordcode is already a massive change in our bytecode, I would prefer to not touch f_lasti just as a late update of WORDCODE in 3.7. I'm ok to keep f_lasti == -1 for frames not started yet. Serhiy: "I prefer pygen_yf.patch (with addressing Yury's suggestions)." Ok, fine. I'm also more confident in smaller changes when we are very close to the final release! So let's forget lasti.patch ;-) (Sorry Demur!) -- FYI attached test.py reproduces the bug because Cursor._read_data() starts with the instruction "SETUP_LOOP 72" and 72 is the code of the operation YIELD_FROM :-) To reproduce the bug, you must have a code object where the second byte is 72. It's not easy to control the generated bytecode from the Python code, so I decided to not write an unit test for this bug. |
pygen_yf-2.patch LGTM. Agreed about tests. |
LGTM |
New changeset 303cedfb9e7a by Victor Stinner in branch '3.6': |
Thanks Martin Richard to continue to find (and report!) crazy corner cases ;-) |
Thank you all for fixing this so quickly, it's been done amazingly fast! |
Martin Richard: "Thank you all for fixing this so quickly, it's been done amazingly fast!" Even if I didn't write the commit a77756e480c2 which introduced the regression, I pushed it and so I felt responsible. It's a good motivation to fix it quickly. So nobody notice :-D |
(Victor also merged this into default branch for 3.7.0 in 6453ff3328b8) |
Misc/NEWS
so that it is managed by towncrier #552Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: