-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
PEP 511: code.co_lnotab: use signed line number delta to support moving instructions in an optimizer #70295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Python doesn't store the original line number in the .pyc file in the bytecode. Instead, an efficient table is used to find the line number from the current in the bytecode: code.co_lnotab. Basically, it's a list of (offset_delta, line_number_delta) pairs where offset_delta and line_number_delta are unsigned 8 bits numbers. If an offset delta is larger than 255, (offset_delta % 255, line_number_delta) and (offset_delta // 255, 0) pairs are emited. Same for line_number_delta. (In fact, more than two pairs can be created.) The format is described in Objects/lnotab_notes.txt. I implemented an optimizer which can generate *negative* line number. For example, the loop: for i in range(2): # line 1
print(i) # line 2 is replaced with: i = 0 # line 1
print(i) # line 2
i = 1 # line 1
print(i) # line 2 The third instruction has a negative line number delta. I'm not the first one hitting the issue, but it's just that no one proposed a patch before. Previous projects bitten by this issue:
Attached patch changes the type of line number delta from unsigned 8-bit integer to *signed* 8-bit integer. If a line number delta is smaller than -128 or larger than 127, multiple pairs are created (as before). My code in Lib/dis.py is inefficient. Maybe unpack the full lnotab than *then* skip half of the bytes? (instead of calling struct.unpack times for each byte). The patch adds also "assert(Py_REFCNT(lnotab_obj) == 1);" to PyCode_Optimize(). The assertion never fails, but it's just to be extra safe. The patch renames variables in PyCode_Optimize() because I was confused between "offset" and "line numbers". IMHO variables were badly named. I changed the MAGIC_NUMBER of importlib, but it was already changed for f-string: # Python 3.6a0 3360 (add FORMAT_VALUE opcode bpo-25483) Is it worth to modify it again? You may have to recompile Python/importlib_external.h if it's not recompiled automatically (just touch the file before running make). Note: this issue is related to the PEP-511 (the PEP is not ready for a review, but it gives a better overview of the use cases.) |
New changeset c4a826184937 by Victor Stinner in branch 'default': |
The main visible change is that .pyc files can be a little bit larger and so use more disk space. Well... in fact only .pyc of files using line number delta larger than 127. |
Patch version 2 to take Serhiy's review in account:
Note: avoid also useless "if (x != NULL)" checks before calling PyMem_Free(). PyMem_Free(NULL) is well specified: do nothing. |
Patch version 3:
|
I don't see directly the link between this issue and the bpo-17611, but cool if it helps to implement new optimizations :-) I compared my patch with bpo-16956 patch:
Additionally, my patch uses better names in the peephole optimizer, but it's not directly related to the issue. By the way, this change should be commited in a separated patch. I prefer to push my recent. By the way, it's up to date, whereas bpo-16956 patch requires a rebase. |
Yes, you patch supersedes bpo-16956 patch. Added new comments on Rietveld for lnotab_notes.txt. I afraid this patch can cause problems with code tracing where it is assumed that lines are increased monotonically and *instr_lb <= frame->f_lasti < *instr_ub. We should carefully analyze the effect of the patch on the tracing. Before committing you must ask Guido for approval. AFAIK his had objections against code transformations that make debugging harder. |
We have many unit test in the Python test suite which rely on exact line numbers. Examples:
I know them because they were all broken when my fatoptimizer project had bugs related to line numbers :-) With my patch, the full Python test suite pass whereas my patch doesn't modify any test.
First, my patch has no impact on frame->f_lasti. The trace module and test_sys_settrace use frame.f_lineno which PyFrame_GetLineNumber(). This function returns f->f_lineno if the frame has a trace function, or PyCode_Addr2Line(). PyCode_Addr2Line() and PyFrame_GetLineNumber() still work with my patch. When you trace a program, "negative line delta" and "negative instruction offset" are not new in Python: it's a basic requirement to support loop, when you compare two instructions seen by the tracer. To be clear, my patch does *not* introduce negative line number delta in co_lnotab. It only *adds support* for negative line number delta. If a tool decodes co_lnotab using 8-bit unsigned number for line number delta, the tool still works even with the patch. It only starts to return wrong line numbers if you start debugging a program which has negative line numbers. If you use fatoptimizer, you get such negative delta. But if you use an optimizer, you should be prepared to some subtle differences. The good practice is to disable all optimizers when you debug code. It's also really hard (or impossible) to debug C code optimized with -O3. I always use gcc -O0 to debug CPython.
Are you aware of tools decoding directly co_lnotab? Oh, I forgot the old Misc/gdbinit script which *does* decode directly co_lnotab. Does anyone still use it? If yes, it should also be updated. I failed to reproduce the bug with Misc/gdbinit, beacuse bug only occurs if you debug a program which uses negative line number, and CPython doesn't produce negative line number in co_lnotab by default... So it would be "nice" to also support negative line number in Misc/gdbinit, but maybe it's ok to let this old script dying? :-D |
Ah! I found Ned Batchelder's coverage project which has a _bytes_lines() method "adapted from dis.py in the standard library". The method uses directly co_lnotab to compute line numbers. Ok, *this project* will have to be updated if it wants to support fatoptimizer and other code transformers producing negative line numbers. Maybe I can contribute to it with a patch if my change to CPython 3.6 is accepted ;-) |
I just wanted to comment on "don't change importlib MAGIC: we only change it between Python minor versions": that's actually not true. Feel free to up the number whenever you make a change that affects eval.c or bytecode. Otherwise .pyc files won't be regenerated. And that number is cheap anyway and isn't about to run out, so don't worry about updating it multiple times before the code sees a public release. |
Brett Cannon added the comment:
Since my patch may break setup of multiple python developers, it can |
But there is no need to increase it by 10. I suppose the gap is added to allow updating bytecode in maintained releases, but in process of developing next version we don't need this. |
There's technically no need to worry about ranged values as the magic number is purely an equality check to see if the interpreter matches what the .pyc was created with. I guess there might be third-party code that does a range check, but that's bad as importlib checks the raw bytes only; using a number is mostly a convenience for changing it. |
The launcher on Windows does a range check. |
Patch version 4:
I don't expect my patch to be complete nor perfect. IMHO it's fine to adjust the code later if needed. I would like to integrate FAT Python changes step by step. It looks like the general idea of AST optimizers is well accepted. |
Nice work, my bpo-21385 is also related. Basically, transforming non-Python code into Python meant that all line number information, which otherwise would have been useful for debugging, had to be discarded, or debug builds of Python would dump cores. So, bye "assert(d_lineno >= 0);", you won't be missed. |
New changeset 775b74e0e103 by Victor Stinner in branch 'default': |
Ok, it looks like there are multiple use cases for negative line numbers, and the change doesn't really break anything in practice. I tried to explain exactly who is impacted and how to update the code in the Porting section of What's New in Python 3.6. For each review Serhiy. I pushed the the change to Python 3.6. |
New changeset c6fb1651ea2e by Victor Stinner in branch 'default': |
New changeset 16f60cd918e0 by Victor Stinner in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: