-
-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler emits EXTENDED_ARG + NOP sequence in 3.10 #89918
Comments
dis module incorrectly handles the instruction sequence where EXTENDED_ARG is followed by NOP, e.g.,: EXTENDED_ARG 0x01 The above sequence loads the constant from index 0x0129, whereas dis attempts to load it from index 0x010129, resulting in "IndexError: tuple index out of range error" when attempting to iterate over instructions returned by dis.get_instructions(). Complete example: # *** example.py begin ***
import types
import dis
# Create a test code object...
constants = [None] * (0x000129 + 1)
constants[0x000129] = "Hello world!"
code = types.CodeType(
0, # argcount
0, # posonlyargcount
0, # kwonlyargcount
0, # nlocals
1, # stacksize
64, # flags
bytes([
0x90, 0x01, # EXTENDED_ARG 0x01
0x09, 0xFF, # NOP 0xFF
0x90, 0x01, # EXTENDED_ARG 0x01
0x64, 0x29, # LOAD_CONST 0x29
0x53, 0x00, # RETURN_VALUE 0x00
]), # codestring=
tuple(constants), # constants
(), # names
(), # varnames
'<no file>', # filename
'code', # name
1, # firstlineno
b'' # linetable
)
# ... and eval it to show that NOP resets EXTENDED_ARG
print("Output:", eval(code))
# Try to list all instructions via dis
print(list(dis.get_instructions(code))) # *** example.py end *** Running the example gives us: Output: Hello world!
Traceback (most recent call last):
File "/home/rok/example.py", line 35, in <module>
print(list(dis.get_instructions(code)))
File "/usr/lib64/python3.10/dis.py", line 338, in _get_instructions_bytes
argval, argrepr = _get_const_info(arg, constants)
File "/usr/lib64/python3.10/dis.py", line 292, in _get_const_info
argval = const_list[const_index]
IndexError: tuple index out of range To fix the problem on dis side, the extended_arg in dis._unpack_opargs should be reset to 0 when NOP (or perhaps any opcode without argument) is encountered. I.e., by adding "extended_arg = 0" here: Line 525 in 9127520
The problematic behavior was reported by users of PyInstaller under python 3.10; it seems that python 3.10 generates bytecode involving EXTENDED_ARG + NOP for telegram.message.py: The original PyInstaller issue with preliminary analysis is here: pyinstaller/pyinstaller#6301 |
Thanks for the report. This should fix it, I'll create a patch: @@ -428,6 +428,7 @@ def _unpack_opargs(code):
extended_arg = (arg << 8) if op == EXTENDED_ARG else 0
else:
arg = None
+ extended_arg = 0
yield (i, op, arg) |
+ Mark for FYI. The behaviour of dis is not new, I wonder if generating code with EXTENDED_ARG + NOP is new? |
The EXTENDED_ARG + NOP sequence seems to be specific to python 3.10; we could not reproduce the original bug (dis failing on telegram.message) with earlier python versions. The combination makes little sense, so perhaps it is a bug on its own? |
I have a hunch that this is caused in optimize_basic_block. In general, we don't bother clearing the oparg when peepholing instructions into NOPs. We could either start becoming more principled about that and fix all of the existing NOPs, or just update instrsize and write_op_arg to ignore args for instructions that don't use them. The latter seems easier and less error-prone. |
We should also verify that the interpreter doesn’t have its own version of the dis bug. |
I don't think that it does, since oparg gets totally reassigned each time we load a new instruction. EXTENDED_ARG actually needs to hack around this by advancing the instruction itself, updating oparg, and jumping straight into the next opcode. |
The 3.9 backport had a conflict, but I think it's not worth worrying about because this scenario only showed up in 3.10. |
Right, it's like this: TARGET(EXTENDED_ARG) {
int oldoparg = oparg;
NEXTOPARG();
oparg |= oldoparg << 8;
PRE_DISPATCH_GOTO();
DISPATCH_GOTO();
} It's seems correct (the next arg will be NOP so nothing happens). But it's wasteful. |
Indeed. Do you plan on removing the extra EXTENDED_ARGs in instrsize and write_op_arg? I can take care of it if not. |
That sounds like a good idea. Please go ahead. |
Here's a (more) minimal reproducer I've been able to create:
It's not really that hard to create unused arguments... it's just hard to make them large enough to need an EXTENDED_ARG! For example, Given the pretty tight coupling (and speed/simplicity) of instrsize and write_op_arg, I now think the best fix would instead just be adding another pass immediately before assemble_jump_offsets that does something like this:
Even though it's quite straightforward to fix, I only really think it's worth doing if it annoys us enough that many non-argument-using instructions are indeed emitted with nonzero arguments. I'm starting to feel like it will only complicate things and possibly slow down compilation times (granted, probably by a very small amount, but it's not nothing). Besides the above example, (which was informed by the linked PyInstaller discussion, thanks Rok), I haven't actually been able to coax the compiler into emitting these extra EXTENDED_ARGs in *any* other situation. TL;DR: This is super rare, doesn't actually cause any problems, and incurs a small compile-time cost to "fix". Close as "won't fix"? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: