Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code.replace() fails to preserve CO_FAST_HIDDEN flag on locals #110543

Open
rokm opened this issue Oct 9, 2023 · 6 comments
Open

code.replace() fails to preserve CO_FAST_HIDDEN flag on locals #110543

rokm opened this issue Oct 9, 2023 · 6 comments
Assignees
Labels
3.12 bugs and security fixes 3.13 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@rokm
Copy link

rokm commented Oct 9, 2023

Bug report

Bug description:

Support for inlining list/dict/set comprehensions in c3b595e introduced a CO_FAST_HIDDEN, which is applied in combination with a different type code, for example CO_FAST_LOCAL. However, when the code object is copied via code.replace() function call, this additional flag is lost; consequently, execution of the returned code object results in a bizarre-looking error.

Example:

Consider the following example program

# program.py
import sys

if len(sys.argv) != 2:
    print(f"usage: {sys.argv[0]} <dir|locals|globals>")
    sys.exit(1)
mode = sys.argv[1]

# The comprehension must use same variable name as the code that attempts `del`.
_allvalues = ''.join([myobj for myobj in ['a', 'b', 'c']])

myobj = None  # for del below
if mode == 'dir':
    print("DIR():", dir())
elif mode == 'locals':
    print("LOCALS():", locals())
elif mode == 'globals':
    print("GLOBALS():", globals())

del myobj

and the following script that compiles the program to byte-code .pyc:

# compile_script.py
import sys
import os
import struct
import marshal
import importlib.util

if len(sys.argv) < 3:
    print(f"usage: {sys.argv[0]} <source> <dest> [0|1]")
    sys.exit(1)

filename = sys.argv[1]
out_filename = sys.argv[2]

strip_co = False if len(sys.argv) < 4 else sys.argv[3] != '0'

with open(filename, 'rb') as fp:
    src = fp.read()

co = compile(src, filename, 'exec')
if strip_co:
    co = co.replace()  # In real use-case, we would be replacing filename here

with open(out_filename, 'wb') as fp:
    fp.write(importlib.util.MAGIC_NUMBER)
    fp.write(struct.pack('<I', 0b01))  # PEP-552: hash-based pyc, check_source=False
    fp.write(b'\00' * 8)  # Zero the source hash
    marshal.dump(co, fp)

For some context, the above example is a distilled reproduction of what is going in PyInstaller and scipy.stats._distn_infrastructure module in pyinstaller/pyinstaller#7992: the collected module is byte-compiled, and the absolute filename in the code-object is anonymized into environment-relative path via co.replace() (see here for details).

But in the above example, no replacement is done, and so one would expect of co.replace() to return an identical code object.

However, this is not the case (even though co == co.replace() in python claims that they are identical):

$ python3.12 compile_script.py program.py compiled-orig.pyc 0  # Compile without co.replace()
$ python3.12 compile_script.py program.py compiled-copy.pyc 1  # Compile with co.replace()
$ sha256sum *.pyc
2e03af03bcbb41b3a6cc6f592f5143acf7d82edc089913504c1f8446764795e1  compiled-copy.pyc
5034955819efba0dc7ff3ee94101c1f6dfe33b102d547efc77577d77a99f1732  compiled-orig.pyc

Running the original version:

$ python3.12 compiled-orig.pyc globals
GLOBALS(): {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourcelessFileLoader object at 0x7fe7fb327830>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '[...]/compiled-orig.pyc', '__cached__': None, 'sys': <module 'sys' (built-in)>, 'mode': 'globals', '_allvalues': 'abc', 'myobj': None}

$ python3.12 compiled-orig.pyc dir
DIR(): ['__annotations__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_allvalues', 'mode', 'myobj', 'sys']

$ python3.12 compiled-orig.pyc locals
LOCALS(): {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourcelessFileLoader object at 0x7f2846527830>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '[...]/compiled-orig.pyc', '__cached__': None, 'sys': <module 'sys' (built-in)>, 'mode': 'locals', '_allvalues': 'abc', 'myobj': None}

Running the version with co.replace():

$ python3.12 compiled-copy.pyc globals
GLOBALS(): {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourcelessFileLoader object at 0x7fd7f1b27830>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '[...]/compiled-copy.pyc', '__cached__': None, 'sys': <module 'sys' (built-in)>, 'mode': 'globals', '_allvalues': 'abc', 'myobj': None}

$ python3.12 compiled-copy.pyc dir
DIR(): ['__annotations__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_allvalues', 'mode', 'sys']
Traceback (most recent call last):
  File "program.py", line 20, in <module>
    del myobj
        ^^^^^
NameError: name 'myobj' is not defined

$ python3.12 compiled-copy.pyc locals
LOCALS(): {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourcelessFileLoader object at 0x7f8a35d27830>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '[...]/compiled-copy.pyc', '__cached__': None, 'sys': <module 'sys' (built-in)>, 'mode': 'locals', '_allvalues': 'abc'}
Traceback (most recent call last):
  File "program.py", line 20, in <module>
    del myobj
        ^^^^^
NameError: name 'myobj' is not defined

Comparing the compiled-orig.pyc and compiled-copy.pyc in a hex editor, there is one byte of difference; its position corresponds to marshaled co_localspluskinds, and the value is 0x30 (CO_FAST_LOCAL | CO_FAST_HIDDEN) in original and 0x20 (CO_FAST_LOCAL) in copy variant.

CPython versions tested on:

3.12

Operating systems tested on:

Linux, Windows

Linked PRs

@rokm rokm added the type-bug An unexpected behavior, bug, or error label Oct 9, 2023
@JelleZijlstra JelleZijlstra self-assigned this Oct 9, 2023
@JelleZijlstra JelleZijlstra added 3.12 bugs and security fixes 3.13 new features, bugs and security fixes labels Oct 9, 2023
@JelleZijlstra
Copy link
Member

code.replace ultimately calls PyUnstable_Code_NewWithPosOnlyArgs, which computes the localsplusflags but never sets CO_FASTHIDDEN. It's not obvious we have enough information at that point to set the flag, but I'll keep looking.

@JelleZijlstra
Copy link
Member

Possible solution: iterate over the bytecode and if we encounter a LOAD_FAST_AND_CLEAR and we're in a non-function scope, set CO_FAST_HIDDEN on that name.

Also note that the bizarre errors happen only if you call dir()/globals()/locals() after the listcomp; this is implicit in the above bug report but it confused me for a bit while I was trying to write a test. I believe it's because these functions end up calling _PyFrame_GetLocals, which can modify the locals dict in some cases.

@JelleZijlstra
Copy link
Member

@carljm fyi. gh-110586 is a working fix, though not the prettiest.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 8, 2023
…ythonGH-110586)

(cherry picked from commit 0b718e6)

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
carljm pushed a commit that referenced this issue Nov 8, 2023
…H-110586) (#111866)

gh-110543: Fix CodeType.replace in presence of comprehensions (GH-110586)
(cherry picked from commit 0b718e6)

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
@hugovk
Copy link
Member

hugovk commented Nov 9, 2023

Thanks!

@hugovk hugovk closed this as completed Nov 9, 2023
@carljm
Copy link
Member

carljm commented Nov 9, 2023

Per discussion on the merged PR, I would like to leave this issue open to track a cleaner fix for main (that we weren't comfortable putting in 3.12.)

@carljm carljm reopened this Nov 9, 2023
@hugovk
Copy link
Member

hugovk commented Nov 9, 2023

Sure, sorry for the premature close.

@iritkatriel iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants