Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the C code for calling Python code: _PyEval_EvalCode() #87156

Closed
markshannon opened this issue Jan 21, 2021 · 25 comments
Closed

Improve the C code for calling Python code: _PyEval_EvalCode() #87156

markshannon opened this issue Jan 21, 2021 · 25 comments
Assignees
Labels
3.10 interpreter-core

Comments

@markshannon
Copy link
Member

@markshannon markshannon commented Jan 21, 2021

BPO 42990
Nosy @gvanrossum, @brettcannon, @rhettinger, @vstinner, @encukou, @markshannon, @serhiy-storchaka, @1st1
PRs
  • #24298
  • #24368
  • #24559
  • #24564
  • #24566
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/markshannon'
    closed_at = <Date 2021-03-17.21:00:33.189>
    created_at = <Date 2021-01-21.16:05:52.112>
    labels = ['interpreter-core', '3.10']
    title = 'Improve the C code for calling Python code: _PyEval_EvalCode()'
    updated_at = <Date 2021-03-17.21:00:33.189>
    user = 'https://github.com/markshannon'

    bugs.python.org fields:

    activity = <Date 2021-03-17.21:00:33.189>
    actor = 'vstinner'
    assignee = 'Mark.Shannon'
    closed = True
    closed_date = <Date 2021-03-17.21:00:33.189>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2021-01-21.16:05:52.112>
    creator = 'Mark.Shannon'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42990
    keywords = ['patch']
    message_count = 25.0
    messages = ['385432', '385491', '385500', '385908', '386061', '387187', '387190', '387192', '387196', '387199', '387223', '387233', '387243', '387244', '387245', '387260', '387304', '387306', '387340', '387341', '387345', '387377', '387401', '387408', '388969']
    nosy_count = 8.0
    nosy_names = ['gvanrossum', 'brett.cannon', 'rhettinger', 'vstinner', 'petr.viktorin', 'Mark.Shannon', 'serhiy.storchaka', 'yselivanov']
    pr_nums = ['24298', '24368', '24559', '24564', '24566']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue42990'
    versions = ['Python 3.10']

    @markshannon
    Copy link
    Member Author

    @markshannon markshannon commented Jan 21, 2021

    Currently, to make a call to Python (modules, classes, etc, not just functions) from C has to use the monster that is _PyEval_EvalCode.
    As Python has adding features over the years, _PyEval_EvalCode has grown and grown. It is time for a refactor.

    _PyEval_EvalCode takes 16, yes sixteen parameters!
    Those 16 parameters fall into two main groups, one describing the function being called, and the other describing the arguments to the call.
    Due to the jumbo size and complexity of _PyEval_EvalCode, we also have specialised forms of it, e.g. _PyFunction_Vectorcall that handle common cases and then bail to _PyEval_EvalCode in other cases.

    In outline _PyEval_EvalCode performs the following actions:

    1. Make a new frame.
    2. Fill in that frame using the arguments and defaults provided.
    3. If the code object has flags set for a generator, or coroutine, return a new generator or coroutine.
    4. Otherwise call the interpreter with the newly created frame.

    As _PyEval_EvalCode or its earlier equivalents have gained arguments, they have a left of list of legacy functions. It is not clear what is the "correct" function to use in C extensions. Hopefully extension code uses the PyObject_Call API, but PyEval_EvalCodeEx is part of the C-API.

    To improve this situation, I propose:

    A new C struct, the "frame descriptor" which contains the code object, defaults, globals, and names that describe the code to be executed. PyFunctionObject would wrap this, to simplify the common case of calling a Python function.

    Split the Eval functions into vector and tuple/dict forms, both forms taking a "frame descriptor", as well as the arguments.

    Mark the older forms of the Eval functions as "legacy", creating a local "frame descriptor" and transforming the arguments in vector or tuple/dict forms, in the legacy functions.

    Factor out the frame creation part of the Eval functions.

    The above changes will be necessary for PEP-651, but I think would be a worthwhile improvement even if PEP-651 is not accepted.

    @markshannon markshannon self-assigned this Jan 21, 2021
    @markshannon markshannon added the interpreter-core label Jan 21, 2021
    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Jan 22, 2021

    PyObject *
    _PyEval_EvalCode(PyThreadState *tstate,
    PyObject *_co, PyObject *globals, PyObject *locals,
    PyObject *const *args, Py_ssize_t argcount,
    PyObject *const *kwnames, PyObject *const *kwargs,
    Py_ssize_t kwcount, int kwstep,
    PyObject *const *defs, Py_ssize_t defcount,
    PyObject *kwdefs, PyObject *closure,
    PyObject *name, PyObject *qualname)

    Hum no, only 16 arguments, everything is fine! :-D

    More seriously, I already considered to rework this code.

    The pseudo code is:

    if (...) return <new generator>;
    frame = (...);
    retval = _PyEval_EvalFrame(tstate, f, 0);
    _PyObject_GC_TRACK(f);
    return retval;

    I like the idea of splitting these two parts:

      f = create_frame_or_gen(...);
      if (<is generator>) return f;
      retval = _PyEval_EvalFrame(tstate, f, 0);
      _PyObject_GC_TRACK(f);
      return retval;

    I see one advantage: stack memory consumation, we don't have to hold the 16 arguments on the stack, only 3 parameters (tstate, f, 0).

    Split the Eval functions into vector and tuple/dict forms, both forms taking a "frame descriptor", as well as the arguments.

    Hum, it seems like you have a different idea how to refactor the code.

    Would it be worth it to have more specialized create_frame_or_gen() functions for the common cases?

    --

    I would also be interested by the ability to not create a frame at all, but I guess that it's a way larger refactoring.

    @vstinner vstinner changed the title Improve the C code for calling Python code Improve the C code for calling Python code: _PyEval_EvalCode() Jan 22, 2021
    @markshannon
    Copy link
    Member Author

    @markshannon markshannon commented Jan 22, 2021

    Rather than:

      f = create_frame_or_gen(...);
      if (<is generator>) return f;
      retval = _PyEval_EvalFrame(tstate, f, 0);
      _PyObject_GC_TRACK(f);
      return retval;

    I was thinking:

      f = create_frame(...);
      if (<is generator>) return make_gen(f);
      retval = _PyEval_EvalFrame(tstate, f, 0);
      _PyObject_GC_TRACK(f);
      return retval;

    The complicated part is create_frame(...), so I want to clean that up first.

    @markshannon
    Copy link
    Member Author

    @markshannon markshannon commented Jan 29, 2021

    New changeset d6c33fb by Mark Shannon in branch 'master':
    bpo-42990: Introduce 'frame constructor' struct to simplify API for PyEval_CodeEval and friends (GH-24298)
    d6c33fb

    @markshannon
    Copy link
    Member Author

    @markshannon markshannon commented Feb 1, 2021

    New changeset 0332e56 by Mark Shannon in branch 'master':
    bpo-42990: Further refactoring of PyEval_ functions. (GH-24368)
    0332e56

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented Feb 17, 2021

    +1 on exposing f,builtins.

    Of course, the thing I'd really want is a way to state that all references to builtins are meant to have the exact semantics of those builtins, so the compiler can translate e.g. len(x) into a new opcode that just calls PyObject_Size(). (I can dream, can't I?)

    Another dream: assume that globals that refer to modules, classes or functions don't change, so they can be cached more aggressively.

    I suppose enough checking of dict version tags can get us there, or at least close enough.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 17, 2021

    +1 on exposing f,builtins.

    This change is related to bpo-43228 "Regression in function builtins": cloudpickle is broken by this issue (new PyFunctionObject.func_builtins member).

    Of course, the thing I'd really want is a way to state that all references to builtins are meant to have the exact semantics of those builtins, so the compiler can translate e.g. len(x) into a new opcode that just calls PyObject_Size(). (I can dream, can't I?)

    I tried to implement such optimization in my old https://faster-cpython.readthedocs.io/fat_python.html project. I implemented guards to de-optimize the code if a builtin is overriden.

    I suppose enough checking of dict version tags can get us there, or at least close enough.

    The dict version comes from my FAT Python project: PEP-509. It is used to optimize method calls. See also PEP-510 and PEP-511 ;-)

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented Feb 17, 2021

    Well maybe I'll be picking up those ideas again... Thanks for documenting
    so much of what you did then!

    @1st1
    Copy link
    Member

    @1st1 1st1 commented Feb 17, 2021

    I tried to implement such optimization in my old https://faster-cpython.readthedocs.io/fat_python.html project. I implemented guards to de-optimize the code if a builtin is overriden.

    FWIW the globals opcode cache handles all of this now. There's no point in specifically optimizing the builtins lookup since we optimize all global lookups for a code object that's hot enough.

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented Feb 18, 2021

    FWIW the globals opcode cache handles all of this now. There's no point in specifically optimizing the builtins lookup since we optimize all global lookups for a code object that's hot enough.

    So you think that even a dedicated "LEN" opcode would not be any faster? (This is getting in Paul Sokolovsky territory -- IIRC he has a variant of Python that doesn't allow overriding builtins.)

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 18, 2021

    New changeset a3c3ffa by Victor Stinner in branch 'master':
    bpo-42990: Add __builtins__ attribute to functions (GH-24559)
    a3c3ffa

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 18, 2021

    I reopen the issue since there is bpo-43228 regression, caused by this issue, which is still under discussion, and Mark also proposed to add a new builtins parameter to the function constructor (FunctionType).

    I wrote PR 24564 to help fixing bpo-43228 regression: with this change, functions now inherit the current builtins if the globals namespace is overriden, but the new globals has no "__builtins__" key. This change is backward incompatible on purpose. If someone really wants to run a function in a different builtins namespace, globals['__builtins__'] must be set explicitly.

    Once PR 24564 will be merged, I plan to write a 3rd PR to add an optional builtins keyword-only parameter to the function constructor (FunctionType).

    @vstinner vstinner reopened this Feb 18, 2021
    @vstinner vstinner added the 3.10 label Feb 18, 2021
    @1st1
    Copy link
    Member

    @1st1 1st1 commented Feb 18, 2021

    So you think that even a dedicated "LEN" opcode would not be any faster? (This is getting in Paul Sokolovsky territory -- IIRC he has a variant of Python that doesn't allow overriding builtins.)

    Yeah, a dedicated LEN opcode could only be faster if it would not be possible to shadow builtins (or if there was a "len" operator in Python). If that's not the case, this hypothetical LEN opcode would still have to check if "len" was shadowed or not, and that's slower than the optimized LOAD_GLOBAL we have now.

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented Feb 18, 2021

    Victor

    the new globals has no "__builtins__" key. This change is backward incompatible on purpose. If someone really wants to run a function in a different builtins namespace, globals['__builtins__'] must be set explicitly.

    You say it's on purpose, what's the purpose? Aren't you worried this is going to break stuff? And why is this necessary given the LOAD_GLOBAL cache?

    Yury

    this hypothetical LEN opcode would still have to check if "len" was shadowed or not, and that's slower than the optimized LOAD_GLOBAL we have now.

    It could use the same check though? Just check the version tags.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 18, 2021

    You say it's on purpose, what's the purpose? Aren't you worried this is going to break stuff?

    There is a subtle behavior difference between Python 3.9 and Python 3.10. func_builtins2.py of bpo-43228 works on Python 3.9 but fails on Python 3.10. With my PR 24564, func_builtins2.py works again on Python 3.10.

    See bpo-43228 for the details.

    And why is this necessary given the LOAD_GLOBAL cache?

    My PR 24564 is not related to LOAD_GLOBAL, but how a frame fills its f_builtins member from a function.

    LOAD_GLOBAL uses f_globals and f_builtins members of a frame.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 18, 2021

    New changeset 44085a3 by Victor Stinner in branch 'master':
    bpo-42990: Refactor _PyFrame_New_NoTrack() (GH-24566)
    44085a3

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 19, 2021

    I rephrased PR 24564 to clarify the scope of the incompatible change: in practice, only the types.FunctionType constructor changes.

    Defining functions in Python using "def function(...): ...", eval(code, {}) and exec(code, {}) are not affected. eval() and exec() already inherit the current builtins if globals does not contain "__builtins__" key.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 19, 2021

    Updated PR documentation:
    ---
    The types.FunctionType constructor now inherits the current builtins
    if the globals parameter is used and the globals dictionary has no
    "__builtins__" key, rather than rather than using {"None": None} as
    builtins: same behavior than eval() and exec() functions.

    Defining a function with "def function(...): ..." in Python is not
    affected, globals cannot be overriden with this syntax: it also
    inherits the current builtins.
    ---

    This PR makes FunctionType makes more consistent with other Python functions.

    Also, it doesn't prevent people to attempt building a "sandbox", it remains possible to override __builtins__ in FunctionType, eval(), exec(), etc.

    Usally, such sandbox pass a modified builtins namespace to eval() and exec() and the functions simply inherit it, functions defines with "def function(...): ..." and functions created with types.FunctionType constructor: my PR only impacts a very specific case, when types.FunctionType is called with a different globals dictionary which has no "__builtins__" key.

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented Feb 19, 2021

    Thanks, that's clearer.

    I'm still worried about the change in semantics where globals["__builtins__"] is assigned a different dict after the function object has been created (similar to https://bugs.python.org/file49816/func_builtins2.py).

    I.e.

    def foo(): return len("abc")
    code = foo.__code__
    g = {"__builtins__": {"len": len}}
    f = FunctionType(code, g)
    f()  # Succeeds
    g["__builtins__"] = {}
    f()  # Fails in 3.9 and before, passes in 3.10

    Assuming code uses len, does f() succeed or fail?

    I realize this is a pretty esoteric, but it does show the change in semantics (from later to earlier binding). Should we care? I like early binding because it allows more optimizations[1], but traditionally Python's semantics use late binding.

    [1] Not in this case, the user could still change the meaning of len() with e.g.

    g["__builtins__"]["len"] = lambda x: return 42

    @markshannon
    Copy link
    Member Author

    @markshannon markshannon commented Feb 19, 2021

    In Python 3.9 the binding is more late-ish binding, than true late binding.

    Because globals['__builtins__'] is cached for each function activation, executing functions don't see updates.

    Example:

    >>> def f():
    ...     print(len("test"))
    ...     bltns = f.__globals__["__builtins__"]
    ...     if hasattr(bltns, "__dict__"):
    ...         bltns = bltns.__dict__
    ...     new = bltns.copy()
    ...     new["len"] = lambda x : 7
    ...     f.__globals__["__builtins__"] = new
    ...     print(len("test"))
    ... 
    >>> 
    >>> f()
    4
    4
    >>> f()
    7
    7

    True late binding would print:

    >>> f()
    4
    7
    >>> f()
    7
    7

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 19, 2021

    Guido: "I'm still worried about the change in semantics where globals["__builtins__"] is assigned a different dict after the function object has been created (...)"

    Well, there is a semantics change of Python 3.10 documented at:
    https://docs.python.org/dev/whatsnew/3.10.html#other-language-changes

    "Functions have a new __builtins__ attribute which is used to look for builtin symbols when a function is executed, instead of looking into __globals__['__builtins__']. (Contributed by Mark Shannon in bpo-42990.)"

    And the function __builtins__ attribute is read-only.

    Your example is not affected by PR 24564 because the globals has the "__builtins__" key.

    In Python 3.10, you can modify func.__builtins__ (new attribute):
    ---

    def foo(s): return len(s)
    code = foo.__code__
    FunctionType = type(foo)
    f = FunctionType(code, {"__builtins__": {"len": len}})
    print(f("abc"))
    f.__builtins__.clear()
    print(f("abc"))

    Output:
    ---

    3
    Traceback (most recent call last):
      (...)
    NameError: name 'len' is not defined

    Mark: "Because globals['__builtins__'] is cached for each function activation, executing functions don't see updates."

    In Python 3.10, if someone wants to hack builtins while the function is running, modifying the builtins namespace in-place works as expected:
    ---

    def f():
        print(len("test"))
        builtins_ns = f.__globals__['__builtins__'].__dict__
        #builtins_ns = f.__builtins__
        builtins_ns['len'] = lambda x: 7
        print(len("test"))
    
    f()

    Output:
    ---
    4
    7
    ---

    It also works with "builtins_ns = f.__builtins__".

    Guido: "I realize this is a pretty esoteric, but it does show the change in semantics (from later to earlier binding). Should we care? I like early binding because it allows more optimizations[1], but traditionally Python's semantics use late binding."

    Modifying built-in functions/types is commonly done in tests. Example:
    ---

    import unittest.mock
    
    def func():
        with unittest.mock.patch('builtins.chr', return_value='mock'):
            return chr(65)
    
    print(func())

    The expected output is: "mock". Overriding an attribute of the builtins module immediately updates func.__builtins__. It works because func.__builtins__ is builtins.__dict__.

    In FAT Python, I implemented an optimization which copies builtin functions to constants, replace LOAD_GLOBAL with LOAD_CONST:
    https://fatoptimizer.readthedocs.io/en/latest/optimizations.html#copy-builtin-to-constant

    This optimization breaks this Python semantics, it is no longer possible to override builtin functions in tests:
    https://fatoptimizer.readthedocs.io/en/latest/semantics.html#builtin-functions-replaced-in-the-middle-of-a-function

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented Feb 20, 2021

    Thanks for the clarifications, Victor!

    So I now understand: the *identity* of the builtins dict used by a function's code is captured when the function object is created. And I understand why: it's so that in the fast path for the LOAD_GLOBAL opcode we won't have to do a dict lookup in globals to find the builtins.

    Regarding the text in What's New 3.10 about this at https://docs.python.org/dev/whatsnew/3.10.html#other-language-changes, I recommend adding there that func.__builtins__ is initialized from globals["__builtins__"], if it exists, else from the frame's builtins, when the function object is created; like you state in #24564. Or perhaps make one of these paragraphs refer to the other for details, since they are duplicate mentions of the same behavior change (once the latter PR lands).

    Also, thanks to you and Mark and everyone else who has worked on optimizations like the globals cache and the method cache.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 20, 2021

    Regarding the text in What's New 3.10 about this at https://docs.python.org/dev/whatsnew/3.10.html#other-language-changes, I recommend adding there that func.__builtins__ is initialized from globals["__builtins__"], if it exists, else from the frame's builtins, when the function object is created; like you state in #24564.

    Good idea, I updated my PR.

    Or perhaps make one of these paragraphs refer to the other for details, since they are duplicate mentions of the same behavior change (once the latter PR lands).

    IMO it's useful to have two different paragraphs. One about the new attribute which is not really a semantics change, and one about the semantics changes when globals["__builtins__"] doesn't exist. For people who only care about Python 3.10 incompatible changes, they can simply read the Porting to Python 3.10 > Changes in the Python API section ;-)

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 20, 2021

    New changeset 46496f9 by Victor Stinner in branch 'master':
    bpo-42990: Functions inherit current builtins (GH-24564)
    46496f9

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Mar 17, 2021

    I checked manually cloudpickle_bug.py attached to bpo-43228: it works as expected. The bpo-43228 regression has been fixed. I close again the issue.

    It was proposed to add a builtins parameter to the types.FunctionType constructor. If someone wants to add it: please go ahead ;-)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 interpreter-core
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants