Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable automatic update of frame locals during tracing #86363

Closed
fabioz mannequin opened this issue Oct 29, 2020 · 19 comments
Closed

Disable automatic update of frame locals during tracing #86363

fabioz mannequin opened this issue Oct 29, 2020 · 19 comments
Labels
3.11 bug and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@fabioz
Copy link
Mannequin

fabioz mannequin commented Oct 29, 2020

BPO 42197
Nosy @ncoghlan, @vstinner, @fabioz, @nedbat, @markshannon
PRs
  • bpo-42197: Disable automatic update of frame locals during tracing #23028
  • bpo-42197: Don't create f_locals dictionary unless we actually need it. #32055
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-10-29.14:16:41.031>
    labels = ['interpreter-core', '3.11', 'performance']
    title = 'Disable automatic update of frame locals during tracing'
    updated_at = <Date 2022-03-28.06:57:42.353>
    user = 'https://github.com/fabioz'

    bugs.python.org fields:

    activity = <Date 2022-03-28.06:57:42.353>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2020-10-29.14:16:41.031>
    creator = 'fabioz'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42197
    keywords = ['patch']
    message_count = 19.0
    messages = ['379874', '385320', '385323', '385329', '385336', '396563', '396572', '396574', '396576', '411108', '411111', '411112', '411114', '411145', '411177', '411210', '415801', '415998', '416145']
    nosy_count = 6.0
    nosy_names = ['ncoghlan', 'vstinner', 'fabioz', 'nedbat', 'Mark.Shannon', 'python-dev']
    pr_nums = ['23028', '32055']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue42197'
    versions = ['Python 3.11']

    @fabioz
    Copy link
    Mannequin Author

    fabioz mannequin commented Oct 29, 2020

    Right now, when a debugger is active, the number of local variables can affect the tracing speed quite a lot.

    For instance, having tracing setup in a program such as the one below takes 4.64 seconds to run, yet, changing all the variables to have the same name -- i.e.: change all assignments to a = 1 (such that there's only a single variable in the namespace), it takes 1.47 seconds (in my machine)... the higher the number of variables, the slower the tracing becomes.

    import time
    t = time.time()
    
    def call():
        a = 1
        b = 1
        c = 1
        d = 1
        e = 1
        f = 1
    
    def noop(frame, event, arg):
        return noop
    
    import sys
    sys.settrace(noop)
    
    for i in range(1_000_000):
        call()
    
    print('%.2fs' % (time.time() - t,))
    

    This happens because PyFrame_FastToLocalsWithError and PyFrame_LocalsToFast are called inside the call_trampoline (https://github.com/python/cpython/blob/master/Python/sysmodule.c#L946).

    So, I'd like to simply remove those calls.

    Debuggers can call PyFrame_LocalsToFast when needed -- otherwise mutating non-current frames doesn't work anyways. As a note, pydevd already has such a call: https://github.com/fabioz/PyDev.Debugger/blob/0d4d210f01a1c0a8647178b2e665b53ab113509d/_pydevd_bundle/pydevd_save_locals.py#L57 and PyPy also has a counterpart.

    As for PyFrame_FastToLocalsWithError, I don't really see any reason to call it at all.

    i.e.: something as the code below prints the a variable from the main() frame regardless of that and I checked all pydevd tests and nothing seems to be affected (it seems that accessing f_locals already does this:

    frame_getlocals(PyFrameObject *f, void *closure)
    , so, I don't see much reason to call it at all).

    def call():
        import sys
        frame = sys._getframe()
        print(frame.f_back.f_locals)
       
    def main():
        a = 1
        call()
       
    if __name__ == '__main__':
        main()
    

    So, the idea is removing those lines (I expect that I'll be able to provide a pull request for this).

    @fabioz fabioz mannequin added 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Oct 29, 2020
    @vstinner
    Copy link
    Member

    Fabio created a thread on python-dev:
    https://mail.python.org/archives/list/python-dev@python.org/thread/62WK3THUDNWZCDOMXXDZFG3O4LIOIP4W/

    In 2017, Nick Coghlan wrote the PEP-558 "Defined semantics for locals()".

    @vstinner
    Copy link
    Member

    Mark: do you think that it's an acceptable trade-off to ask debuggers and profilers to call explicitly PyFrame_FastToLocalsWithError() and PyFrame_LocalsToFast(), to optimize the common case, when locals are not needed?

    @markshannon
    Copy link
    Member

    I don't think so.
    Tracing is already somewhat fragile, see https://bugs.python.org/issue30744.
    Making it more complex is likely to add more bugs.

    PEP-558 should help, as f_locals becomes a proxy.
    There are some minor issues with PEP-558 (I'm not convinced that it is completely robust), but it should fix this issue.

    Ultimately, what we need is a decent debugger API, something along the lines of the JVMTI.

    @fabioz
    Copy link
    Mannequin Author

    fabioz mannequin commented Jan 20, 2021

    I agree that it can be made better, but I think most of the issues right now comes from CPython trying to automatically do something that's not bound to work (calling PyFrame_FastToLocals/PyFrame_LocalsToFast under the hood during the tracing call).

    https://bugs.python.org/issue30744 is a great example of why that can never work properly (debuggers need to manage that much more carefully, just doing it on all calls is really bound to not work).

    So, the current implementation besides being broken also makes things pretty slow.

    I agree that PEP-558 may fix things here (but it's much more work).

    As a disclaimer, pydevd actually uses a different tracer which doesn't do those calls and when possible uses the frame eval to modify the bytecode directly to minimize the overhead, so, in practice the current support given by CPython for debugger is pretty reasonable for doing a fast debugger (but there are a few caveats that it must work around -- such as the auto PyFrame_FastToLocals/PyFrame_LocalsToFast calls).

    @ncoghlan
    Copy link
    Contributor

    I've updated PEP-558 with a reference back to this ticket as additional motivation (the update also addresses several of the fragility concerns that Mark raised): python/peps#1787

    There's still work to be done to bring the reference implementation into line with the updated proposal, but I'm making it an explicit goal to get the proposal submitted for pronouncement in time for inclusion in 3.11.

    @ncoghlan ncoghlan added 3.11 bug and security fixes and removed 3.10 only security fixes labels Jun 27, 2021
    @fabioz
    Copy link
    Mannequin Author

    fabioz mannequin commented Jun 27, 2021

    @ncoghlan I took a quick look at the PEP...

    I'm a bit worried about:

    On optimised frames, the Python level f_locals API will become a direct read/write proxy for the frame's local and closure variable storage, and hence no longer support storing additional data that doesn't correspond to a local or closure variable on the underyling frame object.

    In particular, it's common for a debugger to evaluate expressions in a context that would change the locals to add new variables.

    i.e.: stopping at a breakpoint and doing 2 exec calls with frame.f_locals with:
    import some_module
    v = some_module.compute(xxx)

    So, it's expected that some_module and v would be in the locals at this point.

    Right now after each line of the evaluation, a PyFrame_FastToLocals must be called so things work as they should, but given that the PEP explicitly says that it should be deprecated, and this being a common feature for a debugger, what'd be the proper way to support that?

    p.s.: should I ask such questions regarding the PEP somewhere else instead of this issue or is this an appropriate place?

    @markshannon
    Copy link
    Member

    So, it's expected that some_module and v would be in the locals at this point.

    If a function does not have the local variables some_module and v, then the change wouldn't be visible to the debugee.
    So what difference does it make?

    @fabioz
    Copy link
    Mannequin Author

    fabioz mannequin commented Jun 27, 2021

    So, it's expected that some_module and v would be in the locals at this point.

    If a function does not have the local variables some_module and v, then the change wouldn't be visible to the debugee.
    So what difference does it make?

    Right now such changes are visible to the debugee in the locals frames if a user does the exec and calls PyFrame_FastToLocals right afterwards (even if they weren't initially there).

    So, it's the difference between being able to import a module and creating/manipulating new variables in an exec in any frame (as it works right now) or not being able to make it at all (if that feature is deprecated as is being implied in the PEP).

    @markshannon
    Copy link
    Member

    While the various frame and debugger PEPs that are open offer a better solution to this, they might not be accepted for 3.11.

    So I'd like to revisit this.

    Removing the calls to PyFrame_FastToLocals and friends cuts the overhead of tracing down a lot. A lot.

    Using Fabio's example I get the following slowdowns:
    No calls to PyFrame_FastToLocals`: x10
    Main branch with one local variable: x16
    Main branch with six locals variables: x50

    This is quite a compelling reason to remove the calls to PyFrame_FastToLocals.

    The questions is "what will we break doing this"?
    All the tests pass, so that seems promising.

    Ned, how would this impact coverage.py?

    @nedbat
    Copy link
    Member

    nedbat commented Jan 21, 2022

    Off the top of my head, I'm not sure this affects coverage.py at all, but I could be missing something. Does PR 23028 have all the changes, or is there some other python/cpython branch I can test with?

    @markshannon
    Copy link
    Member

    Yes the PR has all the changes.

    It is just the changes sysmodule.c that you need:
    https://github.com/python/cpython/pull/23028/files#diff-a3a5c73931235f7f344c072dc755d6508e13923db3f5d581c5e88652075871cb

    @markshannon
    Copy link
    Member

    @vstinner
    Copy link
    Member

    As I already wrote previously, I'm a supporter of requiring debuggers and profilers to explicitly call PyFrame_FastToLocalsWithError() and PyFrame_LocalsToFast(). It should only impact a minority of users, whereas the majority will benefit of way better performance, no?

    @nedbat
    Copy link
    Member

    nedbat commented Jan 21, 2022

    I ran the coverage.py test suite using https://github.com/faster-cpython/cpython/tree/dont-fast-to-locals-in-trampoline, and there were no failures.

    @nedbat
    Copy link
    Member

    nedbat commented Jan 21, 2022

    And btw, the tests with that branch ran at least twice as fast as with 3.10....!

    @vstinner
    Copy link
    Member

    See also bpo-47092: [C API] Add PyFrame_GetVar(frame, name) function.

    @markshannon
    Copy link
    Member

    New changeset d7163bb by Mark Shannon in branch 'main':
    bpo-42197: Don't create f_locals dictionary unless we actually need it. (GH-32055)
    d7163bb

    @vstinner
    Copy link
    Member

    Mark: Please add the new PyFrame_GetLocals() function to the C API > New Features doc:
    https://docs.python.org/dev/whatsnew/3.11.html#id1

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 bug and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants