Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-113939: Frame clear, clear locals #113940

Merged
merged 14 commits into from Jan 31, 2024
Merged

gh-113939: Frame clear, clear locals #113940

merged 14 commits into from Jan 31, 2024

Conversation

albertz
Copy link
Contributor

@albertz albertz commented Jan 11, 2024

frame.f_locals might still contain references to the local vars.

Fix #113939.

Note, for PyTorch and others, when you first do extended exception reporting which accesses f_locals in any way, this here fixes two arising problems. Related:


f_locals might still contain references to the local vars.

Fix python#113939.
@bedevere-app
Copy link

bedevere-app bot commented Jan 11, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@albertz

This comment was marked as resolved.

@bedevere-app
Copy link

bedevere-app bot commented Jan 12, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Lib/test/test_frame.py Outdated Show resolved Hide resolved
except ZeroDivisionError as exc:
support.gc_collect()
self.assertIsNotNone(wr())
print(exc.__traceback__.tb_frame.f_locals)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually just want to access exc.__traceback__.tb_frame.f_locals here (trigger a getattr), which is important for the test. I just thought that also printing the result might be interesting for debugging purpose, but we could also leave away the print.

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to DELETE_FAST seems odd to me.

@@ -1508,6 +1508,7 @@ dummy_func(
PyObject *v = GETLOCAL(oparg);
ERROR_IF(v == NULL, unbound_local_error);
SETLOCAL(oparg, NULL);
Py_CLEAR(LOCALS());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this? DELETE_FAST should delete a single "fast" local variable, which should have nothing to do with f_locals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rational is explained in the issue #113939. To summarize: I think you would expect that the exception object (and its linked traceback, the frames, and locals of the frames) would go out of scope when you leave the except block, and then deleted, when there are no other references to it. The DELETE_FAST then would delete the object (ref count becomes 0). However, if f_locals was accessed at some point, a DELETE_FAST would not trigger the delete, as another copy of the locals is in f_locals. It would stay alive as long as you access f_locals again which would resync the locals, or once the function leaves.

My use case was a training loop, where the function is running for a very long time, and the locals of some frames where an exception might have occured contained huge amount of memory, and it was very unexpected to me, aber making sure every reference to the exception was deleted, that the memory still was not deleted. Even more unexpectedly was, after getting another exception, then the memory was freed. Only then to find out that this f_locals dict was still holding a reference, and once I accessed f_locals again (in the next exception later), it was freed.

So, this change here is one possible simple fix, to get the expected behavior. But I understand that this is maybe not how you want it. The question is, how do you want it? Do you have a better suggestion? Or is this all expected behavior, and no change should be done here? (To me, it was unexpected behavior.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_locals_cleared_after_exception_handled tests exactly this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a particular problem related to the lifetime of a handled exception, I think we should think of this as a problem with exceptions, and try to fix it within the exception handling mechanism rather than change DELETE_FAST.

Can we make POP_EXCEPT clear the locals, and move it to after the

               LOAD_CONST               0 (None)
               STORE_FAST               0 (e)
               DELETE_FAST              0 (e)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make POP_EXCEPT clear the locals

Yes, I also thought about that. However, then I thought, maybe there are other situations where the lifetime of objects is unexpected? Basically, whenever there is a DELETE_FAST on some local variable, and f_locals was accessed before, it would result in this maybe unexpected behavior that the object stays alive until f_locals is accessed again, or until the function finishes.

So, you say, this is more important now for leaving except blocks, and less important for other cases of DELETE_FAST?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics of f_locals is somewhat broken, see PEP 667. Fiddling with DELETE_FAST is not going to fix that unfortunately, but is likely to break other stuff.

Which of the test cases fail with just the change to frame_clear to delete f_locals?

Copy link
Contributor Author

@albertz albertz Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which of the test cases fail with just the change to frame_clear to delete f_locals?

test_locals_cleared_after_exception_handled still fails with only the frame_clear change but not the DELETE_FAST change.

The semantics of f_locals is somewhat broken, see PEP-667.

Ah, yes. I was thinking the same when looking through the code, that this f_locals vs fast locals (and those sync functions) potentially has a lot of issues.

Fiddling with DELETE_FAST is not going to fix that unfortunately,

Well, it fixes the issue here, that there are still references to the object while there should not.

but is likely to break other stuff.

I'm not sure. Why? If DELETE_FAST executes but f_locals is not touched by the op, then in any case f_locals is in a wrong state (not in sync with the fast locals anymore). So this change seems to be more safe to me than not doing it.

Or do you want to allow that bytecode can lead to such inconsistent state, and only Python source code is required to give consistent behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albertz, I appreciate that you've come this far in tracking down your issue. But I don't think the solution is to change bytecodes. Surely if we change DELETE_FAST, which implements del x, we should also change STORE_FAST, which implements x = None and also loses a reference to whatever was originally in x. But we don't want to make STORE_FAST slower, not even by one memory read + a conditional jump.

I also don't want bytecodes to mess with f_locals because the latter logically belongs to a totally different part of the interpreter.

I really think that the way to fix this, eventually, is to change the was locals() and frame.f_locals are implemented. There are two PEPs about this, 667 and 558. My preference would be 667, but both are currently blocked, mostly on people's time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the response. Yes, I agree, PEP-667 would be much cleaner, and also solve this problem.

Instead of messing with bytecodes, do you have maybe another idea for a simple workaround until PEP-667 is implemented?

Or otherwise, for this PR here, should we remove the DELETE_FAST change, but still do the frame.clear() change? I think the change in frame.clear() should be fine, right?

In that case, remove also the test_locals_cleared_after_exception_handled, which would still fail then, or disable this test somehow?

@markshannon
Copy link
Member

Or otherwise, for this PR here, should we remove the DELETE_FAST change, but still do the frame.clear() change? I think the change in frame.clear() should be fine, right?

Yes, I think that's the way to go.

@albertz
Copy link
Contributor Author

albertz commented Jan 27, 2024

What about the test case? Should I also delete it? For reference, this one:

class LocalsTest(unittest.TestCase):
    """
    Tests for locals.
    """

    def test_locals_cleared_after_exception_handled(self):
        # see gh-113939
        class C:
            pass
        wr = None
        def inner():
            nonlocal wr
            c = C()
            wr = weakref.ref(c)
            1/0
        try:
            inner()
        except ZeroDivisionError as exc:
            support.gc_collect()
            self.assertIsNotNone(wr())
            print(exc.__traceback__.tb_frame.f_locals)
        support.gc_collect()
        self.assertIsNone(wr())

The other test case (test_clear_locals_after_f_locals_access) can stay, of course.

@albertz
Copy link
Contributor Author

albertz commented Jan 27, 2024

Or otherwise, for this PR here, should we remove the DELETE_FAST change, but still do the frame.clear() change? I think the change in frame.clear() should be fine, right?

Yes, I think that's the way to go.

I did that now. So, PR can be merged now?

Copy link
Member

@iritkatriel iritkatriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add whitespace, otherwise LGTM.

Lib/test/test_frame.py Show resolved Hide resolved
Lib/test/test_frame.py Show resolved Hide resolved
@iritkatriel iritkatriel merged commit 78c2545 into python:main Jan 31, 2024
32 checks passed
@albertz albertz deleted the patch-1 branch January 31, 2024 20:31
aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

traceback.clear_frames does not clear locals when there have been previous access to f_locals
4 participants