Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weakref callbacks running before finalizers in GC collection #84492

Open
a-feld mannequin opened this issue Apr 17, 2020 · 17 comments
Open

Weakref callbacks running before finalizers in GC collection #84492

a-feld mannequin opened this issue Apr 17, 2020 · 17 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@a-feld
Copy link
Mannequin

a-feld mannequin commented Apr 17, 2020

BPO 40312
Nosy @tim-one, @pablogsal, @a-feld
Files
  • 0001-Run-finalizers-before-invoking-weakref-callbacks.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-04-17.19:52:40.901>
    labels = ['interpreter-core', '3.8', 'type-bug', '3.7', '3.9']
    title = 'Weakref callbacks running before finalizers in GC collection'
    updated_at = <Date 2020-04-20.21:54:30.265>
    user = 'https://github.com/a-feld'

    bugs.python.org fields:

    activity = <Date 2020-04-20.21:54:30.265>
    actor = 'tim.peters'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2020-04-17.19:52:40.901>
    creator = 'a-feld'
    dependencies = []
    files = ['49076']
    hgrepos = []
    issue_num = 40312
    keywords = ['patch']
    message_count = 17.0
    messages = ['366675', '366750', '366751', '366753', '366754', '366797', '366844', '366845', '366848', '366850', '366853', '366857', '366859', '366862', '366864', '366873', '366895']
    nosy_count = 4.0
    nosy_names = ['tim.peters', 'pablogsal', 'a-feld', 'deekay']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue40312'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9']

    @a-feld
    Copy link
    Mannequin Author

    a-feld mannequin commented Apr 17, 2020

    Our team is making use of a weakref.WeakValueDictionary() that is accessed through the finalizer of a class. We observed that in Python 3 occasionally values that are directly referenced by an object being finalized were missing from the WeakValueDictionary.

    Example:

        import weakref
        cache = weakref.WeakValueDictionary()
    
        class Foo(object):
            pass
    
    
        class Bar(object):
            def __init__(self, foo):
                self.foo = foo
                cache['foo'] = foo
    
            def __del__(self):
                del cache['foo']
    
        bar = Bar(Foo())
        del bar

    Upon further investigation, we realized that this had to do with the weakref callback within WeakValueDictionary being called (removing the key from the dict) before the finalizer for Foo was called.

    Reproduction:

        import gc
        import weakref
    
        cache = weakref.WeakValueDictionary()
    
    
        class Foo(object):
            pass
    
    
        class Bar(object):
            def __init__(self, foo):
                # Force a reference cycle to run del only on gc.collect
                self._self = self
                self.foo = foo
                cache["foo"] = foo
    
            def __del__(self):
                # foo is missing from the cache because the weakref callback has
                # already run. KeyError is raised.
                del cache["foo"]
    
    
        bar = Bar(Foo())
        del bar
    
        gc.collect()

    Expected behavior:

    The weakref callback should only be called when the object is known to be deleted (after the finalizer runs). Running weakref callbacks before then means that the weakref callback can run on objects being ressurected by the finalizer.

    Example:

        import gc
        import weakref
    
    
        class ForeverObject(object):
            def __init__(self, circular):
                # Introduce a circular reference so that gc must collect the object
                if circular:
                    self._self = self
    
            def __del__(self):
                global o
                o = self
    
    
        def callback(wr):
            print("callback running", wr)
    
    
        for circular in (True, False):
            print("------- Circular reference:", circular, "-------")
            o = ForeverObject(circular)
            wr = weakref.ref(o, callback)
            del o
            gc.collect()
            print("--------------")

    Note: Python 2.7 appears to have the opposite behavior - weakref callbacks are only invoked when dealloc occurs outside of gc. The Python 2.7 behavior hasn't yet been investigated.

    If the expected behavior above is confirmed, I would be happy to submit a patch for this issue!

    @a-feld a-feld mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Apr 17, 2020
    @tim-one
    Copy link
    Member

    tim-one commented Apr 19, 2020

    Offhand I'm surprised because of this: if a weakref W refers to object O, and W and O are _both_ in cyclic trash, gc does not want to invoke W's callback (if any). In fact, it works hard not to call it. So I'm surprised that the callback is invoked at all, not by whether it's called before or after __del__ is called.

    @a-feld
    Copy link
    Mannequin Author

    a-feld mannequin commented Apr 19, 2020

    Thanks for the response!

    if a weakref W refers to object O, and W and O are _both_ in cyclic trash

    I believe that in the examples W is not in cyclic trash, but remains referenced as a global in the frame. Only O is in cyclic trash (O references itself).

    I would expect that W's callback would be invoked in this case, but only after O is guaranteed to be deleted. In some cases O can be resurrected in the finalizer.

    @tim-one
    Copy link
    Member

    tim-one commented Apr 19, 2020

    Ah, I missed that cache is global. So it will hold reachable strong refs to the weakrefs stored for the dict's values. So gc will run the callbacks associated with weakrefs' trash referents.

    I think you're out of luck. Nothing is defined about the order in which the stuff in cyclic trash is destroyed. gc has no knowledge of your intended semantics, and no way to be taught. You happened to create code that assumed (albeit indirectly & subtly) a finalizer would run before a relevant callback, but someone else could create code assuming the reverse.

    It so happens that gc forces all callbacks to run before it forces any finalizers to run, and I'm loathe to change that code - weakref callbacks in particular have been an historical segfault factory, so nobody will touch that code without extraordinarily strong reason to risk it. But I'll add Pablo here just in case he's feeling adventurous ;-)

    In any case, I'd say it's always _best_ practice to never delete a key from any kind of weak dict except under protection of a try/except block. The point of a weak dict is that entries can vanish "by magic". And in this particular case, by deeper magic than was anticipated ;-)

    @a-feld
    Copy link
    Mannequin Author

    a-feld mannequin commented Apr 19, 2020

    Reading more carefully, I may have jumped to conclusions here :)

    Looking at the weakref docs, I see the following description of the callback functionality:

    If callback is provided and not None, and the returned weakref object is still alive, the callback will be called when the object is about to be finalized; the weak reference object will be passed as the only parameter to the callback; the referent will no longer be available.

    This description seems to imply that even if an object is resurrected, the callback will be run.

    Using the ForeverObject example above, I see the weakref callback behavior is different when going through gc versus going through _Py_Dealloc.

    The behavior being different seems to imply that a contract is broken somewhere.

    In this case I think I assumed it was gc, but it looks like it might actually be that the contract (as currently defined) is broken by dealloc. Finalizers are always called before weakref callbacks on the reference counted path:

    if (PyObject_CallFinalizerFromDealloc(self) < 0) {

    Here is the output from the ForeverObject example above:

    ------- Circular reference: True -------
    callback running <weakref at 0x10fedea10; dead>
    --------------
    ------- Circular reference: False -------
    --------------

    For my own understanding, why is the API documented as running the callback prior to finalizers running? The referent is unavailable when the callback runs, so isn't it safe to run the weakref callbacks consistently after the finalizers?

    @tim-one
    Copy link
    Member

    tim-one commented Apr 19, 2020

    Things get complicated here because in older versions of Python an instance of ForeverObject(True) could "leak" forever: if an object in a trash cycle had a __del__ method, that method would never be called, and the object would never be collected.

    Starting in Python 3.4, that changed: __del__ no longer inhibits collection of objects in cyclic trash. However, __del__ is called no more than once starting in 3.4. If an object is resurrected by __del__, it's marked with a "__del__ was already called" bit, and __del__ is never called again by magic if/when the object becomes trash again.

    I don't think the weakref docs were changed, because nobody cares ;-)

    What it _intended_ to mean by "about to be finalized" is clear as mud. What it actually means is akin to "about to have its memory destroyed and recycled".

    In current CPython, for your ForeverObject(False), del o does not make the object trash "for real". del runs immediately (due to deterministic, synchronous reference counting) and resurrects it. That cuts off the "about to have its memory destroyed and recycled" part, so the callback doesn't run.

    But if you do

    del o
    

    again, _then_ the callback runs. __del__ isn't run again, so the object isn't resurrected again, so the "about to have its memory destroyed and recycled" part applies.

    In cyclic gc, there is no deterministic order in which end-of-life actions occur. There may well be thousands of objects in cyclic trash, or reachable only from cyclic trash. The order picked is more-or-less arbitrary, just trying like hell to ensure that no end-of-life action ever "sees" an object whose memory has already been destroyed and recycled.

    To make progress at all, it _assumes_ all the cyclic trash really will be reclaimed (memory destroyed and recycled). That's why it runs all weakref callbacks to trash objects (provided the weakref isn't also trash). It also runs all finalizers (except on objects with a __del__ that has already been called). Only after _all_ that is done does it even start to destroy and recycle memory.

    Although, along the way, memory _may_ be destroyed and recycled as a result of refcounts falling to 0 as end-of-life actions (callbacks and finalizers) are invoked.

    And, yup, it's as delicate as it sounds ;-)

    @a-feld
    Copy link
    Mannequin Author

    a-feld mannequin commented Apr 20, 2020

    Thanks for the explanation! I agree that "about to be finalized" is unclear in the docs :)

    I still believe that having different behavior for the ordering of finalizers versus weakref callbacks depending on whether the path is through gc versus reference counting is a bug.

    The callback should be able to assume that when it's running, the referent is actually dead. The execution of a weakref callback in our case results in items being dropped from a WeakValueDictionary prematurely (the object is still referenced, accessible, and alive at the time the weakref callback runs).

    I've attached a patch that would cause weakref callbacks to run consistently after finalizers. With the patch applied, all tests in cpython appear to pass, but the code examples above now work consistently.

    @a-feld
    Copy link
    Mannequin Author

    a-feld mannequin commented Apr 20, 2020

    Also I just noticed this statement:

    In current CPython, for your ForeverObject(False), del o does not make the object trash "for real". del runs immediately (due to deterministic, synchronous reference counting) and resurrects it. That cuts off the "about to have its memory destroyed and recycled" part, so the callback doesn't run.

    The problem is the callback _does_ run even though the object is resurrected! :)

    (Only if going through gc)

    @pablogsal
    Copy link
    Member

    I think this is a good summary of what you are referring to:

    >>> import gc, weakref
    >>> class Lazarus:
    ...    def __del__(self):
    ...       global x
    ...       x = self
    ...
    >>> def callback(*args):
    ...     print("DEAD")
    ...

    # No gc dead:

    >>> x = None
    >>> a = Lazarus()
    >>> w = weakref.ref(a, callback)
    >>> del a # Notice that the callback has not been executed
    >>> del x
    DEAD

    # Gc code:

    >>> a = Lazarus()
    >>> x = None
    >>> cycle = []
    >>> cycle.append(cycle)
    >>> cycle.append(a)
    >>> w = weakref.ref(a, callback)
    >>> del a,cycle
    >>> gc.collect()
    DEAD
    >>> x. #The callback has executed but the variable is still alive
    <__main__.Lazarus object at 0x1020e9910>

    The "problem" is that when going through the gc, the callback is executed when the object is resurrected but going through normal refcount the callback is executed only when the object dies.

    Notice that in both cases the fundamental property still holds: the callback is executed *once*. What you are discussing now is a matter of *when*.

    Not sure if we want to homogenize both code paths, but changing the one that goes through the GC is going to be challenging to say the least because as Tim mentioned, the weakrefs semantics are *very* delicate.

    @pablogsal
    Copy link
    Member

    One thing we could do is call the weakref callbacks after we call finalize_garbage and only on the "final_unreachable" set (the objects that do not resurrect). Notice that doing this still has one difference: the callback will be executed AFTER the finalizer while in the normal refcount path is executed BEFORE the finalizer.

    I have not given enough thought to the correctness of doing this but my gut feeling tells me something can go wrong if we do that. What do you think about this path, Tim?

    Here is the diff I am refering to for clarity:

    diff --git a/Modules/gcmodule.c b/Modules/gcmodule.c
    index 5727820f09..498ff927ab 100644
    --- a/Modules/gcmodule.c
    +++ b/Modules/gcmodule.c
    @@ -1252,9 +1252,6 @@ collect(PyThreadState *tstate, int generation,
    }
    }

    • /* Clear weakrefs and invoke callbacks as necessary. */
    • m += handle_weakrefs(&unreachable, old);
    • validate_list(old, collecting_clear_unreachable_clear);
      validate_list(&unreachable, collecting_set_unreachable_clear);

    @@ -1267,6 +1264,9 @@ collect(PyThreadState *tstate, int generation,
    PyGC_Head final_unreachable;
    handle_resurrected_objects(&unreachable, &final_unreachable, old);

    + /* Clear weakrefs and invoke callbacks as necessary. */
    + m += handle_weakrefs(&final_unreachable, old);
    +
    /* Call tp_clear on objects in the final_unreachable set. This will cause
    * the reference cycles to be broken. It may also cause some objects
    * in finalizers to be freed.

    @tim-one
    Copy link
    Member

    tim-one commented Apr 20, 2020

    Allan, we don't (at least not knowingly) write tests that rely on order of end-of-life actions, because the _language_ defines nothing about the order. So you can permute the order and it's unlikely any standard tests would fail.

    The only reason your example "works" outside of cyclic gc is because, as already noted, CPython (the implementation you're using, not Python the language) primarily relies on reference counting. That guarantees __del__ is called first, but that's not a language guarantee, it's a consequence of reference counting invoking __del__ IMMEDIATELY upon the refcount falling to 0. The _language_ not only doesn't guarantee that, it doesn't even guarantee that __del__ will ever be called.

    For CPython (this implementation), it's important that we don't introduce gratuitous breakage of programs that are "working" because of implementation details. How do you know that no program currently relies on the implementation detail that cyclic gc happens to run callbacks before finalizers now? CPython has done that for many years, and I personally wouldn't risk breaking "working" programs by flipping that order now, not without truly compelling reason (example: stopping a segfault would be compelling).

    Pablo, as above, I'm inclined to leave things alone unless we can "prove" no current code could possibly be relying (even by accident) on that gc currently runs callbacks before finalizers. Which may be the case! I don't know ;-)

    @pablogsal
    Copy link
    Member

    Pablo, as above, I'm inclined to leave things alone unless we can "prove" no current code could possibly be relying (even by accident) on that gc currently runs callbacks before finalizers. Which may be the case! I don't know ;-)

    I very much agree with this. Also, (I think you already mentioned this) over-specifying the order of things in the gc may be a great way to shoot ourselves in the foot if we need to fix bugs or some odd behaviour during finalization/destruction (think for instance the latest bug regarding tp_clear and weakref callbacks).

    I think we could at least improve somehow the docs, to say at least that the order is not specified so people that look at them do not rely on it.

    @a-feld
    Copy link
    Mannequin Author

    a-feld mannequin commented Apr 20, 2020

    I definitely understand the possibility that some code is relying on the current gc behavior of weakref callbacks being invoked after finalizers.

    That being said, the behavior is currently inconsistent between gc and reference counted paths. The language doesn't have to define the explicit ordering but the internal inconsistency was definitely unexpected for us.

    The result is that behavioral consistency becomes more difficult in application code when using language provided structures such as WeakValueDictionary.

    cache = weakref.WeakValueDictionary()
    
    class Bar:
        pass
    
    class Foo:
        def __init__(self):
            self._self = self
            self.value = Bar()
            cache[id(self.value)] = self.value
    
        def __del__(self):
            # the cache may or may not have self.value at this point
            # even though self.value is strongly referenced!
            print(list(cache.items()))

    From the weakref docs:

    Entries in the dictionary will be discarded when no strong reference to the value exists any more.

    But doesn't the code above imply that the entry is discarded even though there are strong references to the value?

    In any case, I definitely appreciate all the eyes on this Tim + Pablo! At the very least, documentation updates do sound like a good idea if we're moving forward with leaving the behavior of weakrefs as currently specified. In particular, it would be worth pointing out that weakrefs callbacks can run even when the object is referenced.

    @pablogsal
    Copy link
    Member

    The result is that behavioral consistency becomes more difficult in application code when using language provided structures such as WeakValueDictionary.

    Well, you are already in tricky territory using finalizers. Note this sentence from the docs (https://docs.python.org/3/reference/datamodel.html#object.\_\_del__):

    It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.

    So CPython does not even promise that __del__ will be called always!

    But doesn't the code above imply that the entry is discarded even though there are strong references to the value?

    No, because if there would be strong references then the refcount won't be 0 and the object would not have been finalized. If the object is finalized is because nobody has more strong references. A WeakValueDictionary holds weak references in the values, that is why is called WeakValueDictionary ;)

    In particular, it would be worth pointing out that weakrefs callbacks can run even when the object is referenced.

    That can be true but with a big caveat. There are two cases:

    • Normal refcount falls to 0: the callback is executed when the deallocator is going to run, so nobody has strong references.

    • GC: The gc calls the callback *before* the finalizer but at that point (before the finalizer) is unreachable from the outside, so all the references are from internal objects. Is still referenced but from unreachable objects that (unless resurrected) they are going to be destroyed by the gc. Notice that at the point the callback runs, all those objects are unreachable, so they cannot be referenced by anything in your program except themselves.

    @a-feld
    Copy link
    Mannequin Author

    a-feld mannequin commented Apr 20, 2020

    Yup agree with all the points above, just wanted to point out that I think self.value is strongly referenced (even though it's just internal references and will be collected by the gc) during Foo.__del__ execution (annotated code below), yet the WeakValueDictionary entry is cleared:

    import gc
    import sys
    import weakref
    
    cache = weakref.WeakValueDictionary()
    
    
    class Bar:
        pass
    
    
    class Foo:
        def __init__(self):
            self._self = self
            self.value = Bar()
            cache[id(self.value)] = self.value
    
        def __del__(self):
            print(sys.getrefcount(self.value))  # -> 2
            # the cache may or may not have self.value at this point
            # even though self.value is strongly referenced!
            print(list(cache.items()))  # -> []
    
    
    o = Foo()
    del o
    gc.collect()

    @tim-one
    Copy link
    Member

    tim-one commented Apr 20, 2020

    Well, the refcounts on _everything_ cyclic gc sees are greater than 0. Because if an object's refcount ever falls to 0 in CPython, reference counting deals with it immediately, and so it doesn't survive to participate in cyclic gc.

    IOW, absolutely everything cyclic gc deals with is "strongly referenced" in the sense you're using: absolutely everything cyclic gc deals with has a nonzero refcount. Indeed, in a debug build, gc asserts early on that everything it sees has a refcount > 0.

    It can be trash anyway. Finding those "just internal references" is the point of cyclic gc. In your last example, the instant after you did del o, the instance of Foo, and the instance of Bar, both became trash, period, despite that they both retained positive refcounts.

    So you may as well object that the refcount of self is also greater than 0 inside del. Indeed, add

                print(sys.getrefcount(self))

    inside your __del__, and you'll see it prints 4. So it's _twice_ as "strongly referenced" as self.value ;-) It's being destroyed anyway.

    Refcounts are simply irrelevant at this point, and so also is the concept of "strong reference" on its own. All that matters is whether a strong reference exists from outside the generation being collected. self and self.value are in exactly the same boat in this regard: there are no strong references to either from outside, but there are strong references to both from inside. They're both trash.

    There is a principled way to proceed that would get you what you want in this example, but CPython never pursued it because it would add major complexity for almost no apparent practical gain: build a graph of all the cyclic trash, compute the strongly connected components, build the induced DAG composed of the SCCs, then run end-of-life actions in order of a topological sort of that DAG. Then del would run first (self is in a self-cycle SCC with no predecessors in the DAG), and the weakref callback second (the Bar instance would be in its own SCC, with the self SCC as its predecessor in the DAG).

    But I'm not sure any real gc system does that.

    @tim-one
    Copy link
    Member

    tim-one commented Apr 20, 2020

    A simple (finalizer-only) example of what an SCC-based DAG topsort ordering would accomplish:

        import gc
    
        class C:
            def __init__(self, val):
                self.val = val
            def __del__(self):
                print("finalizing", self.val)
    c, b, a = map(C, "cba")
    a.next = b
    b.next = c
    
    #a.loop = a
    del c, b, a
    gc.collect()
    

    That finalizes in the order a, b, c. Refcount semantics force that. But, uncomment the "a.loop = a" line, and the order changes to c, b, a. They all look exactly the same to gc, so it runs finalizers in the order they happen to appear in the list gc is crawling over. A DAG topsort ordering would force a, b, c order again.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    Status: No status
    Development

    No branches or pull requests

    2 participants