py::keep_alive: nurse destructor can run before patient's #856

bmerry · 2017-05-16T18:03:49Z

Issue description

The py::keep_alive weakref trick doesn't always work. In some cases (at least in Python 2.7, haven't tried 3), the weakref callback can run before the referenced object is cleared. This happens in the cyclic garbage collector: first all weakref callbacks are handled (handle_weakrefs function), then the garbage is cleared. If the nurse is being destroyed on this path, then it can lead to the patient destructor running before the nurse's (leading to undefined behaviour if the destructor of the nurse references the patient). Note that the nurse doesn't itself have to be part of a reference cycle, it just has to be reachable from one. However, it needs to have Py_TPFLAGS_HAVE_GC. In the example code I've ensured that by specifying py::dynamic_attr, but I think when I originally encountered the problem it was set due to having a declared base.

I've had similar issues with Boost.Python before. I suspect the weakref approach is fundamentally untenable; it may be necessary to store a list of handles (with strong references) in the PyObject and decref them when the object is cleared.

Reproducible example code

keepalive.cpp:

#include <pybind11/pybind11.h>

namespace py = pybind11;

class B;

class A {
private:
    B &b;
public:
    A(B &b) : b(b) {}
    ~A() { py::print("In A::~A()"); }
};

class B {
public:
    ~B() { py::print("In B::~B()"); }
};

PYBIND11_PLUGIN(keepalive) {
    py::module m("keepalive");

    py::class_<A>(m, "A", py::dynamic_attr())
        .def(py::init<B &>(), py::keep_alive<1, 2>());

    py::class_<B>(m, "B").def(py::init<>());
    return m.ptr();
}

test.py:

from keepalive import A, B

lst = [A(B())]
lst.append(lst)
del lst

Output is

In B::~B()
In A::~A()

Expected output is the other way around.

The text was updated successfully, but these errors were encountered:

bmerry · 2017-05-16T18:59:28Z

Or maybe it would work for the callback function to call the C++ destructor if it hasn't already been called, and deal with that gracefully when the object is cleared - but I don't know nearly enough about the internals to know if that's sane or not.

jagerman · 2017-05-27T02:07:51Z

I suspect what's happening here is that both A and B are being destroyed at the same time ("same" being within a single garbage collection). Essentially keep_alive is a bit weaker than you're expecting: it really only guarantees that A won't be admitted for garbage collection before B, but doesn't guarantee that they can't be admitted together (and as far as I know Python doesn't provide any guarantees on the order of destruction of objects within a garbage collection pass). If they are admitted together, you can get out-of-order destruction, as you're seeing.

I'm not sure that there's a simple workaround that could be done to deal with this on the pybind11 side (especially since neither object actually has to be pybind-managed).

You ought to be able to work around it by storing by storing a shared_ptr<B> instead of a B & (and making B's holder shared_ptr<B>); that should let the C++ object live beyond the destruction of the wrapping Python object.

bmerry · 2017-05-27T07:05:45Z

it really only guarantees that A won't be admitted for garbage collection before B, but doesn't guarantee that they can't be admitted together

Not exactly. B isn't identified as garbage in the garbage collection pass, because its reference is held by pybind11. It's really about the order of clearing A vs calling the weakref callbacks on A's weakref, which isn't guaranteed. If A is cleared first, everything goes fine. If the weakref callback is called first, then it drops the last reference to B, immediately causing B to be cleared.

I'm not sure that there's a simple workaround that could be done to deal with this on the pybind11 side (especially since neither object actually has to be pybind-managed).

I hadn't considered that the objects aren't necessarily pybind-managed, but on the other hand if the nurse isn't a C++ object then I don't think this is an issue. I have some ideas for fixing it if it is pybind-managed, which I can experiment with. Is there a way to determine whether a given py::handle is pybind-managed?

You ought to be able to work around it by storing by storing a shared_ptr instead of a B & (and making B's holder shared_ptr); that should let the C++ object live beyond the destruction of the wrapping Python object.

I've already done exactly that, but I'm in the fortunate position of being able to modify the C++ API I'm wrapping.

jagerman · 2017-05-27T14:51:06Z

B isn't identified as garbage in the garbage collection pass, because its reference is held by pybind11.

B won't be as long as A is referenced elsewhere, but once A is identified for gc, the weakref is cleared which admits both the weakref itself and A, so the list, A, B, and the weakref all get deallocated in the same gc pass.

from keepalive import A, B
import gc

gc.collect()
gc.set_debug(gc.DEBUG_STATS | gc.DEBUG_COLLECTABLE)

lst = [A(B())]
lst.append(lst)
del lst

gc.collect()
gc.set_debug(0)

output:

gc: collecting generation 2...
gc: objects in each generation: 5 0 6532
gc: collectable <keepalive.A 0x7f6a4085a8b8>
gc: collectable <list 0x7f6a4099abc8>
In B::~B()
In A::~A()
gc: done, 3 unreachable, 0 uncollectable, 0.0006s elapsed

A fix would be nice, but I still suspect it will end up getting rather complicated.

Is there a way to determine whether a given py::handle is pybind-managed?

pybind11::detail::get_type_info(Py_TYPE(h.ptr())) returns a detail::type_info * if h is an instance of a pybind-registered type, nullptr if it isn't.

bmerry · 2017-05-27T14:54:12Z

B won't be as long as A is referenced elsewhere, but once A is identified for gc, the weakref is cleared which admits both the weakref itself and A, so the list, A, B, and the weakref all get deallocated in the same gc pass.

Yes. I think we're in agreement and just thinking about the same thing in different ways.

You ought to be able to work around it by storing by storing a shared_ptr instead of a B & (and making B's holder shared_ptr); that should let the C++ object live beyond the destruction of the wrapping Python object.

Thanks, I'll see what I can cook up and how hairy it gets. Failing that, we should at least put a warning in the user manual.

jagerman · 2017-05-27T15:01:54Z

Failing that, we should at least put a warning in the user manual.

Agreed. I'll wait for you to see if you can come up with something else, but if not, a note in the docs is definitely worthwhile.

This is a first attempt to address pybind#856. There is still a bit of work to do (e.g., some tests, and double-checking that I'm not leaking references). At this point I'm looking for feedback on whether the idea is sound and would be accepted. Instead of the weakref trick, every pybind-managed object holds a (Python) list of references, which gets cleared in `clear_instance`. The list is only constructed the first time it is needed. The one downside I can see is that every pybind object will get bigger by `sizeof(PyObject *)`, even if no keep_alive is used. On the other hand, it should significantly reduce the overhead of using keep_alive, since there is no need to construct a weakref object and a callback object. One thing I'm not sure about is whether the call to `Py_CLEAR(instance->patients);` belongs inside or outside the `has_value` test. At the moment it's inside, but that's cargo culting rather than from an understanding of the conditions under which an instance might not have a value.

This fixes pybind#856. Instead of the weakref trick, the internals structure holds an unordered_map from PyObject* to a vector of references. To avoid the cost of the unordered_map lookup for objects that don't have any keep_alive patients, a flag is added to each instance to indicate whether there is anything to do.

bmerry mentioned this issue May 19, 2017

with_custodian_and_ward doesn't guarantee destructor ordering boostorg/python#128

Open

bmerry mentioned this issue May 28, 2017

Make instance_essentials hold references to patients #880

Merged

jagerman closed this as completed in 9d698f7 Jun 24, 2017

wjakob mentioned this issue Oct 12, 2020

fix: test_factory_constructors.py failure triggered by test_register_duplicate_class #2564

Merged

YannickJadoul mentioned this issue Nov 1, 2020

Use weakref to clean up captured function object in def_buffer #2634

Merged

rwgk mentioned this issue Feb 9, 2023

FWD pybind11 google/pybind11clif#856

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py::keep_alive: nurse destructor can run before patient's #856

py::keep_alive: nurse destructor can run before patient's #856

bmerry commented May 16, 2017

bmerry commented May 16, 2017

jagerman commented May 27, 2017

bmerry commented May 27, 2017

jagerman commented May 27, 2017

bmerry commented May 27, 2017

jagerman commented May 27, 2017

py::keep_alive: nurse destructor can run before patient's #856

py::keep_alive: nurse destructor can run before patient's #856

Comments

bmerry commented May 16, 2017

Issue description

Reproducible example code

bmerry commented May 16, 2017

jagerman commented May 27, 2017

bmerry commented May 27, 2017

jagerman commented May 27, 2017

bmerry commented May 27, 2017

jagerman commented May 27, 2017