Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpyext: reference cycles are not handled (tp_traverse, tp_clear are never called) #3848

Open
gitlab-importer opened this issue Nov 10, 2022 · 6 comments

Comments

@gitlab-importer
Copy link

In Heptapod by @wjakob on Nov 10, 2022, 19:34

Dear PyPy team (& @mattip),

I'm chasing down a few remaining nanobind test suite issues. Here is reproducer 1/3:

This test sets up a wrapper type that contains a nested PyObject * pointer. When the pointer refers to the parent object, it creates a circular reference that will cause the object to leak using only normal reference counting. To handle cycles, CPython requires that developers implement the slots tp_traverse (to identify the cycle) and tp_clear (to break it).

Python test: https://github.com/wjakob/pypy_issues/blob/master/issue_5.py#L18

C code: https://github.com/wjakob/pypy_issues/blob/master/pypy_issues.c#L10

In PyPy, the associated functions are never called, and the associated objects leak.

My reproducer is a contrived example, but such cycles happen all the time and can be created quite easily. Consider for example Python bindings for a GUI library: a button widget (implemented in C) might have a callback method that can be set from Python. The problem is that the assigned python function object is actually a closure that captures any referenced variables. If the button callback closes the current window (e.g. window.close()) it creates a cycle because the closure depends on the window object, and window will in turn reference the button contained in the window. The tp_traverse and tp_clear slots make it possible to handle such inter-language cycles properly.

Thanks,
Wenzel

@gitlab-importer
Copy link
Author

In Heptapod by @mattip on Nov 15, 2022, 12:09

This is a big task. We need to implement a cpyext-aware garbage collector that can notice isolated cycles of cpyext objects. There was work done on the cpyext-gc-cycle branch. I merged py3.8 into it, and will try to move forward with this over the next couple of weeks, but I do not think we can hold up the release for it.

@gitlab-importer
Copy link
Author

In Heptapod by @cfbolz on Nov 15, 2022, 13:57

so to summarize a bit:

  • right now, C-level cycles are uncollected
  • we can support those by essentially copying CPython's cycle-collector (which is what the cpyext-gc-cycle branch is doing)
  • inter-language cycles (cpyext object->python->same cpyext object) are much harder to collect and need complex GC support. it's not clear to me whether we have the capacity for that
  • HPy solves these issues (even in its current form, inter-language cycles get collected properly)

@gitlab-importer
Copy link
Author

In Heptapod by @wjakob on Nov 15, 2022, 14:33

My reproducer demonstrates the problem using a cycle of only cpyext objects, but it was not my intention to only report that specific case. Inter-language cycles are common, for example where a function closure captures an object defined in an extension library (The cycle is cpyext instance -> function object -> cpyext instance). Another common case (though that could perhaps be handled specially) is where an object in an extension library defines tp_dictoffset which leads to circular references involving dictionaries and instances of the class. Is there any technical reason why the issue could be handled in HPy but not in cpyext?

@gitlab-importer
Copy link
Author

In Heptapod by @arigo on Nov 15, 2022, 15:57

Is there any technical reason why the issue could be handled in HPy but
not in cpyext?

Yes. HPy is designed for this kind of cross-implementation support, whereas
CPython's API is not at all.

@gitlab-importer
Copy link
Author

In Heptapod by @cfbolz on Nov 15, 2022, 21:05

I think it's basically a pretty difficult problem, and the APIs need to be in a certain shape to support it (and the CPython C-API isn't). As a data point, this is pretty similar to the problems that browsers have. the DOM is in C++ and typically uses refcounting, the Javascript heaps use precise GC. only recently did Chrome get reasonable cross-component GC: https://v8.dev/blog/high-performance-cpp-gc and that is certainly making use of the fact that Google has complete control over the APIs.

@gitlab-importer
Copy link
Author

In Heptapod by @wjakob on Nov 15, 2022, 21:11

What I don't understand is the difference between the HPy and cpyext APIs. To me, it seems like HPy simply wraps the tp_traverse and tp_clear functionality--in fact, it has little alternative because extensions must at the end of the day work on CPython that uses this interface. The example code in the HPy documentation looks near-identical: https://docs.hpyproject.org/en/latest/porting-example/index.html?highlight=traverse#step-02-transition-some-methods-to-hpy

Could that same mechanism not be also used for cpyext? What makes them so different?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant