New MultiRef object #26811

jdemeyer · 2018-12-04T13:06:25Z

Allow creating multiple references to a given object, where some are strong and some are weak. Only the number of strong and weak references is specified: any given reference is dynamically considered strong or weak. This is done greedily to maximize the amount of garbage that can be collected.

This allows for an object X to be referenced by two objects A and B such that A and B are both needed to keep X alive (one of the references is strong and the other is weak): if either A or B gets deleted, then also X gets deleted. But if both A and B are alive, then X remains alive.

To give a more concrete example, one can imagine a coercion map which is referenced by the domain and codomain such that both are needed to keep the map alive.

CC: @simon-king-jena

Component: coercion

Author: Jeroen Demeyer

Branch/Commit: u/jdemeyer/new_multiweakref_object @ 141bca8

Issue created by migration from https://trac.sagemath.org/ticket/26811

The text was updated successfully, but these errors were encountered:

jdemeyer · 2018-12-04T16:34:50Z

Branch: u/jdemeyer/new_multiweakref_object

sagetrac-git · 2018-12-04T16:41:48Z

Commit: 5d33561

sagetrac-git · 2018-12-04T16:41:48Z

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`5d33561`	`MultiWeakref object`

sagetrac-git · 2018-12-04T22:38:35Z

Changed commit from 5d33561 to 330893d

sagetrac-git · 2018-12-04T22:38:35Z

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`330893d`	`MultiWeakref object`

sagetrac-git · 2018-12-05T16:40:39Z

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`5e0ddde`	`MultiWeakref object`

sagetrac-git · 2018-12-05T16:40:39Z

Changed commit from 330893d to 5e0ddde

sagetrac-git · 2018-12-06T12:36:31Z

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`0cfc660`	`MultiWeakref object`

sagetrac-git · 2018-12-06T12:36:31Z

Changed commit from 5e0ddde to 0cfc660

simon-king-jena · 2018-12-06T13:22:06Z

comment:9

Can you give a reference to (I guess) CPython documentation, that explains why your code works as it is supposed to?

jdemeyer · 2018-12-06T13:32:07Z

comment:10

Replying to @simon-king-jena:

Can you give a reference to (I guess) CPython documentation, that explains why your code works as it is supposed to?

No. The only reference is the CPython source code.

sagetrac-git · 2018-12-06T16:04:55Z

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`4597886`	`MultiWeakref object`

sagetrac-git · 2018-12-06T16:04:55Z

Changed commit from 0cfc660 to 4597886

sagetrac-git · 2018-12-06T16:24:45Z

Changed commit from 4597886 to e3ce528

sagetrac-git · 2018-12-06T16:24:45Z

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`e3ce528`	`MultiWeakref object`

jdemeyer · 2018-12-06T16:25:31Z

comment:14

I think the implementation is more or less done now. I added plenty of documentation, I hope it is clear.

nbruin · 2018-12-06T21:47:18Z

comment:15

It looks like a neat idea and it may be worth experimenting with it to see if we can get some benefits out of it, but before we commit to accepting its use in sage, I think we need to establish a clearer picture of what the assumptions on CPython are and to what extent these are justified.

The main thing that worries me is the dependence on visiting order whether links are considered strong or not. As we've seen, the order of visiting can influence whether a certain cycle will be found reachable or not, so in addition to lifetime of objects being ill-defined due to when gc happens, we'll also have that it's ill-defined due to how gc proceeds.

What are exactly the assumptions we are making about visiting order that would allow us, given a reference graph, to decide which objects will be found reachable and which won't?

sagetrac-git · 2018-12-07T09:55:40Z

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`4d112a7`	`MultiWeakref object`

sagetrac-git · 2018-12-07T09:55:40Z

Changed commit from e3ce528 to 4d112a7

jdemeyer · 2018-12-07T09:57:58Z

comment:17

We define a traverse loop as a loop of the form:

        for obj in set_of_objects:
            type(obj)->tp_traverse(obj, visit, arg)

where visit is constant during the loop (arg does not
matter). More precisely, a traverse loop is a sequence of
consecutive tp_traverse calls with the same visit.
Traverse loops can be detected by comparing the visit argument
to the last visit.

These are the main assumptions:

Any garbage collection involves at least two traverse loops.
Thefore, a traverse loop cannot span multiple garbage
collections. In CPython, a garbage collection needs two
traverse loops: one in subtract_refs and one in
move_unreachable.
The precise reference graph is allowed to change between traverse
loops, as long as refcounts do not change and the reference graph
is consistent within each individual traverse loop.
Considering the first-visited references in a traverse loop as
weak references maximizes the amount of garbage that can be
collected.

jdemeyer · 2018-12-07T09:58:36Z

comment:18

Replying to @nbruin:

As we've seen, the order of visiting can influence whether a certain cycle will be found reachable or not

Can you elaborate on this or give an example?

jdemeyer · 2018-12-07T10:02:41Z

comment:19

Replying to @nbruin:

we'll also have that it's ill-defined due to how gc proceeds.

The order-dependence here should not be a problem because the GC does not really keep track of references A -> B but only the number of times that B is referenced. You can see this in the signature of visit: only the visited object (B) is passed, not the object holding the reference (A).

So we really want to minimize the number of times that B is seen as reference, it doesn't matter which objects have B as strong reference.

sagetrac-git · 2018-12-07T10:49:55Z

Changed commit from 4d112a7 to af63c47

sagetrac-git · 2018-12-07T10:49:55Z

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`af63c47`	`MultiWeakref object`

sagetrac-git · 2018-12-07T15:30:14Z

Changed commit from af63c47 to 141bca8

sagetrac-git · 2018-12-07T15:30:14Z

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`141bca8`	`MultiWeakref object`

nbruin · 2018-12-07T18:34:45Z

comment:22

Replying to @jdemeyer:

Replying to @nbruin:

As we've seen, the order of visiting can influence whether a certain cycle will be found reachable or not

Can you elaborate on this or give an example?

If we have a multiref object M to an object C that needs two references to stay alive and has two references

<ROOT> -> A -> M -> C and B -> M -> C -> B

then it depends in what order these are checked: If the connection via is checked first, then the "strong" reference to C falls in an unreachable cycle, so C is collectible. In the other order, the "strong" reference comes from , so C lives on.

That's why order matters. Apparently you are assuming that all "reachable" trees are visited before other links are examined. In a full mark-and-sweep collector, this would be a fair assumption, because the unreachable nodes will not be visited. Because python uses a hybrid system with reference counting, it is not clear to me that this is a valid assumption, or that it will remain valid if it is now. In principle, you could decide that a cycle is unreachable by checking that all the reference counts can be accounted for by internal links.

I think you are also assuming is that the "VISIT" callback will be identical within an entire GC operation.

These assumptions should all be made explicit and corroborated if possible. They are definite drawbacks to this approach, because it ties the design of sage more closely to a particular implementation: You are introducing a primitive that is not easy to implement on memory models without detailed knowledge.

I think definite advantages need to be exhibited before we'd commit to using a primitive like this.

For instance, I'm not so sure that the penalty above is worth paying just to ditch the weak ref to the domain. Another, more transparent way to do this, is to clearly distinguish the lifetime implications for structures with coercions (with a coercion A -> B, should B keep A alive or the other way around? -- I think it is acceptable to have one or the other) and then cache the map on the shorter-lived one.

jdemeyer · 2018-12-11T10:08:57Z

comment:23

Replying to @nbruin:

That's why order matters. Apparently you are assuming that all "reachable" trees are visited before other links are examined.

Yes, and this is how the Python GC works. Admittedly, it's an assumption, but it's an assumption which is currently valid.

In principle, you could decide that a cycle is unreachable by checking that all the reference counts can be accounted for by internal links.

Yes, and this is part of what Python's GC does: it uses refcounts to determine which objects are certainly reachable and which are potentially unreachable. It then recursively checks all links from reachable objects to determine which potentially unreachable objects are reachable anyway.

So in your example (assuming that ROOT is not tracked by GC), A will be certainly reachable and all other objects will start out potentially unreachable.

I think you are also assuming is that the "VISIT" callback will be identical within an entire GC operation.

No, a single GC operation involves two traverse loops with two distinct VISIT values (one to count references and one to check potentially unreachable objects).

These assumptions should all be made explicit and corroborated if possible.

I think I'm doing that in the documentation, in the section starting with ASSUMPTIONS.

nbruin · 2018-12-11T21:01:52Z

comment:24

Replying to @jdemeyer:

Replying to @nbruin:

That's why order matters. Apparently you are assuming that all "reachable" trees are visited before other links are examined.

Yes, and this is how the Python GC works. Admittedly, it's an assumption, but it's an assumption which is currently valid.

In principle, you could decide that a cycle is unreachable by checking that all the reference counts can be accounted for by internal links.

Yes, and this is part of what Python's GC does: it uses refcounts to determine which objects are certainly reachable and which are potentially unreachable. It then recursively checks all links from reachable objects to determine which potentially unreachable objects are reachable anyway.

So in your example (assuming that ROOT is not tracked by GC), A will be certainly reachable and all other objects will start out potentially unreachable.

I think you are also assuming is that the "VISIT" callback will be identical within an entire GC operation.

No, a single GC operation involves two traverse loops with two distinct VISIT values (one to count references and one to check potentially unreachable objects).

These assumptions should all be made explicit and corroborated if possible.

I think I'm doing that in the documentation, in the section starting with ASSUMPTIONS.

Yes, although here already you are giving more information. Since these things are so sensitive, I think it would be good to include explicit references to the python source and udate your comments with the details about VISIT.

One of my main concerns is really the cost of fundamentally changing the rules of how references prevent garbage collection. For instance, at the moment gc.get_referrers and/or objgraph allow us to see if an object is prevented from being garbage collected. I think with the change here, this would fundamentally change. That's why I would urge you to only consider including this construct if you have established it gives significant benefits that are otherwise impossible or very difficult to effect. With MultiRef, the sage architecture would be even less like standard python.

jdemeyer added this to the sage-8.5 milestone Dec 4, 2018

jdemeyer added c: coercion labels Dec 4, 2018

This comment has been minimized.

Sign in to view

jdemeyer changed the title ~~New MultiWeakref object~~ New MultiRef object Dec 6, 2018

mkoeppe removed this from the sage-8.5 milestone Dec 29, 2022

jdemeyer mentioned this issue Dec 29, 2022

Use MultiRef to cache coercion maps #26852

Open

simon-king-jena mentioned this issue Jun 14, 2019

Make polynomial rings collectable after arithmetic operations #27095

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New MultiRef object #26811

New MultiRef object #26811

jdemeyer commented Dec 4, 2018

This comment has been minimized.

This comment has been minimized.

jdemeyer commented Dec 4, 2018

sagetrac-git mannequin commented Dec 4, 2018

sagetrac-git mannequin commented Dec 4, 2018

sagetrac-git mannequin commented Dec 4, 2018

sagetrac-git mannequin commented Dec 4, 2018

sagetrac-git mannequin commented Dec 5, 2018

sagetrac-git mannequin commented Dec 5, 2018

sagetrac-git mannequin commented Dec 6, 2018

sagetrac-git mannequin commented Dec 6, 2018

simon-king-jena commented Dec 6, 2018

jdemeyer commented Dec 6, 2018

sagetrac-git mannequin commented Dec 6, 2018

sagetrac-git mannequin commented Dec 6, 2018

sagetrac-git mannequin commented Dec 6, 2018

sagetrac-git mannequin commented Dec 6, 2018

jdemeyer commented Dec 6, 2018

nbruin commented Dec 6, 2018

sagetrac-git mannequin commented Dec 7, 2018

sagetrac-git mannequin commented Dec 7, 2018

jdemeyer commented Dec 7, 2018

jdemeyer commented Dec 7, 2018

jdemeyer commented Dec 7, 2018

sagetrac-git mannequin commented Dec 7, 2018

sagetrac-git mannequin commented Dec 7, 2018

sagetrac-git mannequin commented Dec 7, 2018

sagetrac-git mannequin commented Dec 7, 2018

nbruin commented Dec 7, 2018

jdemeyer commented Dec 11, 2018

nbruin commented Dec 11, 2018

New MultiRef object #26811

New MultiRef object #26811

Comments

jdemeyer commented Dec 4, 2018

This comment has been minimized.

This comment has been minimized.

jdemeyer commented Dec 4, 2018

sagetrac-git mannequin commented Dec 4, 2018

sagetrac-git mannequin commented Dec 4, 2018

sagetrac-git mannequin commented Dec 4, 2018

sagetrac-git mannequin commented Dec 4, 2018

sagetrac-git mannequin commented Dec 5, 2018

sagetrac-git mannequin commented Dec 5, 2018

sagetrac-git mannequin commented Dec 6, 2018

sagetrac-git mannequin commented Dec 6, 2018

simon-king-jena commented Dec 6, 2018

jdemeyer commented Dec 6, 2018

sagetrac-git mannequin commented Dec 6, 2018

sagetrac-git mannequin commented Dec 6, 2018

sagetrac-git mannequin commented Dec 6, 2018

sagetrac-git mannequin commented Dec 6, 2018

jdemeyer commented Dec 6, 2018

nbruin commented Dec 6, 2018

sagetrac-git mannequin commented Dec 7, 2018

sagetrac-git mannequin commented Dec 7, 2018

jdemeyer commented Dec 7, 2018

jdemeyer commented Dec 7, 2018

jdemeyer commented Dec 7, 2018

sagetrac-git mannequin commented Dec 7, 2018

sagetrac-git mannequin commented Dec 7, 2018

sagetrac-git mannequin commented Dec 7, 2018

sagetrac-git mannequin commented Dec 7, 2018

nbruin commented Dec 7, 2018

jdemeyer commented Dec 11, 2018

nbruin commented Dec 11, 2018