New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New MultiRef object #26811
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Commit: |
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
|
comment:9
Can you give a reference to (I guess) CPython documentation, that explains why your code works as it is supposed to? |
comment:10
Replying to @simon-king-jena:
No. The only reference is the CPython source code. |
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
|
comment:14
I think the implementation is more or less done now. I added plenty of documentation, I hope it is clear. |
comment:15
It looks like a neat idea and it may be worth experimenting with it to see if we can get some benefits out of it, but before we commit to accepting its use in sage, I think we need to establish a clearer picture of what the assumptions on CPython are and to what extent these are justified. The main thing that worries me is the dependence on visiting order whether links are considered strong or not. As we've seen, the order of visiting can influence whether a certain cycle will be found reachable or not, so in addition to lifetime of objects being ill-defined due to when gc happens, we'll also have that it's ill-defined due to how gc proceeds. What are exactly the assumptions we are making about visiting order that would allow us, given a reference graph, to decide which objects will be found reachable and which won't? |
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
|
comment:17
We define a traverse loop as a loop of the form:
where These are the main assumptions:
|
comment:18
Replying to @nbruin:
Can you elaborate on this or give an example? |
comment:19
Replying to @nbruin:
The order-dependence here should not be a problem because the GC does not really keep track of references So we really want to minimize the number of times that |
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
|
comment:22
Replying to @jdemeyer:
If we have a multiref object M to an object C that needs two references to stay alive and has two references
then it depends in what order these are checked: If the connection via is checked first, then the "strong" reference to C falls in an unreachable cycle, so C is collectible. In the other order, the "strong" reference comes from , so C lives on. That's why order matters. Apparently you are assuming that all "reachable" trees are visited before other links are examined. In a full mark-and-sweep collector, this would be a fair assumption, because the unreachable nodes will not be visited. Because python uses a hybrid system with reference counting, it is not clear to me that this is a valid assumption, or that it will remain valid if it is now. In principle, you could decide that a cycle is unreachable by checking that all the reference counts can be accounted for by internal links. I think you are also assuming is that the "VISIT" callback will be identical within an entire GC operation. These assumptions should all be made explicit and corroborated if possible. They are definite drawbacks to this approach, because it ties the design of sage more closely to a particular implementation: You are introducing a primitive that is not easy to implement on memory models without detailed knowledge. I think definite advantages need to be exhibited before we'd commit to using a primitive like this. For instance, I'm not so sure that the penalty above is worth paying just to ditch the weak ref to the domain. Another, more transparent way to do this, is to clearly distinguish the lifetime implications for structures with coercions (with a coercion A -> B, should B keep A alive or the other way around? -- I think it is acceptable to have one or the other) and then cache the map on the shorter-lived one. |
comment:23
Replying to @nbruin:
Yes, and this is how the Python GC works. Admittedly, it's an assumption, but it's an assumption which is currently valid.
Yes, and this is part of what Python's GC does: it uses refcounts to determine which objects are certainly reachable and which are potentially unreachable. It then recursively checks all links from reachable objects to determine which potentially unreachable objects are reachable anyway. So in your example (assuming that
No, a single GC operation involves two traverse loops with two distinct VISIT values (one to count references and one to check potentially unreachable objects).
I think I'm doing that in the documentation, in the section starting with |
comment:24
Replying to @jdemeyer:
Yes, although here already you are giving more information. Since these things are so sensitive, I think it would be good to include explicit references to the python source and udate your comments with the details about VISIT. One of my main concerns is really the cost of fundamentally changing the rules of how references prevent garbage collection. For instance, at the moment gc.get_referrers and/or objgraph allow us to see if an object is prevented from being garbage collected. I think with the change here, this would fundamentally change. That's why I would urge you to only consider including this construct if you have established it gives significant benefits that are otherwise impossible or very difficult to effect. With MultiRef, the sage architecture would be even less like standard python. |
Allow creating multiple references to a given object, where some are strong and some are weak. Only the number of strong and weak references is specified: any given reference is dynamically considered strong or weak. This is done greedily to maximize the amount of garbage that can be collected.
This allows for an object X to be referenced by two objects A and B such that A and B are both needed to keep X alive (one of the references is strong and the other is weak): if either A or B gets deleted, then also X gets deleted. But if both A and B are alive, then X remains alive.
To give a more concrete example, one can imagine a coercion map which is referenced by the domain and codomain such that both are needed to keep the map alive.
CC: @simon-king-jena
Component: coercion
Author: Jeroen Demeyer
Branch/Commit: u/jdemeyer/new_multiweakref_object @
141bca8
Issue created by migration from https://trac.sagemath.org/ticket/26811
The text was updated successfully, but these errors were encountered: