New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graphs.transitiveClosure adding self-loops is unintuitive #2778
Comments
I wonder if we might key this to whether the original graph permits self loops? |
We could certainly key this to whether the original graph permits self-loops; that's easy. The question of correctness is a bit tricky, though. Transitive closure is determined in terms of reachability (at least one reference calls a transitive closure a "reachability graph"), and the definition of reachability that seems to be the most prevalent IME is "w is reachable from u if there exists a path from u to w". Note that in Guava's definition of So the question is whether it makes sense to explicitly represent the reachability of u from itself as a self-loop or not. Some options for resolving this: There may be other options as well. Thoughts, anyone? |
I think requiring the user to specify whether they want self-loops or not is best, as it would be more explicit and help to prevent surprises. Requiring a public enum class SelfLoops {
ALLOW, DISALLOW
} |
Why does the right answer not turn on the standard definition of transitive closure, where a closure is the smallest relation that i. includes the original relation and ii. is transitive? This means that for a directed graph self loops should be included iff a node is reachable from itself along directed path of positive length. (See issue 3187.) |
@alan-isaac I think that's a reasonable point; I think that it's also worth noting that including all self-loops may tend to conflate "transitive" and "reflexive". (Although I have one quibble: I don't care what the length of the walk is from In any event, I think that the definition you're proposing is more or less where we're likely to end up. |
I don't think I understand the "quibble". Reflexive relations will of course have a shortest directed walk from |
The quibble is this: if the edges in the graph have weights, and the length of a (For the record, it's taken us a while to resolve this in part because this is one of the many definitional issues in which graph theory has not developed a consensus: some sources assert that a transitive closure should include self-loops for each node.) |
May I have a reference to a source that defines the transitive closure not to be a closure in the usual sense or somehow redefines closure to not require including the original relation. I've looked but have not found one. Are these sources somehow distinguishing between the transitive closure of a relation and its reachability relation, or is the latter also redefined to exclude cycles? |
@alan-isaac I think that there's a confusion here between two related concepts. No resource that I've found has suggested that the transitive closure of a relation (graph) should not be a superset of the original relation (graph). (That was the subject of the other bug you linked to, I think.) The point of controversy here is whether the transitive closure of a graph should always (or even optionally) induce self-loops incident to each node, based on the argument that any When I was investigating this last year, I found the following textbook definitions of transitive closure graphs:
[1] Udi Manber, Introduction to Algorithms (p. 186) |
OK, I see what you are talking about. But this is a matter of very odd choices of vocabulary. The transitive closure of a binary relation in a set X is by definition the smallest transitive relation in X that includes R. It is common enough to create another closure, often called the reflexive-transitive closure, which is the transitive closure of the reflexive closure. Some authors explicitly say they will call this the transitive closure for short (e.g., Computer Algorthms by Baase and Van Gelder), but they should not. In addition to these, some authors introduce a notion of an irreflexive transitive closure, but this is very bad terminology, since the idea of an irreflexive closure (i.e., superset) is inchoherent. Thank you for clarifying. In my view, if one wants all self loops in a relation that is not reflexive, one should produce the transitive closure of the reflexive closure. (However, providing an option for this seems fine to me.) For the transitive closure itself, a self loop should always indicate a positive-length cycle (possibly of length 1). To be a proper closure, any self loops in the original relation must be retained. In terms of the discussion above, therefore, the question really cannot be whether to "allow" self loops, since they may be required by a transitive closure. It is instead whether to add self loops (i.e., to create the reflexive transitive closure). Of course, one might offer an option to remove self loops, with the understanding that the result will not be a closure. |
I expect that most people that want the transitive closure of a graph have never heard of the reflexive property. :) In any event, I agree that the idea of calling the transitive closure of the reflexive closure a transitive closure per se is weird. You are correct that the question is whether to add self-loops or not. I agree that that can be viewed as a reflexive closure; it can also be viewed as a side effect of a transitive closure in which one declares that for each node Anyway, I think we're on the same page as to what is desirable, and hopefully we'll be able to get this to happen in the right way soon. |
I see this feature is still not implemented, for the reference see python networkx impl |
I still agree that we should address this. It's on my list of items to work out with the rest of the Guava team. :) |
I've been working on a project that centres around DAGs. I was very surprised to find cycles cropping up, so I tracked the issue down to Graphs.transitiveClosure.
I did a bit of reading (Wikipedia mostly) to see if this was expected behaviour. However, everything I saw pointed towards self-cycles not being a requirement of a transitive closure. I found two different images on Wikipedia that show a closure with no self-cycle, The description on the Transitive Closure wiki page itself uses the example that nodes are airports and edges are flights, and the transitive closure is the graph of everything reachable from a node in one or more hops.
Then, I thought to actually check the documentation more thoroughly. It looks like this is intended behaviour, but should this behaviour maybe be changed? (Maybe a flag parameter that can be passed to disable self-cycles)
If not, maybe the documentation could make this a little clearer, since the Guava notion of the transitive closure is slightly different from the usual definition (unless I've really misunderstood things) - reachability elsewhere is defined as within 1 or more hops, but in Guava it is 0 or more.
There's a one-line workaround at least (
graph.nodes().forEach(n -> graph.removeEdge(n, n));
), hopefully that doesn't fall afoul of any concurrent modification rules.The text was updated successfully, but these errors were encountered: