Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graphs.transitiveClosure adding self-loops is unintuitive #2778

Open
meindratheal opened this issue Apr 5, 2017 · 13 comments
Open

Graphs.transitiveClosure adding self-loops is unintuitive #2778

meindratheal opened this issue Apr 5, 2017 · 13 comments
Assignees
Labels

Comments

@meindratheal
Copy link

I've been working on a project that centres around DAGs. I was very surprised to find cycles cropping up, so I tracked the issue down to Graphs.transitiveClosure.

I did a bit of reading (Wikipedia mostly) to see if this was expected behaviour. However, everything I saw pointed towards self-cycles not being a requirement of a transitive closure. I found two different images on Wikipedia that show a closure with no self-cycle, The description on the Transitive Closure wiki page itself uses the example that nodes are airports and edges are flights, and the transitive closure is the graph of everything reachable from a node in one or more hops.

Then, I thought to actually check the documentation more thoroughly. It looks like this is intended behaviour, but should this behaviour maybe be changed? (Maybe a flag parameter that can be passed to disable self-cycles)
If not, maybe the documentation could make this a little clearer, since the Guava notion of the transitive closure is slightly different from the usual definition (unless I've really misunderstood things) - reachability elsewhere is defined as within 1 or more hops, but in Guava it is 0 or more.

There's a one-line workaround at least (graph.nodes().forEach(n -> graph.removeEdge(n, n));), hopefully that doesn't fall afoul of any concurrent modification rules.

@lowasser
Copy link
Contributor

lowasser commented Apr 5, 2017

I wonder if we might key this to whether the original graph permits self loops?

@jrtom
Copy link
Member

jrtom commented Apr 5, 2017

We could certainly key this to whether the original graph permits self-loops; that's easy.

The question of correctness is a bit tricky, though. Transitive closure is determined in terms of reachability (at least one reference calls a transitive closure a "reachability graph"), and the definition of reachability that seems to be the most prevalent IME is "w is reachable from u if there exists a path from u to w". Note that in Guava's definition of Graphs.reachableNodes() we specify that u is reachable from itself (and this seems to be consistent with most definitions AFAICT).

So the question is whether it makes sense to explicitly represent the reachability of u from itself as a self-loop or not.

Some options for resolving this:
(1) Make the definition more explicit so people won't be surprised. (Right now the definition incorporates the definition of reachability by reference.)
(2) Let the addition of self-loops be contingent on whether the original graph permits them.
(3) Add an option (parallel method, boolean parameter, etc.) to allow (or require) the user to explicitly specify whether they want self-loops or not. (One option here: require the user to provide their own GraphBuilder instance.)

There may be other options as well.

Thoughts, anyone?

@jbduncan
Copy link
Contributor

jbduncan commented Apr 8, 2017

I think requiring the user to specify whether they want self-loops or not is best, as it would be more explicit and help to prevent surprises.

Requiring a GraphBuilder rather than a boolean parameter is a cool idea, but I think it may be too verbose in practice. How about an enum class like the following, which would provide more context than just a boolean param?

public enum class SelfLoops {
  ALLOW, DISALLOW
}

@alan-isaac
Copy link

alan-isaac commented Oct 9, 2018

Why does the right answer not turn on the standard definition of transitive closure, where a closure is the smallest relation that i. includes the original relation and ii. is transitive? This means that for a directed graph self loops should be included iff a node is reachable from itself along directed path of positive length. (See issue 3187.)

@jrtom
Copy link
Member

jrtom commented Oct 10, 2018

@alan-isaac I think that's a reasonable point; I think that it's also worth noting that including all self-loops may tend to conflate "transitive" and "reflexive".

(Although I have one quibble: I don't care what the length of the walk is from u to u, I just care whether it involves at least one edge.)

In any event, I think that the definition you're proposing is more or less where we're likely to end up.

@alan-isaac
Copy link

I don't think I understand the "quibble". Reflexive relations will of course have a shortest directed walk from u to u of length 1. For any element where reflexivity fails but a cycle exists, the shortest walk will be length > 1.
Anyway, thank you for your apparent willingness to better align the implementation of transitive_closure with the standard definition for the transitive closure of a directed graph.

@jrtom
Copy link
Member

jrtom commented Oct 11, 2018

The quibble is this: if the edges in the graph have weights, and the length of a u-v walk is defined as being the sum of the edge weights in that walk, then the (weighted) length of a u-u path can be less than 1, or even negative, but we'd still want to include the self-loop in the transitive closure.

(For the record, it's taken us a while to resolve this in part because this is one of the many definitional issues in which graph theory has not developed a consensus: some sources assert that a transitive closure should include self-loops for each node.)

@alan-isaac
Copy link

May I have a reference to a source that defines the transitive closure not to be a closure in the usual sense or somehow redefines closure to not require including the original relation. I've looked but have not found one. Are these sources somehow distinguishing between the transitive closure of a relation and its reachability relation, or is the latter also redefined to exclude cycles?

@jrtom
Copy link
Member

jrtom commented Oct 11, 2018

@alan-isaac I think that there's a confusion here between two related concepts.

No resource that I've found has suggested that the transitive closure of a relation (graph) should not be a superset of the original relation (graph). (That was the subject of the other bug you linked to, I think.)

The point of controversy here is whether the transitive closure of a graph should always (or even optionally) induce self-loops incident to each node, based on the argument that any u-v walk should induce a u-v edge, including vacuous walks (those with no edges).

When I was investigating this last year, I found the following textbook definitions of transitive closure graphs:

  • one reference that explicitly excludes self-loops based on no-edge vacuous paths [1]
  • two references whose definitions do not exclude such self-loops (and thus implicitly include them) [2, 3]
  • one reference that explicitly includes such self-loops [4]

[1] Udi Manber, Introduction to Algorithms (p. 186)
[2] Horowitz, Sahni, Rajasekaran, Computer Algorithms (p. 115)
[3] Cormen, Leiserson, Rivest, Introduction to Algorithms (p. 87)
[4] Ross and Wright, Discrete Mathematics (p. 115-120)

@alan-isaac
Copy link

OK, I see what you are talking about. But this is a matter of very odd choices of vocabulary. The transitive closure of a binary relation in a set X is by definition the smallest transitive relation in X that includes R. It is common enough to create another closure, often called the reflexive-transitive closure, which is the transitive closure of the reflexive closure. Some authors explicitly say they will call this the transitive closure for short (e.g., Computer Algorthms by Baase and Van Gelder), but they should not. In addition to these, some authors introduce a notion of an irreflexive transitive closure, but this is very bad terminology, since the idea of an irreflexive closure (i.e., superset) is inchoherent.

Thank you for clarifying. In my view, if one wants all self loops in a relation that is not reflexive, one should produce the transitive closure of the reflexive closure. (However, providing an option for this seems fine to me.) For the transitive closure itself, a self loop should always indicate a positive-length cycle (possibly of length 1). To be a proper closure, any self loops in the original relation must be retained. In terms of the discussion above, therefore, the question really cannot be whether to "allow" self loops, since they may be required by a transitive closure. It is instead whether to add self loops (i.e., to create the reflexive transitive closure). Of course, one might offer an option to remove self loops, with the understanding that the result will not be a closure.

@jrtom
Copy link
Member

jrtom commented Oct 12, 2018

I expect that most people that want the transitive closure of a graph have never heard of the reflexive property. :) In any event, I agree that the idea of calling the transitive closure of the reflexive closure a transitive closure per se is weird.

You are correct that the question is whether to add self-loops or not. I agree that that can be viewed as a reflexive closure; it can also be viewed as a side effect of a transitive closure in which one declares that for each node u there is always (at least) a zero-edge u-u walk.

Anyway, I think we're on the same page as to what is desirable, and hopefully we'll be able to get this to happen in the right way soon.

@cgdecker cgdecker added P3 status=triaged type=enhancement Make an existing feature better labels Jul 30, 2019
@tadamovsky
Copy link

I see this feature is still not implemented, for the reference see python networkx impl
https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.dag.transitive_closure.html
has additional parameter 'reflexive':
A reflexive transitive closure creates a self-loop for the path from v to v of length 0. The usual transitive closure creates a self-loop only if a cycle exists (a path from v to v with length > 0)

@jrtom
Copy link
Member

jrtom commented Oct 24, 2023

I still agree that we should address this. It's on my list of items to work out with the rest of the Guava team. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants