Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nx.dfs_labeled_edges does not return last visited edge if depth_limit is specified #6239

Closed
ikarsokolov opened this issue Nov 26, 2022 · 5 comments · Fixed by #6240
Closed

Comments

@ikarsokolov
Copy link

nx.dfs_labeled_edges does not return last (deepest) visited edge when traversing backwards if depth_limit is specified.

Current Behavior

graph = nx.path_graph(5, nx.DiGraph)
list(nx.dfs_labeled_edges(graph, source=0))

[(0, 0, 'forward'),
 (0, 1, 'forward'),
 (1, 2, 'forward'),
 (2, 3, 'forward'),
 (3, 4, 'forward'),
 (3, 4, 'reverse'),
 (2, 3, 'reverse'),
 (1, 2, 'reverse'),
 (0, 1, 'reverse'),
 (0, 0, 'reverse')]

list(nx.dfs_labeled_edges(graph, source=0, depth_limit=4))

[(0, 0, 'forward'),
 (0, 1, 'forward'),
 (1, 2, 'forward'),
 (2, 3, 'forward'),
 (3, 4, 'forward'),
 (2, 3, 'reverse'),
 (1, 2, 'reverse'),
 (0, 1, 'reverse'),
 (0, 0, 'reverse')]

Note that edge (3, 4, 'reverse') is missing.

Expected Behavior

graph = nx.path_graph(5, nx.DiGraph)
list(nx.dfs_labeled_edges(graph, source=0, depth_limit=4))

[(0, 0, 'forward'),
 (0, 1, 'forward'),
 (1, 2, 'forward'),
 (2, 3, 'forward'),
 (3, 4, 'forward'),
 (3, 4, 'reverse'),
 (2, 3, 'reverse'),
 (1, 2, 'reverse'),
 (0, 1, 'reverse'),
 (0, 0, 'reverse')]

Environment

Python version: 3.10
NetworkX version: 2.8.8 and 3.0rc1

@dschult
Copy link
Member

dschult commented Nov 26, 2022

I can verify that there is a bug in dfs_labeled_edges for the reporting of reverse edges at the requested depth_level. The fix is to add an else clause for the check on depth_now > 1. And there are almost no tests currently for "reverse" edge reporting in the test suite. So we should add those.

@dschult
Copy link
Member

dschult commented Nov 26, 2022

This may not be a bug afterall... more like a question about the definition of a "reverse" edge.

A "reverse" edge is one in which both u and v have been visited and the edge is in the DFS tree. But with the depth_limit, we don't actually visit node v when we move forward along (u, v)... we reach the depth limit and stop that branch. Because we haven't visited that node, some people could argue that we shouldn't report the "reverse" edge.

One result is that for a fixed depth_limit, the preorder of nodes reports one depth more of nodes than the post-order.

But we can define the labeled DFS edges in either manner. For depth_limit 2, we could report the forward and reverse edges that we would-have-seen without a depth_limit for the edges that we touch. Or we could report the reverse edges only if we actually visit the node at the far end of the edge. I can't find any good definition of what constitutes a reverse edge (also called a back-edge) when there is a depth_limit to the traversal.

@ikarsokolov can you describe the application you are working with and why you expect there to be a reverse edge for each forward edge when the depth_limit is set?

I can see an advantage to ensuring that "reverse" edge is reported for each "forward" edge. You gain information about when you stopped looking at a node -- but you don't have information about whether the DFS had finished visiting that node or you reached the depth_limit. We can easily limit the impact of this change to just this function. What is the right thing to report?

@ikarsokolov
Copy link
Author

ikarsokolov commented Nov 26, 2022

@dschult thanks for quick and detailed responses!

can you describe the application you are working with and why you expect there to be a reverse edge for each forward edge when the depth_limit is set?

Simple example:
I have tree and want to do DFS from the root. I also have blacklist of nodes that should be excluded from DFS result with their complete subtrees. Something like "I want filtered iterator over tree without these particular nodes and all their children".

I listen to traversal events:

  • "forward" edge pointed to blacklisted node => blacklisted subtree traversal is started, subsequent nodes should be dropped from the result.
  • Matching "reverse" edge => exit from blacklisted node, subtree traversal is finished, subsequent nodes should be kept in the result.

If depth_limit is set and blacklisted node happens to be one level above limit I will never receive signal that node + it's subtree traversal is complete.

A "reverse" edge is one in which both u and v have been visited and the edge is in the DFS tree. But with the depth_limit, we don't actually visit node v when we move forward along (u, v)... we reach the depth limit and stop that branch. Because we haven't visited that node, some people could argue that we shouldn't report the "reverse" edge.

I see your point. I think in this case we should have another label besides "forward". Something like "examine".
See how this is done in boost graph library.
They handled traversal with visitor pattern but the principle is the same.

vis.examine_edge(e, g) is invoked on every out-edge of each vertex after it is discovered.
vis.tree_edge(e, g) is invoked on each edge as it becomes a member of the edges that form the search tree. If you wish to record predecessors, do so at this event point.
vis.back_edge(e, g) is invoked on the back edges in the graph.

examine_edge is missing label in NetworkX implementation.

The only drawback I see here is wasted computational power. "Examined" nodes on last depth level will not be included in final result. At least on my naive implementation. My graph has about ~1M nodes so a lot of nodes can be enumerated here,

@dschult
Copy link
Member

dschult commented Nov 27, 2022

With our code, you can get the "examine" trait of an edges by looking for etype in ("forward", "nontree"). So, I'm not sure it helps all that much. The difference I see is that the boost code has a "finish_vertex" moment. That's really what you are looking for: a point when you can be sure the tree from that node has been explored completely. That's what we use "reverse" to indicate.

But we have to figure out whether we consider the subtree-from-that-node to have been fully explored when the depth_limit stopped the exploration. Currently, we don't indicate that we stopped exploring the tree-from-that-node. I think it would be good to report the edge as "reverse-depth_limit" to make a note that a reverse edge should be indicated (we are done exploring that node), but that it isn't the same as a normal "reverse" edge. I'll put together a PR for this.

In the meantime, a workaround is for you to track the depth yourself, and when a "forward" edge reaches the depth_limit, the corresponding "reverse" edge is guaranteed to immediately follow that "forward" edge. So, you don't have to enter blacklisted mode at all. You know that no subtree nodes will be reported.
Something like:

if etype == "forward":
    if v in excluded_nodes:
        if depth < depth_limit:  # no need to blacklist the subtree if depth == depth_limit
            blacklisted_subtree = True

@ikarsokolov
Copy link
Author

@dschult I tried your patch and it works exactly like I wanted.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants