Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix-for-issue-6881 (removing non-terminal leaves) #7136

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jaki729
Copy link

@jaki729 jaki729 commented Nov 30, 2023

ref-- NetworkX GitHub Issue #6881
This updated function now checks if there is only one node left in the graph after removing non-terminal leaves. If there's only one node left, it exits the loop, preventing further removal of non-terminal leaves. This should produce the desired behavior as outlined in the provided code snippet.

def _remove_nonterminal_leaves(G, terminals):
    """Remove non-terminal leaves iteratively."""
    terminals_set = set(terminals)
    while True:
        non_terminal_leaves = [n for n in G.nodes if n not in terminals_set and G.degree(n) == 1]
        if not non_terminal_leaves:
            break
        G.remove_nodes_from(non_terminal_leaves)
        if len(G.nodes) == 1:  # If only one node left, exit loop
            break

@jaki729 jaki729 changed the title bugfix-for-issue-6881 bugfix-for-issue-6881 (removing non-terminal leaves) Nov 30, 2023
Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need at least one test but otherwise looks on the right track.

Comment on lines +227 to +230

# Remove non-terminal leaves from the generated tree
_remove_nonterminal_leaves(T, terminal_nodes)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Isn't _remove_nonterminal_leaves called as required in the functions that require it already?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is necessary and now resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced - isn't _remove_nonterminal_leaves called as needed by algo in L220?

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!
If I understand correctly this isn't actually a bug, but a speed-up. right? Removing one node at a time doesn't give wrong answers, it only makes it slower.

Also, I don't understand the logic behind the check for having one node left.
It seems to me that this process could easily skip over the case of one node left. You should add a test for the various possibilities (for sure). But could you explain why you want one node left and what to do if a tree/path shrinks from both ends until there are two nodes left connected by a single edge. The next iteration will result in no nodes.

  1. why is having no nodes an issue -- isn't that a reasonable outcome?
  2. the previous version of the code would stop at 1 node (because the degree would be zero -- not 1.) The PR seems to give different results. Can you find a workaround?

Comment on lines +118 to +126
while True:
non_terminal_leaves = [
n for n in G.nodes if n not in terminals_set and G.degree(n) == 1
]
if not non_terminal_leaves:
break
G.remove_nodes_from(non_terminal_leaves)
if len(G.nodes) == 1: # If only one node left, exit loop
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while True:
non_terminal_leaves = [
n for n in G.nodes if n not in terminals_set and G.degree(n) == 1
]
if not non_terminal_leaves:
break
G.remove_nodes_from(non_terminal_leaves)
if len(G.nodes) == 1: # If only one node left, exit loop
break
non_terminal_leaves = [n for n, deg in G.degree if deg == 1]
while non_terminal_leaves and len(G) > 1:
G.remove_nodes_from(non_terminal_leaves)
non_terminal_leaves = [n for n, deg in G.degree if deg == 1]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic behind checking for having one node left after removing non-terminal leaves is related to ensuring that the process terminates appropriately. The function _remove_nonterminal_leaves is meant to trim non-terminal leaves iteratively until only terminal nodes or a single node (root) remains.

The logic behind checking for one node left and address the concern you've raised about the possibility of ending up with no nodes:

  1. Purpose of Having One Node Left:

    • The idea is to ensure that the pruning process stops when the tree or path structure has been simplified to its minimal form, containing only the essential structure without any unnecessary non-terminal nodes.
    • It's a way to ensure that the resulting structure reflects a meaningful representation of the input graph or subgraph.
  2. Handling the Case of Shrinking to Two Nodes Connected by a Single Edge:

    • If the process shrinks the structure to just two nodes connected by an edge, the current implementation doesn't account for this specific scenario and would erroneously lead to no nodes in the subsequent iteration.
    • Having no nodes would not be a reasonable outcome in the context of this algorithm because the intention is to preserve the tree or path structure.
  3. Difference in Behavior from the Previous Version:

    • You're correct that the previous version stopped at nodes with a degree of zero. However, the new logic checks for nodes with a degree of one, which might lead to a different termination condition and potentially different results.

A workaround to ensure proper termination without resulting in no nodes could involve modifying the termination condition. Instead of checking for one node left, you could modify the condition to stop when only two nodes connected by an edge remain. This could involve adjusting the while loop condition and the termination logic within the _remove_nonterminal_leaves function to account for this specific scenario.

For example, the modification could be something like this:

while len(G.nodes) > 2:
    non_terminal_leaves = [
        n for n in G.nodes if n not in terminals_set and G.degree(n) == 1
    ]
    if not non_terminal_leaves:
        break
    G.remove_nodes_from(non_terminal_leaves)

This adjustment would ensure that the process stops when only two nodes connected by an edge are left, preserving the core structure of the tree or path while removing non-terminal leaves.

It's essential to test this modification thoroughly to ensure it behaves as intended across various graph structures and scenarios.

Copy link
Member

@dschult dschult Dec 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for that description -- very helpful.

Note: The existing code (before this PR) stops with a single node for each connected component when presented with e.g. two distinct paths. So it stops with 2 nodes while the PR code stops with 2 nodes, 1 node or with 0 nodes depending on whether the lengths of the paths are odd or even. So more work is needed here to clarify what is going on.

My current tangent in this discussion (:)) is that I don't understand the significance of stopping with a single node. The empty graph is a good way to handle nodes that are not connected to any terminal nodes. And the null graph (no nodes and no edges) is a path and thus also a tree. Don't we want to return that?

For example, consider the graph containing two paths, with two terminal nodes in one of the paths. The steiner tree should be the path (nodes and edges) between the two terminal nodes. The nodes and edges on the other path should all be removed during this _remove_nonterminal_leaves process. We should not leave isolated nodes around like the main repo version of this function does. The PR code is better in that it returns the right path in this case (the single node case never occurs because there are terminal nodes).

So, when do we need the single code restriction? When there are no terminal nodes. And in that case, the tree could/should be the null-tree. Right? It doesn't need a root -- at least the Wikipedia Steiner Tree page doesn't talk about roots of trees at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your thoughts on the significance of stopping with a single node, your perspective is valid. The empty graph or null graph indeed represents valid states in certain contexts, especially when there are no terminal nodes or when the structure condenses down to that minimal form.

In the context of finding the Steiner tree, the primary aim is to identify the minimal subgraph that connects all the terminal nodes. If there are no terminal nodes, having the null tree as a representation could be a suitable outcome, considering the Steiner tree problem doesn't necessarily require a root node and focuses on connecting the specified terminals efficiently.

Your example of having two paths with terminal nodes in one path implies that the Steiner tree should be the path connecting those terminals, effectively removing non-essential nodes and edges.

The notion of stopping with a single node might be a specific condition from previous implementations, perhaps as a means to handle scenarios where there are no terminal nodes or to ensure a minimum representation of the graph. However, as you rightly point out, in the context of the Steiner tree problem, the null tree could be a more suitable representation when there are no terminal nodes, and there might not be a strict necessity to limit the outcome to a single node.

Considering the Wikipedia page on the Steiner tree problem doesn't explicitly discuss roots within trees for this context, focusing on the connectivity and minimization of the subgraph might be the primary objective.

Therefore, modifying the algorithm to consider these scenarios where the null tree or minimal representations suffice in the absence of terminal nodes could potentially enhance its accuracy and alignment with the problem's objectives. This could involve refining the termination conditions or logic within the _remove_nonterminal_leaves function to cater to these scenarios effectively.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assume a function _remove_nonterminal_leaves that removes non-terminal leaves from a graph until a minimal representation is achieved. We'll modify it to handle scenarios where there are no terminal nodes or when the minimal representation is the null tree.

import networkx as nx

def _remove_nonterminal_leaves(G, terminals_set):
    while True:
        non_terminal_leaves = [
            n for n in G.nodes if n not in terminals_set and G.degree(n) == 1
        ]
        if not non_terminal_leaves:
            break
        G.remove_nodes_from(non_terminal_leaves)
    
    # Check for specific scenarios when terminating
    if len(G.nodes) == 0:
        # Null graph case
        return nx.Graph()  # Return an empty graph
    elif len(G.nodes) == 1:
        # Single node case
        return list(G.nodes)[0]  # Return the single node
    else:
        # More than one node, return the remaining graph
        return G

# Create a graph with two paths and no terminal nodes
G = nx.path_graph(4)
H = nx.path_graph(5)
G.add_nodes_from(H.nodes)
G.add_edges_from(H.edges)

# Run the modified function
terminals = set()  # No terminal nodes in this case
result = _remove_nonterminal_leaves(G, terminals)

print("Resulting Graph or Node:")
if isinstance(result, nx.Graph):
    print("Null Tree: Empty graph")
elif isinstance(result, str):
    print("Single Node:", result)
else:
    print("Remaining Graph Nodes:", result.nodes)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import pytest
from graphviz import Digraph

def _remove_nonterminal_leaves(G, terminals):
    terminals_set = set(terminals)
    while True:
        non_terminal_leaves = [
            n for n in G.nodes if n not in terminals_set and G.degree(n) == 1
        ]
        if not non_terminal_leaves:
            break
        G.remove_nodes_from(non_terminal_leaves)
        if len(G.nodes) == 1: # If only one node left, exit loop
            break

def test_remove_nonterminal_leaves():
    G = Digraph()
    G.edges(['ab', 'bc', 'ca', 'ad', 'ae', 'af', 'ag', 'gh', 'hi', 'ij', 'ik', 'il', 'in', 'jm', 'jn', 'j0', '01', '12', '23', '34', '45', '56', '67', '78', '89', '9a', 'aa']))

    terminals = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a']
    non_terminal_leaves = [n for n, deg in G.degree if deg == 1 and n not in terminals]

    assert len(non_terminal_leaves) == 20

    _remove_nonterminal_leaves(G, terminals)

    non_terminal_leaves = [n for n, deg in G.degree if deg == 1 and n not in terminals]

    assert len(non_terminal_leaves) == 0


def test_remove_nonterminal_leaves_single_node():
    G = Digraph()
    G.edges(['ab', 'bc', 'ca', 'ad', 'ae', 'af', 'ag', 'gh', 'hi', 'ij', 'ik', 'il', 'in', 'jm', 'jn', 'j0', '01', '12', '23', '34', '45', '56', '67', '78', '89', '9a', 'aa']))

    terminals = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a']
    non_terminal_leaves = [n for n, deg in G.degree if deg == 1 and n not in terminals]

    assert len(non_terminal_leaves) == 20

    _remove_nonterminal_leaves(G, terminals)

    assert len(G.nodes) == 1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More test cases

Test Case 1: Basic scenario

import networkx as nx

# Create a simple graph
G = nx.Graph()
G.add_edges_from([(1, 2), (2, 3), (3, 4), (4, 5)])
terminals = [1, 5]

_remove_nonterminal_leaves(G, terminals)

# Assert the resulting graph after removal
assert sorted(G.nodes) == terminals  # Check if the resulting nodes are the terminals
assert sorted(G.edges) == [(1, 2), (4, 5)]  # Check if the resulting edges are as expected

Test Case 2: Graph with single node and no terminal specified

import networkx as nx

# Create a graph with a single node
G = nx.Graph()
G.add_node(1)

# No terminals specified
terminals = []

_remove_nonterminal_leaves(G, terminals)

# Assert the resulting graph after removal
assert sorted(G.nodes) == [1]  # The single node should remain in the graph
assert len(G.edges) == 0  # There should be no edges

Test Case 3: Graph with multiple branches and terminals specified

import networkx as nx

# Create a graph with multiple branches
G = nx.Graph()
G.add_edges_from([(1, 2), (2, 3), (2, 4), (4, 5), (4, 6)])
terminals = [1, 3, 5, 6]

_remove_nonterminal_leaves(G, terminals)

# Assert the resulting graph after removal
assert sorted(G.nodes) == terminals  # Check if the resulting nodes are the terminals
assert sorted(G.edges) == [(1, 2), (3, 2), (5, 4), (6, 4)]  # Check if the resulting edges are as expected

Absolutely! Here are a few more test cases for the `_remove_nonterminal_leaves` function:

### Test Case 4: Graph with no non-terminal leaves
```python
import networkx as nx

# Create a graph where all nodes are terminals
G = nx.path_graph(5)
terminals = [0, 2, 4]

_remove_nonterminal_leaves(G, terminals)

# Assert the resulting graph after removal
assert sorted(G.nodes) == terminals  # All nodes should remain as terminals
assert sorted(G.edges) == [(0, 1), (1, 2), (2, 3), (3, 4)]  # Edges should remain intact

Test Case 5: Graph with no terminals

import networkx as nx

# Create a graph with branches but no terminals
G = nx.Graph()
G.add_edges_from([(1, 2), (2, 3), (3, 4), (3, 5), (5, 6)])

terminals = []

_remove_nonterminal_leaves(G, terminals)

# Assert the resulting graph after removal
assert sorted(G.nodes) == [1, 2, 3, 4, 5, 6]  # All nodes should remain
assert sorted(G.edges) == [(1, 2), (2, 3), (3, 4), (3, 5), (5, 6)]  # Edges should remain intact

Test Case 6: Graph with no edges

import networkx as nx

# Create a graph with no edges
G = nx.Graph()
G.add_nodes_from([1, 2, 3])
terminals = [1, 2, 3]

_remove_nonterminal_leaves(G, terminals)

# Assert the resulting graph after removal
assert sorted(G.nodes) == terminals  # All nodes should remain as terminals
assert len(G.edges) == 0  # There should be no edges in the graph

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to remove nodes. The algorithms do not remove any nodes. They only talk about removing edges.
So this whole discussion about remaining nodes is irrelevant. We only need to remove edges. Any isolated nodes become part of the "steiner tree" that is returned.

Or am I missing something...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay let me try it once

@OrionSehn
Copy link

OrionSehn commented Apr 24, 2024

I understand that the papers say just to remove edges. Without having read through the entirety of this I had over an afternoon had gone ahead and implemented an approach similar to Dan's earlier suggestion: #7136 (comment)

def _remove_nonterminal_leaves(G, terminals):
    terminals_set = set(terminals)
    degree_one_nodes = {n for n in G if G.degree(n) == 1}
    nonterminal_leaves = degree_one_nodes - terminals_set
    while nonterminal_leaves:
        possible_nonterminal_leaves = set()
        for n in nonterminal_leaves:
            possible_nonterminal_leaves = possible_nonterminal_leaves | set(
                G.neighbors(n)
            )
        G.remove_nodes_from(nonterminal_leaves)
        nonterminal_leaves = {
            leaf for leaf in possible_nonterminal_leaves if G.degree(leaf) == 1
        } - terminals_set

This is implemented here:
#7422

This approach is slightly different, because once you remove a single degree node, its only the neighbors that you'd need to check to see if the removal of that single degree node would result in additional zero degree nodes, which would mean you have to check the degrees far fewer times in the case of a leaf being made by trimming a leaf.

The only operations that are run after this function are to get a subgraph of the original with whatever edges are specified by this copy of the graph, and removing the nodes removes the single attached edge of this copy.

Why we can't just remove nodes in that copy of the graph, as we only return the edges one line later for both options. Those additional empty nodes wouldn't exist for a line single line beyond this function.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants