fix: Include singleton/trivial paths in all_simple_paths & other functions #6694

plammens · 2023-05-16T21:36:21Z

The source being one of the targets shouldn't automatically exclude all other solutions.

Closes #6690, closes #6732

Waiting on a response to all_simple_paths returns empty generator when the source is one of the possible targets #6690 (comment) before marking as ready for review.
Report the single node paths (and show this in doc_string examples).
Add a test that these functions report single node paths.
Add tests to the other functions mentioned above to ensure that they treat single node paths as paths. Those tests may already exist. But if they don't it'd be good to add them.
For consistency, report empty edge paths in all_simple_edge_paths (pending discussion, see all_simple_paths returns empty generator when the source is one of the possible targets #6690 (comment))

plammens · 2023-06-04T16:06:55Z

While fixing some of the newly tested issues, I found it more convenient to refactor the existing code. Now both all_simple_paths and all_simple_edge_paths use the same core implementation, both for regular graphs and multigraphs, so that the maintenance effort is brought down from 3 different implementations (and 2 public functions with input sanitization) to just 1 (with sanitization only defined in one of the two functions to avoid repetition).

While this PR is relatively large, I think all of these changes belong together and that it would be quite difficult to break it down into completely independent, self-coherent PRs.

plammens · 2023-06-04T16:35:08Z

This last commit (0adbfc6) is optional.

It's so that the size of current_path and the stack are always the same, to avoid having to constantly check if current_path is non-empty before popping, making it marginally more efficient. But perhaps it makes it slightly more ugly in some regards.

dschult

Thanks for this!

I have two minor suggestions for readability of tests below.
We don't use typing in NetworkX, so could that be removed?
I'm still working through the code itself. Can you explain how current_path works. Is the incoming edge to node an edge between the previous node in the dict's order and node? I get that you are storing it as the value in this dict. But I'm trying to understand how the dict structure stores the path. And perhaps a related point: I think of finding/storing the path to be more efficient than storing the edge-path. Is there a reason you are forming the edge-path and then extracting the path, rather than constructing the path and then extracting the edge-path? (perhaps related to keys?)

networkx/algorithms/tests/test_simple_paths.py

plammens · 2023-06-26T12:07:10Z

I have two minor suggestions for readability of tests below.

We don't use typing in NetworkX, so could that be removed?

Done!

I'm still working through the code itself. Can you explain how current_path works.

It's a dictionary which maps the node in the path to the edge that was used to enter that node in the path. Moreover, since dictionaries preserve insertion order, it is also contains the sequence of edges that forms the path. For instance, in the path graph with two nodes, 0 and 1, the path from 0 to 1 would be represented in this dictionary as {0: (None, 0), 1: (0, 1)}. Except for a small detail I'm omitting: the dictionary always starts with the dummy pair None: None instead of source: None because of what the comments above say:

To avoid unnecessary checks, the loop is structured in a way such that a path is considered for yielding only after a new node/edge is added.
We bootstrap the search by adding a dummy iterator to the stack that only yields a dummy edge to source (so that the trivial path has a chance of being included).

Namely it allows the trivial path to be considered as part of the general loop instead of having to deal with it separately as a special case (a separate if statement). The dummy value also helps in keeping the size of the dictionary and the size of the iterators stack synchronized, so that if you can pop one you can always pop the other without having to check manually if it's empty.

So the path mentioned above would be represented as {None: None, 0: (None, 0), 1: (0, 1)}. To extract the edge path, we now just have to extract the sequence of values, removing the first two (the first because it's the dummy value, the second because there is no node that is used to enter the source, since we start at the source).

By the way, I added this dummy value in d7bd2f5 as an optional refactor, you can look at the commit diff to see the alternative (how it was before). I made this as a separate commit so that it can easily be dropped if you prefer it the other way.

Is the incoming edge to node an edge between the previous node in the dict's order and node?

Exactly.

And perhaps a related point: I think of finding/storing the path to be more efficient than storing the edge-path. Is there a reason you are forming the edge-path and then extracting the path, rather than constructing the path and then extracting the edge-path? (perhaps related to keys?)

I assume that you're asking why all_simple_paths is implemented in terms of all_simple_edge_paths and not the other way around:

networkx/networkx/algorithms/simple_paths.py

Lines 255 to 256 in e365e40

    
           for edge_path in all_simple_edge_paths(G, source, target, cutoff): 
        
               yield [source] + [edge[1] for edge in edge_path]

The reason for extracting the node path from the edge path rather than the other way around is, as you suspected, multigraphs: you can always extract the node path from an edge path, but you lose information if you try to extract the edge path from a node path because there might be more than one edge between nodes in the case of a multigraph. For instance, consider the multigraph with edges (0, 1, 'A'), (0, 1, 'B'). The only node path from 0 to 1 is [0, 1], but if we have to reconstruct the edge path it's not possible to know whether it was [(0, 1, 'A')] or [(0, 1, 'B')].

By doing it this way around it is possible to keep a single implementation for all cases, including multigraphs.

The reason for using a dictionary with nodes as keys (instead of e.g. storing just a list of edges), is so that when we're considering a new node to add to the path, we can check in O(1) time whether that node is already in the path.

Does this explanation make it clearer? Feel free to ask again if there is anything else that you'd like me to explain better. I'm open to suggestions if you think the code or the comments can be made clearer :)

plammens · 2023-06-26T12:29:19Z

There are some unrelated tests that are failing in CI that I don't know how to fix. I already rebased onto the latest master.

rossbar · 2023-06-26T12:40:20Z

There are some unrelated tests that are failing in CI that I don't know how to fix. I already rebased onto the latest master.

I suspect this is due to the latest release of scipy (which came out yesterday). No need to worry about it in this PR, thanks for trying a rebase though!

…tions

So that the size of current_path and the stack are always the same, to avoid having to check `if current_path` is non-empty before popping.K

Co-authored-by: Dan Schult <dschult@colgate.edu>

plammens · 2023-07-28T13:55:21Z

@dschult Any updates on this? I've rebased again to see if the CI fails disappear.

dschult

Sorry for the delay/silence. I've looked through this again and I think we're all set to merge. I approve this PR.

MridulS

Thanks @plammens !

…tions (networkx#6694)

plammens marked this pull request as ready for review May 21, 2023 08:49

plammens marked this pull request as draft May 21, 2023 18:24

dschult mentioned this pull request May 29, 2023

all_simple_paths returns empty generator when the source is one of the possible targets #6690

Closed

plammens mentioned this pull request Jun 3, 2023

all_simple_paths on MultiGraph fails when multiple targets are reached exactly at the cutoff #6732

Closed

plammens force-pushed the fix/all-simple-paths-source-in-targets branch 2 times, most recently from 4df2485 to b7aefce Compare June 4, 2023 13:30

plammens marked this pull request as ready for review June 4, 2023 13:33

plammens force-pushed the fix/all-simple-paths-source-in-targets branch from b7aefce to d75e92d Compare June 4, 2023 13:41

plammens changed the title ~~fix: all_simple_paths when source is in targets~~ fix: Include singleton/trivial paths in all_simple_paths & other functions Jun 4, 2023

dschult reviewed Jun 12, 2023

View reviewed changes

networkx/algorithms/tests/test_simple_paths.py Outdated Show resolved Hide resolved

networkx/algorithms/tests/test_simple_paths.py Outdated Show resolved Hide resolved

plammens force-pushed the fix/all-simple-paths-source-in-targets branch from c0d9dbd to e365e40 Compare June 26, 2023 11:41

plammens force-pushed the fix/all-simple-paths-source-in-targets branch 2 times, most recently from 9a8d8c2 to 997a9e3 Compare June 26, 2023 12:23

plammens and others added 10 commits July 28, 2023 15:53

test: Add test for networkx#6732

5b01716

refactor: Merge _all_simple_paths for Graph and MultiGraph

c94b10e

fix: Report singleton paths in all_simple_paths

440137f

refactor: Use yield from syntax and remove _empty_generator

0568155

tests: Add tests for singleton/trivial path to other path functions

8768e81

refactor: Merge all_simple_paths and all_simple_edge_paths implementa…

608e3fa

…tions

refactor: Add dummy node to start of current_path

0ac3a3e

So that the size of current_path and the stack are always the same, to avoid having to check `if current_path` is non-empty before popping.K

style: Apply readability suggestions from code review

d99477f

Co-authored-by: Dan Schult <dschult@colgate.edu>

style: Remove type annotations

957bb6e

docs: Add note regarding empty path to all_simple_edge_paths

56750af

plammens force-pushed the fix/all-simple-paths-source-in-targets branch from 90dd5f9 to 56750af Compare July 28, 2023 13:53

dschult approved these changes Nov 28, 2023

View reviewed changes

MridulS approved these changes Nov 28, 2023

View reviewed changes

MridulS added the type: Bug fix label Nov 28, 2023

MridulS added this to the 3.3 milestone Nov 28, 2023

MridulS merged commit 16480b1 into networkx:main Nov 28, 2023
38 checks passed

andrew-causalens mentioned this pull request Apr 10, 2024

Improvement: Bound networkx to less than 3.3 causalens/cai-causal-graph#66

Merged

17 tasks

cvanelteren pushed a commit to cvanelteren/networkx that referenced this pull request Apr 22, 2024

fix: Include singleton/trivial paths in all_simple_paths & other func…

ad09b66

…tions (networkx#6694)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Include singleton/trivial paths in all_simple_paths & other functions #6694

fix: Include singleton/trivial paths in all_simple_paths & other functions #6694

plammens commented May 16, 2023 •

edited

plammens commented Jun 4, 2023

plammens commented Jun 4, 2023

dschult left a comment

plammens commented Jun 26, 2023 •

edited

plammens commented Jun 26, 2023

rossbar commented Jun 26, 2023

plammens commented Jul 28, 2023

dschult left a comment

MridulS left a comment

fix: Include singleton/trivial paths in all_simple_paths & other functions #6694

fix: Include singleton/trivial paths in all_simple_paths & other functions #6694

Conversation

plammens commented May 16, 2023 • edited

plammens commented Jun 4, 2023

plammens commented Jun 4, 2023

dschult left a comment

Choose a reason for hiding this comment

plammens commented Jun 26, 2023 • edited

plammens commented Jun 26, 2023

rossbar commented Jun 26, 2023

plammens commented Jul 28, 2023

dschult left a comment

Choose a reason for hiding this comment

MridulS left a comment

Choose a reason for hiding this comment

plammens commented May 16, 2023 •

edited

plammens commented Jun 26, 2023 •

edited