Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Include singleton/trivial paths in all_simple_paths & other functions #6694

Merged
merged 10 commits into from Nov 28, 2023

Conversation

plammens
Copy link
Contributor

@plammens plammens commented May 16, 2023

The source being one of the targets shouldn't automatically exclude all other solutions.

Closes #6690, closes #6732

@plammens plammens marked this pull request as ready for review May 21, 2023 08:49
@plammens plammens marked this pull request as draft May 21, 2023 18:24
@plammens plammens force-pushed the fix/all-simple-paths-source-in-targets branch 2 times, most recently from 4df2485 to b7aefce Compare June 4, 2023 13:30
@plammens plammens marked this pull request as ready for review June 4, 2023 13:33
@plammens plammens force-pushed the fix/all-simple-paths-source-in-targets branch from b7aefce to d75e92d Compare June 4, 2023 13:41
@plammens plammens changed the title fix: all_simple_paths when source is in targets fix: Include singleton/trivial paths in all_simple_paths & other functions Jun 4, 2023
@plammens
Copy link
Contributor Author

plammens commented Jun 4, 2023

While fixing some of the newly tested issues, I found it more convenient to refactor the existing code. Now both all_simple_paths and all_simple_edge_paths use the same core implementation, both for regular graphs and multigraphs, so that the maintenance effort is brought down from 3 different implementations (and 2 public functions with input sanitization) to just 1 (with sanitization only defined in one of the two functions to avoid repetition).

While this PR is relatively large, I think all of these changes belong together and that it would be quite difficult to break it down into completely independent, self-coherent PRs.

@plammens
Copy link
Contributor Author

plammens commented Jun 4, 2023

This last commit (0adbfc6) is optional.

It's so that the size of current_path and the stack are always the same, to avoid having to constantly check if current_path is non-empty before popping, making it marginally more efficient. But perhaps it makes it slightly more ugly in some regards.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

  • I have two minor suggestions for readability of tests below.
  • We don't use typing in NetworkX, so could that be removed?
  • I'm still working through the code itself. Can you explain how current_path works. Is the incoming edge to node an edge between the previous node in the dict's order and node? I get that you are storing it as the value in this dict. But I'm trying to understand how the dict structure stores the path. And perhaps a related point: I think of finding/storing the path to be more efficient than storing the edge-path. Is there a reason you are forming the edge-path and then extracting the path, rather than constructing the path and then extracting the edge-path? (perhaps related to keys?)

networkx/algorithms/tests/test_simple_paths.py Outdated Show resolved Hide resolved
networkx/algorithms/tests/test_simple_paths.py Outdated Show resolved Hide resolved
@plammens plammens force-pushed the fix/all-simple-paths-source-in-targets branch from c0d9dbd to e365e40 Compare June 26, 2023 11:41
@plammens
Copy link
Contributor Author

plammens commented Jun 26, 2023

  • I have two minor suggestions for readability of tests below.
  • We don't use typing in NetworkX, so could that be removed?

Done!

  • I'm still working through the code itself. Can you explain how current_path works.

It's a dictionary which maps the node in the path to the edge that was used to enter that node in the path. Moreover, since dictionaries preserve insertion order, it is also contains the sequence of edges that forms the path. For instance, in the path graph with two nodes, 0 and 1, the path from 0 to 1 would be represented in this dictionary as {0: (None, 0), 1: (0, 1)}. Except for a small detail I'm omitting: the dictionary always starts with the dummy pair None: None instead of source: None because of what the comments above say:

To avoid unnecessary checks, the loop is structured in a way such that a path is considered for yielding only after a new node/edge is added.
We bootstrap the search by adding a dummy iterator to the stack that only yields a dummy edge to source (so that the trivial path has a chance of being included).

Namely it allows the trivial path to be considered as part of the general loop instead of having to deal with it separately as a special case (a separate if statement). The dummy value also helps in keeping the size of the dictionary and the size of the iterators stack synchronized, so that if you can pop one you can always pop the other without having to check manually if it's empty.

So the path mentioned above would be represented as {None: None, 0: (None, 0), 1: (0, 1)}. To extract the edge path, we now just have to extract the sequence of values, removing the first two (the first because it's the dummy value, the second because there is no node that is used to enter the source, since we start at the source).

By the way, I added this dummy value in d7bd2f5 as an optional refactor, you can look at the commit diff to see the alternative (how it was before). I made this as a separate commit so that it can easily be dropped if you prefer it the other way.

Is the incoming edge to node an edge between the previous node in the dict's order and node?

Exactly.

And perhaps a related point: I think of finding/storing the path to be more efficient than storing the edge-path. Is there a reason you are forming the edge-path and then extracting the path, rather than constructing the path and then extracting the edge-path? (perhaps related to keys?)

I assume that you're asking why all_simple_paths is implemented in terms of all_simple_edge_paths and not the other way around:

for edge_path in all_simple_edge_paths(G, source, target, cutoff):
yield [source] + [edge[1] for edge in edge_path]

The reason for extracting the node path from the edge path rather than the other way around is, as you suspected, multigraphs: you can always extract the node path from an edge path, but you lose information if you try to extract the edge path from a node path because there might be more than one edge between nodes in the case of a multigraph. For instance, consider the multigraph with edges (0, 1, 'A'), (0, 1, 'B'). The only node path from 0 to 1 is [0, 1], but if we have to reconstruct the edge path it's not possible to know whether it was [(0, 1, 'A')] or [(0, 1, 'B')].

By doing it this way around it is possible to keep a single implementation for all cases, including multigraphs.

The reason for using a dictionary with nodes as keys (instead of e.g. storing just a list of edges), is so that when we're considering a new node to add to the path, we can check in O(1) time whether that node is already in the path.


Does this explanation make it clearer? Feel free to ask again if there is anything else that you'd like me to explain better. I'm open to suggestions if you think the code or the comments can be made clearer :)

@plammens plammens force-pushed the fix/all-simple-paths-source-in-targets branch 2 times, most recently from 9a8d8c2 to 997a9e3 Compare June 26, 2023 12:23
@plammens
Copy link
Contributor Author

There are some unrelated tests that are failing in CI that I don't know how to fix. I already rebased onto the latest master.

@rossbar
Copy link
Contributor

rossbar commented Jun 26, 2023

There are some unrelated tests that are failing in CI that I don't know how to fix. I already rebased onto the latest master.

I suspect this is due to the latest release of scipy (which came out yesterday). No need to worry about it in this PR, thanks for trying a rebase though!

@plammens plammens force-pushed the fix/all-simple-paths-source-in-targets branch from 90dd5f9 to 56750af Compare July 28, 2023 13:53
@plammens
Copy link
Contributor Author

@dschult Any updates on this? I've rebased again to see if the CI fails disappear.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay/silence. I've looked through this again and I think we're all set to merge. I approve this PR.

Copy link
Member

@MridulS MridulS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @plammens !

@MridulS MridulS added this to the 3.3 milestone Nov 28, 2023
@MridulS MridulS merged commit 16480b1 into networkx:main Nov 28, 2023
38 checks passed
cvanelteren pushed a commit to cvanelteren/networkx that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants