ENH : added `sort_neighbors` to all functions in `depth_first_search.py` #7196

Schefflera-Arboricola · 2023-12-29T16:42:37Z

[UPDATED]
ref #7132

added sort_neighbors to all functions in depth_first_search.py
added tests
updated links in docs
updated sort_neighbors docs in breadth_first_search.py
removed usage of callable(used for ignoring invalid sort_neighbors values) in both bfs and dfs functions

Thank you :)

…in docs of dfs

dschult

This is a nice idea -- we have the feature in other places, but it fits well here too. !
Thanks!

Is the sort_children function supposed to return an iterator, or an iterable?
The docs say iterator, but the code looks built for an iterable in that it takes iter of the returned value. I'm not sure which is the better choice here -- what do the other functions that take a sorting neighbors function do?

Another approach for the default value is to make it change to an iterator.

    if sort_children is None:
        sort_children = iter
    ...
    stack = [(start, sort_children(G[start]))]

That approach kind of requires that the return type be an iterator and not an iterable.
Think about what makes most sense to you.

Schefflera-Arboricola · 2024-01-01T11:36:29Z

Thanks for the review.

what do the other functions that take a sorting neighbors function do?

All the functions with sort_neighbors as kwarg in breadth_first_search.py have in their documentation iterator but they don't give an error even when an iterable is returned by the sort_neighbors function. The generic_bfs_edges function converts the output from the sort_neighbors function into an iterator, and all the other functions in the file use this function only.

# from breadth_first_search.py
if sort_neighbors is not None:
  _neighbors = neighbors
  neighbors = lambda node: iter(sort_neighbors(_neighbors(node)))

But, converting an iterable to an iterator doesn't make much sense because the computation has already been done(while creating the iterable) and then we convert it into an iterator so that we can do the computations(again) later. But, we can say that it will be more memory efficient.

I couldn't find a direct way of checking if a function returns an iterator or an iterable. Please let me know your thoughts on this. Right now, I've made the dfs functions so that if sort_children returns an iterable it is converted into iterator, like in bfs functions. And the documentation says iterator only.

Please let me know if I should add a note to clarify that if it returns an iterable then it would be converted to an iterator internally but it will be more efficient if sort_children returns an iterator.

Thank you

rossbar

I'm generally +1 on this, and I think the implementation preserves generators as desired (see inline comments).

One high-level question I had: should the name be sort_neighbors instead of sort_children? I think this is actually a more precise description of what is being ordered, and would also be consistent with the bfs functions.

rossbar · 2024-01-05T23:55:32Z

networkx/algorithms/traversal/depth_first_search.py

+    get_children = G.neighbors
+    if sort_children is not None and callable(sort_children):
+        get_children = lambda node: iter(sort_children(G.neighbors(node)))


FWIW I think this makes sense. From the other discussion in the PR, it seems like there's a bit of confusion on whether get_children should be an iterable or an iterator. IIUC, the confusion seems to arise from the fact that iterators (e.g. generators) are iterables (but not all iterables are iterators!).

So - if sort_children here were e.g. a generator, then this code would generate a function get_children that returned a generator (i.e. iterable) as desired.

if the sort_children function returns an iterable(e.g. list) then when we call the get_children later, then sort_children(G.neighbors(node)) will first do all the computation for sorting and return an iterable, and then it will be converted into an iterator by iter() function. But, when the sort_children returns an iterator the computation will be done here for child in children:, when the children is used. So, that's why I think it would be slightly better if sort_children returns an iterator, as compared to when it will return an iterable.

Looking at the sort_neighbors inputs within the BFS module, and the way the code here works, I think sort_children should be called sort_neighbors here even though other parts of the code call the neighbor a child or children. Those are internal names (I hope) so don't impact outside the function, and in the broader literature child and parent are not consistently one direction or the other in a directed graph. So, neighbor matches the BFS code within NetworkX and avoids the confusion of the words child and parent.

For iterator vs iterable, I think the code is fine, but the docs don’t match what the code does. In particular:

sort_neighbors (counter to some doc_strings in BFS) is given an iterator as input -- not an iterable. The docs are pretty clear that it should return an iterator -- but its use allows an iterable. It seems like we should change the docs to match what the code enforces/assumes.

For both the BFS and DFS it seems good to have the docs say that the function takes an iterator of nodes as input. That is what we do when we apply the function to G.neighbors or G.predecessors.

For both the BFS and DFS it seems good to have the docs say that the function should return an iterable, since that is what we require. (we could ask for it to return an iterator and remove all our calls to iter(sort_... but that would be backward incompatible with only very marginal speedup of the code. Memory use is hard to know because thesort_neighbors function could easily construct a list from it's input, sort that list and then return an iterator over the list. So the whole "iterator" memory advantage is likely lost whenever sorting is involved.)

The important parts of this discussion (in my view) are:

the input to the function is an iterator (can only be passed through once after which it is exhausted).

we take iter(...) of the output so it can be an iterable (like a list) even though it allows memory use reduction if the function is "thin" and returns an iterator.

the docs should reflect items 1) and 2), and I believe they currently don't.

Schefflera-Arboricola · 2024-01-07T16:53:19Z

One high-level question I had: should the name be sort_neighbors instead of sort_children? I think this is actually a more precise description of what is being ordered, and would also be consistent with the bfs functions.

I chose to go with sort_children because all the variable names in dfs functions were like child, children etc.

dschult

This review adds a couple of small tweaks to what was said above.
Let me know if there are questions.

[Edit:
Two more ideas (of course after I have press the button for the review)

Is there a reason the parameter descriptions are different in BFS from those in DFS? Should we make them all the same?
It might be good to put in the docs the best/most common choice of function: sorted.
Maybe like:

        A function that takes an iterator over nodes as the input, and
        returns an iterable of the same nodes with a custom ordering.
        A common choice is `sorted`.

networkx/algorithms/traversal/breadth_first_search.py

networkx/algorithms/traversal/depth_first_search.py

dschult

Thanks!

rossbar

Generally LGTM, thanks @Schefflera-Arboricola ! My main comment is that I think we should make the newly added kwarg keyword-only!

rossbar · 2024-01-17T18:16:16Z

networkx/algorithms/traversal/depth_first_search.py

@@ -15,7 +15,7 @@


 @nx._dispatchable
-def dfs_edges(G, source=None, depth_limit=None):
+def dfs_edges(G, source=None, depth_limit=None, sort_neighbors=None):


Since these kwargs are new in the dfs fns, let's make them keyword only!

Suggested change

def dfs_edges(G, source=None, depth_limit=None, sort_neighbors=None):

def dfs_edges(G, source=None, depth_limit=None, *, sort_neighbors=None):

networkx/algorithms/traversal/depth_first_search.py

rossbar · 2024-01-17T18:20:49Z

networkx/algorithms/traversal/depth_first_search.py

@@ -101,7 +110,7 @@ def dfs_edges(G, source=None, depth_limit=None):


 @nx._dispatchable
-def dfs_tree(G, source=None, depth_limit=None):
+def dfs_tree(G, source=None, depth_limit=None, sort_neighbors=None):


Suggested change

def dfs_tree(G, source=None, depth_limit=None, sort_neighbors=None):

def dfs_tree(G, source=None, depth_limit=None, *, sort_neighbors=None):

rossbar · 2024-01-17T18:22:40Z

networkx/algorithms/traversal/depth_first_search.py

        d[s].append(t)
    return dict(d)


 @nx._dispatchable
-def dfs_postorder_nodes(G, source=None, depth_limit=None):
+def dfs_postorder_nodes(G, source=None, depth_limit=None, sort_neighbors=None):


Suggested change

def dfs_postorder_nodes(G, source=None, depth_limit=None, sort_neighbors=None):

def dfs_postorder_nodes(G, source=None, depth_limit=None, *, sort_neighbors=None):

rossbar · 2024-01-17T18:22:59Z

networkx/algorithms/traversal/depth_first_search.py

    return (v for u, v, d in edges if d == "reverse")


 @nx._dispatchable
-def dfs_preorder_nodes(G, source=None, depth_limit=None):
+def dfs_preorder_nodes(G, source=None, depth_limit=None, sort_neighbors=None):


Suggested change

def dfs_preorder_nodes(G, source=None, depth_limit=None, sort_neighbors=None):

def dfs_preorder_nodes(G, source=None, depth_limit=None, *, sort_neighbors=None):

rossbar · 2024-01-17T18:24:21Z

networkx/algorithms/traversal/depth_first_search.py

    return (v for u, v, d in edges if d == "forward")


 @nx._dispatchable
-def dfs_labeled_edges(G, source=None, depth_limit=None):
+def dfs_labeled_edges(G, source=None, depth_limit=None, sort_neighbors=None):


Suggested change

def dfs_labeled_edges(G, source=None, depth_limit=None, sort_neighbors=None):

def dfs_labeled_edges(G, source=None, depth_limit=None, *, sort_neighbors=None):

Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

Schefflera-Arboricola · 2024-01-21T17:28:23Z

My main comment is that I think we should make the newly added kwarg keyword-only!

@rossbar I don't think I understand the motivation for doing this. And will we let users use sort_neighbors as a positional arg after 2-3 version releases... when it's old?

Thanks!

rossbar · 2024-01-21T20:42:22Z

I don't think I understand the motivation for doing this.

Thanks for the question! The main motivation is to enforce a policy that improves readability of code. For example, without keyword only arguments a user could write the following:

my_edges = nx.dfs_edges(G, 0, 2, sorted)

which will execute just fine. However, to someone reading this code, they might be confused about what all these function parameters represent and would be forced to look them up in the docs to really understand what's going on.

However, with the keyword-only restriction, the last parameter will have to be passed in via keyword, which arguably makes the code more readable:

my_edges = nx.dfs_edges(G, 0, 2, sort_neighbors=sorted)

Of course, it'd be better if the other two parameters were passed in with keywords too, but if we were to force them to be keyword-only that would break backwards compatibility. Since the sort_neighbors parameter is new for all the DFS functions, we have an opportunity to enforce this policy without breaking any compatibility!

And will we let users use sort_neighbors as a positional arg after 2-3 version releases... when it's old?

No, the intent would be to keep this policy in perpetuity. Again the only reason we limit ourselves to the sort_neighbors parameter is because it's a new parameter, so we can make it keyword-only without breaking any existing code!

rossbar

Thanks @Schefflera-Arboricola , this LGTM!

Schefflera-Arboricola · 2024-01-24T18:12:02Z

Thanks @rossbar and @dschult for reviewing!

…py` (networkx#7196) Added sort_children to all dfs funcs, added dfs tests, updated links in docs of dfs. Also updated the parameter description for sort_neighbors in all traversal functions that support it --------- Co-authored-by: Ross Barnowski <rossbar@caltech.edu>

added sort_children to all dfs funcs, added dfs tests, updated links …

78201d7

…in docs of dfs

Schefflera-Arboricola changed the title ~~added sort_children to all functions in depth_first_search.py~~ ENH : added sort_children to all functions in depth_first_search.py Dec 29, 2023

dschult added the type: Enhancements label Dec 30, 2023

dschult reviewed Dec 30, 2023

View reviewed changes

Schefflera-Arboricola added 3 commits January 1, 2024 17:07

updated sorting of children nodes- using iter

03ce2f7

bug : fixed get_children

bc44d9a

added notes to docs

9f9c27f

rossbar reviewed Jan 6, 2024

View reviewed changes

Schefflera-Arboricola and others added 6 commits January 8, 2024 12:28

Merge branch 'networkx:main' into dfs_sort

39d7943

changed sort_children to sort_neighbors

a09da19

removed note and updated sort_neighbors desc

2b5b256

updated sort_neighbors desc in bfs

ba825a0

style fix and typo fix

c323edd

style fix and updated dfs tests

5bd671b

Schefflera-Arboricola changed the title ~~ENH : added sort_children to all functions in depth_first_search.py~~ ENH : added sort_neighbors to all functions in depth_first_search.py Jan 8, 2024

dschult reviewed Jan 10, 2024

View reviewed changes

networkx/algorithms/traversal/breadth_first_search.py Outdated Show resolved Hide resolved

networkx/algorithms/traversal/depth_first_search.py Outdated Show resolved Hide resolved

Schefflera-Arboricola and others added 4 commits January 13, 2024 20:39

Merge branch 'main' into dfs_sort

4e9bef6

rm callable

052113e

rm sorted_children

632d696

updated sort_neighbors docs

67f2d84

dschult approved these changes Jan 14, 2024

View reviewed changes

Cosmetic changes to added tests.

09a18b1

rossbar reviewed Jan 17, 2024

View reviewed changes

making get_children def. concise

e5649e9

Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

Schefflera-Arboricola and others added 3 commits January 21, 2024 23:05

Merge branch 'networkx:main' into dfs_sort

ebe17d8

making get_children def concise

9860cd6

style fix

0afea09

Schefflera-Arboricola added 3 commits January 22, 2024 14:11

style fix

3197aa8

converting sort_neighbors keyword arg only

0d72845

style fix

f650c2c

rossbar approved these changes Jan 24, 2024

View reviewed changes

rossbar merged commit d6569ad into networkx:main Jan 24, 2024
38 of 39 checks passed

jarrodmillman added this to the 3.3 milestone Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH : added `sort_neighbors` to all functions in `depth_first_search.py` #7196

ENH : added `sort_neighbors` to all functions in `depth_first_search.py` #7196

Schefflera-Arboricola commented Dec 29, 2023 •

edited

dschult left a comment

Schefflera-Arboricola commented Jan 1, 2024 •

edited

rossbar left a comment

rossbar Jan 5, 2024

Schefflera-Arboricola Jan 7, 2024 •

edited

dschult Jan 7, 2024 •

edited

Schefflera-Arboricola commented Jan 7, 2024

dschult left a comment •

edited

dschult left a comment

rossbar left a comment

rossbar Jan 17, 2024

rossbar Jan 17, 2024

rossbar Jan 17, 2024

rossbar Jan 17, 2024

rossbar Jan 17, 2024

Schefflera-Arboricola commented Jan 21, 2024

rossbar commented Jan 21, 2024

rossbar left a comment

Schefflera-Arboricola commented Jan 24, 2024

	def dfs_edges(G, source=None, depth_limit=None, sort_neighbors=None):
	def dfs_edges(G, source=None, depth_limit=None, *, sort_neighbors=None):

	def dfs_tree(G, source=None, depth_limit=None, sort_neighbors=None):
	def dfs_tree(G, source=None, depth_limit=None, *, sort_neighbors=None):

	def dfs_postorder_nodes(G, source=None, depth_limit=None, sort_neighbors=None):
	def dfs_postorder_nodes(G, source=None, depth_limit=None, *, sort_neighbors=None):

	def dfs_preorder_nodes(G, source=None, depth_limit=None, sort_neighbors=None):
	def dfs_preorder_nodes(G, source=None, depth_limit=None, *, sort_neighbors=None):

	def dfs_labeled_edges(G, source=None, depth_limit=None, sort_neighbors=None):
	def dfs_labeled_edges(G, source=None, depth_limit=None, *, sort_neighbors=None):

ENH : added sort_neighbors to all functions in depth_first_search.py #7196

ENH : added sort_neighbors to all functions in depth_first_search.py #7196

Conversation

Schefflera-Arboricola commented Dec 29, 2023 • edited

dschult left a comment

Choose a reason for hiding this comment

Schefflera-Arboricola commented Jan 1, 2024 • edited

rossbar left a comment

Choose a reason for hiding this comment

rossbar Jan 5, 2024

Choose a reason for hiding this comment

Schefflera-Arboricola Jan 7, 2024 • edited

Choose a reason for hiding this comment

dschult Jan 7, 2024 • edited

Choose a reason for hiding this comment

Schefflera-Arboricola commented Jan 7, 2024

dschult left a comment • edited

Choose a reason for hiding this comment

dschult left a comment

Choose a reason for hiding this comment

rossbar left a comment

Choose a reason for hiding this comment

rossbar Jan 17, 2024

Choose a reason for hiding this comment

rossbar Jan 17, 2024

Choose a reason for hiding this comment

rossbar Jan 17, 2024

Choose a reason for hiding this comment

rossbar Jan 17, 2024

Choose a reason for hiding this comment

rossbar Jan 17, 2024

Choose a reason for hiding this comment

Schefflera-Arboricola commented Jan 21, 2024

rossbar commented Jan 21, 2024

rossbar left a comment

Choose a reason for hiding this comment

Schefflera-Arboricola commented Jan 24, 2024

ENH : added `sort_neighbors` to all functions in `depth_first_search.py` #7196

ENH : added `sort_neighbors` to all functions in `depth_first_search.py` #7196

Schefflera-Arboricola commented Dec 29, 2023 •

edited

Schefflera-Arboricola commented Jan 1, 2024 •

edited

Schefflera-Arboricola Jan 7, 2024 •

edited

dschult Jan 7, 2024 •

edited

dschult left a comment •

edited