Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH : added sort_neighbors to all functions in depth_first_search.py #7196

Merged
merged 22 commits into from Jan 24, 2024

Conversation

Schefflera-Arboricola
Copy link
Contributor

@Schefflera-Arboricola Schefflera-Arboricola commented Dec 29, 2023

[UPDATED]
ref #7132

  1. added sort_neighbors to all functions in depth_first_search.py
  2. added tests
  3. updated links in docs
  4. updated sort_neighbors docs in breadth_first_search.py
  5. removed usage of callable(used for ignoring invalid sort_neighbors values) in both bfs and dfs functions

Thank you :)

@Schefflera-Arboricola Schefflera-Arboricola changed the title added sort_children to all functions in depth_first_search.py ENH : added sort_children to all functions in depth_first_search.py Dec 29, 2023
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice idea -- we have the feature in other places, but it fits well here too. !
Thanks!

Is the sort_children function supposed to return an iterator, or an iterable?
The docs say iterator, but the code looks built for an iterable in that it takes iter of the returned value. I'm not sure which is the better choice here -- what do the other functions that take a sorting neighbors function do?

Another approach for the default value is to make it change to an iterator.

    if sort_children is None:
        sort_children = iter
    ...
    stack = [(start, sort_children(G[start]))]

That approach kind of requires that the return type be an iterator and not an iterable.
Think about what makes most sense to you.

@Schefflera-Arboricola
Copy link
Contributor Author

Schefflera-Arboricola commented Jan 1, 2024

Thanks for the review.

what do the other functions that take a sorting neighbors function do?

All the functions with sort_neighbors as kwarg in breadth_first_search.py have in their documentation iterator but they don't give an error even when an iterable is returned by the sort_neighbors function. The generic_bfs_edges function converts the output from the sort_neighbors function into an iterator, and all the other functions in the file use this function only.

# from breadth_first_search.py
if sort_neighbors is not None:
  _neighbors = neighbors
  neighbors = lambda node: iter(sort_neighbors(_neighbors(node)))

But, converting an iterable to an iterator doesn't make much sense because the computation has already been done(while creating the iterable) and then we convert it into an iterator so that we can do the computations(again) later. But, we can say that it will be more memory efficient.

I couldn't find a direct way of checking if a function returns an iterator or an iterable. Please let me know your thoughts on this. Right now, I've made the dfs functions so that if sort_children returns an iterable it is converted into iterator, like in bfs functions. And the documentation says iterator only.

Please let me know if I should add a note to clarify that if it returns an iterable then it would be converted to an iterator internally but it will be more efficient if sort_children returns an iterator.

Thank you

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally +1 on this, and I think the implementation preserves generators as desired (see inline comments).

One high-level question I had: should the name be sort_neighbors instead of sort_children? I think this is actually a more precise description of what is being ordered, and would also be consistent with the bfs functions.

Comment on lines 88 to 90
get_children = G.neighbors
if sort_children is not None and callable(sort_children):
get_children = lambda node: iter(sort_children(G.neighbors(node)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think this makes sense. From the other discussion in the PR, it seems like there's a bit of confusion on whether get_children should be an iterable or an iterator. IIUC, the confusion seems to arise from the fact that iterators (e.g. generators) are iterables (but not all iterables are iterators!).

So - if sort_children here were e.g. a generator, then this code would generate a function get_children that returned a generator (i.e. iterable) as desired.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the sort_children function returns an iterable(e.g. list) then when we call the get_children later, then sort_children(G.neighbors(node)) will first do all the computation for sorting and return an iterable, and then it will be converted into an iterator by iter() function. But, when the sort_children returns an iterator the computation will be done here for child in children:, when the children is used. So, that's why I think it would be slightly better if sort_children returns an iterator, as compared to when it will return an iterable.

Copy link
Member

@dschult dschult Jan 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the sort_neighbors inputs within the BFS module, and the way the code here works, I think sort_children should be called sort_neighbors here even though other parts of the code call the neighbor a child or children. Those are internal names (I hope) so don't impact outside the function, and in the broader literature child and parent are not consistently one direction or the other in a directed graph. So, neighbor matches the BFS code within NetworkX and avoids the confusion of the words child and parent.

For iterator vs iterable, I think the code is fine, but the docs don’t match what the code does. In particular:

  • sort_neighbors (counter to some doc_strings in BFS) is given an iterator as input -- not an iterable. The docs are pretty clear that it should return an iterator -- but its use allows an iterable. It seems like we should change the docs to match what the code enforces/assumes.
  • For both the BFS and DFS it seems good to have the docs say that the function takes an iterator of nodes as input. That is what we do when we apply the function to G.neighbors or G.predecessors.
  • For both the BFS and DFS it seems good to have the docs say that the function should return an iterable, since that is what we require. (we could ask for it to return an iterator and remove all our calls to iter(sort_... but that would be backward incompatible with only very marginal speedup of the code. Memory use is hard to know because thesort_neighbors function could easily construct a list from it's input, sort that list and then return an iterator over the list. So the whole "iterator" memory advantage is likely lost whenever sorting is involved.)

The important parts of this discussion (in my view) are:

  1. the input to the function is an iterator (can only be passed through once after which it is exhausted).
  2. we take iter(...) of the output so it can be an iterable (like a list) even though it allows memory use reduction if the function is "thin" and returns an iterator.
  3. the docs should reflect items 1) and 2), and I believe they currently don't.

@Schefflera-Arboricola
Copy link
Contributor Author

One high-level question I had: should the name be sort_neighbors instead of sort_children? I think this is actually a more precise description of what is being ordered, and would also be consistent with the bfs functions.

I chose to go with sort_children because all the variable names in dfs functions were like child, children etc.

@Schefflera-Arboricola Schefflera-Arboricola changed the title ENH : added sort_children to all functions in depth_first_search.py ENH : added sort_neighbors to all functions in depth_first_search.py Jan 8, 2024
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review adds a couple of small tweaks to what was said above.
Let me know if there are questions.

[Edit:
Two more ideas (of course after I have press the button for the review)

  • Is there a reason the parameter descriptions are different in BFS from those in DFS? Should we make them all the same?
  • It might be good to put in the docs the best/most common choice of function: sorted.
    Maybe like:
        A function that takes an iterator over nodes as the input, and
        returns an iterable of the same nodes with a custom ordering.
        A common choice is `sorted`.

networkx/algorithms/traversal/breadth_first_search.py Outdated Show resolved Hide resolved
networkx/algorithms/traversal/depth_first_search.py Outdated Show resolved Hide resolved
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, thanks @Schefflera-Arboricola ! My main comment is that I think we should make the newly added kwarg keyword-only!

@@ -15,7 +15,7 @@


@nx._dispatchable
def dfs_edges(G, source=None, depth_limit=None):
def dfs_edges(G, source=None, depth_limit=None, sort_neighbors=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these kwargs are new in the dfs fns, let's make them keyword only!

Suggested change
def dfs_edges(G, source=None, depth_limit=None, sort_neighbors=None):
def dfs_edges(G, source=None, depth_limit=None, *, sort_neighbors=None):

networkx/algorithms/traversal/depth_first_search.py Outdated Show resolved Hide resolved
@@ -101,7 +110,7 @@ def dfs_edges(G, source=None, depth_limit=None):


@nx._dispatchable
def dfs_tree(G, source=None, depth_limit=None):
def dfs_tree(G, source=None, depth_limit=None, sort_neighbors=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def dfs_tree(G, source=None, depth_limit=None, sort_neighbors=None):
def dfs_tree(G, source=None, depth_limit=None, *, sort_neighbors=None):

d[s].append(t)
return dict(d)


@nx._dispatchable
def dfs_postorder_nodes(G, source=None, depth_limit=None):
def dfs_postorder_nodes(G, source=None, depth_limit=None, sort_neighbors=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def dfs_postorder_nodes(G, source=None, depth_limit=None, sort_neighbors=None):
def dfs_postorder_nodes(G, source=None, depth_limit=None, *, sort_neighbors=None):

return (v for u, v, d in edges if d == "reverse")


@nx._dispatchable
def dfs_preorder_nodes(G, source=None, depth_limit=None):
def dfs_preorder_nodes(G, source=None, depth_limit=None, sort_neighbors=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def dfs_preorder_nodes(G, source=None, depth_limit=None, sort_neighbors=None):
def dfs_preorder_nodes(G, source=None, depth_limit=None, *, sort_neighbors=None):

return (v for u, v, d in edges if d == "forward")


@nx._dispatchable
def dfs_labeled_edges(G, source=None, depth_limit=None):
def dfs_labeled_edges(G, source=None, depth_limit=None, sort_neighbors=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def dfs_labeled_edges(G, source=None, depth_limit=None, sort_neighbors=None):
def dfs_labeled_edges(G, source=None, depth_limit=None, *, sort_neighbors=None):

Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
@Schefflera-Arboricola
Copy link
Contributor Author

My main comment is that I think we should make the newly added kwarg keyword-only!

@rossbar I don't think I understand the motivation for doing this. And will we let users use sort_neighbors as a positional arg after 2-3 version releases... when it's old?

Thanks!

@rossbar
Copy link
Contributor

rossbar commented Jan 21, 2024

I don't think I understand the motivation for doing this.

Thanks for the question! The main motivation is to enforce a policy that improves readability of code. For example, without keyword only arguments a user could write the following:

my_edges = nx.dfs_edges(G, 0, 2, sorted)

which will execute just fine. However, to someone reading this code, they might be confused about what all these function parameters represent and would be forced to look them up in the docs to really understand what's going on.

However, with the keyword-only restriction, the last parameter will have to be passed in via keyword, which arguably makes the code more readable:

my_edges = nx.dfs_edges(G, 0, 2, sort_neighbors=sorted)

Of course, it'd be better if the other two parameters were passed in with keywords too, but if we were to force them to be keyword-only that would break backwards compatibility. Since the sort_neighbors parameter is new for all the DFS functions, we have an opportunity to enforce this policy without breaking any compatibility!

And will we let users use sort_neighbors as a positional arg after 2-3 version releases... when it's old?

No, the intent would be to keep this policy in perpetuity. Again the only reason we limit ourselves to the sort_neighbors parameter is because it's a new parameter, so we can make it keyword-only without breaking any existing code!

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Schefflera-Arboricola , this LGTM!

@rossbar rossbar merged commit d6569ad into networkx:main Jan 24, 2024
38 of 39 checks passed
@jarrodmillman jarrodmillman added this to the 3.3 milestone Jan 24, 2024
@Schefflera-Arboricola
Copy link
Contributor Author

Thanks @rossbar and @dschult for reviewing!

cvanelteren pushed a commit to cvanelteren/networkx that referenced this pull request Apr 22, 2024
…py` (networkx#7196)

Added sort_children to all dfs funcs, added dfs tests, updated links in docs of dfs.

Also updated the parameter description for sort_neighbors in all traversal
functions that support it



---------

Co-authored-by: Ross Barnowski <rossbar@caltech.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants