New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH : added sort_neighbors
to all functions in depth_first_search.py
#7196
Conversation
sort_children
to all functions in depth_first_search.py
sort_children
to all functions in depth_first_search.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice idea -- we have the feature in other places, but it fits well here too. !
Thanks!
Is the sort_children
function supposed to return an iterator, or an iterable?
The docs say iterator, but the code looks built for an iterable in that it takes iter
of the returned value. I'm not sure which is the better choice here -- what do the other functions that take a sorting neighbors function do?
Another approach for the default value is to make it change to an iterator.
if sort_children is None:
sort_children = iter
...
stack = [(start, sort_children(G[start]))]
That approach kind of requires that the return type be an iterator and not an iterable.
Think about what makes most sense to you.
Thanks for the review.
All the functions with # from breadth_first_search.py
if sort_neighbors is not None:
_neighbors = neighbors
neighbors = lambda node: iter(sort_neighbors(_neighbors(node))) But, converting an iterable to an iterator doesn't make much sense because the computation has already been done(while creating the iterable) and then we convert it into an iterator so that we can do the computations(again) later. But, we can say that it will be more memory efficient. I couldn't find a direct way of checking if a function returns an iterator or an iterable. Please let me know your thoughts on this. Right now, I've made the dfs functions so that if sort_children returns an iterable it is converted into iterator, like in bfs functions. And the documentation says Please let me know if I should add a note to clarify that if it returns an iterable then it would be converted to an iterator internally but it will be more efficient if Thank you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally +1 on this, and I think the implementation preserves generators as desired (see inline comments).
One high-level question I had: should the name be sort_neighbors
instead of sort_children
? I think this is actually a more precise description of what is being ordered, and would also be consistent with the bfs functions.
get_children = G.neighbors | ||
if sort_children is not None and callable(sort_children): | ||
get_children = lambda node: iter(sort_children(G.neighbors(node))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I think this makes sense. From the other discussion in the PR, it seems like there's a bit of confusion on whether get_children
should be an iterable or an iterator. IIUC, the confusion seems to arise from the fact that iterators (e.g. generators) are iterables (but not all iterables are iterators!).
So - if sort_children
here were e.g. a generator, then this code would generate a function get_children
that returned a generator (i.e. iterable) as desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the sort_children
function returns an iterable(e.g. list) then when we call the get_children
later, then sort_children(G.neighbors(node))
will first do all the computation for sorting and return an iterable, and then it will be converted into an iterator by iter()
function. But, when the sort_children
returns an iterator the computation will be done here for child in children:
, when the children
is used. So, that's why I think it would be slightly better if sort_children
returns an iterator, as compared to when it will return an iterable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the sort_neighbors
inputs within the BFS module, and the way the code here works, I think sort_children
should be called sort_neighbors
here even though other parts of the code call the neighbor a child or children. Those are internal names (I hope) so don't impact outside the function, and in the broader literature child and parent are not consistently one direction or the other in a directed graph. So, neighbor matches the BFS code within NetworkX and avoids the confusion of the words child and parent.
For iterator vs iterable, I think the code is fine, but the docs don’t match what the code does. In particular:
- sort_neighbors (counter to some doc_strings in BFS) is given an iterator as input -- not an iterable. The docs are pretty clear that it should return an iterator -- but its use allows an iterable. It seems like we should change the docs to match what the code enforces/assumes.
- For both the BFS and DFS it seems good to have the docs say that the function takes an iterator of nodes as input. That is what we do when we apply the function to
G.neighbors
orG.predecessors
. - For both the BFS and DFS it seems good to have the docs say that the function should return an iterable, since that is what we require. (we could ask for it to return an iterator and remove all our calls to
iter(sort_...
but that would be backward incompatible with only very marginal speedup of the code. Memory use is hard to know because thesort_neighbors
function could easily construct a list from it's input, sort that list and then return an iterator over the list. So the whole "iterator" memory advantage is likely lost whenever sorting is involved.)
The important parts of this discussion (in my view) are:
- the input to the function is an iterator (can only be passed through once after which it is exhausted).
- we take
iter(...)
of the output so it can be an iterable (like a list) even though it allows memory use reduction if the function is "thin" and returns an iterator. - the docs should reflect items 1) and 2), and I believe they currently don't.
I chose to go with |
sort_children
to all functions in depth_first_search.py
sort_neighbors
to all functions in depth_first_search.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This review adds a couple of small tweaks to what was said above.
Let me know if there are questions.
[Edit:
Two more ideas (of course after I have press the button for the review)
- Is there a reason the parameter descriptions are different in BFS from those in DFS? Should we make them all the same?
- It might be good to put in the docs the best/most common choice of function:
sorted
.
Maybe like:
A function that takes an iterator over nodes as the input, and
returns an iterable of the same nodes with a custom ordering.
A common choice is `sorted`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, thanks @Schefflera-Arboricola ! My main comment is that I think we should make the newly added kwarg keyword-only!
@@ -15,7 +15,7 @@ | |||
|
|||
|
|||
@nx._dispatchable | |||
def dfs_edges(G, source=None, depth_limit=None): | |||
def dfs_edges(G, source=None, depth_limit=None, sort_neighbors=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these kwargs are new in the dfs fns, let's make them keyword only!
def dfs_edges(G, source=None, depth_limit=None, sort_neighbors=None): | |
def dfs_edges(G, source=None, depth_limit=None, *, sort_neighbors=None): |
@@ -101,7 +110,7 @@ def dfs_edges(G, source=None, depth_limit=None): | |||
|
|||
|
|||
@nx._dispatchable | |||
def dfs_tree(G, source=None, depth_limit=None): | |||
def dfs_tree(G, source=None, depth_limit=None, sort_neighbors=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def dfs_tree(G, source=None, depth_limit=None, sort_neighbors=None): | |
def dfs_tree(G, source=None, depth_limit=None, *, sort_neighbors=None): |
d[s].append(t) | ||
return dict(d) | ||
|
||
|
||
@nx._dispatchable | ||
def dfs_postorder_nodes(G, source=None, depth_limit=None): | ||
def dfs_postorder_nodes(G, source=None, depth_limit=None, sort_neighbors=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def dfs_postorder_nodes(G, source=None, depth_limit=None, sort_neighbors=None): | |
def dfs_postorder_nodes(G, source=None, depth_limit=None, *, sort_neighbors=None): |
return (v for u, v, d in edges if d == "reverse") | ||
|
||
|
||
@nx._dispatchable | ||
def dfs_preorder_nodes(G, source=None, depth_limit=None): | ||
def dfs_preorder_nodes(G, source=None, depth_limit=None, sort_neighbors=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def dfs_preorder_nodes(G, source=None, depth_limit=None, sort_neighbors=None): | |
def dfs_preorder_nodes(G, source=None, depth_limit=None, *, sort_neighbors=None): |
return (v for u, v, d in edges if d == "forward") | ||
|
||
|
||
@nx._dispatchable | ||
def dfs_labeled_edges(G, source=None, depth_limit=None): | ||
def dfs_labeled_edges(G, source=None, depth_limit=None, sort_neighbors=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def dfs_labeled_edges(G, source=None, depth_limit=None, sort_neighbors=None): | |
def dfs_labeled_edges(G, source=None, depth_limit=None, *, sort_neighbors=None): |
Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
@rossbar I don't think I understand the motivation for doing this. And will we let users use Thanks! |
Thanks for the question! The main motivation is to enforce a policy that improves readability of code. For example, without keyword only arguments a user could write the following: my_edges = nx.dfs_edges(G, 0, 2, sorted) which will execute just fine. However, to someone reading this code, they might be confused about what all these function parameters represent and would be forced to look them up in the docs to really understand what's going on. However, with the keyword-only restriction, the last parameter will have to be passed in via keyword, which arguably makes the code more readable: my_edges = nx.dfs_edges(G, 0, 2, sort_neighbors=sorted) Of course, it'd be better if the other two parameters were passed in with keywords too, but if we were to force them to be keyword-only that would break backwards compatibility. Since the
No, the intent would be to keep this policy in perpetuity. Again the only reason we limit ourselves to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Schefflera-Arboricola , this LGTM!
…py` (networkx#7196) Added sort_children to all dfs funcs, added dfs tests, updated links in docs of dfs. Also updated the parameter description for sort_neighbors in all traversal functions that support it --------- Co-authored-by: Ross Barnowski <rossbar@caltech.edu>
[UPDATED]
ref #7132
sort_neighbors
to all functions indepth_first_search.py
sort_neighbors
docs inbreadth_first_search.py
callable
(used for ignoring invalidsort_neighbors
values) in both bfs and dfs functionsThank you :)