Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplify stack in dfs #6366

Merged
merged 5 commits into from May 27, 2023
Merged

simplify stack in dfs #6366

merged 5 commits into from May 27, 2023

Conversation

Tortar
Copy link
Contributor

@Tortar Tortar commented Jan 15, 2023

The stack in dfs can have only parent and children, the depth can be controlled from the outside of it, making the function slightly better in terms of time and space, but also readibility in my opinion :-)

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find it harder to reason about with the depth tracked separately outside of the stack, but that's admittedly pretty subjective. Out of curiosity what's the primary motivation for the proposed change?

@Tortar
Copy link
Contributor Author

Tortar commented Jan 21, 2023

From the speed point of view I measured from a tiny 3% better to a somewhat more relavant 15% better in the best case, so not that much. I think that memory-wise the stack this way consumes probably 1/3 less space, which is good given that it is the second data structure in this method in terms of memory occupied, but all in all, it is not much of a difference, the primary motivation was that I found it interesting to think about this optimization :D

@rossbar
Copy link
Contributor

rossbar commented Jan 21, 2023

I think that memory-wise the stack this way consumes probably 1/3 less space

Well, at least 1/3 the number of objects, which is not the same as memory since depth_now is just an integer. I was curious what this would look like in practice I tested with path_graph(10_000_000)1 and used memray for memory profiling:

On main

mem_main

This PR

mem_patch8

So that's a 400 MB reduction in total usage and 300 fewer allocations for this call (which should help with performance), but accounting for the other changes it ends up being about 300 MB saving for dfs_edges (out of 6.8 GB, so ~4%).

Sorry for the bad screengrab quality - to reproduce (though you may want to use fewer nodes for systems with <16GB ram):

  1. Create a script, e.g. dfs_memtest.py

    import networkx as nx
    G = nx.path_graph(10_000_000)
    e = list(nx.dfs_edges(G, source=0))
  2. Follow the memray instructions to create the flamegraph

Footnotes

  1. The reasoning here was that this graph would maximize the memory saving, as each node only ever has one child, and the nodes are also integers.

@Tortar
Copy link
Contributor Author

Tortar commented Jan 22, 2023

that's interesting, thanks for the flamegraph and the explanation on how to reproduce it! Do you think it is worth it? I don't consider the code less readable, it is actually clearer with this edit at what level we are in the dfs in my opinion

@rossbar
Copy link
Contributor

rossbar commented Jan 22, 2023

it is actually clearer with this edit

This is inherently subjective - there is no "right" way to do it. The reason I find it more readable is that the state (i.e. the current depth) is tracked with the current node, whereas the proposed change tracks the state separately. Though again, this is entirely subjective so we're unlikely to make any progress discussing on this front :)

A minor improvement to memory consumption is a more concrete justification for the proposed change. I'd like to hear what others think!

@Tortar
Copy link
Contributor Author

Tortar commented Jan 22, 2023

indeed I was too assertive, I meant to say that it is easier to track when we change levels in the dfs, but anyway I agree that the objective motivation for the change has to be evaluated in relation to the little improvement in memory/time!

@ImHereForTheCookies
Copy link
Contributor

This makes sense to me. 👍

@boothby
Copy link
Contributor

boothby commented May 26, 2023

FWIW I agree that a local value is preferable to loading the stack down with longer state tuples. I prefer a slightly different idiom than the one used here:

     while stack:
         parent, children = stack[-1]
         try:
             child = next(children)
             ....
         except StopIteration:
             stack.pop()

# without try/except

    while stack:
        parent, children = stack[-1]
        for child in children:
            ...
            break
        else:
            stack.pop()

In avoiding exceptions, the for,break,else tends to be a little faster.

@boothby
Copy link
Contributor

boothby commented May 27, 2023

@Tortar I have implemented my suggestion here: https://github.com/boothby/networkx/tree/tortar-patch-8 -- I would push the changes to your repo but the pre-commit hooks are currently faulty in your branch (not your fault; easily resolved with git rebase -i main) -- after rebasing I cannot push to your branch without --force which is awfully impolite.

Note that we only need to break when we grow the stack -- the for,break,else pattern is most effective in dfs_labeled_edges where non-tree edges are yielded "for free" within the iterator resulting in fewer touches on the stack:

        while stack:
            parent, children = stack[-1]
            for child in children:
                if child in visited:
                    yield parent, child, "nontree"
                else:
                    yield parent, child, "forward"
                    visited.add(child)
                    if depth_now < depth_limit:
                        stack.append((child, iter(G[child])))
                        depth_now += 1
                        break
                    else:
                        yield parent, child, "reverse-depth_limit"
            else:
                stack.pop()
                depth_now -= 1
                if stack:
                    yield stack[-1][0], parent, "reverse"

@Tortar
Copy link
Contributor Author

Tortar commented May 27, 2023

thanks @boothby for the improvements! will push them in a moment and make you a co-author of the pull :-)

Co-authored-by: boothby <kelly.r.boothby@gmail.com>
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also #6714 which looks to fix #6479...

I think we can/should go ahead and merge these changes as a first step -- and then address the strange behavior seen with depth-limited search.

I'm approving this and I'll have @boothby press the green button. There are a lot of PRs related to this and the order may impact merge conflicts. Best to have control in those cases I guess. :}

Thanks!

@boothby boothby merged commit 8410c37 into networkx:main May 27, 2023
34 checks passed
@jarrodmillman jarrodmillman added this to the 3.2 milestone Jun 4, 2023
Alex-Markham pushed a commit to Alex-Markham/networkx that referenced this pull request Oct 13, 2023
* simplify stack in dfs

* Update depth_first_search.py

* Update depth_first_search.py

* Add suggestions

Co-authored-by: boothby <kelly.r.boothby@gmail.com>

* Update depth_first_search.py

---------

Co-authored-by: boothby <kelly.r.boothby@gmail.com>
dschult pushed a commit to BrunoBaldissera/networkx that referenced this pull request Oct 23, 2023
* simplify stack in dfs

* Update depth_first_search.py

* Update depth_first_search.py

* Add suggestions

Co-authored-by: boothby <kelly.r.boothby@gmail.com>

* Update depth_first_search.py

---------

Co-authored-by: boothby <kelly.r.boothby@gmail.com>
cvanelteren pushed a commit to cvanelteren/networkx that referenced this pull request Apr 22, 2024
* simplify stack in dfs

* Update depth_first_search.py

* Update depth_first_search.py

* Add suggestions

Co-authored-by: boothby <kelly.r.boothby@gmail.com>

* Update depth_first_search.py

---------

Co-authored-by: boothby <kelly.r.boothby@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants