-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ticket 1861 #460
Ticket 1861 #460
Conversation
@jakevdp: looks ok? |
This looks really nice -- thanks for the submission. It's great to see this being improved! I haven't yet pulled and tested it, but if it passes the current unit tests I think it's fine. @timleslie - I wonder if you'd be willing to implement the algorithm for an undirected graph as well? The current one is a bit of a place-holder, and is pretty slow. What it would take is to create another version of the Is this something you'd be willing to tackle and add to this pull request? I don't think it would be much added effort, and the payoff would be huge. Thanks for looking at this -- again, very nice work here! |
I'll have a look into the undirected version and see what I can come up with. It appears that it doesn't suffer from the same memory limitations due to a recursive algorithm that the directed version did, but it could potentially be simplified/consolidated to achieve better speed and memory performance. |
Right - it didn't use the same algorithm: currently it works by constructing a full depth-first tree from each node. This produces the correct result, but is definitely not optimal. I put the current algorithm in as an easy place-holder when I first wrote the module, but never ended up implementing a faster version. I think the algorithm you implemented would be significantly faster and more memory-efficient. Again, thanks for looking into this! |
@jakevdp I've just pushed code for the undirected graph. As expected it's faster than the previous version and has the nice property of not needing to allocate any new arrays (beyond the transposed matrix and labels). This removes the need for 4*N worth of integers that the previous algorithm used. |
Nice! I'll try to do a detailed review of this soon... I'm traveling and preparing for PyCon this week, so it might take a few days before I get to it. |
np.ndarray[ITYPE_t, ndim=1, mode='c'] labels): | ||
""" | ||
Uses an iterative version of Tarjan's algorithm to find the strongly connected components | ||
of a directed graph represented as a sparse matrix (scipy.sparse.csc_matrix or scipy.sparse.csr_matrix). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 suggests limiting code lines to 79 characters, and documentation to 72 characters - these lines should be shortened
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also several places below
Thanks for the fast fixes! I was just trying out the code, and comparing the results to other implementations. It's producing the correct results, and it's fast! One quick question: why are there fewer layers of loops in the undirected version? I would have thought the two implementations would be identical, except for the need to loop over two child arrays for the undirected graph. Is there a detail that I'm not seeing? |
By the way - sorry for letting this sit for so long! |
There's the same number of loops, but there's less conditional statements. This is because we don't have to do any work on the back tracking step of the depth first search. We can assign each node a label as we're traversing downwards, since the fact that we hit the node means it's part of the weakly connected component. |
Great! I think this looks ready to go. I'm +1 for merge. @pv, what do you think? |
BUG: sparse.csgraph: make connected_components_directed work on large graphs
Looks good, merged. |
backported to 0.12.x in 2cfd985 |
This pull request addresses ticket 1861
http://projects.scipy.org/scipy/ticket/1861
All tests pass and memory performance is orders of magnitude better.