Ticket 1861 #460

timleslie · 2013-03-08T09:58:32Z

This pull request addresses ticket 1861

http://projects.scipy.org/scipy/ticket/1861

All tests pass and memory performance is orders of magnitude better.

pv · 2013-03-10T19:36:39Z

jakevdp · 2013-03-11T00:54:41Z

This looks really nice -- thanks for the submission. It's great to see this being improved! I haven't yet pulled and tested it, but if it passes the current unit tests I think it's fine.

@timleslie - I wonder if you'd be willing to implement the algorithm for an undirected graph as well? The current one is a bit of a place-holder, and is pretty slow. What it would take is to create another version of the connected_components function which accepts the transposed version of indices and indptr. The loops through the indptr array would also loop through the second indptr array, checking those nodes as well (keeping in mind there could be some repeats).

Is this something you'd be willing to tackle and add to this pull request? I don't think it would be much added effort, and the payoff would be huge.

Thanks for looking at this -- again, very nice work here!

timleslie · 2013-03-11T01:58:19Z

I'll have a look into the undirected version and see what I can come up with. It appears that it doesn't suffer from the same memory limitations due to a recursive algorithm that the directed version did, but it could potentially be simplified/consolidated to achieve better speed and memory performance.

jakevdp · 2013-03-11T02:12:32Z

Right - it didn't use the same algorithm: currently it works by constructing a full depth-first tree from each node. This produces the correct result, but is definitely not optimal. I put the current algorithm in as an easy place-holder when I first wrote the module, but never ended up implementing a faster version. I think the algorithm you implemented would be significantly faster and more memory-efficient.

Again, thanks for looking into this!

timleslie · 2013-03-11T05:10:45Z

@jakevdp I've just pushed code for the undirected graph. As expected it's faster than the previous version and has the nice property of not needing to allocate any new arrays (beyond the transposed matrix and labels). This removes the need for 4*N worth of integers that the previous algorithm used.

jakevdp · 2013-03-11T13:55:44Z

Nice! I'll try to do a detailed review of this soon... I'm traveling and preparing for PyCon this week, so it might take a few days before I get to it.

jakevdp · 2013-03-22T02:23:05Z

scipy/sparse/csgraph/_traversal.pyx

+                                        np.ndarray[ITYPE_t, ndim=1, mode='c'] labels):
+    """
+    Uses an iterative version of Tarjan's algorithm to find the strongly connected components
+    of a directed graph represented as a sparse matrix (scipy.sparse.csc_matrix or scipy.sparse.csr_matrix).



PEP8 suggests limiting code lines to 79 characters, and documentation to 72 characters - these lines should be shortened

Also several places below

…lines

jakevdp · 2013-03-22T03:02:57Z

Thanks for the fast fixes! I was just trying out the code, and comparing the results to other implementations. It's producing the correct results, and it's fast! One quick question: why are there fewer layers of loops in the undirected version? I would have thought the two implementations would be identical, except for the need to loop over two child arrays for the undirected graph. Is there a detail that I'm not seeing?

jakevdp · 2013-03-22T03:03:19Z

By the way - sorry for letting this sit for so long!

timleslie · 2013-03-22T03:26:59Z

There's the same number of loops, but there's less conditional statements. This is because we don't have to do any work on the back tracking step of the depth first search. We can assign each node a label as we're traversing downwards, since the fact that we hit the node means it's part of the weakly connected component.

jakevdp · 2013-03-22T05:26:39Z

Great! I think this looks ready to go. I'm +1 for merge. @pv, what do you think?

BUG: sparse.csgraph: make connected_components_directed work on large graphs

pv · 2013-03-22T08:20:03Z

Looks good, merged.

rgommers · 2013-03-28T22:46:16Z

backported to 0.12.x in 2cfd985

timleslie added 4 commits March 8, 2013 09:32

Replace recursive algorithm with iterative algorithm

90daaef

Merge remote-tracking branch 'upstream/master' into ticket-1861

12dfbca

Update inline documentation

b478919

Add an extra test of strongly connected components

e449edc

Add an optimized WCC algorithm

9381880

timleslie added 2 commits March 11, 2013 05:22

Add another test for weakly connected components

c845892

Merge branch 'master' into ticket-1861

856b58c

jakevdp reviewed Mar 22, 2013
View reviewed changes

timleslie added 2 commits March 22, 2013 02:52

Address review comments: Use ITYPE instead of np.int32. Shorten long …

f746399

…lines

Merge branch 'master' into ticket-1861

5cec7d5

pv added a commit that referenced this pull request Mar 22, 2013

Merge pull request #460 from timleslie/ticket-1861

cc48790

BUG: sparse.csgraph: make connected_components_directed work on large graphs

pv merged commit cc48790 into scipy:master Mar 22, 2013

jakevdp mentioned this pull request Mar 26, 2013

add regression test for issue 1876 #484

Merged

This was referenced Apr 25, 2013

build fails because of non-existing files and auto generated files were not generated (Trac #449) #976

Closed

bug in csgraph.connected_components with connection='strong' (Trac #1876) #2395

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ticket 1861 #460

Ticket 1861 #460

timleslie commented Mar 8, 2013

pv commented Mar 10, 2013

jakevdp commented Mar 11, 2013

timleslie commented Mar 11, 2013

jakevdp commented Mar 11, 2013

timleslie commented Mar 11, 2013

jakevdp commented Mar 11, 2013

jakevdp Mar 22, 2013

jakevdp Mar 22, 2013

jakevdp commented Mar 22, 2013

jakevdp commented Mar 22, 2013

timleslie commented Mar 22, 2013

jakevdp commented Mar 22, 2013

pv commented Mar 22, 2013

rgommers commented Mar 28, 2013

Ticket 1861 #460

Ticket 1861 #460

Conversation

timleslie commented Mar 8, 2013

pv commented Mar 10, 2013

jakevdp commented Mar 11, 2013

timleslie commented Mar 11, 2013

jakevdp commented Mar 11, 2013

timleslie commented Mar 11, 2013

jakevdp commented Mar 11, 2013

jakevdp Mar 22, 2013

Choose a reason for hiding this comment

jakevdp Mar 22, 2013

Choose a reason for hiding this comment

jakevdp commented Mar 22, 2013

jakevdp commented Mar 22, 2013

timleslie commented Mar 22, 2013

jakevdp commented Mar 22, 2013

pv commented Mar 22, 2013

rgommers commented Mar 28, 2013