Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ticket 1861 #460

Merged
merged 9 commits into from
Mar 22, 2013
Merged

Ticket 1861 #460

merged 9 commits into from
Mar 22, 2013

Conversation

timleslie
Copy link
Contributor

This pull request addresses ticket 1861

http://projects.scipy.org/scipy/ticket/1861

All tests pass and memory performance is orders of magnitude better.

@pv
Copy link
Member

pv commented Mar 10, 2013

@jakevdp: looks ok?

@jakevdp
Copy link
Member

jakevdp commented Mar 11, 2013

This looks really nice -- thanks for the submission. It's great to see this being improved! I haven't yet pulled and tested it, but if it passes the current unit tests I think it's fine.

@timleslie - I wonder if you'd be willing to implement the algorithm for an undirected graph as well? The current one is a bit of a place-holder, and is pretty slow. What it would take is to create another version of the connected_components function which accepts the transposed version of indices and indptr. The loops through the indptr array would also loop through the second indptr array, checking those nodes as well (keeping in mind there could be some repeats).

Is this something you'd be willing to tackle and add to this pull request? I don't think it would be much added effort, and the payoff would be huge.

Thanks for looking at this -- again, very nice work here!

@timleslie
Copy link
Contributor Author

I'll have a look into the undirected version and see what I can come up with. It appears that it doesn't suffer from the same memory limitations due to a recursive algorithm that the directed version did, but it could potentially be simplified/consolidated to achieve better speed and memory performance.

@jakevdp
Copy link
Member

jakevdp commented Mar 11, 2013

Right - it didn't use the same algorithm: currently it works by constructing a full depth-first tree from each node. This produces the correct result, but is definitely not optimal. I put the current algorithm in as an easy place-holder when I first wrote the module, but never ended up implementing a faster version. I think the algorithm you implemented would be significantly faster and more memory-efficient.

Again, thanks for looking into this!

@timleslie
Copy link
Contributor Author

@jakevdp I've just pushed code for the undirected graph. As expected it's faster than the previous version and has the nice property of not needing to allocate any new arrays (beyond the transposed matrix and labels). This removes the need for 4*N worth of integers that the previous algorithm used.

@jakevdp
Copy link
Member

jakevdp commented Mar 11, 2013

Nice! I'll try to do a detailed review of this soon... I'm traveling and preparing for PyCon this week, so it might take a few days before I get to it.

np.ndarray[ITYPE_t, ndim=1, mode='c'] labels):
"""
Uses an iterative version of Tarjan's algorithm to find the strongly connected components
of a directed graph represented as a sparse matrix (scipy.sparse.csc_matrix or scipy.sparse.csr_matrix).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 suggests limiting code lines to 79 characters, and documentation to 72 characters - these lines should be shortened

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also several places below

@jakevdp
Copy link
Member

jakevdp commented Mar 22, 2013

Thanks for the fast fixes! I was just trying out the code, and comparing the results to other implementations. It's producing the correct results, and it's fast! One quick question: why are there fewer layers of loops in the undirected version? I would have thought the two implementations would be identical, except for the need to loop over two child arrays for the undirected graph. Is there a detail that I'm not seeing?

@jakevdp
Copy link
Member

jakevdp commented Mar 22, 2013

By the way - sorry for letting this sit for so long!

@timleslie
Copy link
Contributor Author

There's the same number of loops, but there's less conditional statements. This is because we don't have to do any work on the back tracking step of the depth first search. We can assign each node a label as we're traversing downwards, since the fact that we hit the node means it's part of the weakly connected component.

@jakevdp
Copy link
Member

jakevdp commented Mar 22, 2013

Great! I think this looks ready to go. I'm +1 for merge. @pv, what do you think?

pv added a commit that referenced this pull request Mar 22, 2013
BUG: sparse.csgraph: make connected_components_directed work on large graphs
@pv pv merged commit cc48790 into scipy:master Mar 22, 2013
@pv
Copy link
Member

pv commented Mar 22, 2013

Looks good, merged.

@rgommers
Copy link
Member

backported to 0.12.x in 2cfd985

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants