Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_graph_is_connected consumes too much memory #5024

Closed
rudimeier opened this issue Jul 23, 2015 · 5 comments · Fixed by #5443
Closed

_graph_is_connected consumes too much memory #5024

rudimeier opened this issue Jul 23, 2015 · 5 comments · Fixed by #5443

Comments

@rudimeier
Copy link

Hi,

_graph_is_connected from sklearn.manifold.spectral_embedding_ consumes about 5 times as much memory as the input matrix. Could this be improved?

networkx.is_connected does not need any addional memory, maybe you may have a look at how they do it
https://networkx.github.io/documentation/latest/_modules/networkx/algorithms/components/connected.html#is_connected

(I haven't yet carefully checked if networkx.is_conneted does the same like sklearn!)

@jakevdp
Copy link
Member

jakevdp commented Jul 24, 2015

Did you observe this for a sparse input, or a dense input? The two lead to completely different code paths.

@rudimeier
Copy link
Author

I've observed this only for the dense case.

@jakevdp
Copy link
Member

jakevdp commented Jul 24, 2015

I believe this is the offending line:

_, node_to_add = np.where(graph[connected_components_matrix] != 0)

It allocates ~4 temporary arrays which are on the same order as the size of the input.
It's one of the hazards of using NumPy, unfortunately: the fast & terse way to do vectorized computations often obtains speed at the expense of memory use.

@amueller
Copy link
Member

The scipy stuff all works only on sparse matrices, right? A cython version for checking whether a dense matrix is connected is probably pretty short. Is it worth it?

@rudimeier is this really the bottleneck? How large is your matrix? I would imagine that even if we improve this particular part, there are other parts that will take more memory.

@AlexandreAbraham
Copy link
Contributor

I'm working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants