-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_graph_is_connected consumes too much memory #5024
Comments
Did you observe this for a sparse input, or a dense input? The two lead to completely different code paths. |
I've observed this only for the dense case. |
I believe this is the offending line: _, node_to_add = np.where(graph[connected_components_matrix] != 0) It allocates ~4 temporary arrays which are on the same order as the size of the input. |
The scipy stuff all works only on sparse matrices, right? A cython version for checking whether a dense matrix is connected is probably pretty short. Is it worth it? @rudimeier is this really the bottleneck? How large is your matrix? I would imagine that even if we improve this particular part, there are other parts that will take more memory. |
I'm working on it. |
Hi,
_graph_is_connected
fromsklearn.manifold.spectral_embedding_
consumes about 5 times as much memory as the input matrix. Could this be improved?networkx.is_connected
does not need any addional memory, maybe you may have a look at how they do ithttps://networkx.github.io/documentation/latest/_modules/networkx/algorithms/components/connected.html#is_connected
(I haven't yet carefully checked if
networkx.is_conneted
does the same like sklearn!)The text was updated successfully, but these errors were encountered: