_graph_is_connected consumes too much memory #5024

rudimeier · 2015-07-23T18:35:06Z

Hi,

_graph_is_connected from sklearn.manifold.spectral_embedding_ consumes about 5 times as much memory as the input matrix. Could this be improved?

networkx.is_connected does not need any addional memory, maybe you may have a look at how they do it
https://networkx.github.io/documentation/latest/_modules/networkx/algorithms/components/connected.html#is_connected

(I haven't yet carefully checked if networkx.is_conneted does the same like sklearn!)

The text was updated successfully, but these errors were encountered:

jakevdp · 2015-07-24T03:27:30Z

Did you observe this for a sparse input, or a dense input? The two lead to completely different code paths.

rudimeier · 2015-07-24T07:15:52Z

I've observed this only for the dense case.

jakevdp · 2015-07-24T22:52:23Z

I believe this is the offending line:

_, node_to_add = np.where(graph[connected_components_matrix] != 0)

It allocates ~4 temporary arrays which are on the same order as the size of the input.
It's one of the hazards of using NumPy, unfortunately: the fast & terse way to do vectorized computations often obtains speed at the expense of memory use.

amueller · 2015-07-30T21:01:22Z

The scipy stuff all works only on sparse matrices, right? A cython version for checking whether a dense matrix is connected is probably pretty short. Is it worth it?

@rudimeier is this really the bottleneck? How large is your matrix? I would imagine that even if we improve this particular part, there are other parts that will take more memory.

AlexandreAbraham · 2015-10-19T09:48:05Z

I'm working on it.

AlexandreAbraham mentioned this issue Oct 19, 2015

[MRG+1] Optimize sklearn.manifold._graph_is_connected #5443

Merged

GaelVaroquaux closed this as completed in #5443 Oct 19, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_graph_is_connected consumes too much memory #5024

_graph_is_connected consumes too much memory #5024

rudimeier commented Jul 23, 2015

jakevdp commented Jul 24, 2015

rudimeier commented Jul 24, 2015

jakevdp commented Jul 24, 2015

amueller commented Jul 30, 2015

AlexandreAbraham commented Oct 19, 2015

_graph_is_connected consumes too much memory #5024

_graph_is_connected consumes too much memory #5024

Comments

rudimeier commented Jul 23, 2015

jakevdp commented Jul 24, 2015

rudimeier commented Jul 24, 2015

jakevdp commented Jul 24, 2015

amueller commented Jul 30, 2015

AlexandreAbraham commented Oct 19, 2015