Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryError in scipy.sparse.csgraph.shortest_path #2526

Closed
glciampaglia opened this issue May 31, 2013 · 10 comments
Closed

MemoryError in scipy.sparse.csgraph.shortest_path #2526

glciampaglia opened this issue May 31, 2013 · 10 comments
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.sparse.csgraph

Comments

@glciampaglia
Copy link

Hi,

I want to compute the shortest path distances in a sparse directed graph with 2M nodes but scipy.csgraph.shortest_path immediately throws a MemoryError -- regardless of the chosen method. This is what I do:

In [2]: import numpy as np

In [3]: import scipy.sparse as sp

In [4]: import scipy.sparse.csgraph as csg

In [5]: a = np.load('adjacency_isa.npy')

In [6]: adj = sp.coo_matrix((a['weight'], (a['row'], a['col'])), (2351254,)*2)

In [7]: adj = adj.tocsr()

And this is what I get:

In [8]: D = csg.shortest_path(adj, directed=True)
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-8-a8a3285366c7> in <module>()
----> 1 D = csg.shortest_path(adj, directed=True)

/nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in 
scipy.sparse.csgraph._shortest_path.shortest_path 
(scipy/sparse/csgraph/_shortest_path.c:2117)()

/nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in 
scipy.sparse.csgraph._shortest_path.dijkstra 
(scipy/sparse/csgraph/_shortest_path.c:3948)()

MemoryError:

In [9]: D = csg.dijkstra(adj, directed=True)
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-9-dd917b3d30d9> in <module>()
----> 1 D = csg.dijkstra(adj, directed=True)

/nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in 
scipy.sparse.csgraph._shortest_path.dijkstra 
(scipy/sparse/csgraph/_shortest_path.c:3948)()

MemoryError:

In [10]: D = csg.floyd_warshall(adj, directed=True)
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-10-709122a21d70> in <module>()
----> 1 D = csg.floyd_warshall(adj, directed=True)

/nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in 
scipy.sparse.csgraph._shortest_path.floyd_warshall 
(scipy/sparse/csgraph/_shortest_path.c:2457)()

/nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_validation.pyc 
in validate_graph(csgraph, directed, dtype, csr_output, dense_output, 
copy_if_dense, copy_if_sparse, null_value_in, null_value_out, infinity_null, 
nan_null)
26 csgraph = csr_matrix(csgraph, dtype=DTYPE, copy=copy_if_sparse)
27 else:
---> 28 csgraph = csgraph_to_dense(csgraph, null_value=null_value_out)
29 elif np.ma.is_masked(csgraph):
30 if dense_output:

/nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_tools.so in 
scipy.sparse.csgraph._tools.csgraph_to_dense 
(scipy/sparse/csgraph/_tools.c:2984)()

MemoryError:

It seems that there are two distinct issues:

  1. floyd_warshall() calls validate_graph with csr_output = False (_shortest_path.pyx:218), causing the graph to be converted to dense. I believe this a bug.
  2. dijkstra creates a dense distance matrix (_shortest_path.pyx:409). I understand that one cannot make any assumption about the connectivity of the graph, and thus of the sparsity of the distance matrix itself; and of course I can get around this by calling dijkstra multiple times with a manageable chunk of indices, and discarding the values that are equal to inf, but it would be nonetheless nice if the code tried to do something similar, at least for the cases when one knows that most of the distances will be inf.
@rgommers
Copy link
Member

@jakevdp I guess you're interested in this one

@jakevdp
Copy link
Member

jakevdp commented May 31, 2013

The value returned by the shortest_path functions is an N x N matrix of the pairwise shortest distances. If you don't have enough memory to hold that N x N matrix, then the algorithm will fail with a memory error, regardless of what method you use.

From your question, though, it sounds like you don't care about the full N x N matrix of shortest distances. In that case, what is it that you want to compute?

@glciampaglia
Copy link
Author

Hi Jake,

I want to compute the N x N matrix of pairwise distances. My input is a weakly connected DAG so I expect the distance matrix to be fairly sparse as well, and thus to fit into memory.

My assumption was that passing a sparse matrix to the function I would have got back a sparse matrix, hence my puzzlement at seeing the creation of a dense matrix. I can agree that not being the general case this probably cannot be considered a defect, but rather a feature request. Or maybe the docstring could be updated to reflect this detail of the semantic of the function?

Cheers

@argriffing
Copy link
Contributor

I'm pretty sure that in the world of scipy sparse matrices, entries that are missing from the matrix are interpreted as zero, so a matrix consisting of mostly inf would not be treated as sparse. On the other hand, there seems to be work afoot with sparse bool matrices that is considering allowing both 'mostly true' and 'mostly false' matrices to be handled sparsely, so maybe there could be a 'mostly inf' matrix following this idea. Or maybe an alternate function could be defined that returns the N x N matrix of reciprocals of pairwise distances which could be sparse in the usual sense, if 1/inf is treated as zero.

@glciampaglia
Copy link
Author

Fair point. I didn't think of the problem with the Infs: in my application weights are > 0 so I was interpreting any off-diagonal 0 as an infinite distance. The idea of returning the reciprocal of the distances seems really neat. Btw, could you please point to the sparse bool matrices code? Thanks!

@argriffing
Copy link
Contributor

Btw, could you please point to the sparse bool matrices code?

It is under construction by @cowlicks for google summer of code.

@glciampaglia
Copy link
Author

Thanks!

@jakevdp
Copy link
Member

jakevdp commented Jun 2, 2013

Or maybe the docstring could be updated to reflect this detail of the semantic of the function?

The doc string of shortest_path is already pretty explicit that the return value is an ndarray of size N x N.

I agree that allowing the function to construct a sparse matrix might be useful in some special cases, but you'd probably want a separate function for that. If you'd like to implement it, I wouldn't mind 😄

@argriffing
Copy link
Contributor

For what it's worth, networkx will do all_pairs_dijkstra_path_length in a way that gives a sparse output. I often end up using networkx and dictionaries instead of scipy.sparse...

@glciampaglia
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.sparse.csgraph
Projects
None yet
Development

No branches or pull requests

4 participants