Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

Merge pull request #119 from jakevdp/sparse-graph

Adds ``csgraph`` as a submodule under scipy.sparse.

A few of these functions were written for scikit-learn, then this grew
to a complete submodule of common graph algorithms, all using sparse
matrices as data structure.
  • Loading branch information...
commit 6a2d89505a07ebd9461e12e6be7f2bf55fd3e772 2 parents 92f62aa + 0c111e9
@rgommers rgommers authored
Showing with 45,862 additions and 41 deletions.
  1. +4 −0 .gitattributes
  2. +1 −0  doc/API.rst.txt
  3. +25 −0 doc/release/0.11.0-notes.rst
  4. +1 −0  doc/source/index.rst
  5. +1 −0  doc/source/sparse.csgraph.rst
  6. +212 −0 doc/source/tutorial/csgraph.rst
  7. +1 −0  doc/source/tutorial/index.rst
  8. +9 −5 scipy/sparse/__init__.py
  9. +4 −2 scipy/sparse/bento.info
  10. +11 −0 scipy/sparse/csgraph/SConscript
  11. +2 −0  scipy/sparse/csgraph/SConstruct
  12. +165 −0 scipy/sparse/csgraph/__init__.py
  13. +6 −12 scipy/sparse/{csgraph.py → csgraph/_components.py}
  14. +135 −0 scipy/sparse/csgraph/_laplacian.py
  15. +6,131 −0 scipy/sparse/csgraph/_min_spanning_tree.c
  16. +142 −0 scipy/sparse/csgraph/_min_spanning_tree.pyx
  17. +16,151 −0 scipy/sparse/csgraph/_shortest_path.c
  18. +1,239 −0 scipy/sparse/csgraph/_shortest_path.pyx
  19. +9,102 −0 scipy/sparse/csgraph/_tools.c
  20. +414 −0 scipy/sparse/csgraph/_tools.pyx
  21. +10,942 −0 scipy/sparse/csgraph/_traversal.c
  22. +681 −0 scipy/sparse/csgraph/_traversal.pyx
  23. +55 −0 scipy/sparse/csgraph/_validation.py
  24. +9 −0 scipy/sparse/csgraph/bento.info
  25. +11 −0 scipy/sparse/csgraph/parameters.pxi
  26. +28 −0 scipy/sparse/csgraph/setup.py
  27. +16 −0 scipy/sparse/csgraph/setupscons.py
  28. +47 −0 scipy/sparse/csgraph/tests/test_connected_components.py
  29. +54 −0 scipy/sparse/csgraph/tests/test_conversions.py
  30. +33 −0 scipy/sparse/csgraph/tests/test_graph_components.py
  31. +27 −0 scipy/sparse/csgraph/tests/test_graph_laplacian.py
  32. +140 −0 scipy/sparse/csgraph/tests/test_shortest_path.py
  33. +17 −0 scipy/sparse/csgraph/tests/test_spanning_tree.py
  34. +44 −0 scipy/sparse/csgraph/tests/test_traversal.py
  35. +1 −0  scipy/sparse/setup.py
  36. +1 −0  scipy/sparse/setupscons.py
  37. +0 −22 scipy/sparse/tests/test_spfuncs.py
View
4 .gitattributes
@@ -11,6 +11,10 @@ scipy/io/matlab/mio_utils.c binary
scipy/io/matlab/mio5_utils.c binary
scipy/io/matlab/streams.c binary
scipy/signal/spectral.c binary
+scipy/sparse/csgraph/_min_spanning_tree.c binary
+scipy/sparse/csgraph/_shortest_path.c binary
+scipy/sparse/csgraph/_tools.c binary
+scipy/sparse/csgraph/_traversal.c binary
scipy/spatial/ckdtree.c binary
scipy/spatial/qhull.c binary
scipy/special/lambertw.c binary
View
1  doc/API.rst.txt
@@ -117,6 +117,7 @@ change is made.
* scipy.sparse
- linalg
+ - csgraph
* scipy.spatial
View
25 doc/release/0.11.0-notes.rst
@@ -86,9 +86,33 @@ A function for creating Pascal matrices, ``scipy.linalg.pascal``, was added.
``misc.logsumexp`` now takes an optional ``axis`` keyword argument.
+Sparse Graph Submodule
+----------------------
+The new submodule :mod:`scipy.sparse.csgraph` implements a number of efficient
+graph algorithms for graphs stored as sparse adjacency matrices. Available
+routines are:
+
+ - :func:`connected_components` - determine connected components of a graph
+ - :func:`laplacian` - compute the laplacian of a graph
+ - :func:`shortest_path` - compute the shortest path between points on a
+ positive graph
+ - :func:`dijkstra` - use Dijkstra's algorithm for shortest path
+ - :func:`floyd_warshall` - use the Floyd-Warshall algorithm for
+ shortest path
+ - :func:`breadth_first_order` - compute a breadth-first order of nodes
+ - :func:`depth_first_order` - compute a depth-first order of nodes
+ - :func:`breadth_first_tree` - construct the breadth-first tree from
+ a given node
+ - :func:`depth_first_tree` - construct a depth-first tree from a given node
+ - :func:`minimum_spanning_tree` - construct the minimum spanning
+ tree of a graph
+
Deprecated features
===================
+``scipy.sparse.cs_graph_components`` has been made a part of the sparse graph
+submodule, and renamed to ``scipy.sparse.csgraph.connected_components``.
+Calling the former routine will result in a deprecation warning.
``scipy.misc.radon`` has been deprecated. A more full-featured radon transform
can be found in scikits-image.
@@ -122,3 +146,4 @@ added.
Authors
=======
+Jake Vanderplas <vanderplas@hail.astro.washington.edu>, sparse graph submodule
View
1  doc/source/index.rst
@@ -38,6 +38,7 @@ Reference
signal
sparse
sparse.linalg
+ sparse.csgraph
spatial
special
stats
View
1  doc/source/sparse.csgraph.rst
@@ -0,0 +1 @@
+.. automodule:: scipy.sparse.csgraph
View
212 doc/source/tutorial/csgraph.rst
@@ -0,0 +1,212 @@
+Compressed Sparse Graph Routines `scipy.sparse.csgraph`
+=======================================================
+
+.. sectionauthor:: Jake Vanderplas <vanderplas@astro.washington.edu>
+
+.. currentmodule: scipy.sparse.csgraph
+
+
+Example: Word Ladders
+---------------------
+
+A `Word Ladder <http://en.wikipedia.org/wiki/Word_ladder>`_ is a word game
+invented by Lewis Carroll in which players find paths between words by
+switching one letter at a time. For example, one can link "ape" and "man"
+in the following way::
+
+.. math::
+ {\rm ape \to apt \to ait \to bit \to big \to bag \to mag \to man}
+
+Note that each step involves changing just one letter of the word. This is
+just one possible path from "ape" to "man", but is it the shortest possible
+path? If we desire to find the shortest word ladder path between two given
+words, the sparse graph submodule can help.
+
+First we need a list of valid words. Many operating systems have such a list
+built-in. For example, on linux, a word list can often be found at one of the
+following locations::
+
+ /usr/share/dict
+ /var/lib/dict
+
+Another easy source for words are the scrabble word lists available at various
+sites around the internet (search with your favorite search engine). We'll
+first create this list. The system word lists consist of a file with one
+word per line. The following should be modified to use the particular word
+list you have available::
+
+ >>> word_list = open('/usr/share/dict/words').readlines()
+ >>> word_list = map(str.strip, word_list)
+
+We want to look at words of length 3, so let's select just those words of the
+correct length. We'll also eliminate words which start with upper-case
+(proper nouns) or contain non alpha-numeric characters like apostrophes and
+hyphens. Finally, we'll make sure everything is lower-case for comparison
+later::
+
+ >>> word_list = [word for word in word_list if len(word) == 3]
+ >>> word_list = [word for word in word_list if word[0].islower()]
+ >>> word_list = [word for word in word_list if word.isalpha()]
+ >>> word_list = map(str.lower, word_list)
+ >>> len(word_list)
+ 586
+
+Now we have a list of 586 valid three-letter words (the exact number may
+change depending on the particular list used). Each of these words will
+become a node in our graph, and we will create edges connecting the nodes
+associated with each pair of words which differs by only one letter.
+
+There are efficient ways to do this, and inefficient ways to do this. To
+do this as efficiently as possible, we're going to use some sophisticated
+numpy array manipulation:
+
+ >>> import numpy as np
+ >>> word_list = np.asarray(word_list)
+ >>> word_list.dtype
+ dtype('|S3')
+ >>> word_list.sort() # sort for quick searching later
+
+We have an array where each entry is three bytes. We'd like to find all pairs
+where exactly one byte is different. We'll start by converting each word to
+a three-dimensional vector:
+
+ >>> word_bytes = np.ndarray((word_list.size, word_list.itemsize),
+ ... dtype='int8',
+ ... buffer=word_list.data)
+ >>> word_bytes.shape
+ (586, 3)
+
+Now we'll use the
+`Hamming distance <http://en.wikipedia.org/wiki/Hamming_distance>`_
+between each point to determine which pairs of words are connected.
+The Hamming distance measures the fraction of entries between two vectors
+which differ: any two words with a hamming distance equal to :math:`1/N`,
+where :math:`N` is the number of letters, are connected in the word ladder::
+
+ >>> from scipy.spatial.distance import pdist, squareform
+ >>> from scipy.sparse import csr_matrix
+ >>> hamming_dist = pdist(word_bytes, metric='hamming')
+ >>> graph = csr_matrix(squareform(hamming_dist < 1.5 / word_list.itemsize))
+
+When comparing the distances, we don't use an equality because this can be
+unstable for floating point values. The inequality produces the desired
+result as long as no two entries of the word list are identical. Now that our
+graph is set up, we'll use a shortest path search to find the path between
+any two words in the graph::
+
+ >>> i1 = word_list.searchsorted('ape')
+ >>> i2 = word_list.searchsorted('man')
+ >>> word_list[i1]
+ 'ape'
+ >>> word_list[i2]
+ 'man'
+
+We need to check that these match, because if the words are not in the list
+that will not be the case. Now all we need is to find the shortest path
+between these two indices in the graph. We'll use dijkstra's algorithm,
+because it allows us to find the path for just one node::
+
+ >>> from scipy.sparse.csgraph import dijkstra
+ >>> distances, predecessors = dijkstra(graph, indices=i1,
+ ... return_predecessors=True)
+ >>> print distances[i2]
+ 5.0
+
+So we see that the shortest path between 'ape' and 'man' contains only
+five steps. We can use the predecessors returned by the algorithm to
+reconstruct this path::
+
+ >>> path = []
+ >>> i = i2
+ >>> while i != i1:
+ >>> path.append(word_list[i])
+ >>> i = predecessors[i]
+ >>> path.append(word_list[i1])
+ >>> print path[::-1]
+ ['ape', 'apt', 'opt', 'oat', 'mat', 'man']
+
+This is three fewer links than our initial example: the path from ape to man
+is only five steps.
+
+Using other tools in the module, we can answer other questions. For example,
+are there three-letter words which are not linked in a word ladder? This
+is a question of connected components in the graph::
+
+ >>> from scipy.sparse.csgraph import connected_components
+ >>> N_components, component_list = connected_components(graph)
+ >>> print N_components
+ 15
+
+In this particular sample of three-letter words, there are 15 connected
+components: that is, 15 distinct sets of words with no paths between the
+sets. How many words are in each of these sets? We can learn this from
+the list of components::
+
+ >>> [np.sum(component_list == i) for i in range(15)]
+ [571, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
+
+There is one large connected set, and 14 smaller ones. Let's look at the
+words in the smaller ones::
+
+ >>> [list(word_list[np.where(component_list == i)]) for i in range(1, 15)]
+ [['aha'],
+ ['chi'],
+ ['ebb'],
+ ['ems', 'emu'],
+ ['gnu'],
+ ['ism'],
+ ['khz'],
+ ['nth'],
+ ['ova'],
+ ['qua'],
+ ['ugh'],
+ ['ups'],
+ ['urn'],
+ ['use']]
+
+These are all the three-letter words which do not connect to others via a word
+ladder.
+
+We might also be curious about which words are maximally separated. Which
+two words take the most links to connect? We can determine this by computing
+the matrix of all shortest paths. Note that by convention, the
+distance between two non-connected points is reported to be infinity, so
+we'll need to remove these before finding the maximum::
+
+ >>> distances, predecessors = dijkstra(graph, return_predecessors=True)
+ >>> np.max(distances[~np.isinf(distances)])
+ 13.0
+
+So there is at least one pair of words which takes 13 steps to get from one
+to the other! Let's determine which these are::
+
+ >>> i1, i2 = np.where(distances == 13)
+ >>> zip(word_list[i1], word_list[i2])
+ [('imp', 'ohm'),
+ ('imp', 'ohs'),
+ ('ohm', 'imp'),
+ ('ohm', 'ump'),
+ ('ohs', 'imp'),
+ ('ohs', 'ump'),
+ ('ump', 'ohm'),
+ ('ump', 'ohs')]
+
+We see that there are two pairs of words which are maximally separated from
+each other: 'imp' and 'ump' on one hand, and 'ohm' and 'ohs' on the other
+hand. We can find the connecting list in the same way as above::
+
+ >>> path = []
+ >>> i = i2[0]
+ >>> while i != i1[0]:
+ >>> path.append(word_list[i])
+ >>> i = predecessors[i1[0], i]
+ >>> path.append(word_list[i1[0]])
+ >>> print path[::-1]
+ ['imp', 'amp', 'asp', 'ask', 'ark', 'are', 'aye', 'rye', 'roe', 'woe', 'woo', 'who', 'oho', 'ohm']
+
+This gives us the path we desired to see.
+
+Word ladders are just one potential application of scipy's fast graph
+algorithms for sparse matrices. Graph theory makes appearances in many
+areas of mathematics, data analysis, and machine learning. The sparse graph
+tools are flexible enough to handle many of these situations.
View
1  doc/source/tutorial/index.rst
@@ -17,6 +17,7 @@ SciPy Tutorial
signal
linalg
arpack
+ csgraph
stats
ndimage
io
View
14 scipy/sparse/__init__.py
@@ -61,12 +61,14 @@
isspmatrix_coo
isspmatrix_dia
-Graph algorithms:
+Submodules
+----------
.. autosummary::
:toctree: generated/
- cs_graph_components -- Determine connected components of a graph
+ csgraph - Compressed sparse graph routines
+ linalg - sparse linear algebra routines
Exceptions
----------
@@ -171,7 +173,8 @@
"""
# Original code by Travis Oliphant.
-# Modified and extended by Ed Schofield, Robert Cimrman, and Nathan Bell.
+# Modified and extended by Ed Schofield, Robert Cimrman,
+# Nathan Bell, and Jake Vanderplas.
from base import *
from csr import *
@@ -181,11 +184,12 @@
from coo import *
from dia import *
from bsr import *
-from csgraph import *
-
from construct import *
from extract import *
+# for backward compatibility with v0.10. This function is marked as deprecated
+from csgraph import cs_graph_components
+
#from spfuncs import *
__all__ = filter(lambda s:not s.startswith('_'),dir())
View
6 scipy/sparse/bento.info
@@ -1,8 +1,10 @@
Recurse:
linalg,
- sparsetools
+ sparsetools,
+ csgraph
Library:
Packages:
linalg,
- sparsetools
+ sparsetools,
+ csgraph
View
11 scipy/sparse/csgraph/SConscript
@@ -0,0 +1,11 @@
+# vim:syntax=python
+from os.path import join as pjoin
+
+from numscons import GetNumpyEnvironment, CheckF77Clib
+
+env = GetNumpyEnvironment(ARGUMENTS)
+
+env.NumpyPythonExtension('_shortest_path', source = ['_shortest_path.c'])
+env.NumpyPythonExtension('_traversal', source = ['_traversal.c'])
+env.NumpyPythonExtension('_min_spanning_tree', source = ['_min_spanning_tree.c'])
+env.NumpyPythonExtension('_tools', source = ['_tools.c'])
View
2  scipy/sparse/csgraph/SConstruct
@@ -0,0 +1,2 @@
+from numscons import GetInitEnvironment
+GetInitEnvironment(ARGUMENTS).DistutilsSConscript('SConscript')
View
165 scipy/sparse/csgraph/__init__.py
@@ -0,0 +1,165 @@
+r"""
+==============================================================
+Compressed Sparse Graph Routines (:mod:`scipy.sparse.csgraph`)
+==============================================================
+
+.. currentmodule:: scipy.sparse.csgraph
+
+Fast graph algorithms based on sparse matrix representations.
+
+Contents
+========
+
+.. autosummary::
+ :toctree: generated/
+
+ connected_components -- determine connected components of a graph
+ laplacian -- compute the laplacian of a graph
+ shortest_path -- compute the shortest path between points on a positive graph
+ dijkstra -- use Dijkstra's algorithm for shortest path
+ floyd_warshall -- use the Floyd-Warshall algorithm for shortest path
+ bellman_ford -- use the Bellman-Ford algorithm for shortest path
+ johnson -- use Johnson's algorithm for shortest path
+ breadth_first_order -- compute a breadth-first order of nodes
+ depth_first_order -- compute a depth-first order of nodes
+ breadth_first_tree -- construct the breadth-first tree from a given node
+ depth_first_tree -- construct a depth-first tree from a given node
+ minimum_spanning_tree -- construct the minimum spanning tree of a graph
+
+Graph Representations
+=====================
+This module uses graphs which are stored in a matrix format. A
+graph with N nodes can be represented by an (N x N) adjacency matrix G.
+If there is a connection from node i to node j, then G[i, j] = w, where
+w is the weight of the connection. For nodes i and j which are
+not connected, the value depends on the representation:
+
+- for dense array representations, non-edges are represented by
+ G[i, j] = 0, infinity, or NaN.
+
+- for dense masked representations (of type np.ma.MaskedArray), non-edges
+ are represented by masked values. This can be useful when graphs with
+ zero-weight edges are desired.
+
+- for sparse array representations, non-edges are represented by
+ non-entries in the matrix. This sort of sparse representation also
+ allows for edges with zero weights.
+
+As a concrete example, imagine that you would like to represent the following
+undirected graph::
+
+ G
+
+ (0)
+ / \
+ 1 2
+ / \
+ (2) (1)
+
+This graph has three nodes, where node 0 and 1 are connected by an edge of
+weight 2, and nodes 0 and 2 are connected by an edge of weight 1.
+We can construct the dense, masked, and sparse representations as follows,
+keeping in mind that an undirected graph is represented by a symmetric matrix::
+
+ >>> G_dense = np.array([[0, 2, 1],
+ ... [2, 0, 0],
+ ... [1, 0, 0]])
+ >>> G_masked = np.ma.masked_values(G_dense, 0)
+ >>> from scipy.sparse import csr_matrix
+ >>> G_sparse = csr_matrix(G_dense)
+
+This becomes more difficult when zero edges are significant. For example,
+consider the situation when we slightly modify the above graph::
+
+ G2
+
+ (0)
+ / \
+ 0 2
+ / \
+ (2) (1)
+
+This is identical to the previous graph, except nodes 0 and 2 are connected
+by an edge of zero weight. In this case, the dense representation above
+leads to ambiguities: how can non-edges be represented if zero is a meaningful
+value? In this case, either a masked or sparse representation must be used
+to eliminate the ambiguity::
+
+ >>> G2_data = np.array([[np.inf, 2, 0 ],
+ ... [2, np.inf, np.inf],
+ ... [0, np.inf, np.inf]])
+ >>> G2_masked = np.ma.masked_invalid(G2_data)
+ >>> from scipy.sparse.csgraph import csgraph_from_dense
+ >>> # G2_sparse = csr_matrix(G2_data) would give the wrong result
+ >>> G2_sparse = csgraph_from_dense(G2_data, null_value=np.inf)
+ >>> G2_sparse.data
+ array([ 2., 0., 2., 0.])
+
+Here we have used a utility routine from the csgraph submodule in order to
+convert the dense representation to a sparse representation which can be
+understood by the algorithms in submodule. By viewing the data array, we
+can see that the zero values are explicitly encoded in the graph.
+
+Directed vs. Undirected
+-----------------------
+Matrices may represent either directed or undirected graphs. This is
+specified throughout the csgraph module by a boolean keyword. Graphs are
+assumed to be directed by default. In a directed graph, traversal from node
+i to node j can be accomplished over the edge G[i, j], but not the edge
+G[j, i]. In a non-directed graph, traversal from node i to node j can be
+accomplished over either G[i, j] or G[j, i]. If both edges are not null,
+and the two have unequal weights, then the smaller of the two is used.
+Note that a symmetric matrix will represent an undirected graph, regardless
+of whether the 'directed' keyword is set to True or False. In this case,
+using ``directed=True`` generally leads to more efficient computation.
+
+The routines in this module accept as input either scipy.sparse representations
+(csr, csc, or lil format), masked representations, or dense representations
+with non-edges indicated by zeros, infinities, and NaN entries.
+"""
+
+__docformat__ = "restructuredtext en"
+
+__all__ = ['cs_graph_components',
+ 'connected_components',
+ 'laplacian',
+ 'shortest_path',
+ 'floyd_warshall',
+ 'dijkstra',
+ 'bellman_ford',
+ 'johnson',
+ 'breadth_first_order',
+ 'depth_first_order',
+ 'breadth_first_tree',
+ 'depth_first_tree',
+ 'minimum_spanning_tree',
+ 'construct_dist_matrix',
+ 'reconstruct_path',
+ 'csgraph_from_dense',
+ 'csgraph_masked_from_dense',
+ 'csgraph_to_dense',
+ 'csgraph_to_masked',
+ 'NegativeCycleError']
+
+from _components import cs_graph_components
+from _laplacian import laplacian
+from _shortest_path import shortest_path, floyd_warshall, dijkstra,\
+ bellman_ford, johnson, NegativeCycleError
+from _traversal import breadth_first_order, depth_first_order, \
+ breadth_first_tree, depth_first_tree, connected_components
+from _min_spanning_tree import minimum_spanning_tree
+from _tools import construct_dist_matrix, reconstruct_path,\
+ csgraph_from_dense, csgraph_to_dense, csgraph_masked_from_dense,\
+ csgraph_from_masked
+
+from numpy import deprecate as _deprecate
+cs_graph_components = _deprecate(cs_graph_components,
+ message=("In the future, use "
+ "csgraph.connected_components. Note "
+ "that this new function has a "
+ "slightly different interface: see "
+ "the docstring for more "
+ "information."))
+
+from numpy.testing import Tester
+test = Tester().test
View
18 scipy/sparse/csgraph.py → scipy/sparse/csgraph/_components.py
@@ -1,19 +1,14 @@
-"""Compressed Sparse graph algorithms"""
-
-__docformat__ = "restructuredtext en"
-
-__all__ = ['cs_graph_components']
-
import numpy as np
-from sparsetools import cs_graph_components as _cs_graph_components
+from scipy.sparse.sparsetools import cs_graph_components as _cs_graph_components
-from csr import csr_matrix
-from base import isspmatrix
+from scipy.sparse.csr import csr_matrix
+from scipy.sparse.base import isspmatrix
_msg0 = 'x must be a symmetric square matrix!'
_msg1 = _msg0 + '(has shape %s)'
+
def cs_graph_components(x):
"""
Determine connected components of a graph stored as a compressed
@@ -26,7 +21,7 @@ def cs_graph_components(x):
Parameters
-----------
- x: ndarray-like, 2 dimensions, or sparse matrix
+ x: array_like or sparse matrix, 2 dimensions
The adjacency matrix of the graph. Only the upper triangular part
is used.
@@ -49,8 +44,7 @@ def cs_graph_components(x):
Examples
--------
- >>> from scipy.sparse import cs_graph_components
- >>> import numpy as np
+ >>> from scipy.sparse.csgraph import connected_components
>>> D = np.eye(4)
>>> D[0,1] = D[1,0] = 1
>>> cs_graph_components(D)
View
135 scipy/sparse/csgraph/_laplacian.py
@@ -0,0 +1,135 @@
+"""
+Laplacian of a compressed-sparse graph
+"""
+
+# Authors: Aric Hagberg <hagberg@lanl.gov>
+# Gael Varoquaux <gael.varoquaux@normalesup.org>
+# Jake Vanderplas <vanderplas@astro.washington.edu>
+# License: BSD
+
+import numpy as np
+from scipy.sparse import isspmatrix
+
+###############################################################################
+# Graph laplacian
+def laplacian(csgraph, normed=False, return_diag=False):
+ """ Return the Laplacian matrix of a directed graph. For non-symmetric
+ graphs the out-degree is used in the computation.
+
+ Parameters
+ ----------
+ csgraph: array_like or sparse matrix, 2 dimensions
+ compressed-sparse graph, with shape (N, N).
+ directed: bool, optional
+ If True (default), then csgraph represents a directed graph.
+ normed: bool, optional
+ If True, then compute normalized Laplacian.
+ return_diag: bool, optional
+ If True, then return diagonal as well as laplacian.
+
+ Returns
+ -------
+ lap: ndarray
+ The N x N laplacian matrix of graph.
+
+ diag: ndarray
+ The length-N diagonal of the laplacian matrix.
+ diag is returned only if return_diag is True.
+
+ Notes
+ -----
+ The Laplacian matrix of a graph is sometimes referred to as the
+ "Kirchoff matrix" or the "admittance matrix", and is useful in many
+ parts of spectral graph theory. In particular, the eigen-decomposition
+ of the laplacian matrix can give insight into many properties of the graph.
+
+ For non-symmetric directed graphs, the laplacian is computed using the
+ out-degree of each node.
+
+ Examples
+ --------
+ >>> from scipy.sparse import csgraph
+ >>> G = np.arange(5) * np.arange(5)[:, np.newaxis]
+ >>> G
+ array([[ 0, 0, 0, 0, 0],
+ [ 0, 1, 2, 3, 4],
+ [ 0, 2, 4, 6, 8],
+ [ 0, 3, 6, 9, 12],
+ [ 0, 4, 8, 12, 16]])
+ >>> csgraph.laplacian(G, normed=False)
+ array([[ 0, 0, 0, 0, 0],
+ [ 0, 9, -2, -3, -4],
+ [ 0, -2, 16, -6, -8],
+ [ 0, -3, -6, 21, -12],
+ [ 0, -4, -8, -12, 24]])
+ """
+ if csgraph.ndim != 2 or csgraph.shape[0] != csgraph.shape[1]:
+ raise ValueError('csgraph must be a square matrix or array')
+
+ if normed and (np.issubdtype(csgraph.dtype, np.int)
+ or np.issubdtype(csgraph.dtype, np.uint)):
+ csgraph = csgraph.astype(np.float)
+
+ if isspmatrix(csgraph):
+ return _laplacian_sparse(csgraph, normed=normed,
+ return_diag=return_diag)
+ else:
+ return _laplacian_dense(csgraph, normed=normed,
+ return_diag=return_diag)
+
+
+def _laplacian_sparse(graph, normed=False, return_diag=False):
+ n_nodes = graph.shape[0]
+ if not graph.format == 'coo':
+ lap = (-graph).tocoo()
+ else:
+ lap = -graph.copy()
+ diag_mask = (lap.row == lap.col)
+ if not diag_mask.sum() == n_nodes:
+ # The sparsity pattern of the matrix has holes on the diagonal,
+ # we need to fix that
+ diag_idx = lap.row[diag_mask]
+
+ lap = lap.tolil()
+
+ diagonal_holes = list(set(range(n_nodes)).difference(
+ diag_idx))
+ lap[diagonal_holes, diagonal_holes] = 1
+ lap = lap.tocoo()
+ diag_mask = (lap.row == lap.col)
+ lap.data[diag_mask] = 0
+ w = -np.asarray(lap.sum(axis=1)).squeeze()
+ if normed:
+ w = np.sqrt(w)
+ w_zeros = (w == 0)
+ w[w_zeros] = 1
+ lap.data /= w[lap.row]
+ lap.data /= w[lap.col]
+ lap.data[diag_mask] = (1 - w_zeros).astype(lap.data.dtype)
+ else:
+ lap.data[diag_mask] = w[lap.row[diag_mask]]
+ if return_diag:
+ return lap, w
+ return lap
+
+
+def _laplacian_dense(graph, normed=False, return_diag=False):
+ n_nodes = graph.shape[0]
+ lap = -graph.copy()
+
+ # set diagonal to zero
+ lap.flat[::n_nodes + 1] = 0
+ w = -lap.sum(axis=0)
+ if normed:
+ w = np.sqrt(w)
+ w_zeros = (w == 0)
+ w[w_zeros] = 1
+ lap /= w
+ lap /= w[:, np.newaxis]
+ lap.flat[::n_nodes + 1] = 1 - w_zeros
+ else:
+ lap.flat[::n_nodes + 1] = w
+ if return_diag:
+ return lap, w
+ return lap
+
View
6,131 scipy/sparse/csgraph/_min_spanning_tree.c
6,131 additions, 0 deletions not shown
View
142 scipy/sparse/csgraph/_min_spanning_tree.pyx
@@ -0,0 +1,142 @@
+# Author: Jake Vanderplas -- <vanderplas@astro.washington.edu>
+# License: BSD, (C) 2011
+
+import numpy as np
+cimport numpy as np
+
+from scipy.sparse import csr_matrix, isspmatrix_csc, isspmatrix
+from _validation import validate_graph
+
+include 'parameters.pxi'
+
+def minimum_spanning_tree(csgraph, overwrite=False):
+ r"""Return a minimum spanning tree of an undirected graph
+
+ A minimum spanning tree is a graph consisting of the subset of edges
+ which together connect all connected nodes, while minimizing the total
+ sum of weights on the edges. This is computed using the Kruskal algorithm.
+
+ Parameters
+ ----------
+ csgraph: array_like or sparse matrix, 2 dimensions
+ The N x N matrix representing an undirected graph over N nodes
+ (see notes below).
+ overwrite: bool, optional
+ if true, then parts of the input graph will be overwritten for
+ efficiency.
+
+ Returns
+ -------
+ span_tree: csr matrix
+ The N x N compressed-sparse representation of the undirected minimum
+ spanning tree over the input (see notes below).
+
+ Notes
+ -----
+ This routine uses undirected graphs as input and output. That is, if
+ graph[i, j] and graph[j, i] are both zero, then nodes i and j do not
+ have an edge connecting them. If either is nonzero, then the two are
+ connected by the minimum nonzero value of the two.
+
+ Examples
+ --------
+ The following example shows the computation of a minimum spanning tree
+ over a simple four-component graph::
+
+ input graph minimum spanning tree
+
+ (0) (0)
+ / \ /
+ 3 8 3
+ / \ /
+ (3)---5---(1) (3)---5---(1)
+ \ / /
+ 6 2 2
+ \ / /
+ (2) (2)
+
+ It is easy to see from inspection that the minimum spanning tree involves
+ removing the edges with weights 8 and 6. In compressed sparse
+ representation, the solution looks like this:
+
+ >>> from scipy.sparse import csr_matrix
+ >>> from scipy.sparse.csgraph import minimum_spanning_tree
+ >>> X = csr_matrix([[0, 8, 0, 3],
+ ... [0, 0, 2, 5],
+ ... [0, 0, 0, 6],
+ ... [0, 0, 0, 0]])
+ >>> Tcsr = minimum_spanning_tree(X)
+ >>> Tcsr.toarray().astype(int)
+ array([[0, 0, 0, 3],
+ [0, 0, 2, 5],
+ [0, 0, 0, 0],
+ [0, 0, 0, 0]])
+ """
+ global NULL_IDX
+
+ csgraph = validate_graph(csgraph, True, DTYPE, dense_output=False,
+ copy_if_sparse=not overwrite)
+ cdef int N = csgraph.shape[0]
+
+ data = csgraph.data
+ indices = csgraph.indices
+ indptr = csgraph.indptr
+
+ components = np.arange(N, dtype=ITYPE)
+ predecessors = np.empty(N, dtype=ITYPE)
+ predecessors.fill(NULL_IDX)
+
+ i_sort = np.argsort(data)
+ row_indices = np.zeros(len(data), dtype=ITYPE)
+
+ _min_spanning_tree(data, indices, indptr, i_sort,
+ row_indices, predecessors, components)
+
+ sp_tree = csr_matrix((data, indices, indptr), (N, N))
+ sp_tree.eliminate_zeros()
+
+ return sp_tree
+
+
+cdef _min_spanning_tree(np.ndarray[DTYPE_t, ndim=1, mode='c'] data,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] col_indices,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] indptr,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] i_sort,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] row_indices,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] predecessors,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] components):
+ # Work-horse routine for computing minimum spanning tree using
+ # Kruskal's algorithm. By separating this code here, we get more
+ # efficient indexing.
+ global NULL_IDX
+ cdef unsigned int i, j, V1, V2
+ cdef DTYPE_t E
+
+ # Arrange `row_indices` to contain the row index of each value in `data`.
+ # Note that the array `col_indices` already contains the column index.
+ for i from 0 <= i < predecessors.shape[0]:
+ for j from indptr[i] <= j < indptr[i + 1]:
+ row_indices[j] = i
+
+ # step through the edges from smallest to largest.
+ # V1 and V2 are the vertices, and E is the edge weight connecting them.
+ for i from 0 <= i < i_sort.shape[0]:
+ j = i_sort[i]
+ V1 = row_indices[j]
+ V2 = col_indices[j]
+ E = data[j]
+
+ # progress upward to the head node of each subtree
+ while predecessors[V1] != NULL_IDX:
+ V1 = predecessors[V1]
+ while predecessors[V2] != NULL_IDX:
+ V2 = predecessors[V2]
+
+ # if the subtrees are different, then we connect them and keep the
+ # edge. Otherwise, we remove the edge: it duplicates one already
+ # in the spanning tree.
+ if components[V1] != components[V2]:
+ predecessors[V2] = V1
+ else:
+ data[j] = 0
+
View
16,151 scipy/sparse/csgraph/_shortest_path.c
16,151 additions, 0 deletions not shown
View
1,239 scipy/sparse/csgraph/_shortest_path.pyx
@@ -0,0 +1,1239 @@
+"""
+Routines for performing shortest-path graph searches
+
+The main interface is in the function :func:`shortest_path`. This
+calls cython routines that compute the shortest path using
+the Floyd-Warshall algorithm, Dijkstra's algorithm with Fibonacci Heaps,
+the Bellman-Ford algorithm, or Johnson's Algorithm.
+"""
+
+# Author: Jake Vanderplas -- <vanderplas@astro.washington.edu>
+# License: BSD, (C) 2011
+import warnings
+
+import numpy as np
+cimport numpy as np
+
+from scipy.sparse import csr_matrix, isspmatrix, isspmatrix_csr, isspmatrix_csc
+from _validation import validate_graph
+
+cimport cython
+
+from libc.stdlib cimport malloc, free
+
+include 'parameters.pxi'
+
+
+class NegativeCycleError(Exception):
+ def __init__(self, message=''):
+ Exception.__init__(self, message)
+
+
+def shortest_path(csgraph, method='auto',
+ directed=True,
+ return_predecessors=False,
+ unweighted=False,
+ overwrite=False):
+ """
+ Perform a shortest-path graph search on a positive directed or
+ undirected graph.
+
+ Parameters
+ ----------
+ csgraph : array, matrix, or sparse matrix, 2 dimensions
+ The N x N array of distances representing the input graph.
+ method : string ['auto'|'FW'|'D'], optional
+ Algorithm to use for shortest paths. Options are:
+
+ 'auto' -- (default) select the best among 'FW', 'D', 'BF', or 'J'
+ based on the input data.
+ 'FW' -- Floyd-Warshall algorithm. Computational cost is
+ approximately ``O[N^3]``. The input csgraph will be
+ converted to a dense representation.
+ 'D' -- Dijkstra's algorithm with Fibonacci heaps. Computational
+ cost is approximately ``O[N(N*k + N*log(N))]``, where
+ ``k`` is the average number of connected edges per node.
+ The input csgraph will be converted to a csr
+ representation.
+ 'BF' -- Bellman-Ford algorithm. This algorithm can be used when
+ weights are negative. If a negative cycle is encountered,
+ an error will be raised. Computational cost is
+ approximately ``O[N(N^2 k)]``, where ``k`` is the average
+ number of connected edges per node. The input csgraph will
+ be converted to a csr representation.
+ 'J' -- Johnson's algorithm. Like the Bellman-Ford algorithm,
+ Johnson's algorithm is designed for use when the weights
+ are negative. It combines the Bellman-Ford algorithm
+ with Dijkstra's algorithm for faster computation.
+
+ directed : bool, optional
+ If True (default), then find the shortest path on a directed graph:
+ only move from point i to point j along paths csgraph[i, j].
+ If False, then find the shortest path on an undirected graph: the
+ algorithm can progress from point i to j along csgraph[i, j] or
+ csgraph[j, i]
+ return_predecessors : bool, optional
+ If True, return the size (N, N) predecesor matrix
+ unweighted : bool, optional
+ If True, then find unweighted distances. That is, rather than finding
+ the path between each point such that the sum of weights is minimized,
+ find the path such that the number of edges is minimized.
+ overwrite : bool, optional
+ If True, overwrite csgraph with the result. This applies only if
+ method == 'FW' and csgraph is a dense, c-ordered array with
+ dtype=float64.
+
+ Returns
+ -------
+ dist_matrix : ndarray
+ The N x N matrix of distances between graph nodes. dist_matrix[i,j]
+ gives the shortest distance from point i to point j along the graph.
+
+ predecessors : ndarray
+ Returned only if return_predecessors == True.
+ The N x N matrix of predecessors, which can be used to reconstruct
+ the shortest paths. Row i of the predecessor matrix contains
+ information on the shortest paths from point i: each entry
+ predecessors[i, j] gives the index of the previous node in the
+ path from point i to point j. If no path exists between point
+ i and j, then predecessors[i, j] = -9999
+
+ Raises
+ ------
+ NegativeCycleError:
+ if there are negative cycles in the graph
+
+ Notes
+ -----
+ As currently implemented, Dijkstra's algorithm and Johnson's algorithm
+ do not work for graphs with direction-dependent distances when
+ directed == False. i.e., if csgraph[i,j] and csgraph[j,i] are non-equal
+ edges, method='D' may yield an incorrect result.
+ """
+ validate_graph(csgraph, directed, DTYPE)
+
+ if method == 'auto':
+ # guess fastest method based on number of nodes and edges
+ N = csgraph.shape[0]
+ if isspmatrix(csgraph):
+ Nk = csgraph.nnz
+ else:
+ Nk = np.sum(csgraph > 0)
+
+ if Nk < N * N / 4:
+ if ((isspmatrix(csgraph)
+ and np.any(csgraph.data < 0)) or np.any(csgraph < 0)):
+ method = 'J'
+ else:
+ method = 'D'
+ else:
+ method = 'FW'
+
+ if method == 'FW':
+ return floyd_warshall(csgraph, directed,
+ return_predecessors=return_predecessors,
+ unweighted=unweighted,
+ overwrite=overwrite)
+
+ elif method == 'D':
+ return dijkstra(csgraph, directed,
+ return_predecessors=return_predecessors,
+ unweighted=unweighted)
+
+ elif method == 'BF':
+ return bellman_ford(csgraph, directed,
+ return_predecessors=return_predecessors,
+ unweighted=unweighted)
+
+ elif method == 'J':
+ return johnson(csgraph, directed,
+ return_predecessors=return_predecessors,
+ unweighted=unweighted)
+
+ else:
+ raise ValueError("unrecognized method '%s'" % method)
+
+
+def floyd_warshall(csgraph, directed=True,
+ return_predecessors=False,
+ unweighted=False,
+ overwrite=False):
+ """
+ Compute the shortest path lengths using the Floyd-Warshall algorithm
+
+ Parameters
+ ----------
+ csgraph : array, matrix, or sparse matrix, 2 dimensions
+ The N x N array of distances representing the input graph.
+ directed : bool, optional
+ If True (default), then find the shortest path on a directed graph:
+ only move from point i to point j along paths csgraph[i, j].
+ If False, then find the shortest path on an undirected graph: the
+ algorithm can progress from point i to j along csgraph[i, j] or
+ csgraph[j, i]
+ return_predecessors : bool, optional
+ If True, return the size (N, N) predecesor matrix
+ unweighted : bool, optional
+ If True, then find unweighted distances. That is, rather than finding
+ the path between each point such that the sum of weights is minimized,
+ find the path such that the number of edges is minimized.
+ overwrite : bool, optional
+ If True, overwrite csgraph with the result. This applies only if
+ csgraph is a dense, c-ordered array with dtype=float64.
+
+ Returns
+ -------
+ dist_matrix : ndarray
+ The N x N matrix of distances between graph nodes. dist_matrix[i,j]
+ gives the shortest distance from point i to point j along the graph.
+
+ predecessors : ndarray
+ Returned only if return_predecessors == True.
+ The N x N matrix of predecessors, which can be used to reconstruct
+ the shortest paths. Row i of the predecessor matrix contains
+ information on the shortest paths from point i: each entry
+ predecessors[i, j] gives the index of the previous node in the
+ path from point i to point j. If no path exists between point
+ i and j, then predecessors[i, j] = -9999
+
+ Raises
+ ------
+ NegativeCycleError:
+ if there are negative cycles in the graph
+ """
+ dist_matrix = validate_graph(csgraph, directed, DTYPE,
+ csr_output=False,
+ copy_if_dense=not overwrite)
+
+ if unweighted:
+ dist_matrix[~np.isinf(dist_matrix)] = 1
+
+ if return_predecessors:
+ predecessor_matrix = np.empty(dist_matrix.shape,
+ dtype=ITYPE, order='C')
+ else:
+ predecessor_matrix = np.empty((0, 0), dtype=ITYPE)
+
+ _floyd_warshall(dist_matrix,
+ predecessor_matrix,
+ int(directed))
+
+ if np.any(dist_matrix.diagonal() < 0):
+ raise NegativeCycleError("Negative cycle in nodes %s"
+ % np.where(dist_matrix.diagonal() < 0)[0])
+
+ if return_predecessors:
+ return dist_matrix, predecessor_matrix
+ else:
+ return dist_matrix
+
+
+@cython.boundscheck(False)
+cdef void _floyd_warshall(
+ np.ndarray[DTYPE_t, ndim=2, mode='c'] dist_matrix,
+ np.ndarray[ITYPE_t, ndim=2, mode='c'] predecessor_matrix,
+ int directed=0):
+ # dist_matrix : in/out
+ # on input, the graph
+ # on output, the matrix of shortest paths
+ # dist_matrix should be a [N,N] matrix, such that dist_matrix[i, j]
+ # is the distance from point i to point j. Zero-distances imply that
+ # the points are not connected.
+ global NULL_IDX
+ cdef int N = dist_matrix.shape[0]
+ assert dist_matrix.shape[1] == N
+
+ cdef unsigned int i, j, k
+
+ cdef DTYPE_t infinity = np.inf
+ cdef DTYPE_t d_ijk
+
+ #----------------------------------------------------------------------
+ # Initialize distance matrix
+ # - set non-edges to infinity
+ # - set diagonal to zero
+ # - symmetrize matrix if non-directed graph is desired
+ dist_matrix[dist_matrix == 0] = infinity
+ dist_matrix.flat[::N + 1] = 0
+ if not directed:
+ for i from 0 <= i < N:
+ for j from i + 1 <= j < N:
+ if dist_matrix[j, i] <= dist_matrix[i, j]:
+ dist_matrix[i, j] = dist_matrix[j, i]
+ else:
+ dist_matrix[j, i] = dist_matrix[i, j]
+
+ #----------------------------------------------------------------------
+ # Initialize predecessor matrix
+ # - check matrix size
+ # - initialize diagonal and all non-edges to NULL
+ # - initialize all edges to the row index
+ cdef int store_predecessors = False
+
+ if predecessor_matrix.size > 0:
+ store_predecessors = True
+ assert predecessor_matrix.shape[0] == N
+ assert predecessor_matrix.shape[1] == N
+ predecessor_matrix.fill(NULL_IDX)
+ i_edge = np.where(~np.isinf(dist_matrix))
+ predecessor_matrix[i_edge] = i_edge[0]
+ predecessor_matrix.flat[::N + 1] = NULL_IDX
+
+ # Now perform the Floyd-Warshall algorithm.
+ # In each loop, this finds the shortest path from point i
+ # to point j using intermediate nodes 0 ... k
+ if store_predecessors:
+ for k from 0 <= k < N:
+ for i from 0 <= i < N:
+ if dist_matrix[i, k] == infinity:
+ continue
+ for j from 0 <= j < N:
+ d_ijk = dist_matrix[i, k] + dist_matrix[k, j]
+ if d_ijk < dist_matrix[i, j]:
+ dist_matrix[i, j] = d_ijk
+ predecessor_matrix[i, j] = predecessor_matrix[k, j]
+ else:
+ for k from 0 <= k < N:
+ for i from 0 <= i < N:
+ if dist_matrix[i, k] == infinity:
+ continue
+ for j from 0 <= j < N:
+ d_ijk = dist_matrix[i, k] + dist_matrix[k, j]
+ if d_ijk < dist_matrix[i, j]:
+ dist_matrix[i, j] = d_ijk
+
+
+
+def dijkstra(csgraph, directed=True, indices=None,
+ return_predecessors=False,
+ unweighted=False):
+ """
+ Dijkstra algorithm using Fibonacci Heaps
+
+ Parameters
+ ----------
+ csgraph : array, matrix, or sparse matrix, 2 dimensions
+ The N x N array of non-negative distances representing the input graph.
+ directed : bool, optional
+ If True (default), then find the shortest path on a directed graph:
+ only move from point i to point j along paths csgraph[i, j].
+ If False, then find the shortest path on an undirected graph: the
+ algorithm can progress from point i to j along csgraph[i, j] or
+ csgraph[j, i]
+ indices : array-like or integer, optional
+ if specified, only compute the paths for the points at the given
+ indices.
+ return_predecessors : bool, optional
+ If True, return the size (N, N) predecesor matrix
+ unweighted : bool, optional
+ If True, then find unweighted distances. That is, rather than finding
+ the path between each point such that the sum of weights is minimized,
+ find the path such that the number of edges is minimized.
+
+ Returns
+ -------
+ dist_matrix : ndarray
+ The matrix of distances between graph nodes. dist_matrix[i,j]
+ gives the shortest distance from point i to point j along the graph.
+
+ predecessors : ndarray
+ Returned only if return_predecessors == True.
+ The matrix of predecessors, which can be used to reconstruct
+ the shortest paths. Row i of the predecessor matrix contains
+ information on the shortest paths from point i: each entry
+ predecessors[i, j] gives the index of the previous node in the
+ path from point i to point j. If no path exists between point
+ i and j, then predecessors[i, j] = -9999
+
+ Notes
+ -----
+ As currently implemented, Dijkstra's algorithm does not work for
+ graphs with direction-dependent distances when directed == False.
+ i.e., if csgraph[i,j] and csgraph[j,i] are not equal and
+ both are nonzero, setting directed=False will not yield the correct
+ result.
+
+ Also, this routine does not work for graphs with negative
+ distances. Negative distances can lead to infinite cycles that must
+ be handled by specialized algorithms such as Bellman-Ford's algorithm
+ or Johnson's algorithm.
+ """
+ global NULL_IDX
+
+ #------------------------------
+ # validate csgraph and convert to csr matrix
+ csgraph = validate_graph(csgraph, directed, DTYPE,
+ dense_output=False)
+
+ if np.any(csgraph.data < 0):
+ warnings.warn("Graph has negative weights: dijkstra will give "
+ "inaccurate results if the graph contains negative "
+ "cycles. Consider johnson or bellman_ford.")
+
+ N = csgraph.shape[0]
+
+ #------------------------------
+ # intitialize/validate indices
+ if indices is None:
+ indices = np.arange(N, dtype=ITYPE)
+ return_shape = indices.shape + (N,)
+ else:
+ indices = np.array(indices, order='C', dtype=ITYPE, copy=True)
+ return_shape = indices.shape + (N,)
+ indices = np.atleast_1d(indices).reshape(-1)
+ indices[indices < 0] += N
+ if np.any(indices < 0) or np.any(indices >= N):
+ raise ValueError("indices out of range 0...N")
+
+ #------------------------------
+ # initialize dist_matrix for output
+ dist_matrix = np.zeros((len(indices), N), dtype=DTYPE)
+ dist_matrix.fill(np.inf)
+ dist_matrix[np.arange(len(indices)), indices] = 0
+
+ #------------------------------
+ # initialize predecessors for output
+ if return_predecessors:
+ predecessor_matrix = np.empty((len(indices), N), dtype=ITYPE)
+ predecessor_matrix.fill(NULL_IDX)
+ else:
+ predecessor_matrix = np.empty((0, N), dtype=ITYPE)
+
+ if unweighted:
+ csr_data = np.ones(csgraph.data.shape)
+ else:
+ csr_data = csgraph.data
+
+ if directed:
+ _dijkstra_directed(indices,
+ csr_data, csgraph.indices, csgraph.indptr,
+ dist_matrix, predecessor_matrix)
+ else:
+ csgraphT = csgraph.T.tocsr()
+ if unweighted:
+ csrT_data = csr_data
+ else:
+ csrT_data = csgraphT.data
+ _dijkstra_undirected(indices,
+ csr_data, csgraph.indices, csgraph.indptr,
+ csrT_data, csgraphT.indices, csgraphT.indptr,
+ dist_matrix, predecessor_matrix)
+
+ if return_predecessors:
+ return (dist_matrix.reshape(return_shape),
+ predecessor_matrix.reshape(return_shape))
+ else:
+ return dist_matrix.reshape(return_shape)
+
+
+cdef _dijkstra_directed(
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] source_indices,
+ np.ndarray[DTYPE_t, ndim=1, mode='c'] csr_weights,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csr_indices,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csr_indptr,
+ np.ndarray[DTYPE_t, ndim=2, mode='c'] dist_matrix,
+ np.ndarray[ITYPE_t, ndim=2, mode='c'] pred):
+ cdef unsigned int Nind = dist_matrix.shape[0]
+ cdef unsigned int N = dist_matrix.shape[1]
+ cdef unsigned int i, k, j_source, j_current
+
+ cdef DTYPE_t weight
+
+ cdef int return_pred = (pred.size > 0)
+
+ cdef FibonacciHeap heap
+ cdef FibonacciNode *v, *current_node
+ cdef FibonacciNode* nodes = <FibonacciNode*> malloc(N *
+ sizeof(FibonacciNode))
+
+ for i from 0 <= i < Nind:
+ j_source = source_indices[i]
+
+ for k from 0 <= k < N:
+ initialize_node(&nodes[k], k)
+
+ dist_matrix[i, j_source] = 0
+ heap.min_node = NULL
+ insert_node(&heap, &nodes[j_source])
+
+ while heap.min_node:
+ v = remove_min(&heap)
+ v.state = SCANNED
+
+ for j from csr_indptr[v.index] <= j < csr_indptr[v.index + 1]:
+ j_current = csr_indices[j]
+ current_node = &nodes[j_current]
+ if current_node.state != SCANNED:
+ weight = csr_weights[j]
+ if current_node.state == NOT_IN_HEAP:
+ current_node.state = IN_HEAP
+ current_node.val = v.val + weight
+ insert_node(&heap, current_node)
+ if return_pred:
+ pred[i, j_current] = v.index
+ elif current_node.val > v.val + weight:
+ decrease_val(&heap, current_node,
+ v.val + weight)
+ if return_pred:
+ pred[i, j_current] = v.index
+
+ #v has now been scanned: add the distance to the results
+ dist_matrix[i, v.index] = v.val
+
+ free(nodes)
+
+
+cdef _dijkstra_undirected(
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] source_indices,
+ np.ndarray[DTYPE_t, ndim=1, mode='c'] csr_weights,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csr_indices,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csr_indptr,
+ np.ndarray[DTYPE_t, ndim=1, mode='c'] csrT_weights,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csrT_indices,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csrT_indptr,
+ np.ndarray[DTYPE_t, ndim=2, mode='c'] dist_matrix,
+ np.ndarray[ITYPE_t, ndim=2, mode='c'] pred):
+ cdef unsigned int Nind = dist_matrix.shape[0]
+ cdef unsigned int N = dist_matrix.shape[1]
+ cdef unsigned int i, k, j_source, j_current
+
+ cdef DTYPE_t weight
+
+ cdef int return_pred = (pred.size > 0)
+
+ cdef FibonacciHeap heap
+ cdef FibonacciNode *v, *current_node
+ cdef FibonacciNode* nodes = <FibonacciNode*> malloc(N *
+ sizeof(FibonacciNode))
+
+ for i from 0 <= i < Nind:
+ j_source = source_indices[i]
+
+ for k from 0 <= k < N:
+ initialize_node(&nodes[k], k)
+
+ dist_matrix[i, j_source] = 0
+ heap.min_node = NULL
+ insert_node(&heap, &nodes[j_source])
+
+ while heap.min_node:
+ v = remove_min(&heap)
+ v.state = SCANNED
+
+ for j from csr_indptr[v.index] <= j < csr_indptr[v.index + 1]:
+ j_current = csr_indices[j]
+ current_node = &nodes[j_current]
+ if current_node.state != SCANNED:
+ weight = csr_weights[j]
+ if current_node.state == NOT_IN_HEAP:
+ current_node.state = IN_HEAP
+ current_node.val = v.val + weight
+ insert_node(&heap, current_node)
+ if return_pred:
+ pred[i, j_current] = v.index
+ elif current_node.val > v.val + weight:
+ decrease_val(&heap, current_node,
+ v.val + weight)
+ if return_pred:
+ pred[i, j_current] = v.index
+
+ for j from csrT_indptr[v.index] <= j < csrT_indptr[v.index + 1]:
+ j_current = csrT_indices[j]
+ current_node = &nodes[j_current]
+ if current_node.state != SCANNED:
+ weight = csrT_weights[j]
+ if current_node.state == NOT_IN_HEAP:
+ current_node.state = IN_HEAP
+ current_node.val = v.val + weight
+ insert_node(&heap, current_node)
+ if return_pred:
+ pred[i, j_current] = v.index
+ elif current_node.val > v.val + weight:
+ decrease_val(&heap, current_node,
+ v.val + weight)
+ if return_pred:
+ pred[i, j_current] = v.index
+
+ #v has now been scanned: add the distance to the results
+ dist_matrix[i, v.index] = v.val
+
+ free(nodes)
+
+
+def bellman_ford(csgraph, directed=True, indices=None,
+ return_predecessors=False,
+ unweighted=False):
+ """Compute the shortest path lengths using the Bellman-Ford algorithm.
+
+ The Bellman-ford algorithm can robustly deal with graphs with negative
+ weights. If a negative cycle is detected, an error is raised. For
+ graphs without negative edge weights, dijkstra's algorithm may be faster.
+
+ Parameters
+ ----------
+ csgraph : array, matrix, or sparse matrix, 2 dimensions
+ The N x N array of distances representing the input graph.
+ directed : bool, optional
+ If True (default), then find the shortest path on a directed graph:
+ only move from point i to point j along paths csgraph[i, j].
+ If False, then find the shortest path on an undirected graph: the
+ algorithm can progress from point i to j along csgraph[i, j] or
+ csgraph[j, i]
+ indices : array-like or integer, optional
+ if specified, only compute the paths for the points at the given
+ indices.
+ return_predecessors : bool, optional
+ If True, return the size (N, N) predecesor matrix
+ unweighted : bool, optional
+ If True, then find unweighted distances. That is, rather than finding
+ the path between each point such that the sum of weights is minimized,
+ find the path such that the number of edges is minimized.
+
+ Returns
+ -------
+ dist_matrix : ndarray
+ The N x N matrix of distances between graph nodes. dist_matrix[i,j]
+ gives the shortest distance from point i to point j along the graph.
+
+ predecessors : ndarray
+ Returned only if return_predecessors == True.
+ The N x N matrix of predecessors, which can be used to reconstruct
+ the shortest paths. Row i of the predecessor matrix contains
+ information on the shortest paths from point i: each entry
+ predecessors[i, j] gives the index of the previous node in the
+ path from point i to point j. If no path exists between point
+ i and j, then predecessors[i, j] = -9999
+
+ Raises
+ ------
+ NegativeCycleError:
+ if there are negative cycles in the graph
+
+ Notes
+ -----
+ This routine is specially designed for graphs with negative edge weights.
+ If all edge weights are positive, then Dijkstra's algorithm is a better
+ choice.
+ """
+ global NULL_IDX
+
+ #------------------------------
+ # validate csgraph and convert to csr matrix
+ csgraph = validate_graph(csgraph, directed, DTYPE,
+ dense_output=False)
+ N = csgraph.shape[0]
+
+ #------------------------------
+ # intitialize/validate indices
+ if indices is None:
+ indices = np.arange(N, dtype=ITYPE)
+ else:
+ indices = np.array(indices, order='C', dtype=ITYPE)
+ indices[indices < 0] += N
+ if np.any(indices < 0) or np.any(indices >= N):
+ raise ValueError("indices out of range 0...N")
+ return_shape = indices.shape + (N,)
+ indices = np.atleast_1d(indices).reshape(-1)
+
+ #------------------------------
+ # initialize dist_matrix for output
+ dist_matrix = np.empty((len(indices), N), dtype=DTYPE)
+ dist_matrix.fill(np.inf)
+ dist_matrix[np.arange(len(indices)), indices] = 0
+
+ #------------------------------
+ # initialize predecessors for output
+ if return_predecessors:
+ predecessor_matrix = np.empty((len(indices), N), dtype=ITYPE)
+ predecessor_matrix.fill(NULL_IDX)
+ else:
+ predecessor_matrix = np.empty((0, N), dtype=ITYPE)
+
+ if unweighted:
+ csr_data = np.ones(csgraph.data.shape)
+ else:
+ csr_data = csgraph.data
+
+ if directed:
+ ret = _bellman_ford_directed(indices,
+ csr_data, csgraph.indices,
+ csgraph.indptr,
+ dist_matrix, predecessor_matrix)
+ else:
+ ret = _bellman_ford_undirected(indices,
+ csr_data, csgraph.indices,
+ csgraph.indptr,
+ dist_matrix, predecessor_matrix)
+
+ if ret >= 0:
+ raise NegativeCycleError("Negative cycle detected on node %i" % ret)
+
+ if return_predecessors:
+ return (dist_matrix.reshape(return_shape),
+ predecessor_matrix.reshape(return_shape))
+ else:
+ return dist_matrix.reshape(return_shape)
+
+
+cdef int _bellman_ford_directed(
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] source_indices,
+ np.ndarray[DTYPE_t, ndim=1, mode='c'] csr_weights,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csr_indices,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csr_indptr,
+ np.ndarray[DTYPE_t, ndim=2, mode='c'] dist_matrix,
+ np.ndarray[ITYPE_t, ndim=2, mode='c'] pred):
+ global DTYPE_EPS
+ cdef unsigned int Nind = dist_matrix.shape[0]
+ cdef unsigned int N = dist_matrix.shape[1]
+ cdef unsigned int i, j, k, j_source, count
+
+ cdef DTYPE_t d1, d2, w12
+
+ cdef int return_pred = (pred.size > 0)
+
+ for i from 0 <= i < Nind:
+ j_source = source_indices[i]
+
+ # relax all edges N-1 times
+ for count from 0 <= count < N - 1:
+ for j from 0 <= j < N:
+ d1 = dist_matrix[i, j]
+ for k from csr_indptr[j] <= k < csr_indptr[j + 1]:
+ w12 = csr_weights[k]
+ d2 = dist_matrix[i, csr_indices[k]]
+ if d1 + w12 < d2:
+ dist_matrix[i, csr_indices[k]] = d1 + w12
+ if return_pred:
+ pred[i, csr_indices[k]] = j
+
+ # check for negative-weight cycles
+ for j from 0 <= j < N:
+ d1 = dist_matrix[i, j]
+ for k from csr_indptr[j] <= k < csr_indptr[j + 1]:
+ w12 = csr_weights[k]
+ d2 = dist_matrix[i, csr_indices[k]]
+ if d1 + w12 + DTYPE_EPS < d2:
+ return j_source
+
+ return -1
+
+
+cdef int _bellman_ford_undirected(
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] source_indices,
+ np.ndarray[DTYPE_t, ndim=1, mode='c'] csr_weights,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csr_indices,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csr_indptr,
+ np.ndarray[DTYPE_t, ndim=2, mode='c'] dist_matrix,
+ np.ndarray[ITYPE_t, ndim=2, mode='c'] pred):
+ global DTYPE_EPS
+ cdef unsigned int Nind = dist_matrix.shape[0]
+ cdef unsigned int N = dist_matrix.shape[1]
+ cdef unsigned int i, j, k, j_source, ind_k, count
+
+ cdef DTYPE_t d1, d2, w12
+
+ cdef int return_pred = (pred.size > 0)
+
+ for i from 0 <= i < Nind:
+ j_source = source_indices[i]
+
+ # relax all edges N-1 times
+ for count from 0 <= count < N - 1:
+ for j from 0 <= j < N:
+ d1 = dist_matrix[i, j]
+ for k from csr_indptr[j] <= k < csr_indptr[j + 1]:
+ w12 = csr_weights[k]
+ ind_k = csr_indices[k]
+ d2 = dist_matrix[i, ind_k]
+ if d1 + w12 < d2:
+ dist_matrix[i, ind_k] = d2 = d1 + w12
+ if return_pred:
+ pred[i, ind_k] = j
+ if d2 + w12 < d1:
+ dist_matrix[i, j] = d1 = d2 + w12
+ if return_pred:
+ pred[i, j] = ind_k
+
+ # check for negative-weight cycles
+ for j from 0 <= j < N:
+ d1 = dist_matrix[i, j]
+ for k from csr_indptr[j] <= k < csr_indptr[j + 1]:
+ w12 = csr_weights[k]
+ d2 = dist_matrix[i, csr_indices[k]]
+ if abs(d2 - d1) > w12 + DTYPE_EPS:
+ return j_source
+
+ return -1
+
+
+def johnson(csgraph, directed=True, indices=None,
+ return_predecessors=False,
+ unweighted=False):
+ """Compute the shortest path lengths using Johnson's algorithm.
+
+ Johnson's algorithm combines the Bellman-Ford algorithm and Dijkstra's
+ algorithm to quickly find shortest paths in a way that is robust to
+ the presence of negative cycles. If a negative cycle is detected,
+ an error is raised. For graphs without negative edge weights,
+ dijkstra() may be faster.
+
+ Parameters
+ ----------
+ csgraph : array, matrix, or sparse matrix, 2 dimensions
+ The N x N array of distances representing the input graph.
+ directed : bool, optional
+ If True (default), then find the shortest path on a directed graph:
+ only move from point i to point j along paths csgraph[i, j].
+ If False, then find the shortest path on an undirected graph: the
+ algorithm can progress from point i to j along csgraph[i, j] or
+ csgraph[j, i]
+ indices : array-like or integer, optional
+ if specified, only compute the paths for the points at the given
+ indices.
+ return_predecessors : bool, optional
+ If True, return the size (N, N) predecesor matrix
+ unweighted : bool, optional
+ If True, then find unweighted distances. That is, rather than finding
+ the path between each point such that the sum of weights is minimized,
+ find the path such that the number of edges is minimized.
+
+ Returns
+ -------
+ dist_matrix : ndarray
+ The N x N matrix of distances between graph nodes. dist_matrix[i,j]
+ gives the shortest distance from point i to point j along the graph.
+
+ predecessors : ndarray
+ Returned only if return_predecessors == True.
+ The N x N matrix of predecessors, which can be used to reconstruct
+ the shortest paths. Row i of the predecessor matrix contains
+ information on the shortest paths from point i: each entry
+ predecessors[i, j] gives the index of the previous node in the
+ path from point i to point j. If no path exists between point
+ i and j, then predecessors[i, j] = -9999
+
+ Raises
+ ------
+ NegativeCycleError:
+ if there are negative cycles in the graph
+
+ Notes
+ -----
+ This routine is specially designed for graphs with negative edge weights.
+ If all edge weights are positive, then Dijkstra's algorithm is a better
+ choice.
+ """
+ #------------------------------
+ # if unweighted, there are no negative weights: we just use dijkstra
+ if unweighted:
+ return dijkstra(csgraph, directed, indices,
+ return_predecessors, unweighted)
+
+ #------------------------------
+ # validate csgraph and convert to csr matrix
+ csgraph = validate_graph(csgraph, directed, DTYPE,
+ dense_output=False)
+ N = csgraph.shape[0]
+
+ #------------------------------
+ # initialize/validate indices
+ if indices is None:
+ indices = np.arange(N, dtype=ITYPE)
+ return_shape = indices.shape + (N,)
+ else:
+ indices = np.array(indices, order='C', dtype=ITYPE)
+ return_shape = indices.shape + (N,)
+ indices = np.atleast_1d(indices).reshape(-1)
+ indices[indices < 0] += N
+ if np.any(indices < 0) or np.any(indices >= N):
+ raise ValueError("indices out of range 0...N")
+
+ #------------------------------
+ # initialize dist_matrix for output
+ dist_matrix = np.empty((len(indices), N), dtype=DTYPE)
+ dist_matrix.fill(np.inf)
+ dist_matrix[np.arange(len(indices)), indices] = 0
+
+ #------------------------------
+ # initialize predecessors for output
+ if return_predecessors:
+ predecessor_matrix = np.empty((len(indices), N), dtype=ITYPE)
+ predecessor_matrix.fill(NULL_IDX)
+ else:
+ predecessor_matrix = np.empty((0, N), dtype=ITYPE)
+
+ #------------------------------
+ # initialize distance array
+ dist_array = np.empty(N, dtype=DTYPE)
+
+ csr_data = csgraph.data.copy()
+
+ #------------------------------
+ # here we first add a single node to the graph, connected by a
+ # directed edge of weight zero to each node, and perform bellman-ford
+ if directed:
+ ret = _johnson_directed(csr_data, csgraph.indices,
+ csgraph.indptr, dist_array)
+ else:
+ ret = _johnson_undirected(csr_data, csgraph.indices,
+ csgraph.indptr, dist_array)
+
+ if ret >= 0:
+ raise NegativeCycleError("Negative cycle detected on node %i" % ret)
+
+ #------------------------------
+ # add the bellman-ford weights to the data
+ _johnson_add_weights(csr_data, csgraph.indices,
+ csgraph.indptr, dist_array)
+
+ if directed:
+ _dijkstra_directed(indices,
+ csr_data, csgraph.indices, csgraph.indptr,
+ dist_matrix, predecessor_matrix)
+ else:
+ csgraphT = csr_matrix((csr_data, csgraph.indices, csgraph.indptr),
+ csgraph.shape).T.tocsr()
+ _johnson_add_weights(csgraphT.data, csgraphT.indices,
+ csgraphT.indptr, dist_array)
+ _dijkstra_undirected(indices,
+ csr_data, csgraph.indices, csgraph.indptr,
+ csgraphT.data, csgraphT.indices, csgraphT.indptr,
+ dist_matrix, predecessor_matrix)
+
+ #------------------------------
+ # correct the distance matrix for the bellman-ford weights
+ dist_matrix += dist_array
+ dist_matrix -= dist_array[:, None][indices]
+
+ if return_predecessors:
+ return (dist_matrix.reshape(return_shape),
+ predecessor_matrix.reshape(return_shape))
+ else:
+ return dist_matrix.reshape(return_shape)
+
+
+cdef void _johnson_add_weights(
+ np.ndarray[DTYPE_t, ndim=1, mode='c'] csr_weights,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csr_indices,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csr_indptr,
+ np.ndarray[DTYPE_t, ndim=1, mode='c'] dist_array):
+ # let w(u, v) = w(u, v) + h(u) - h(v)
+ cdef unsigned int j, k, N = dist_array.shape[0]
+
+ for j from 0 <= j < N:
+ for k from csr_indptr[j] <= k < csr_indptr[j + 1]:
+ csr_weights[k] += dist_array[j]
+ csr_weights[k] -= dist_array[csr_indices[k]]
+
+
+cdef int _johnson_directed(
+ np.ndarray[DTYPE_t, ndim=1, mode='c'] csr_weights,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csr_indices,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csr_indptr,
+ np.ndarray[DTYPE_t, ndim=1, mode='c'] dist_array):
+ global DTYPE_EPS
+ cdef unsigned int N = dist_array.shape[0]
+ cdef unsigned int j, k, j_source, count
+
+ cdef DTYPE_t d1, d2, w12
+
+ # relax all edges (N+1) - 1 times
+ for count from 0 <= count < N:
+ for k from 0 <= k < N:
+ if dist_array[k] < 0:
+ dist_array[k] = 0
+
+ for j from 0 <= j < N:
+ d1 = dist_array[j]
+ for k from csr_indptr[j] <= k < csr_indptr[j + 1]:
+ w12 = csr_weights[k]
+ d2 = dist_array[csr_indices[k]]
+ if d1 + w12 < d2:
+ dist_array[csr_indices[k]] = d1 + w12
+
+ # check for negative-weight cycles
+ for j from 0 <= j < N:
+ d1 = dist_array[j]
+ for k from csr_indptr[j] <= k < csr_indptr[j + 1]:
+ w12 = csr_weights[k]
+ d2 = dist_array[csr_indices[k]]
+ if d1 + w12 + DTYPE_EPS < d2:
+ return j
+
+ return -1
+
+
+cdef int _johnson_undirected(
+ np.ndarray[DTYPE_t, ndim=1, mode='c'] csr_weights,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csr_indices,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] csr_indptr,
+ np.ndarray[DTYPE_t, ndim=1, mode='c'] dist_array):
+ global DTYPE_EPS
+ cdef unsigned int N = dist_array.shape[0]
+ cdef unsigned int j, k, j_source, count
+
+ cdef DTYPE_t d1, d2, w12
+
+ # relax all edges (N+1) - 1 times
+ for count from 0 <= count < N:
+ for k from 0 <= k < N:
+ if dist_array[k] < 0:
+ dist_array[k] = 0
+
+ for j from 0 <= j < N:
+ d1 = dist_array[j]
+ for k from csr_indptr[j] <= k < csr_indptr[j + 1]:
+ w12 = csr_weights[k]
+ ind_k = csr_indices[k]
+ d2 = dist_array[ind_k]
+ if d1 + w12 < d2:
+ dist_array[ind_k] = d1 + w12
+ if d2 + w12 < d1:
+ dist_array[j] = d1 = d2 + w12
+
+ # check for negative-weight cycles
+ for j from 0 <= j < N:
+ d1 = dist_array[j]
+ for k from csr_indptr[j] <= k < csr_indptr[j + 1]:
+ w12 = csr_weights[k]
+ d2 = dist_array[csr_indices[k]]
+ if abs(d2 - d1) > w12 + DTYPE_EPS:
+ return j
+
+ return -1
+
+
+######################################################################
+# FibonacciNode structure
+# This structure and the operations on it are the nodes of the
+# Fibonacci heap.
+#
+cdef enum FibonacciState:
+ SCANNED
+ NOT_IN_HEAP
+ IN_HEAP
+
+
+cdef struct FibonacciNode:
+ unsigned int index
+ unsigned int rank
+ FibonacciState state
+ DTYPE_t val
+ FibonacciNode* parent
+ FibonacciNode* left_sibling
+ FibonacciNode* right_sibling
+ FibonacciNode* children
+
+
+cdef void initialize_node(FibonacciNode* node,
+ unsigned int index,
+ DTYPE_t val=0):
+ # Assumptions: - node is a valid pointer
+ # - node is not currently part of a heap
+ node.index = index
+ node.val = val
+ node.rank = 0
+ node.state = NOT_IN_HEAP
+
+ node.parent = NULL
+ node.left_sibling = NULL
+ node.right_sibling = NULL
+ node.children = NULL
+
+
+cdef FibonacciNode* rightmost_sibling(FibonacciNode* node):
+ # Assumptions: - node is a valid pointer
+ cdef FibonacciNode* temp = node
+ while(temp.right_sibling):
+ temp = temp.right_sibling
+ return temp
+
+
+cdef FibonacciNode* leftmost_sibling(FibonacciNode* node):
+ # Assumptions: - node is a valid pointer
+ cdef FibonacciNode* temp = node
+ while(temp.left_sibling):
+ temp = temp.left_sibling
+ return temp
+
+
+cdef void add_child(FibonacciNode* node, FibonacciNode* new_child):
+ # Assumptions: - node is a valid pointer
+ # - new_child is a valid pointer
+ # - new_child is not the sibling or child of another node
+ new_child.parent = node
+
+ if node.children:
+ add_sibling(node.children, new_child)
+ else:
+ node.children = new_child
+ new_child.right_sibling = NULL
+ new_child.left_sibling = NULL
+ node.rank = 1
+
+
+cdef void add_sibling(FibonacciNode* node, FibonacciNode* new_sibling):
+ # Assumptions: - node is a valid pointer
+ # - new_sibling is a valid pointer
+ # - new_sibling is not the child or sibling of another node
+ cdef FibonacciNode* temp = rightmost_sibling(node)
+ temp.right_sibling = new_sibling
+ new_sibling.left_sibling = temp
+ new_sibling.right_sibling = NULL
+ new_sibling.parent = node.parent
+ if new_sibling.parent:
+ new_sibling.parent.rank += 1
+
+
+cdef void remove(FibonacciNode* node):
+ # Assumptions: - node is a valid pointer
+ if node.parent:
+ node.parent.rank -= 1
+ if node.left_sibling:
+ node.parent.children = node.left_sibling
+ elif node.right_sibling:
+ node.parent.children = node.right_sibling
+ else:
+ node.parent.children = NULL
+
+ if node.left_sibling:
+ node.left_sibling.right_sibling = node.right_sibling
+ if node.right_sibling:
+ node.right_sibling.left_sibling = node.left_sibling
+
+ node.left_sibling = NULL
+ node.right_sibling = NULL
+ node.parent = NULL
+
+
+######################################################################
+# FibonacciHeap structure
+# This structure and operations on it use the FibonacciNode
+# routines to implement a Fibonacci heap
+
+ctypedef FibonacciNode* pFibonacciNode
+
+
+cdef struct FibonacciHeap:
+ FibonacciNode* min_node
+ pFibonacciNode[100] roots_by_rank # maximum number of nodes is ~2^100.
+
+
+cdef void insert_node(FibonacciHeap* heap,
+ FibonacciNode* node):
+ # Assumptions: - heap is a valid pointer
+ # - node is a valid pointer
+ # - node is not the child or sibling of another node
+ if heap.min_node:
+ add_sibling(heap.min_node, node)
+ if node.val < heap.min_node.val:
+ heap.min_node = node
+ else:
+ heap.min_node = node
+
+
+cdef void decrease_val(FibonacciHeap* heap,
+ FibonacciNode* node,
+ DTYPE_t newval):
+ # Assumptions: - heap is a valid pointer
+ # - newval <= node.val
+ # - node is a valid pointer
+ # - node is not the child or sibling of another node
+ # - node is in the heap
+ node.val = newval
+ if node.parent and (node.parent.val >= newval):
+ remove(node)
+ insert_node(heap, node)
+ elif heap.min_node.val > node.val:
+ heap.min_node = node
+
+
+cdef void link(FibonacciHeap* heap, FibonacciNode* node):
+ # Assumptions: - heap is a valid pointer
+ # - node is a valid pointer
+ # - node is already within heap
+
+ cdef FibonacciNode *linknode, *parent, *child
+
+ if heap.roots_by_rank[node.rank] == NULL:
+ heap.roots_by_rank[node.rank] = node
+ else:
+ linknode = heap.roots_by_rank[node.rank]
+ heap.roots_by_rank[node.rank] = NULL
+
+ if node.val < linknode.val or node == heap.min_node:
+ remove(linknode)
+ add_child(node, linknode)
+ link(heap, node)
+ else:
+ remove(node)
+ add_child(linknode, node)
+ link(heap, linknode)
+
+
+cdef FibonacciNode* remove_min(FibonacciHeap* heap):
+ # Assumptions: - heap is a valid pointer
+ # - heap.min_node is a valid pointer
+ cdef FibonacciNode *temp, *temp_right, *out
+ cdef unsigned int i
+
+ # make all min_node children into root nodes
+ if heap.min_node.children:
+ temp = leftmost_sibling(heap.min_node.children)
+ temp_right = NULL
+
+ while temp:
+ temp_right = temp.right_sibling
+ remove(temp)
+ add_sibling(heap.min_node, temp)
+ temp = temp_right
+
+ heap.min_node.children = NULL
+
+ # choose a root node other than min_node
+ temp = leftmost_sibling(heap.min_node)
+ if temp == heap.min_node:
+ if heap.min_node.right_sibling:
+ temp = heap.min_node.right_sibling
+ else:
+ out = heap.min_node
+ heap.min_node = NULL
+ return out
+
+ # remove min_node, and point heap to the new min
+ out = heap.min_node
+ remove(heap.min_node)
+ heap.min_node = temp
+
+ # re-link the heap
+ for i from 0 <= i < 100:
+ heap.roots_by_rank[i] = NULL
+
+ while temp:
+ if temp.val < heap.min_node.val:
+ heap.min_node = temp
+ temp_right = temp.right_sibling
+ link(heap, temp)
+ temp = temp_right
+
+ return out
+
+
+######################################################################
+# Debugging: Functions for printing the Fibonacci heap
+#
+#cdef void print_node(FibonacciNode* node, int level=0):
+# print '%s(%i,%i) %i' % (level*' ', node.index, node.val, node.rank)
+# if node.children:
+# print_node(leftmost_sibling(node.children), level+1)
+# if node.right_sibling:
+# print_node(node.right_sibling, level)
+#
+#
+#cdef void print_heap(FibonacciHeap* heap):
+# print "---------------------------------"
+# print "min node: (%i, %i)" % (heap.min_node.index, heap.min_node.val)
+# if heap.min_node:
+# print_node(leftmost_sibling(heap.min_node))
+# else:
+# print "[empty heap]"
View
9,102 scipy/sparse/csgraph/_tools.c
9,102 additions, 0 deletions not shown
View
414 scipy/sparse/csgraph/_tools.pyx
@@ -0,0 +1,414 @@
+"""
+Tools and utilities for working with compressed sparse graphs
+"""
+
+# Author: Jake Vanderplas -- <vanderplas@astro.washington.edu>
+# License: BSD, (C) 2012
+
+import numpy as np
+cimport numpy as np
+
+from scipy.sparse import csr_matrix, isspmatrix,\
+ isspmatrix_csr, isspmatrix_csc, isspmatrix_lil
+
+include 'parameters.pxi'
+
+def csgraph_from_masked(graph):
+ """Construct a CSR-format graph from a masked array.
+
+ Parameters
+ ----------
+ graph: MaskedArray
+ Input graph. Shape should be (n_nodes, n_nodes).
+
+ Returns
+ -------
+ csgraph: csr_matrix
+ Compressed sparse representation of graph,
+ """
+ # check that graph is a square matrix
+ graph = np.ma.asarray(graph)
+
+ if graph.ndim != 2:
+ raise ValueError("graph should have two dimensions")
+ N = graph.shape[0]
+ if graph.shape[1] != N:
+ raise ValueError("graph should be a square array")
+
+ # construct the csr matrix using graph and mask
+ if np.ma.is_masked(graph):
+ data = graph.compressed()
+ mask = ~graph.mask
+ else:
+ data = graph.data
+ mask = np.ones(graph.shape, dtype='bool')
+
+ data = np.asarray(data, dtype=DTYPE, order='c')
+
+ idx_grid = np.empty((N, N), dtype=ITYPE)
+ idx_grid[:] = np.arange(N, dtype=ITYPE)
+ indices = np.asarray(idx_grid[mask], dtype=ITYPE, order='c')
+
+ indptr = np.zeros(N + 1, dtype=ITYPE)
+ indptr[1:] = mask.sum(1).cumsum()
+
+ return csr_matrix((data, indices, indptr), (N, N))
+
+
+def csgraph_masked_from_dense(graph,
+ null_value=0,
+ nan_null=True,
+ infinity_null=True,
+ copy=True):
+ """Construct a masked array graph representation from a dense matrix.
+
+ Parameters
+ ----------
+ graph: array_like
+ Input graph. Shape should be (n_nodes, n_nodes).
+ null_value: float or None (optional)
+ Value that denotes non-edges in the graph. Default is zero.
+ infinity_null: bool
+ If True (default), then infinite entries (both positive and negative)
+ are treated as null edges.
+ nan_null: bool
+ If True (default), then NaN entries are treated as non-edges
+
+ Returns
+ -------
+ csgraph: MaskedArray
+ masked array representation of graph
+ """
+ graph = np.array(graph, copy=copy)
+
+ # check that graph is a square matrix
+ if graph.ndim != 2:
+ raise ValueError("graph should have two dimensions")
+ N = graph.shape[0]
+ if graph.shape[1] != N:
+ raise ValueError("graph should be a square array")
+
+ # check whether null_value is infinity or NaN
+ if null_value is not None:
+ null_value = DTYPE(null_value)
+ if np.isnan(null_value):
+ nan_null = True
+ null_value = None
+ elif np.isinf(null_value):
+ infinity_null = True
+ null_value = None
+
+ # flag all the null edges
+ if null_value is None:
+ mask = np.zeros(graph.shape, dtype='bool')
+ graph = np.ma.masked_array(graph, mask, copy=False)
+ else:
+ graph = np.ma.masked_values(graph, null_value, copy=False)
+
+ if infinity_null:
+ graph.mask |= np.isinf(graph)
+
+ if nan_null:
+ graph.mask |= np.isnan(graph)
+
+ return graph
+
+
+def csgraph_from_dense(graph,
+ null_value=0,
+ nan_null=True,
+ infinity_null=True):
+ """Construct a CSR-format sparse graph from a dense matrix.
+
+ Parameters
+ ----------
+ graph: array_like
+ Input graph. Shape should be (n_nodes, n_nodes).
+ null_value: float or None (optional)
+ Value that denotes non-edges in the graph. Default is zero.
+ infinity_null: bool
+ If True (default), then infinite entries (both positive and negative)
+ are treated as null edges.
+ nan_null: bool
+ If True (default), then NaN entries are treated as non-edges
+
+ Returns
+ -------
+ csgraph: csr_matrix
+ Compressed sparse representation of graph,
+ """
+ return csgraph_from_masked(csgraph_masked_from_dense(graph,
+ null_value,
+ nan_null,
+ infinity_null))
+
+
+def csgraph_to_dense(csgraph, null_value=0):
+ """Convert a sparse graph representation to a dense representation
+
+ Parameters
+ ----------
+ csgraph: csr_matrix, csc_matrix, or lil_matrix
+ Sparse representation of a graph.
+ null_value: float, optional
+ The value used to indicate null edges in the dense representation.
+ Default is 0.
+
+ Returns
+ -------
+ graph: ndarray
+ The dense representation of the sparse graph.
+
+ Notes
+ -----
+ For normal sparse graph representations, calling csgraph_to_dense with
+ null_value=0 produces an equivalent result to using dense format
+ conversions in the main sparse package. When the sparse representations
+ have repeated values, however, the results will differ. The tools in
+ scipy.sparse will add repeating values to obtain a final value. This
+ function will select the minimum among repeating values to obtain a
+ final value. For example, here we'll create a two-node directed sparse
+ graph with multiple edges from node 0 to node 1, of weights 2 and 3.
+ This illustrates the difference in behavior:
+
+ >>> from scipy.sparse import csr_matrix
+ >>> data = np.array([2, 3])
+ >>> indices = np.array([1, 1])
+ >>> indptr = np.array([0, 2, 2])
+ >>> M = csr_matrix((data, indices, indptr), shape=(2, 2))
+ >>> M.toarray()
+ array([[0, 5],
+ [0, 0]])
+ >>> csgraph_to_dense(M)
+ array([[0, 2],
+ [0, 0]])
+
+ The reason for this difference is to allow a compressed sparse graph to
+ represent multiple edges between any two nodes. As most sparse graph
+ algorithms are concerned with the single lowest-cost edge between any
+ two nodes, the default scipy.sparse behavior of summming multiple weights
+ does not make sense in this context.
+
+ The other reason for using this routine is to allow for graphs with
+ zero-weight edges. Let's look at the example of a two-node directed
+ graph, connected by an edge of weight zero:
+
+ >>> from scipy.sparse import csr_matrix
+ >>> data = np.array([0.0])
+ >>> indices = np.array([1])
+ >>> indptr = np.array([0, 2, 2])
+ >>> M = csr_matrix((data, indices, indptr), shape=(2, 2))
+ >>> M.toarray()
+ array([[0, 0],
+ [0, 0]])
+ >>> csgraph_to_dense(M, np.inf)
+ array([[ Inf, 0.],
+ [ Inf, Inf]])
+
+ In the first case, the zero-weight edge gets lost in the dense
+ representation. In the second case, we can choose a different null value
+ and see the true form of the graph.
+ """
+ # Allow only csr, lil and csc matrices: other formats when converted to csr
+ # combine duplicated edges: we don't want this to happen in the background.
+ if isspmatrix_csc(csgraph) or isspmatrix_lil(csgraph):
+ csgraph = csgraph.tocsr()
+ elif not isspmatrix_csr(csgraph):
+ raise ValueError("csgraph must be lil, csr, or csc format")
+
+ N = csgraph.shape[0]
+ if csgraph.shape[1] != N:
+ raise ValueError('csgraph should be a square matrix')
+
+ # get attribute arrays
+ data = np.asarray(csgraph.data, dtype=DTYPE, order='C')
+ indices = np.asarray(csgraph.indices, dtype=ITYPE, order='C')
+ indptr = np.asarray(csgraph.indptr, dtype=ITYPE, order='C')
+
+ # create the output array
+ graph = np.empty(csgraph.shape, dtype=DTYPE)
+ graph.fill(np.inf)
+ _populate_graph(graph, data, indices, indptr, null_value)
+ return graph
+
+
+def csgraph_to_masked(csgraph):
+ """Convert a sparse graph representation to a masked array representation
+
+ Parameters
+ ----------
+ csgraph: csr_matrix, csc_matrix, or lil_matrix
+ Sparse representation of a graph.
+
+ Returns
+ -------
+ graph: MakedArray
+ The masked dense representation of the sparse graph.
+ """
+ return np.ma.masked_invalid(csgraph_to_dense(csgraph, np.nan))
+
+
+cdef void _populate_graph(np.ndarray[DTYPE_t, ndim=1, mode='c'] data,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] indices,
+ np.ndarray[ITYPE_t, ndim=1, mode='c'] indptr,
+ np.ndarray[DTYPE_t, ndim=2, mode='c'] graph,
+ DTYPE_t null_value):
+ # data, indices, indptr are the csr attributes of the sparse input.
+ # on input, graph should be filled with infinities, and should be
+ # of size [N, N], which is also the size of the sparse matrix
+ cdef unsigned int</