Use scipy.sparse array datastructure #5139

rossbar · 2021-10-20T20:46:57Z

This is a currently in-progress experiment to see what it would take to adopt the sparse array interface proposed in scipy/scipy#14822. Supporting array semantics for sparse data structures will in the long run be a huge improvement for NetworkX IMO. Both dense arrays (i.e. numpy arrays) and sparse arrays are used extensively throughout, and a decent amount of code is dedicated to implementing something for one or the other data structure. The ability to have these data structures behave the same and use them interchangeable would be excellent! Of course, scipy/scipy#14822 doesn't get all the way there yet, but it is a step in the right direction. This PR is to help flesh out the work there and to help start to frame the type of improvements we'd see in NetworkX if such a feature were to be added upstream.

Note also that this is very much a work in progress - the scipy.sparse array API itself is going to be changing a lot, so a lot of the changes here (for example, wrapping objects created with spdiags in sparse.csr_array calls) is subject to change.

Seems like a reasonable place to start. nx.to_scipy_sparse_matrix is one of the primary interfaces to scipy.sparse from within NetworkX.

Fix two instances in modularitymatrix where a new 2D array was being created via an outer product of two \"vectors\". In the matrix case, this was a row vector \* a column vector. In the array case this can be disambiguated by being explicit with np.outer.

- A few instances of matrix multiplication operator - Add np.newaxis + transpose to get shape right for broadcasting - Explicitly convert e.g. sp.sparse.spdiags to a csr_array.

- Wrap spdiags in csr_array and update matmul operators.

- Replace .A call with appropriate array semantics - wrap sparse.diags in csr_array.

- Replace * with @ - Remove superfluous calls to flatten.

- Simplify lil.getrowview call - Wrap spdiags in csr_array.

networkx/algorithms/link_analysis/hits_alg.py

Update undirected laplacian functions.

Conflicts: - [x] networkx/algorithms/node_classification/hmn.py - [x] networkx/algorithms/node_classification/lgc.py - [x] networkx/linalg/algebraicconnectivity.py

dschult

Looks good so far. I like the notation of places where we should sparsify or not densify.
Just one comment about whether setting x should occur in an if statement or not.

Nice!

networkx/algorithms/link_analysis/hits_alg.py

Modifies test suite.

- biadjacency_matrix. - bethe_hessian_matrix. - incidence_matrix. - laplacian functions. - modularity_matrix functions. - adjacency_matrix.

Add a new conversion function to preserve array semantics internally while not altering behavior for users. Also adds FutureWarning to to_scipy_sparse_matrix.

dschult

The main questions I have are about tests/test_convert_scipy.py where there are tests of to_scipy_sparse_array, but none for from_scipy_sparse_array. I'm guessing that the existing tests for from_scipy_sparse_matrix can just be switched over to the array api. Do we want to do that now? Do we want to duplicate it now (convert one copy to array) and then remove the version for matrix later?

Otherwise this look good. I'm assuming that the sparse array itself is being tested elsewhere. So we're just testing the NetworkX usage of this data structure.

rossbar · 2022-01-14T23:05:00Z

The main questions I have are about tests/test_convert_scipy.py where there are tests of to_scipy_sparse_array, but none for from_scipy_sparse_array

This was because I hadn't added from_scipy_sparse_array yet - done as of 4a92d07.

I'm guessing that the existing tests for from_scipy_sparse_matrix can just be switched over to the array api.

Correct, this was also done in 4a92d07. In fact, from_scipy_sparse_array should work fine with either sparse array or matrices without any changing at all.

jarrodmillman · 2022-01-15T01:39:10Z

@MridulS @dschult I think we should merge this before scipy 1.8 is released. Then we can merge #5262. Before releasing networkx 2.7, we will need to update the requirements file. Since this is a big change it would be good to have it live in the main branch for a bit (potentially more people we see it and possibly test it for us).

networkx/convert_matrix.py

MridulS

Looks good to me! Thanks @rossbar!

networkx/convert_matrix.py

dschult · 2022-01-17T19:51:33Z

Wow -- there is a lot in here.... :} LGTM...
Thanks @rossbar!!

Co-authored-by: Mridul Seth <mail@mriduls.com>

MridulS · 2022-01-18T16:34:03Z

I think we can merge this in now, we can further iterate on this in #5262

Are there any other blockers @rossbar ?

rossbar · 2022-01-18T18:03:58Z

Are there any other blockers

Nope, not at this point. We're still waiting on the official release of scipy 1.8, but the dependency updates can be handled in a separate PR once it's out.

MridulS · 2022-01-18T18:17:18Z

I'll go ahead and merge this then 😄

stefanv

A thorough piece of work, thank you @rossbar. Some tiny pieces of feedback, to use or lose as you choose.

examples/algorithms/plot_subgraphs.py

examples/drawing/plot_tsp.py

stefanv · 2022-01-18T18:18:58Z

networkx/convert_matrix.py

+    edge_attribute: string
+       Name of edge attribute to store matrix numeric value. The data will
+       have the same type as the matrix entry (int, float, (real,imag)).
+


Missing returns section

stefanv · 2022-01-18T18:20:14Z

networkx/convert_matrix.py

+def from_scipy_sparse_matrix(
+    A, parallel_edges=False, create_using=None, edge_attribute="weight"
+):
+    """Creates a new graph from an adjacency matrix given as a SciPy sparse


Nitpick, but ideally these fit on one line. We also typically use "Create" instead of "Creates", i.e., something like:

"Create a graph from a sparse adjacency matrix."

stefanv · 2022-01-18T18:21:28Z

networkx/linalg/bethehessianmatrix.py

@@ -35,7 +35,7 @@ def bethe_hessian_matrix(G, r=None, nodelist=None):

    Returns
    -------
-    H : Numpy matrix
+    H : scipy.sparse.csr_matrix


I like this level of specificity. Should the same be used above, instead of simply "SciPy sparse matrix"? Or were you keeping your options open there, in case it might change in the future?

stefanv · 2022-01-18T18:22:09Z

networkx/linalg/bethehessianmatrix.py

-    return (r ** 2 - 1) * I - r * A + D
+    # TODO: Rm csr_array wrapper when spdiags array creation becomes available
+    D = sp.sparse.csr_array(sp.sparse.spdiags(A.sum(axis=1), 0, m, n, format="csr"))
+    # TODO: Rm csr_array wrapper when eye array creation becomes available


This is a nicely descriptive TODO. Above there is one that simply says "TODO: csr_array"

stefanv · 2022-01-18T18:23:25Z

Ah, perfect timing as usual 😂

MridulS · 2022-01-18T18:24:53Z

Woopsie, sorry about that 😅

* Step 1: use sparse arrays in nx.to_scipy_sparse_matrix. Seems like a reasonable place to start. nx.to_scipy_sparse_matrix is one of the primary interfaces to scipy.sparse from within NetworkX. * 1: Use np.outer instead of mult col/row vectors Fix two instances in modularitymatrix where a new 2D array was being created via an outer product of two \"vectors\". In the matrix case, this was a row vector \* a column vector. In the array case this can be disambiguated by being explicit with np.outer. * Update _transition_matrix in laplacianmatrix module - A few instances of matrix multiplication operator - Add np.newaxis + transpose to get shape right for broadcasting - Explicitly convert e.g. sp.sparse.spdiags to a csr_array. * Update directed_combinitorial_laplacian w/ sparse array. - Wrap spdiags in csr_array and update matmul operators. * Rm matrix-specific code from lgc and hmn modules - Replace .A call with appropriate array semantics - wrap sparse.diags in csr_array. * Change hits to use sparse array semantics. - Replace * with @ - Remove superfluous calls to flatten. * Update sparse matrix usage in layout module. - Simplify lil.getrowview call - Wrap spdiags in csr_array. * lil_matrix -> lil_array in graphmatrix.py. * WIP: Start working on algebraic connectivity module. * Incorporate auth mat varname feedback. * Revert 1D slice and comment for 1D sparse future. * Add TODOs: rm csr_array wrapper around spdiags etc. * WIP: cleanup algebraicconn: tracemin_fiedler. * Typo. * Finish reviewing algebraicconnectivity. * Convert bethe_hessian matrix to use sparse arrays. * WIP: update laplacian. Update undirected laplacian functions. * WIP: laplacian - add comment about _transition_matrix return types. * Finish laplacianmatrix review. * Update attrmatrix. * Switch to official laplacian function. * Update pagerank to use sparse array. * Switch bipartite matrix to sparse arrays. * Check from_scipy_sparse_matrix works with arrays. Modifies test suite. * Apply changes from review. * Fix failing docstring tests. * Fix missing axis for in-place multiplication. * Use scipy==1.8rc2 * Use matrix multiplication * Fix PyPy CI * [MRG] Create plot_subgraphs.py example (networkx#5165) * Create plot_subgraphs.py networkx#4220 * Update plot_subgraphs.py black * Update plot_subgraphs.py lint plus font_size * Update plot_subgraphs.py added more plots * Update plot_subgraphs.py removed plots from the unit test and added comments * Update plot_subgraphs.py lint * Update plot_subgraphs.py typos fixed * Update plot_subgraphs.py added nodes to the plot of the edges removed that was commented out for whatever reason * Update plot_subgraphs.py revert the latest commit - the line was commented out for a reason - it's broken * Update plot_subgraphs.py fixed node color issue * Update plot_subgraphs.py format fix * Update plot_subgraphs.py forgot to draw the nodes... now fixed * Fix sphinx warnings about heading length. * Update examples/algorithms/plot_subgraphs.py * Update examples/algorithms/plot_subgraphs.py Co-authored-by: Ross Barnowski <rossbar@berkeley.edu> Co-authored-by: Dan Schult <dschult@colgate.edu> * Add traveling salesman problem to example gallery (networkx#4874) Adds an example of the using Christofides to solve the TSP problem to the example galery. Co-authored-by: Ross Barnowski <rossbar@berkeley.edu> * Fixed inconsistent documentation for nbunch parameter in DiGraph.edges() (networkx#5037) * Fixed inconsistent documentation for nbunch parameter in DiGraph.edges() * Resolved Requested Changes * Revert changes to degree docstrings. * Update comments in example. * Apply wording to edges method in all graph classes. Co-authored-by: Ross Barnowski <rossbar@berkeley.edu> * Compatibility updates from testing with numpy/scipy/pytest rc's (networkx#5226) * Rm deprecated scipy subpkg access. * Use recwarn fixture in place of deprecated pytest pattern. * Rm unnecessary try/except from tests. * Replace internal `close` fn with `math.isclose`. (networkx#5224) * Replace internal close fn with math.isclose. * Fix lines in docstring examples. * Fix Python 3.10 deprecation warning w/ int div. (networkx#5231) * Touchups and suggestions for subgraph gallery example (networkx#5225) * Simplify construction of G with edges rm'd * Rm unused graph attribute. * Shorten categorization by node type. * Simplify node coloring. * Simplify isomorphism check. * Rm unit test. * Rm redundant plotting of each subgraph. * Use new package name (networkx#5234) * Allowing None edges in weight function of bidirectional Dijkstra (networkx#5232) * added following feature also to bidirectional dijkstra: The weight function can be used to hide edges by returning None. * changed syntax for better readability and code duplicate avoidance Co-authored-by: Hohmann, Nikolas <nikolas.hohmann@tu-darmstadt.de> * Add an FAQ about assigning issues. (networkx#5182) * Add FAQ about assigning issues. * Add note about linking issues from new PRs. * Update dev deps (networkx#5243) * Update minor doc issues with tex notation (networkx#5244) * Add FutureWarnings to fns that return sparse matrices - biadjacency_matrix. - bethe_hessian_matrix. - incidence_matrix. - laplacian functions. - modularity_matrix functions. - adjacency_matrix. * Add to_scipy_sparse_array and use it everywhere. Add a new conversion function to preserve array semantics internally while not altering behavior for users. Also adds FutureWarning to to_scipy_sparse_matrix. * Add from_scipy_sparse_array. Supercedes from_scipy_sparse_matrix. * Handle deprecations in separate PR. * Fix docstring examples. Co-authored-by: Mridul Seth <mail@mriduls.com> Co-authored-by: Jarrod Millman <jarrod.millman@gmail.com> Co-authored-by: Andrew Knyazev <andrew.knyazev@ucdenver.edu> Co-authored-by: Dan Schult <dschult@colgate.edu> Co-authored-by: eskountis <56514439+eskountis@users.noreply.github.com> Co-authored-by: Anutosh Bhat <87052487+anutosh491@users.noreply.github.com> Co-authored-by: NikHoh <nikhoh@web.de> Co-authored-by: Hohmann, Nikolas <nikolas.hohmann@tu-darmstadt.de> Co-authored-by: Sultan Orazbayev <contact@econpoint.com> Co-authored-by: Mridul Seth <mail@mriduls.com>

rossbar added 8 commits October 18, 2021 14:39

Step 1: use sparse arrays in nx.to_scipy_sparse_matrix.

ec52f75

Seems like a reasonable place to start. nx.to_scipy_sparse_matrix is one of the primary interfaces to scipy.sparse from within NetworkX.

Update _transition_matrix in laplacianmatrix module

a886efa

- A few instances of matrix multiplication operator - Add np.newaxis + transpose to get shape right for broadcasting - Explicitly convert e.g. sp.sparse.spdiags to a csr_array.

Update directed_combinitorial_laplacian w/ sparse array.

1546a8b

- Wrap spdiags in csr_array and update matmul operators.

Rm matrix-specific code from lgc and hmn modules

f87c5bc

- Replace .A call with appropriate array semantics - wrap sparse.diags in csr_array.

Change hits to use sparse array semantics.

2070af4

- Replace * with @ - Remove superfluous calls to flatten.

Update sparse matrix usage in layout module.

b70e19f

- Simplify lil.getrowview call - Wrap spdiags in csr_array.

lil_matrix -> lil_array in graphmatrix.py.

f0e1314

rossbar added the Work In Progress label Oct 20, 2021

rossbar mentioned this pull request Oct 20, 2021

Add an array API to scipy.sparse scipy/scipy#14822

Merged

WIP: Start working on algebraic connectivity module.

42a1bb5

rossbar mentioned this pull request Oct 23, 2021

Refactor node_classification to improve conciseness and readability #5144

Merged

jarrodmillman reviewed Nov 16, 2021

View reviewed changes

networkx/algorithms/link_analysis/hits_alg.py Outdated Show resolved Hide resolved

rossbar added 13 commits November 17, 2021 23:22

Incorporate auth mat varname feedback.

6849372

Revert 1D slice and comment for 1D sparse future.

09a7d0b

Add TODOs: rm csr_array wrapper around spdiags etc.

db96af6

WIP: cleanup algebraicconn: tracemin_fiedler.

49eb0c9

Typo.

42663f0

Finish reviewing algebraicconnectivity.

27c2289

Convert bethe_hessian matrix to use sparse arrays.

7d0b89f

WIP: update laplacian.

d997598

Update undirected laplacian functions.

WIP: laplacian - add comment about _transition_matrix return types.

0864e6c

Finish laplacianmatrix review.

025280e

Update attrmatrix.

8f3953f

Switch to official laplacian function.

ad7a847

Merge branch 'main' into try-sparse-array

a7b6d18

Conflicts: - [x] networkx/algorithms/node_classification/hmn.py - [x] networkx/algorithms/node_classification/lgc.py - [x] networkx/linalg/algebraicconnectivity.py

dschult reviewed Dec 4, 2021

View reviewed changes

networkx/algorithms/link_analysis/hits_alg.py Outdated Show resolved Hide resolved

rossbar added 3 commits December 4, 2021 13:16

Update pagerank to use sparse array.

24a84dc

Switch bipartite matrix to sparse arrays.

2fb90a2

Check from_scipy_sparse_matrix works with arrays.

9cce85b

Modifies test suite.

SultanOrazbayev and others added 3 commits January 6, 2022 11:36

Update minor doc issues with tex notation (networkx#5244)

af351b3

Add FutureWarnings to fns that return sparse matrices

aaeb1ff

- biadjacency_matrix. - bethe_hessian_matrix. - incidence_matrix. - laplacian functions. - modularity_matrix functions. - adjacency_matrix.

Add to_scipy_sparse_array and use it everywhere.

b6ce853

Add a new conversion function to preserve array semantics internally while not altering behavior for users. Also adds FutureWarning to to_scipy_sparse_matrix.

rossbar force-pushed the try-sparse-array branch from 2a9a4c0 to b6ce853 Compare January 6, 2022 19:38

jarrodmillman requested a review from dschult January 6, 2022 21:44

dschult reviewed Jan 7, 2022

View reviewed changes

Add from_scipy_sparse_array. Supercedes from_scipy_sparse_matrix.

4a92d07

Handle deprecations in separate PR.

852842b

rossbar mentioned this pull request Jan 15, 2022

Deprecate scipy sparse matrix conversion functions #5262

Merged

jarrodmillman approved these changes Jan 15, 2022

View reviewed changes

MridulS reviewed Jan 17, 2022

View reviewed changes

networkx/convert_matrix.py Show resolved Hide resolved

MridulS approved these changes Jan 17, 2022

View reviewed changes

networkx/convert_matrix.py Outdated Show resolved Hide resolved

Fix docstring examples.

252108f

Co-authored-by: Mridul Seth <mail@mriduls.com>

MridulS merged commit 5dfd57a into networkx:main Jan 18, 2022

stefanv approved these changes Jan 18, 2022

View reviewed changes

This was referenced Jan 27, 2022

added future warning to directed_combinatorial_laplacian for new return type #4142

Closed

NX 3.0: changed return type of directed_combinatorial_laplacian to SciPy Sparse Matrix #4141

Closed

This was referenced May 15, 2022

Remove numpy matrix from directed_combinatorial_laplacian_matrix #4140

Closed

Review return types of matrix objects in linalg #4089

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use scipy.sparse array datastructure #5139

Use scipy.sparse array datastructure #5139

rossbar commented Oct 20, 2021 •

edited by jarrodmillman

Loading

dschult left a comment

dschult left a comment

rossbar commented Jan 14, 2022

jarrodmillman commented Jan 15, 2022

MridulS left a comment

dschult commented Jan 17, 2022

MridulS commented Jan 18, 2022

rossbar commented Jan 18, 2022

MridulS commented Jan 18, 2022

stefanv left a comment

stefanv Jan 18, 2022

stefanv Jan 18, 2022

stefanv Jan 18, 2022

stefanv Jan 18, 2022

stefanv commented Jan 18, 2022

MridulS commented Jan 18, 2022

Use scipy.sparse array datastructure #5139

Use scipy.sparse array datastructure #5139

Conversation

rossbar commented Oct 20, 2021 • edited by jarrodmillman Loading

dschult left a comment

Choose a reason for hiding this comment

dschult left a comment

Choose a reason for hiding this comment

rossbar commented Jan 14, 2022

jarrodmillman commented Jan 15, 2022

MridulS left a comment

Choose a reason for hiding this comment

dschult commented Jan 17, 2022

MridulS commented Jan 18, 2022

rossbar commented Jan 18, 2022

MridulS commented Jan 18, 2022

stefanv left a comment

Choose a reason for hiding this comment

stefanv Jan 18, 2022

Choose a reason for hiding this comment

stefanv Jan 18, 2022

Choose a reason for hiding this comment

stefanv Jan 18, 2022

Choose a reason for hiding this comment

stefanv Jan 18, 2022

Choose a reason for hiding this comment

stefanv commented Jan 18, 2022

MridulS commented Jan 18, 2022

rossbar commented Oct 20, 2021 •

edited by jarrodmillman

Loading