Sparse implementation for attr_matrix #5195

rossbar · 2021-11-20T07:35:43Z

This is thematically similar to the discussion in #5193.

There are two functions in the attrmatrix module: attr_matrix and attr_sparse_matrix. These perform essentially the same function, except one returns a numpy matrix and the other returns a scipy.sparse matrix. IMO it would be an improvement to only have one function for this. What makes the most sense to me would be to use the sparse implementation from attr_sparse_matrix for attr_matrix, so that attr_matrix will return a sparse array instead of a numpy.matrix (we can put a FutureWarning on this for now to notify that it will change). Then the attr_sparse_matrix name can be deprecated. Thoughts?

The text was updated successfully, but these errors were encountered:

dschult · 2021-11-21T02:43:13Z

I think this is related to #4217 .
We should provide a nice interface to move data between the Graph classes and common Scientific Python data structures.

It makes sense to me that sparse arrays be the default data structure (i.e. attr_matrix) created as you suggest. I'd love to have a good API for changing to the many different data structures available. I'm not sure how to manage that, but I think we have numpy.array, scipy.sparse.array (assuming we are using that instead of the matrix object), pandas.DataFrame, xarray.XArray. And then there is the question of whether these should be stored as an adjacency matrix array structure or an edgelist structure.

I guess this issue becomes much bigger when expanded -- perhaps it is better to just make the change you suggest. Afterall, it is an improvement and it doesn't tackle so much that it gets messy. But it'd be nice to figure out how to move data to and from the Graph classes more effectively.

rossbar · 2021-11-22T04:16:54Z

I think this is related to #4217 .
We should provide a nice interface to move data between the Graph classes and common Scientific Python data structures.

Agreed - there's definitely a lot of discussion to be had on this topic. In the attr_matrix case specifically, I think it makes the most sense to use the sparse array by default because then users can use the built-in methods (i.e. toarray()) on the sparse objects to convert to dense arrays (and from there, any other data structure that knows how to build from/interoperate with numpy arrays).

I'll go ahead and submit a PR for the attr_matrix refactor as I've envisioned. As a separate task I think we should also distill the conversation from #4217, this thread (and probably a few other places I'm forgetting at the moment) into a new discussion topic (which may evolve into an NXEP?)

rossbar added type: Enhancements Question A question about NetworkX or network science in general labels Nov 20, 2021

rossbar added this to the networkx-2.7 milestone Nov 20, 2021

rossbar mentioned this issue Feb 3, 2022

Add FutureWarning to attr_matrix to notify users of return type change #5300

Merged

jarrodmillman modified the milestones: networkx-2.7, networkx-3.0 Feb 12, 2022

rossbar modified the milestones: networkx-3.0, networkx-3.1 Nov 29, 2022

jarrodmillman modified the milestones: networkx-3.1, networkx-3.3, networkx-3.2 Apr 4, 2023

jarrodmillman removed this from the 3.2 milestone Jul 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse implementation for attr_matrix #5195

Sparse implementation for attr_matrix #5195

rossbar commented Nov 20, 2021

dschult commented Nov 21, 2021

rossbar commented Nov 22, 2021

Sparse implementation for attr_matrix #5195

Sparse implementation for attr_matrix #5195

Comments

rossbar commented Nov 20, 2021

dschult commented Nov 21, 2021

rossbar commented Nov 22, 2021