Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse implementation for attr_matrix #5195

Open
rossbar opened this issue Nov 20, 2021 · 2 comments
Open

Sparse implementation for attr_matrix #5195

rossbar opened this issue Nov 20, 2021 · 2 comments
Labels
Question A question about NetworkX or network science in general type: Enhancements

Comments

@rossbar
Copy link
Contributor

rossbar commented Nov 20, 2021

This is thematically similar to the discussion in #5193.

There are two functions in the attrmatrix module: attr_matrix and attr_sparse_matrix. These perform essentially the same function, except one returns a numpy matrix and the other returns a scipy.sparse matrix. IMO it would be an improvement to only have one function for this. What makes the most sense to me would be to use the sparse implementation from attr_sparse_matrix for attr_matrix, so that attr_matrix will return a sparse array instead of a numpy.matrix (we can put a FutureWarning on this for now to notify that it will change). Then the attr_sparse_matrix name can be deprecated. Thoughts?

@rossbar rossbar added type: Enhancements Question A question about NetworkX or network science in general labels Nov 20, 2021
@rossbar rossbar added this to the networkx-2.7 milestone Nov 20, 2021
@dschult
Copy link
Member

dschult commented Nov 21, 2021

I think this is related to #4217 .
We should provide a nice interface to move data between the Graph classes and common Scientific Python data structures.

It makes sense to me that sparse arrays be the default data structure (i.e. attr_matrix) created as you suggest. I'd love to have a good API for changing to the many different data structures available. I'm not sure how to manage that, but I think we have numpy.array, scipy.sparse.array (assuming we are using that instead of the matrix object), pandas.DataFrame, xarray.XArray. And then there is the question of whether these should be stored as an adjacency matrix array structure or an edgelist structure.

I guess this issue becomes much bigger when expanded -- perhaps it is better to just make the change you suggest. Afterall, it is an improvement and it doesn't tackle so much that it gets messy. But it'd be nice to figure out how to move data to and from the Graph classes more effectively.

@rossbar
Copy link
Contributor Author

rossbar commented Nov 22, 2021

I think this is related to #4217 .
We should provide a nice interface to move data between the Graph classes and common Scientific Python data structures.

Agreed - there's definitely a lot of discussion to be had on this topic. In the attr_matrix case specifically, I think it makes the most sense to use the sparse array by default because then users can use the built-in methods (i.e. toarray()) on the sparse objects to convert to dense arrays (and from there, any other data structure that knows how to build from/interoperate with numpy arrays).

I'll go ahead and submit a PR for the attr_matrix refactor as I've envisioned. As a separate task I think we should also distill the conversation from #4217, this thread (and probably a few other places I'm forgetting at the moment) into a new discussion topic (which may evolve into an NXEP?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question A question about NetworkX or network science in general type: Enhancements
Development

No branches or pull requests

3 participants