-
-
Notifications
You must be signed in to change notification settings - Fork 135
Description
Please describe the purpose of the new feature or describe the problem to solve.
This is to inquire about adding a sparse version of scipy.spatial.distance.squareform directly into sparse, or perhaps as a COO method.
This is somewhat toward the direction of the scipy.csgraph work, since that's the main time I have ended up needing this functionality. Rather than unrolling/flattening an entire sparse matrix, it's often useful to unroll only the lower/upper triangle of a (square/symmetric) matrix, e.g. when wanting an edge indicator vector from an adjacency matrix, or when scoring covariance shrinkage algorithms, etc.
The issue is that we would not want to instantiate the entire triu_indices in the sparse case, since that defeats the purpose of avoiding instantiating things in memory.
Additionally, we could allow a sparse squareform to operate in a batched/tensor setting, where the lower-triangle gets unrolled for each 2d slice of a tensor.
Suggest a solution if possible.
I think I have a way to implement this, and would be happy to open a pull request if desired. In the prototyping notebook here for my affinis package, I'm using closed form equations to derive the indices for the unrolled/rolled versions from the original. This lets me perform the unrolling with batched dimensions, as well, which is dramatically more space/time-efficient than my previous techniques involving loops.
If you're not interested in adding this functionality to mirror scipy's, I'll probably still make it available in affinis, but I think it might make my API a lot cleaner if this logic was contained within sparse entirely.
If you have tried alternatives, please describe them below.
Previously I've relied on using squareform directly from scipy, and just letting the square matrix become dense.
Additional information that may help us understand your needs.
I'm working toward an update of the affinis core to enable retaining sparsity through the entire graph reconstruction pipeline, for which I will finally migrate away from the scipy.sparse.csr_array object and rely more exclusively on sparse.COO, especially since sparse.einsum is available now and has a number of applications for my case.