You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
anndata should get a split apply combine framework. This is something we’ve wanted for a while, but this issue is just establishing a concrete place to track it.
Questions
API
There are a number of split-apply-combine frameworks/ API's out there. Which do we want to emulate, and what are the most important features?
Access to intermediate elements (e.g. for k, sub_adata in adata.groupby(obs="celltype"))
Efficient reductions and combination
Nested groupby, group over multiple dimensions
Lagged/ windowed grouping (e.g. groupby neighbors of a cell, moving average over time)
What is the correct default return type of a reduction? Can we be flexible about this?
AnnData would be nice for consistency, but do we want to keep all that metadata?
Also, how would we handle reductions over non-X data? E.g. mean point in PCA space for a cluster
DataFrames would let us keep labels, but should they be used as "labelled arrays"?
xarray.DataArrays make sense for just labelled arrays
Reductions and sparse data
How can we make reductions efficient for sparse matrices? One of the most annoying problems when writing code for scanpy and anndata is getting operations to work on both dense and sparse arrays, since sparse arrays have a different API. If reductions and combinatons are included here, then we'll have to be sure sparse arrays work, which could be a lot of extra code. My personal preference for addressing this is to get better support for sparse data with an array API in upstream libraries like pydata/sparse.
We’ve found this approach to be a useful starting point at Cellarity, hope it might also benefit the wider community. Lots of ways to generalize, happy to discuss! #564
anndata
should get a split apply combine framework. This is something we’ve wanted for a while, but this issue is just establishing a concrete place to track it.Questions
API
There are a number of split-apply-combine frameworks/ API's out there. Which do we want to emulate, and what are the most important features?
for k, sub_adata in adata.groupby(obs="celltype")
)What is the correct default return type of a reduction? Can we be flexible about this?
AnnData
would be nice for consistency, but do we want to keep all that metadata?X
data? E.g. mean point in PCA space for a clusterxarray.DataArrays
make sense for just labelled arraysReductions and sparse data
How can we make reductions efficient for sparse matrices? One of the most annoying problems when writing code for scanpy and anndata is getting operations to work on both dense and sparse arrays, since sparse arrays have a different API. If reductions and combinatons are included here, then we'll have to be sure sparse arrays work, which could be a lot of extra code. My personal preference for addressing this is to get better support for sparse data with an array API in upstream libraries like pydata/sparse.
Current and previous attempts/ discussion
Ping @jbloom22, who has also done some work on this.
Other implementations
The text was updated successfully, but these errors were encountered: