Hi all! I wanted to make you aware of a caching extension for scanpy and scvelo that @michalk8 and myself have developed called scachepy and to kick off a discussion about caching in scanpy. From my point of view, there are currently two main ways to cache your results in scanpy, please correct me if I'm wrong:
- write the AnnData object
- manually write the attributes, e.g. adata.X to file, e.g. pickle
The idea of scachepy is to offer the possibility to cache all fields of an AnnData object associated with a certain function call, e.g. sc.pp.pca. It allows you to globally define a caching directory and a backend (default is pickle) that the cached objects will be written to. In the case of PCA, this would amount to calling
import scachepy
c = scachepy.Cache(<directory>)
c.pp.pca(adata)
where c.pp.pca wraps around sc.pp.pca but takes additional caching arguments like force. So in short, our aim with scachepy is to....
- ...have a flexible and easy to use way to cache variables associated with scanpy/scvelo function calls.
- ... speed up individual steps in a scanpy/scvelo analysis by caching them, without having to save the entire AnnData object
- ... be able to share jupyter notebooks with someone else who can run them on a different machine, possibly on a different OS and yet get the exactly the same results because the critical computations are cached.
@michalk8 is the main developer and will be able to tell you much more about it. I would appreciate any input, and would love to discuss caching in scanpy/scvelo.
Hi all! I wanted to make you aware of a caching extension for scanpy and scvelo that @michalk8 and myself have developed called scachepy and to kick off a discussion about caching in scanpy. From my point of view, there are currently two main ways to cache your results in scanpy, please correct me if I'm wrong:
The idea of scachepy is to offer the possibility to cache all fields of an AnnData object associated with a certain function call, e.g.
sc.pp.pca. It allows you to globally define a caching directory and a backend (default is pickle) that the cached objects will be written to. In the case of PCA, this would amount to callingwhere
c.pp.pcawraps aroundsc.pp.pcabut takes additional caching arguments likeforce. So in short, our aim with scachepy is to....@michalk8 is the main developer and will be able to tell you much more about it. I would appreciate any input, and would love to discuss caching in scanpy/scvelo.