Caching in Scanpy

Hi all! I wanted to make you aware of a caching extension for scanpy and scvelo that @michalk8 and myself have developed called [scachepy](https://github.com/theislab/scachepy) and to kick off a discussion about caching in scanpy. From my point of view, there are currently two main ways to cache your results in scanpy, please correct me if I'm wrong:
- write the AnnData object
- manually write the attributes, e.g. adata.X to file, e.g. pickle

The idea of scachepy is to offer the possibility to cache all fields of an AnnData object associated with a certain function call, e.g. `sc.pp.pca`. It allows you to globally define a caching directory and a backend (default is pickle) that the cached objects will be written to. In the case of PCA, this would amount to calling

```python
import scachepy
c = scachepy.Cache(<directory>) 
c.pp.pca(adata)
```
where `c.pp.pca` wraps around `sc.pp.pca` but takes additional caching arguments like `force`. So in short, our aim with scachepy is to....
- ...have a flexible and easy to use way to cache variables associated with scanpy/scvelo function calls.
- ... speed up individual steps in a scanpy/scvelo analysis by caching them, without having to save the entire AnnData object
- ... be able to share jupyter notebooks with someone else who can run them on a different machine, possibly on a different OS and yet get the exactly the same results because the critical computations are cached.

@michalk8 is the main developer and will be able to tell you much more about it. I would appreciate any input, and would love to discuss caching in scanpy/scvelo. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching in Scanpy #947

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Caching in Scanpy #947

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions