Skip to content

Caching in Scanpy #947

@Marius1311

Description

@Marius1311

Hi all! I wanted to make you aware of a caching extension for scanpy and scvelo that @michalk8 and myself have developed called scachepy and to kick off a discussion about caching in scanpy. From my point of view, there are currently two main ways to cache your results in scanpy, please correct me if I'm wrong:

  • write the AnnData object
  • manually write the attributes, e.g. adata.X to file, e.g. pickle

The idea of scachepy is to offer the possibility to cache all fields of an AnnData object associated with a certain function call, e.g. sc.pp.pca. It allows you to globally define a caching directory and a backend (default is pickle) that the cached objects will be written to. In the case of PCA, this would amount to calling

import scachepy
c = scachepy.Cache(<directory>) 
c.pp.pca(adata)

where c.pp.pca wraps around sc.pp.pca but takes additional caching arguments like force. So in short, our aim with scachepy is to....

  • ...have a flexible and easy to use way to cache variables associated with scanpy/scvelo function calls.
  • ... speed up individual steps in a scanpy/scvelo analysis by caching them, without having to save the entire AnnData object
  • ... be able to share jupyter notebooks with someone else who can run them on a different machine, possibly on a different OS and yet get the exactly the same results because the critical computations are cached.

@michalk8 is the main developer and will be able to tell you much more about it. I would appreciate any input, and would love to discuss caching in scanpy/scvelo.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions