Reading parts/fields of adata (h5ad) only #436

Hrovatin · 2020-10-02T07:31:43Z

It would be nice if one could read only individual fields (obs, var, etc.) from adata stored in h5ad format. This would enable faster reading when only metadata is required.

Koncopd · 2020-10-02T10:29:47Z

@Hrovatin hi, you can do
read('file.h5ad', backed='r')
This will have metadata in memory and .X as a backed dataset on the disk.

Hrovatin · 2020-10-02T11:49:38Z

Thanks. This was not clear to me when I read the documentation.

ivirshup · 2020-11-23T07:32:53Z

Re-opening, since I think we can do more with this. Additional cases include:

Reading a single array from obsm
Reading a single column from obs
Reading all entries, but only for a subset of observations

ivirshup · 2022-01-11T14:24:20Z

Currently this can be done with read_elem, write_elem from anndata._io.specs, if the user passes the underlying store. E.g.:

import h5py
from anndata._io.specs import read_elem

with h5py.File("adata.h5ad") as f:
    cell_types = read_elem(f["obs/celltype"])
    umap = read_elem(f["obsm/X_umap"])

I'm considering adding this to the experimental module for the next release.

ivirshup · 2022-01-18T16:28:19Z

In the next release we will export read_elem and write_elem from anndata.experimental

lamasJose · 2024-04-11T10:29:08Z

Re-opening, since I think we can do more with this. Additional cases include:
* Reading a single array from `obsm`

* Reading a single column from `obs`

* Reading all entries, but only for a subset of observations

Hi, is this implemented yet? I am trying to read only a few columns of the X layer with read_elem but I am not finding the way. Maybe I am doing it wrong but it could be very usefull for very large datasets

ivirshup · 2024-04-11T11:15:32Z

If you have the file

f = h5py.File("adata.h5ad")

If it's CSC, you can do:

ad.experimental.sparse_dataset(f["X"])[:, col_idx]

If it's dense you can do:

f["X"][:, col_idx]

If it's CSR, you're basically going to have to read through the whole thing, but dask will handle that for you if you take the read_sparse_as_dask function from this tutorial (https://scanpy.readthedocs.io/en/stable/tutorials/experimental/dask.html) and then do:

read_sparse_as_dask("adata.h5ad", "X", 10_000)[:, col_idx].compute()

lamasJose · 2024-04-11T13:03:30Z

That worked, thanks!

Hrovatin added the enhancement label Oct 2, 2020

Hrovatin changed the title ~~Request: Reading parts/fields of adata (h5ad) only~~ Reading parts/fields of adata (h5ad) only Oct 2, 2020

Hrovatin closed this as completed Oct 2, 2020

ivirshup reopened this Nov 23, 2020

ivirshup added this to the 0.8 milestone Nov 23, 2020

ivirshup self-assigned this Jan 13, 2022

ivirshup closed this as completed Jan 18, 2022

mtvector mentioned this issue Jun 7, 2024

Reading Anndata from only parts of h5ad file: Hack solution #1517

Open

gtca mentioned this issue Jul 2, 2024

Selective read/write mudata modalities scverse/mudata#63

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading parts/fields of adata (h5ad) only #436

Reading parts/fields of adata (h5ad) only #436

Hrovatin commented Oct 2, 2020

Koncopd commented Oct 2, 2020

Hrovatin commented Oct 2, 2020 •

edited

Loading

ivirshup commented Nov 23, 2020

ivirshup commented Jan 11, 2022

ivirshup commented Jan 18, 2022

lamasJose commented Apr 11, 2024

ivirshup commented Apr 11, 2024

lamasJose commented Apr 11, 2024

Reading parts/fields of adata (h5ad) only #436

Reading parts/fields of adata (h5ad) only #436

Comments

Hrovatin commented Oct 2, 2020

Koncopd commented Oct 2, 2020

Hrovatin commented Oct 2, 2020 • edited Loading

ivirshup commented Nov 23, 2020

ivirshup commented Jan 11, 2022

ivirshup commented Jan 18, 2022

lamasJose commented Apr 11, 2024

ivirshup commented Apr 11, 2024

lamasJose commented Apr 11, 2024

Hrovatin commented Oct 2, 2020 •

edited

Loading