-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading parts/fields of adata (h5ad) only #436
Comments
@Hrovatin hi, you can do |
Thanks. This was not clear to me when I read the documentation. |
Re-opening, since I think we can do more with this. Additional cases include:
|
Currently this can be done with import h5py
from anndata._io.specs import read_elem
with h5py.File("adata.h5ad") as f:
cell_types = read_elem(f["obs/celltype"])
umap = read_elem(f["obsm/X_umap"]) I'm considering adding this to the |
In the next release we will export |
Hi, is this implemented yet? I am trying to read only a few columns of the X layer with read_elem but I am not finding the way. Maybe I am doing it wrong but it could be very usefull for very large datasets |
If you have the file f = h5py.File("adata.h5ad") If it's CSC, you can do: ad.experimental.sparse_dataset(f["X"])[:, col_idx] If it's dense you can do: f["X"][:, col_idx] If it's CSR, you're basically going to have to read through the whole thing, but dask will handle that for you if you take the read_sparse_as_dask("adata.h5ad", "X", 10_000)[:, col_idx].compute() |
That worked, thanks! |
It would be nice if one could read only individual fields (obs, var, etc.) from adata stored in h5ad format. This would enable faster reading when only metadata is required.
The text was updated successfully, but these errors were encountered: