-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask and Zarr not loading obsp and obsm from remote s3 #951
Comments
@will-moore Is the store consolidated? I get a |
If not, I should probably make that more obvious as an essential step in the tutorial. @ivirshup That is actually something I've been meaning to mention - we should be able to resolve as @will-moore does the keys of |
Ah, right - that's coming back to the discussion at kevinyamauchi/ome-ngff-tables-prototype#12 where we preferred consolidated metadata not to be part of the NGFF spec. cc @joshmoore |
I mean, y'all are big users and the feature probably makes sense anyway so we probably should implement it. But the penalty is of course a bit higher since you need to open each |
By "implement it" you mean support for "keys"? This is code I'm adding to ome/ome-zarr-py#256 so all NGFF python users will be able to use it. |
I am not sure @will-moore but "implement it" means using the trick you use in the browser to discover columns, whether that's in this example or in general. At the moment, we iterate over the keys of the incoming |
I seem to have got the loading of I don't know if I'm getting close, or if this is really a lot more complex and we should resign ourselves to using consolidate_metadata if we want this remote access to work? from typing import Callable, Union
import json
import dask.array as da
import zarr
from anndata import AnnData
from anndata._io.specs import IOSpec
from anndata.compat import H5Array, H5Group, ZarrArray, ZarrGroup
# ** requires anndata==0.9.0.rc1
from anndata.experimental import read_dispatched, read_elem
from zarr.storage import FSStore
from ome_zarr.io import parse_url
StorageType = Union[H5Array, H5Group, ZarrArray, ZarrGroup]
# From https://anndata.readthedocs.io/en/latest/tutorials/notebooks/%7Bread,write%7D_dispatched.html
def read_remote_anndata(store: FSStore, name: str) -> AnnData:
table_group = zarr.group(store=store, path=name)
def callback(
func: Callable, elem_name: str, elem: StorageType, iospec: IOSpec
) -> AnnData:
print("el", elem_name, iospec.encoding_type, elem)
if iospec.encoding_type in (
"dataframe",
"csr_matrix",
"csc_matrix",
"awkward-array",
):
# Preventing recursing inside of these types
return read_elem(elem)
elif iospec.encoding_type == "array":
return da.from_zarr(elem)
elif iospec.encoding_type == "dict" and "obsm" in elem_name: # or "obsp" in elem_name:
# load .zattrs
attrs = store.get(elem_name + "/.zattrs")
attrs = json.loads(attrs)
print("attrs", attrs)
if "keys" in attrs:
to_return = {}
for key in attrs["keys"]:
print("URL", elem_name + "/" + key)
try:
arr = da.from_zarr(store, elem_name + "/" + key)
to_return[key] = arr
except:
print("Failed " + elem_name + "/" + key)
pass
return to_return
# not handled above, call func()
rsp = func(elem)
return rsp
adata = read_dispatched(table_group, callback=callback)
return adata
url = "https://minio-dev.openmicroscopy.org/idr/temp_table/test_segment.zarr/tables/"
name = "regions_table"
store = parse_url(url, mode="r").store
anndata_obj = read_remote_anndata(store, name)
print('anndata_obj', anndata_obj) |
This issue has been automatically marked as stale because it has not had recent activity. |
@ivirshup what do you think about this?
|
@flying-sheep I am doing this in the upcoming PR already #947 |
Thanks all for following up on this 👍 |
Should we mark that PR as |
Ah, good call. I'll add it. |
Hi @will-moore as an alternative to #947 you can also try out https://github.com/scverse/anndata/tree/ig/xarray_compat (or #1247) which is a bit lighter weight, but similar with a |
Thanks for the update @ilan-gold - Unfortunately I'm not looking to include this work in |
Hi,
I'm using @ilan-gold's nice sample code at
https://anndata.readthedocs.io/en/latest/tutorials/notebooks/%7Bread,write%7D_dispatched.html
to read remote anndata, which is working great when I'm serving data locally via http.
But when I'm using minio to serve the data, I'm not getting the obsp, obsm or uns parts of the AnnData object. See code sample below.
A UI view of the data is at https://deploy-preview-20--ome-ngff-validator.netlify.app/?source=https://minio-dev.openmicroscopy.org/idr/temp_table/test_segment.zarr/tables/regions_table/
(from ome/ome-ngff-validator#20) where I'm using the extra "keys" in e.g. https://minio-dev.openmicroscopy.org/idr/temp_table/test_segment.zarr/tables/regions_table/obsm/.zattrs to load those groups:
Is there any way I can use those 'keys' to load
obsp
andobsm
data?Thanks!
The text was updated successfully, but these errors were encountered: