Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr links should be relative to root #44

Closed
alejoe91 opened this issue Nov 30, 2022 · 11 comments · Fixed by #46
Closed

Zarr links should be relative to root #44

alejoe91 opened this issue Nov 30, 2022 · 11 comments · Fixed by #46
Assignees
Labels
category: bug errors in the code or code behavior priority: high impacts proper operation or use of feature important to most users

Comments

@alejoe91
Copy link
Collaborator

Hi guys,

I was able to successfully export to NWB-zarr sorting info + waveforms and electrode table (using neuroconv).

I performed the conversion remotely and then downloaded the resulting files. When I try to read the file locally, I get a bad link error:

ValueError                                Traceback (most recent call last)
Cell In [5], line 4
      1 nwbfile_path = "/home/alessio/Documents/data/debug/ecephys_632269_2022-10-13_15-41-42_zarr.nwb"
      3 io = NWBZarrIO(nwbfile_path, "r")
----> 4 nwbfile = io.read()

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf/utils.py:645, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    643 def func_call(*args, **kwargs):
    644     pargs = _check_args(args, kwargs)
--> 645     return func(args[0], **pargs)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf/backends/io.py:38, in HDMFIO.read(self, **kwargs)
     35 @docval(returns='the Container object that was read in', rtype=Container)
     36 def read(self, **kwargs):
     37     """Read a container from the IO source."""
---> 38     f_builder = self.read_builder()
     39     if all(len(v) == 0 for v in f_builder.values()):
     40         # TODO also check that the keys are appropriate. print a better error message
     41         raise UnsupportedOperation('Cannot build data. There are no values.')

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf/utils.py:645, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    643 def func_call(*args, **kwargs):
    644     pargs = _check_args(args, kwargs)
--> 645     return func(args[0], **pargs)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:937, in ZarrIO.read_builder(self)
    935 @docval(returns='a GroupBuilder representing the NWB Dataset', rtype='GroupBuilder')
    936 def read_builder(self):
--> 937     f_builder = self.__read_group(self.__file, ROOT_NAME)
    938     return f_builder

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:973, in ZarrIO.__read_group(self, zarr_obj, name)
    971 # read sub groups
    972 for sub_name, sub_group in zarr_obj.groups():
--> 973     sub_builder = self.__read_group(sub_group, sub_name)
    974     ret.set_group(sub_builder)
    976 # read sub datasets

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:973, in ZarrIO.__read_group(self, zarr_obj, name)
    971 # read sub groups
    972 for sub_name, sub_group in zarr_obj.groups():
--> 973     sub_builder = self.__read_group(sub_group, sub_name)
    974     ret.set_group(sub_builder)
    976 # read sub datasets

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:973, in ZarrIO.__read_group(self, zarr_obj, name)
    971 # read sub groups
    972 for sub_name, sub_group in zarr_obj.groups():
--> 973     sub_builder = self.__read_group(sub_group, sub_name)
    974     ret.set_group(sub_builder)
    976 # read sub datasets

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:982, in ZarrIO.__read_group(self, zarr_obj, name)
    979     ret.set_dataset(sub_builder)
    981 # read the links
--> 982 self.__read_links(zarr_obj=zarr_obj, parent=ret)
    984 self._written_builders.set_written(ret)  # record that the builder has been written
    985 self.__set_built(zarr_obj, ret)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:1008, in ZarrIO.__read_links(self, zarr_obj, parent)
   1006     l_path = os.path.join(link['source'], link['path'].lstrip("/"))
   1007 if not os.path.exists(l_path):
-> 1008     raise ValueError("Found bad link %s in %s to %s" % (link_name, self.__path, l_path))
   1010 target_name = str(os.path.basename(l_path))
   1011 target_zarr_obj = zarr.open(l_path, mode='r')

ValueError: Found bad link device in /home/alessio/Documents/data/debug/ecephys_632269_2022-10-13_15-41-42_zarr.nwb to results/ecephys_632269_2022-10-13_15-41-42_zarr.nwb/general/devices/Device

After debugging, the l_path is indeed the path on my remote machine. I think saving links relative to the zarr root should fix it.

@oruebel oruebel added category: bug errors in the code or code behavior priority: high impacts proper operation or use of feature important to most users labels Dec 1, 2022
@oruebel
Copy link
Contributor

oruebel commented Dec 1, 2022

I believe ultimately the issue is most likely in how paths are being determined for links in:

def write_link(self, **kwargs):

Links are currently stored as part of the reserved attribute zarr_link on the Group object that contains the link. zarr_link is a JSON document with a list of dict, where each dict describes a link via the following keys:

  • path: Path to the Zarr file containing the linked object
  • source : Source path within the Zarr file to the linked object
  • name : Name of the link

Could you post the contents from the JSON zarr_link attribute, or better yet, possibly send me the link to the file or a small example file so that I can reproduce the error and see what is exactly causing the issue.

I think saving links relative to the zarr root should fix it.

I agree, links should be relative to the Zarr root.

@alejoe91
Copy link
Collaborator Author

alejoe91 commented Dec 1, 2022

Thanks @oruebel

Attached is a miniml toy example with a broken link (Device):
test_we_zarr.nwb.zip

@oruebel
Copy link
Contributor

oruebel commented Dec 2, 2022

@mavaylon1 could you take a look at this?

@dyf
Copy link

dyf commented Dec 7, 2022

Excited to see that you are working on this!

Allen Institute for Neural Dynamics is running a small workshop in January and we want to use NWB-Zarr files.

@mavaylon1
Copy link
Contributor

I'll take a look

@oruebel
Copy link
Contributor

oruebel commented Dec 8, 2022

Thanks @mavaylon1 for taking a look. While the issue refers to links, the same problem likely also exists for object references (which are links that are stored in datasets or attributes). All links are constructed via:

def __get_ref(self, ref_object):

I.e., to fix the write of links, this is probably where we need to fix things. The __get_ref method is then used in:

def write_link(self, **kwargs):

def write_dataset(self, **kwargs): # noqa: C901

def write_attributes(self, **kwargs):

To construct the definition of references for storing them as links or in datasets or attributes. I.e., fixing this for write I think may only require some minor changes in __get_ref.

However, changing the logic in __get_ref will likely also require updates on read since we here need to resolve the relative paths to join them with the path of the zarr file, as otherwise, Python would interpret the paths as relative to whatever the current directory is (i.e., from where python was started). This will likely require changes to the following read functions:

def __read_links(self, zarr_obj, parent):

def __parse_ref(self, shape, obj_refs, reg_refs, data):

def __read_attrs(self, zarr_obj):

__parse_refs I believe contains all the logic for parsing references from datasets, i.e.,

def __read_dataset(self, zarr_obj, name):

will likely not need to change, but I just wanted to mention just in case there are changes needed there as well. I think the changes needed for read should be fairly simple, i.e., once the read for links works, the changes should translate to the other functions fairly directly. Let me know in case you run into any issues.

@oruebel
Copy link
Contributor

oruebel commented Dec 8, 2022

Allen Institute for Neural Dynamics is running a small workshop in January and we want to use NWB-Zarr files.

@dyf that's great. Let us know in case there is anything we can do to help.

@dyf
Copy link

dyf commented Dec 8, 2022

Allen Institute for Neural Dynamics is running a small workshop in January and we want to use NWB-Zarr files.

@dyf that's great. Let us know in case there is anything we can do to help.

Will do thanks @oruebel. If we can get useable files by end of the year that would be awesome. We want to show them off in the Jupyter NWB Widgets, so if you want to do some sanity checks there that would also be appreciated.

@oruebel
Copy link
Contributor

oruebel commented Dec 8, 2022

We want to show them off in the Jupyter NWB Widgets, so if you want to do some sanity checks there that would also be appreciated.

Thanks, that is good to know. I'll try to put together an example with NWBWidgets + Zarr next week. I have not tried this before but I think it should work since NWBWidgets uses NWBFile from PyNWB as an input.

@oruebel
Copy link
Contributor

oruebel commented Dec 12, 2022

@mavaylon1 any update on this

@oruebel
Copy link
Contributor

oruebel commented Dec 14, 2022

This PR illustrates the problem #46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: bug errors in the code or code behavior priority: high impacts proper operation or use of feature important to most users
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants