Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Writing NWB with experimenter (or any ArrayLike[str]) fails #197

Open
3 tasks done
bjhardcastle opened this issue May 23, 2024 · 4 comments
Open
3 tasks done
Labels
category: bug errors in the code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s)

Comments

@bjhardcastle
Copy link

bjhardcastle commented May 23, 2024

What happened?

Zarr NWB fails to write to disk, based on the format of experimenter. The same data writes to hdf5 NWB without issue.

Steps to Reproduce

import pynwb
import datetime
import hdmf_zarr

nwb = pynwb.NWBFile(
    session_id='test',
    session_description='test',
    identifier='12345',
    session_start_time=(
        datetime.datetime.now()
    ),
    experimenter='First Last',    # putting the string in a list or tuple makes no difference
    epoch_tags={'tag1', 'tag2'},
)
print(f'in-memory: {nwb.experimenter = }')
# output: in-memory: nwb.experimenter = ('First Last',)    # str converted to tuple[str, ...]

path = 'test.nwb.zarr'
with hdmf_zarr.NWBZarrIO(path, 'w') as io:
    io.write(nwb)

Traceback

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 20
     17 path = 'test.nwb.zarr'
     19 with hdmf_zarr.NWBZarrIO(path, 'w') as io:
---> 20     io.write(nwb)
     22 with hdmf_zarr.NWBZarrIO(path, 'r') as io:
     23     print(f'on disk: {io.read().experimenter}')

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\hdmf\utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\hdmf_zarr\backend.py:274, in ZarrIO.write(self, **kwargs)
    264 cache_spec, number_of_jobs, max_threads_per_process, multiprocessing_context = popargs(
    265     "cache_spec", "number_of_jobs", "max_threads_per_process", "multiprocessing_context", kwargs
    266 )
    268 self.__dci_queue = ZarrIODataChunkIteratorQueue(
    269     number_of_jobs=number_of_jobs,
    270     max_threads_per_process=max_threads_per_process,
    271     multiprocessing_context=multiprocessing_context,
    272 )
--> 274 super(ZarrIO, self).write(**kwargs)
    275 if cache_spec:
    276     self.__cache_spec()

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\hdmf\utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\hdmf\backends\io.py:99, in HDMFIO.write(self, **kwargs)
     97 """Write a container to the IO source."""
     98 f_builder = self.__manager.build(container, source=self.__source, root=True)
---> 99 self.write_builder(f_builder, **kwargs)

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\hdmf\utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\hdmf_zarr\backend.py:437, in ZarrIO.write_builder(self, **kwargs)
    433 f_builder, link_data, exhaust_dci, export_source, consolidate_metadata = getargs(
    434     'builder', 'link_data', 'exhaust_dci', 'export_source', 'consolidate_metadata', kwargs
    435 )
    436 for name, gbldr in f_builder.groups.items():
--> 437     self.write_group(
    438         parent=self.__file,
    439         builder=gbldr,
    440         link_data=link_data,
    441         exhaust_dci=exhaust_dci,
    442         export_source=export_source,
    443     )
    444 for name, dbldr in f_builder.datasets.items():
    445     self.write_dataset(
    446         parent=self.__file,
    447         builder=dbldr,
   (...)
    450         export_source=export_source,
    451     )

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\hdmf\utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\hdmf_zarr\backend.py:534, in ZarrIO.write_group(self, **kwargs)
    532 if datasets:
    533     for dset_name, sub_builder in datasets.items():
--> 534         self.write_dataset(
    535             parent=group,
    536             builder=sub_builder,
    537             link_data=link_data,
    538             exhaust_dci=exhaust_dci,
    539             export_source=export_source,
    540         )
    542 # write all links (haven implemented)
    543 links = builder.links

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\hdmf\utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\hdmf_zarr\backend.py:1111, in ZarrIO.write_dataset(self, **kwargs)
   1109     self.__dci_queue.append(dataset=dset, data=data)
   1110 elif hasattr(data, '__len__'):
-> 1111     dset = self.__list_fill__(parent, name, data, options)
   1112 else:
   1113     dset = self.__scalar_fill__(parent, name, data, options)

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\hdmf_zarr\backend.py:1290, in ZarrIO.__list_fill__(self, parent, name, data, options)
   1287 # standard write
   1288 else:
   1289     try:
-> 1290         dset[:] = data  # If data is an h5py.Dataset then this will copy the data
   1291     # For compound data types containing strings Zarr sometimes does not like writing multiple values
   1292     # try to write them one-at-a-time instead then
   1293     except ValueError:

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\zarr\core.py:1452, in Array.__setitem__(self, selection, value)
   1450     self.set_orthogonal_selection(pure_selection, value, fields=fields)
   1451 else:
-> 1452     self.set_basic_selection(pure_selection, value, fields=fields)

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\zarr\core.py:1548, in Array.set_basic_selection(self, selection, value, fields)
   1546     return self._set_basic_selection_zd(selection, value, fields=fields)
   1547 else:
-> 1548     return self._set_basic_selection_nd(selection, value, fields=fields)

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\zarr\core.py:1938, in Array._set_basic_selection_nd(self, selection, value, fields)
   1932 def _set_basic_selection_nd(self, selection, value, fields=None):
   1933     # implementation of __setitem__ for array with at least one dimension
   1934 
   1935     # setup indexer
   1936     indexer = BasicIndexer(selection, self)
-> 1938     self._set_selection(indexer, value, fields=fields)

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\zarr\core.py:1991, in Array._set_selection(self, indexer, value, fields)
   1988                 chunk_value = chunk_value[item]
   1990         # put data
-> 1991         self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields)
   1992 else:
   1993     lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\zarr\core.py:2259, in Array._chunk_setitem(self, chunk_coords, chunk_selection, value, fields)
   2256     lock = self._synchronizer[ckey]
   2258 with lock:
-> 2259     self._chunk_setitem_nosync(chunk_coords, chunk_selection, value, fields=fields)

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\zarr\core.py:2269, in Array._chunk_setitem_nosync(self, chunk_coords, chunk_selection, value, fields)
   2267     self._chunk_delitem(ckey)
   2268 else:
-> 2269     self.chunk_store[ckey] = self._encode_chunk(cdata)

File c:\Users\ben.hardcastle\github\npc_sessions\.venv\Lib\site-packages\zarr\core.py:2385, in Array._encode_chunk(self, chunk)
   2383 if self._filters:
   2384     for f in self._filters:
-> 2385         chunk = f.encode(chunk)
   2387 # check object encoding
   2388 if ensure_ndarray_like(chunk).dtype == object:

File numcodecs\\vlen.pyx:103, in numcodecs.vlen.VLenUTF8.encode()

TypeError: expected unicode string, found ['First Last']

Operating System

Windows

Python Executable

Python

Python Version

3.11

Package Versions

environment_for_issue.txt

Code of Conduct

@bjhardcastle
Copy link
Author

bjhardcastle commented May 23, 2024

The problem seems to stem from updating to zarr==2.18.1.

Rolling back to zarr==2.17.2 fixes this.

Edit: @mavaylon1 looks like you also encountered this yesterday!
Your issue is now closed - what was the resolution?

bjhardcastle added a commit to AllenInstitute/npc_sessions that referenced this issue May 24, 2024
- 2.18.1 introduces bug when writing list of str in `NWBFile.experimenter`
- detailed here: hdmf-dev/hdmf-zarr#197
@mavaylon1
Copy link
Contributor

@bjhardcastle I'm not sure if your issue is with hdmf-zarr or with something external. As for the issue ticket that I closed, it seems zarr 2.18.1 is more strict on how you set data. We used to be able to directly assign a list to the zarr array, but now that list needs to be in a np.array.

@mavaylon1
Copy link
Contributor

In your example, I don't think that is the case. I will take a look next week. Could you give me the output of your pip list in the environment you are using.

@bjhardcastle
Copy link
Author

Hi @mavaylon1, there's a txt file attached above with the output from pip freeze.

The input to experimenter in my example can also be a list of str and the result is the same. I'm pretty sure this is the same behavior you reported in your issue, and hdmf-zarr just isn't compatible with zarr 2.18.1

@bjhardcastle bjhardcastle changed the title [Bug]: Writing NWB with experimenter fails [Bug]: Writing NWB with experimenter (or any ArrayLike[str]) fails May 29, 2024
@mavaylon1 mavaylon1 added category: bug errors in the code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s) labels Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: bug errors in the code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s)
Projects
None yet
Development

No branches or pull requests

2 participants