-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Add support for overriding backend configuration in HDF5 datasets #1170
Comments
To copy datasets on export the hdmf/src/hdmf/backends/hdf5/h5tools.py Lines 1179 to 1191 in 49a60df
which does not support changing of chunking, compression etc.. Converting to A possible option may be to modify
|
@pauladkisson would you want to take a stab at making a PR for this? |
@oruebel, thanks for the detailed explanation! I figured the
Yeah, I can give it a go. |
What would you like to see added to HDMF?
I am working on a new helper feature for neuroconv, in which users can repack an NWB file with new backend configurations (catalystneuro/neuroconv#1003). However when I try to export the NWB file with the new backend configurations, I get a user warning and the new backend configuration is ignored.
/opt/anaconda3/envs/neuroconv_tdtfp_env/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: chunks in H5DataIO will be ignored with H5DataIO.data being an HDF5 dataset
What solution would you like?
I was able to solve this problem by simply converting the HDF5 dataset to a numpy array like so:
# hdmf.container.Container def set_data_io(self, dataset_name: str, data_io_class: Type[DataIO], data_io_kwargs: dict = None, **kwargs): """ Apply DataIO object to a dataset field of the Container. Parameters ---------- dataset_name: str Name of dataset to wrap in DataIO data_io_class: Type[DataIO] Class to use for DataIO, e.g. H5DataIO or ZarrDataIO data_io_kwargs: dict keyword arguments passed to the constructor of the DataIO class. **kwargs: DEPRECATED. Use data_io_kwargs instead. kwargs are passed to the constructor of the DataIO class. """ if kwargs or (data_io_kwargs is None): warn( "Use of **kwargs in Container.set_data_io() is deprecated. Please pass the DataIO kwargs as a " "dictionary to the `data_io_kwargs` parameter instead.", DeprecationWarning, stacklevel=2 ) data_io_kwargs = kwargs data = self.fields.get(dataset_name) + data = np.array(data) if data is None: raise ValueError(f"{dataset_name} is None and cannot be wrapped in a DataIO class") self.fields[dataset_name] = data_io_class(data=data, **data_io_kwargs)
I would appreciate some kind of alternative set_data_io() function that supports overwriting HDF5 data sets in this manner (or something similar).
Do you have any interest in helping implement the feature?
Yes.
The text was updated successfully, but these errors were encountered: