Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open_datatree performance improvement on NetCDF and Zarr files #9014

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

aladinor
Copy link

@aladinor aladinor commented May 7, 2024

open_datatree performance improvement on NetCDF files

Copy link

welcome bot commented May 7, 2024

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label May 8, 2024
@TomNicholas TomNicholas added this to In progress in DataTree integration via automation May 8, 2024
@Illviljan Illviljan added the run-benchmark Run the ASV benchmark workflow label May 10, 2024
@aladinor aladinor changed the title open_datatree performance improvement on NetCDF files open_datatree performance improvement on NetCDF and Zarr files May 10, 2024
@@ -416,6 +415,104 @@ class ZarrStore(AbstractWritableDataStore):
"_close_store_on_close",
)

@classmethod
def open_store(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rewrite open_group to call open_store() internally? That would reduce the amount of duplicated code and make this easier to maintain going forward.

Copy link
Contributor

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thoughts about the legacyhdf5 api and how it might be incorporated.

@@ -16,7 +16,6 @@
BackendEntrypoint,
WritableCFDataStore,
_normalize_path,
_open_datatree_netcdf,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR _open_datatree_netcdf was used by both the netCDF4_.py and h5netcdf_.py backends. Would it be possible to move these changes back into the backends/common location and remove completely rewrite the _open_datatree_netcdf function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you move these changes back to backends/common and leave them in _open_datatree_netcdf, You might be able to import the store for both the legacyhdf5 and the netcdf4 libraries. and then call _open_datatree_netcdf for both legacyhdf and netcdf.

currently _open_datatree_netcdftakes ncDataset: ncDataset | ncDatasetLegacyH5,

you might be able to include a new param with type cdfDataStore: NetCDF4DataStore | H5NetCDFStore and pass the appropritate one from both the n5netcdf_.py and netCDF4_.py backends.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmark Run the ASV benchmark workflow topic-DataTree Related to the implementation of a DataTree class
Projects
Development

Successfully merging this pull request may close these issues.

Improving performance of open_datatree
5 participants