Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Zarr support #627

Merged
merged 74 commits into from
Mar 16, 2022
Merged

Initial Zarr support #627

merged 74 commits into from
Mar 16, 2022

Conversation

normanrz
Copy link
Member

@normanrz normanrz commented Mar 3, 2022

Description:

  • Adds the StorageArray abstraction for WKW and Zarr
  • Changes block_len to chunk_size and file_len to chunks_per_shard, but with backwards-compatible support
  • Uses pathlib in more locations instead of string paths. This can be seen as a preparation for using upath later on.
  • Adds tests for Zarr datasets
  • Renames wkwResolutions to mags in datasource-properties.json for Zarr datasets. cc @fm3

Issues:

Todos:

Make sure to delete unnecessary points or to check all before merging:

  • Updated Changelog

@normanrz
Copy link
Member Author

I reviewed everything except for the (huge) test file. Wouldn't it make sense to generalize the tests so that most of them simply run twice (once for wkw and once for zarr). I think, most of the tests don't deal with the implementation specifics. If some do, they could stay separate. What do you think?

I reworked test_dataset.py and dropped test_zarr_dataset.py, again.

@normanrz
Copy link
Member Author

I think the remaining points from your review are:

  • del mag1._array in test_dataset.py
  • data_format or array_format consistency
  • resolutions, magnifications/mags, multiscales

Copy link
Member

@philippotto philippotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👍 I think, more of the tests could be generalized to work on wkw and zarr, but I understand that it can be quite a hassle to convert this and the returns are probably diminishing. So, I don't have a hard opinion on that.

webknossos/tests/test_dataset.py Outdated Show resolved Hide resolved
webknossos/tests/test_dataset.py Outdated Show resolved Hide resolved
webknossos/tests/test_dataset.py Outdated Show resolved Hide resolved
webknossos/tests/test_dataset.py Outdated Show resolved Hide resolved
webknossos/tests/test_dataset.py Outdated Show resolved Hide resolved
webknossos/tests/test_dataset.py Outdated Show resolved Hide resolved
@philippotto
Copy link
Member

  • del mag1._array in test_dataset.py

I replied to the corresponding comment.

  • data_format or array_format consistency

Is this something you need an opinion on? I don't really have a tendency here 🤷

  • resolutions, magnifications/mags, multiscales

I'd vote for mags, as it's mainly used in newer code, concise and a bit clearer than the others (e.g., dataset resolution could also mean the physical size of a voxel (which we usually but not always refer to as scale)). I see, that it's not necessarily the most common term in the industry, but there will always be naming differences between different organizations/projects (see output ~ artifacts ~ debris). Being consistent within our own ecosystem is what matters most in my opinion.

Copy link
Contributor

@jstriebel jstriebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work 🎉
I added a couple of comments, but overall this looks great! On the open questions:

  • data_format or array_format consistency

Is this something you need an opinion on? I don't really have a tendency here 🤷

Also no strong opinion, but tending a bit towards array_format, since array is more specific than data. (Also fit's the …Array class renaming suggestion in the comments below)

  • resolutions, magnifications/mags, multiscales

I'd vote for mags, as it's mainly used in newer code, concise and a bit clearer than the others (e.g., dataset resolution could also mean the physical size of a voxel (which we usually but not always refer to as scale)). I see, that it's not necessarily the most common term in the industry, but there will always be naming differences between different organizations/projects (see output ~ artifacts ~ debris). Being consistent within our own ecosystem is what matters most in my opinion.

Agreeing with @philippotto, but also no strong opinion here.

webknossos/script_collection/globalize_floodfill.py Outdated Show resolved Hide resolved
webknossos/webknossos/annotation/annotation.py Outdated Show resolved Hide resolved
webknossos/webknossos/dataset/storage.py Outdated Show resolved Hide resolved
webknossos/webknossos/dataset/storage.py Outdated Show resolved Hide resolved
webknossos/webknossos/geometry/vec3_int.py Show resolved Hide resolved
Copy link
Contributor

@jstriebel jstriebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is good to go 🎉, just the respective Changelog entries are still needed. Leaving the final go to @philippotto 🏁

Copy link
Member

@philippotto philippotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent 💯 Good to go from my side, too.

@bulldozer-boy bulldozer-boy bot merged commit 04938de into master Mar 16, 2022
@bulldozer-boy bulldozer-boy bot deleted the more-pathlib branch March 16, 2022 20:42
@normanrz normanrz changed the title ZarrStorageArray and refactorings Initial Zarr support Mar 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Zarr backend for Dataset
4 participants