Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-2 revision 1 #250

Merged
merged 7 commits into from
Jul 11, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 73 additions & 93 deletions rfc/2/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,17 +54,18 @@ Adopting Zarr v3 in OME-Zarr is a precondition for using sharding.

Library support for Zarr v3 is already available for several languages:

- [zarr-python (Python)](https://github.com/zarr-developers/zarr-python)
- [tensorstore (C++/Python)](https://github.com/google/tensorstore)
- [zarrita (Python)](https://github.com/scalableminds/zarrita)
- [zarr-java (Java)](https://github.com/zarr-developers/zarr-java)
- [zarrita.js (JS)](https://github.com/manzt/zarrita.js)
- [zarr3-rs (Rust)](https://github.com/clbarnes/zarr3-rs)

Visualization tools with integrated Zarr implementations are also available:
Visualization tools with integrated Zarr v3 implementations are also available:

- [neuroglancer](https://github.com/google/neuroglancer)
- [WEBKNOSSOS](https://github.com/scalableminds/webknossos)

Support for other languages is under active development, including C, Java and Python.
Support for other languages is under active development.

Libraries will likely prioritize support for v3 over previous versions in the near future.
OME-Zarr should therefore adopt the new version for future-proofing.
Expand All @@ -87,26 +88,7 @@ Implementations can read inner chunks individually.
Depending on the choice of codecs and the underlying storage backends, it may be possible to write inner chunks individually.
However, in the general case, writing is limited to entire shards.

## Proposal

This RFC proposes to adopt version 3 of the Zarr format for OME-Zarr.
Images that use the new version of OME-Zarr metadata MUST NOT use Zarr version 2 any more.

The motivation for making this hard cut is to reduce the burden of complexity for implementations.
Currently, many Zarr library implementations support both versions.
However, in the future they might deprecate support for version 2 or deprioritize it in terms of features and performance.
Additionally, there are OME-Zarr implementations that have their own integrated Zarr stack.
With this hard cut, implementations that only support OME-Zarr versions > 0.5 (TODO: update assigned version number) will not need to implement Zarr version 2 as well.

From a OME-Zarr user perspective, the hard cut also makes things simpler: ≤ 0.5 => Zarr version 2 and > 0.5 => Zarr version 3 (TODO: update assigned version number).
If users wish to upgrade their data from one OME-Zarr version to another, it would be easy to also migrate the core Zarr metadata to version 3.
This is a fairly cheap operation, because only json files are touched.
Zarr version 2 and 3 metadata could even live side-by-side in the same hierarchy.
There are [scripts available](https://github.com/scalableminds/zarrita/blob/8155761/zarrita/array_v2.py#L452-L559) that can migrate the metadata automatically.

It is RECOMMENDED that implementations support a range of OME-Zarr versions, including versions based on Zarr version 2.

### Notable changes in Zarr v3
### Other notable changes in Zarr v3

There are a few notable changes that Zarr v3 brings for OME-Zarr:

Expand All @@ -126,15 +108,40 @@ There are a few notable changes that Zarr v3 brings for OME-Zarr:
The Zarr specification does not prescribe the support stores for Zarr hierarchies.
HTTP(S), File system, S3, GCS, and Zip files are commonly used stores.

## Proposal

This RFC proposes to adopt version 3 of the Zarr format for OME-Zarr.
Images that use the new version of OME-Zarr metadata MUST NOT use Zarr version 2 any more.

With this proposal all features of the Zarr specification are allowed in OME-Zarr.
In the future, the OME-Zarr community MAY decide to restrict the allowed feature set.

The motivation for making this hard cut is to reduce the burden of complexity for implementations.
Currently, many Zarr library implementations support both versions.
However, in the future they might deprecate support for version 2 or deprioritize it in terms of features and performance.
Additionally, there are OME-Zarr implementations that have their own integrated Zarr stack.
With this hard cut, implementations that only support OME-Zarr versions ≥ 0.5 will not need to implement Zarr version 2 as well.

From a OME-Zarr user perspective, the hard cut also makes things simpler: < 0.5 => Zarr version 2 and ≥ 0.5 => Zarr version 3.
If users wish to upgrade their data from one OME-Zarr version to another, it migration tools will be available ([prototype here](https://github.com/scalableminds/zarrita/blob/8155761/zarrita/array_v2.py#L452-L559)).
Migration is a fairly computationally cheap operation, because only json files are touched.

Due to the existence of large quantities of images in OME-Zarr 0.4, it is RECOMMENDED that implementations continue to support OME-Zarr 0.4 with the underlying Zarr v2.

OME-Zarr images MUST be consistent in their OME-Zarr and Zarr version.
With this constraint, implementations only need to detect the version of a provided URL or file path once and can assume that all multiscale levels, wells, series images etc. use the same version.

While technically possible, OME-Zarr 0.5 (with Zarr v3) and OME-Zarr 0.4 (with Zarr v2) metadata could exist side-by-side in a Zarr hierarchy, it is NOT RECOMMENDED.
This may be useful for short periods of time (i.e. during migrations from 0.4 to 0.5), but should not be used longer term.
Multiple metadata versions can lead to conflicts, which may be hard to resolve by implementations.
If implementations encounter 0.4 and 0.5 metadata side-by-side, 0.5 SHOULD be treated preferentially.

### Changes to the OME-Zarr metadata

While the adoption of Zarr v3 does not strictly require changes to the OME-Zarr metadata, this proposal contains changes to align with community conventions and ease implementation:
While the adoption of Zarr v3 does not strictly require changes to the OME-Zarr metadata, this proposal contains changes to align with [community conventions](https://zarr.dev/zeps/draft/ZEP0004.html#namespacing) and ease implementation:

- OME-Zarr metadata will be stored under a dedicated key in the Zarr array or group attributes. The key will be a well-known URI of the OME-NGFF specification with a version number, e.g. `https://ngff.openmicroscopy.org/0.6`.
- Since the version is already encoded in the new metadata key, the `version` keys in `multiscale`, `plate`, `well` etc. are removed.
- OME-Zarr metadata will be stored under a dedicated `ome` key in the Zarr array or group attributes.
- The version information will be moved from the `multiscale`, `plate`, `well` etc. sections into the new `ome` section.
- The `dimension_names` attribute in the Zarr metadata must match the axes names in the OME-Zarr metadata.

Finally, this proposal changes the title of the OME-Zarr specification document to "OME-Zarr specification".
Expand All @@ -145,7 +152,7 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in [IETF RFC 2119][IETF RFC 2119]

## Stakeholders (Recommended Header)
## Stakeholders

Preliminary work of this RFC has been discussed in:

Expand All @@ -156,35 +163,22 @@ Preliminary work of this RFC has been discussed in:
- several Zarr community calls
- several recent OME-NGFF community calls.

<!--
Who has a stake in whether this RFC is accepted?

- Facilitator: The person appointed to shepherd this RFC through the RFC
process.
- Reviewers: List people whose vote (+1 or -1) will be taken into consideration
by the editor when deciding whether this RFC is accepted or rejected. Where
applicable, also list the area they are expected to focus on. In some cases
this section may be initially left blank and stakeholder discovery completed
after an initial round of socialization. Care should be taken to keep the
number of reviewers manageable, although the exact number will depend on the
scope of the RFC in question.
- Consulted: List people who should review the RFC, but whose approval is not
required.
- Socialization: This section may be used to describe how the design was
socialized before advancing to the "Iterate" stage of the RFC process. For
example: "This RFC was discussed at a working group meetings from 20xx-20yy"
-->

## Implementation

OME-Zarr implementations can rely on existing Zarr libraries to implement the adoption of Zarr v3.
OME-Zarr implementations can rely on existing Zarr libraries to implement the adoption of Zarr v3.
See [Background](#background) for a list of v3-capable Zarr libraries.

TODO: Provide a reference implementation
Support for the OME-Zarr 0.5 metadata is under development in [ome-zarr-py](https://github.com/ome/ome-zarr-py/pull/383/files) and other implementations.

## Drawbacks, risks, alternatives, and unknowns

While it is clear that Zarr v3 will become the predominant version of the specification moving forward, current library support for v3 is still under active development.

An alternative to this proposal would be to [add Zarr v3 support to OME-Zarr 0.4](https://github.com/ome/ngff/pull/249) without changes to the OME-Zarr Metadata.
The contents of the `.zattrs` would simply move to the `attributes` within the `zarr.json`.
There would need to be some transparency for users to know what Zarr versions are supported by an implementation.
Additionally, there would be no opportunity to introduce an `ome` namespace in the attributes that is useful for composability.

<!--
- What are the costs of implementing this proposal?
- What known risks exist? What factors may complicate your project? Include:
Expand Down Expand Up @@ -214,22 +208,24 @@ In particular, the chunk sizes can be made small to facilitate interactive visua

## Backwards Compatibility

The metadata of Zarr v3 arrays are not backwards compatible with Zarr v2.
The metadata of Zarr v3 arrays is not backwards compatible with that of Zarr v2 arrays.

It is RECOMMENDED that implementations of OME-Zarr specify the version of the OME-Zarr specification that they support.
Implementations of OME-Zarr MUST specify the version(s) of the OME-Zarr specification that they support.

It is RECOMMENDED that implementations of OME-Zarr that support both v2 and v3-based OME-Zarr versions auto-detect the underlying Zarr version.

While the metadata of Zarr v3 is not backwards compatible, the chunk data is largely backwards compatible, only depending on compressor configuration.
[There are scripts available](https://github.com/scalableminds/zarrita/blob/8155761/zarrita/array_v2.py#L452-L559) to migrate Zarr v2 metadata to Zarr v3.
This is generally a light-weight operation.
Zarr v3 and v2 metadata can exist side-by-side within a Zarr hierarchy.

## Abandoned Ideas

Previous versions of this proposal contained changes to referencing `labels` in the OME-Zarr metadata.
This has been delayed to future RFCs.

Previous versions have used a versioned namespace, e.g. `https://ngff.openmicroscopy.org/0.5`, in the Zarr attributes instead of a simple `ome` namespace with dedicated `version` attribute.
This has been abandoned because it makes discovery of versions more difficult.
Additionally, handling of multiple versions may be ill-defined.

## Examples

File hierarchy of one multi-scale OME-Zarr image `456.zarr`:
Expand Down Expand Up @@ -260,46 +256,38 @@ File hierarchy of one multi-scale OME-Zarr image `456.zarr`:
"zarr_format": 3,
"node_type": "group",
"attributes": {
"https://ngff.openmicroscopy.org/0.6": {
"ome": {
"version": "0.5",
"multiscales": [
{
"coordinateSystems": [
"axes": [
{
"name": "root",
"axes": [
{
"name": "c",
"type": "channel",
"discrete": true
},
{
"name": "x",
"type": "space",
"unit": "nanometer"
},
{
"name": "y",
"type": "space",
"unit": "nanometer"
},
{
"name": "z",
"type": "space",
"unit": "nanometer"
}
]
"name": "c",
"type": "channel"
},
{
"name": "x",
"type": "space",
"unit": "nanometer"
},
{
"name": "y",
"type": "space",
"unit": "nanometer"
},
{
"name": "z",
"type": "space",
"unit": "nanometer"
}
],

"datasets": [
{
"path": "1",
"coordinateTransformations": [
{
"type": "scale",
"scale": [1.0, 11.24, 11.24, 28.0],
"input": "/1",
"output": "root"
"scale": [1.0, 11.24, 11.24, 28.0]
}
]
},
Expand All @@ -308,9 +296,7 @@ File hierarchy of one multi-scale OME-Zarr image `456.zarr`:
"coordinateTransformations": [
{
"type": "scale",
"scale": [1.0, 22.48, 22.48, 28.0],
"input": "/2-2-1",
"output": "root"
"scale": [1.0, 22.48, 22.48, 28.0]
}
]
},
Expand All @@ -319,9 +305,7 @@ File hierarchy of one multi-scale OME-Zarr image `456.zarr`:
"coordinateTransformations": [
{
"type": "scale",
"scale": [1.0, 44.96, 44.96, 28.0],
"input": "/4-4-1",
"output": "root"
"scale": [1.0, 44.96, 44.96, 28.0]
}
]
},
Expand All @@ -330,9 +314,7 @@ File hierarchy of one multi-scale OME-Zarr image `456.zarr`:
"coordinateTransformations": [
{
"type": "scale",
"scale": [1.0, 89.92, 89.92, 56.0],
"input": "/8-8-2",
"output": "root"
"scale": [1.0, 89.92, 89.92, 56.0]
}
]
},
Expand All @@ -341,9 +323,7 @@ File hierarchy of one multi-scale OME-Zarr image `456.zarr`:
"coordinateTransformations": [
{
"type": "scale",
"scale": [1.0, 179.84, 179.84, 112.0],
"input": "/16-16-4",
"output": "root"
"scale": [1.0, 179.84, 179.84, 112.0]
}
]
}
Expand Down
Loading