Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unit -> units #168

Closed
wants to merge 11 commits into from
Closed

unit -> units #168

wants to merge 11 commits into from

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Mar 1, 2023

Change unit to units. units is the name used by the CF Conventions, and thereby the broader climate science community -- the xarray ecosystem uses the units keyword (see https://docs.xarray.dev/en/stable/examples/multidimensional-coords.html, and https://pint-xarray.readthedocs.io/en/stable/creation.html).

Unless there's a very compelling reason for the singular unit, we should probably aim for compatibility with the climate science community (and their nice tools) and use units.

I also added clarification that the name and type attributes should be strings.

I thought I would need to change the schema, but it looks like units is already in there?

(Update, June 20, 2023): this issue has generated a lively discussion and @joshmoore requested that I summarize the pros and cons of the proposed change:

Arguments in favor of keeping unit:

  • The status quo is fine, and switching costs are too high
  • Some English speakers in this issue prefer unit over units (e.g.). But others prefer units over unit, e.g..
  • udunits uses the noun unit in its reference xml files
  • if unit becomes pluralized to units, what about the other fields of axis? e.g., should name become names, and type become types?

Arguments in favor of switching to units:

  • More consistent with CF / xarray. (And the switching cost is worth it).
  • The plural units is more future-proof. If this spec ever allows axis objects to be multi-dimensional, or more generally allows describing the units of multidimensional quantities, then the units field could evolve to a list of udunits-compatible strings without needing to change the name of the field itself. The singular unit fails this test (this might be why CF uses units, but i'm not sure).

@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2023

Automated Review URLs

@will-moore
Copy link
Member

See #157 ;)

@will-moore
Copy link
Member

I think I have a slight preference for "units", which is probably why I used units incorrectly in the schemas - Fixed by #157.
That PR should still be valid, since for v0.4 we did have "unit".
So if we switch back to "units" for v0.5 then we'd need another PR to switch the schemas back again!?

@will-moore will-moore mentioned this pull request Mar 9, 2023
will-moore and others added 4 commits March 9, 2023 19:25
Apparently a new release of bikeshed is now unhappy with the use of
`<img/>` and `<img></img>` is required:

```
  $ bikeshed spec "latest/index.bs" "latest/index.out.html"
  LINE 651:1: Tag <img> wasn't closed at end of file.
   ✘  Did not generate, due to fatal errors

  Failed
```
@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 15, 2023

Is there a blocker here or can we go ahead and merge this?

@will-moore
Copy link
Member

Since #157 is merged now, could you rebase/merge so that those changes don't show up in this PR as its harder to understand, thx.

@will-moore
Copy link
Member

cc @joshmoore @sbesson - you guys OK with this proposal?

@sbesson
Copy link
Member

sbesson commented Mar 16, 2023

At least from my side, I don't have a compelling reason to use the singular unit form. From a specification perspective, I have no objection to this proposal if the community feels it brings value.
Two comments:

  • attribute renaming inherently carries the risk of confusion or simply being missed. We should at least ensure the change is explicitly listed in the version history changes. To some extent, it raises the question of whether there should be instructions for upgrading between breaking versions of the specification
  • Coordinate systems and new coordinate transformations proposal #138 also includes changes to the axes specification and will need to be updated with this change included /cc @bogovicj

Copy link
Contributor

@constantinpape constantinpape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to note that this is a bit inconsistent. A single axis will just have one unit. Conversely the other names should change to names and types, which doesn't make much sense.

@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 17, 2023

@constantinpape The main goal here is harmonizing our metadata with the CF conventions, and they use units for this field. I don't think the term "units" is at variance with usage of the term in english scientific writing -- I do commonly see "units" used to describe a single base unit.

But it's reasonable to ask "why is 'units' conventional, rather than just 'unit'". I suspect this because base units are often combined to form compound units, for which the plural form becomes natural. Thus, although the value of units is a string, we can think that string denoting "the combination of units for this axis". That combination may be simple, e.g. meters (or should it be meter 🤔 ), in which case there is just 1 unit, but an axis may also be associated with compound units, e.g. meter * seconds, which is the case for most confocal images if we want to be pedantic about it (because samples along each row and column are separated by space and time).

Alternatively, maybe "units" is used because an axis denotes an interval, and the length of that interval is measured as a number of ($unit)s? E.g., we say "10 meters" and not "10 meter"?

But these theories are just speculation on my part. I searched for some justification for the plural form in the climate science specifications and couldn't find anything. Ultimately, I'm willing to tolerate the pluralized word in exchange for consistency with climate metadata conventions.

joshmoore and others added 4 commits March 17, 2023 15:45
Apparently a new release of bikeshed is now unhappy with the use of
`<img/>` and `<img></img>` is required:

```
  $ bikeshed spec "latest/index.bs" "latest/index.out.html"
  LINE 651:1: Tag <img> wasn't closed at end of file.
   ✘  Did not generate, due to fatal errors

  Failed
```
@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 18, 2023

@sbesson to your point about handling breaking changes, I modified this PR to a) support the use of the field unit, guaranteeing backwards compatibility, and b) include a suggestion to the effect that implementers of the spec should give users a deprecation warning when an element of axes has the field unit.

@will-moore
Copy link
Member

It looks like the change to the schema has been lost?

"Units" makes sense to me because you tend to use the plural for describing units. E.g. "10 millimetres" the units are millimetres.

@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 21, 2023

@will-moore thanks for spotting that issue with the schema, it should be fixed now

Copy link
Member

@will-moore will-moore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM 👍

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This evening at the Zarr community meeting, @d-v-b brought up the fact this one letter change has not been merged. I guess mea culpa, but I'm still unsure of the value of this churn (needing multiple code paths in implementations). If the code paths are going to be sufficiently rewritten by coming 0.5 work, then like @sbesson I'm not strongly tied to the singular. But unification with cfconventions doesn't feel sufficiently valuable from my point-of-view, at least not yet.

cc: @bogovicj @constantinpape

@jbms
Copy link

jbms commented Jun 14, 2023

I don't really have an opinion about what the ideal name is, though I did just check and see that in the udunits library, the specific unit of "millimeters", etc. would be called a "unit", and "units" refers to the plural of that, i.e. multiple different units:

https://docs.unidata.ucar.edu/udunits/current/udunits2-base.xml
https://docs.unidata.ucar.edu/udunits/current/udunits2-derived.xml

It isn't particularly difficult to support this minor change in any given implementation, but it adds implementation complexity and likely will lead to incompatibility in the future, as some implementations may only support one of the names.

As far as I understand, this is attempting to use the same name as in the CF conventions spec, but does not actually aid compatibility with xarray or other software that follows CF conventions because xarray is actually using the name in a different place in the JSON metadata.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jun 14, 2023

but does not actually aid compatibility with xarray or other software that follows CF conventions because xarray is actually using the name in a different place in the JSON metadata.

Empirically, using the exact same name as xarray / the cf conventions is quite convenient compared to using a name that is the same sans one letter. Xarray coordinate variables, as an in-memory representation, do not have JSON metadata so I'm not sure how to interpret this part. But regardless of where xarray, (or any other library that approximates the CF conventions) puts the word "units", as long as we agree that ome-ngff "unit" and xarray "units" describe the same concept, we should consider harmonizing the word we use for this concept, unless there is a good reason to diverge from CF conventions / xarray.

I will note this is not just a CF conventions / xarray thing. NIFTI and NRRD also use units.

It isn't particularly difficult to support this minor change in any given implementation, but it adds implementation complexity and likely will lead to incompatibility in the future, as some implementations may only support one of the names.

Regarding implementation complexity, I am aware of that, which is why this PR is backwards compatible (i.e., it allows both unit and units). I think this is the simplest way to amend the spec, but if there's an even simpler proposal that achieves the same aim, I am open to suggestions. But we shouldn't trade implementation complexity for user complexity -- diverging from other metadata standards adds needless complexity to users and programs that work in this ecosystem.

@jbms
Copy link

jbms commented Jun 14, 2023

As far as xarray, I'm not too familiar with it, but I assumed that it has a way to load/save this attribute from zarr metadata. Perhaps that is not the case, and it is merely an identifier used in the xarray API.

From a quick look, I think "unit" is pretty much universally used as the singular term, as in SI base unit like "meter" or derived unit like "5 meters", and "units" is the plural form:

https://simple.wiktionary.org/wiki/unit

Since this attribute is specifying a single unit, not multiple units, the singular form would seem to make more sense. Note that in both NIFTI and NRRD, the "units" attribute is specifying more than one unit (in case of NRRD, a unit for each dimension, and for NIFTI, a unit for space and a unit for time), so the plural makes sense there.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jun 14, 2023

Since this attribute is specifying a single unit, not multiple units, the singular form would seem to make more sense.

We discussed conventional english usage of the word "unit" and "units" earlier in the thread. English itself is inconsistent in when units of measure are pluralized. Since English already pluralizes units contextually, Is there a legitimate concern that people will be confused here?

@d-v-b
Copy link
Contributor Author

d-v-b commented Jun 14, 2023

An additional data point: DICOM uses MeasurementUnits to name the metadata describing the units associated with a measured quantity.

@mkitti
Copy link
Member

mkitti commented Jun 14, 2023

If I was a saving a velocity field from my particle image velocimetry experiment, in meters per second, then the. I think units may apply in that case.

Since we may be mostly talking about pixel pitch for images, perhaps what we commonly mean is a distance measure per pixel, e.g. nanometers per pixel. Units would also be applicable here.

Regarding implementations and backwards compatibility, this is a pre-1.0 specification. Under most understandings of versioning schemes, including semantic versioning, there are few if any stability guarantees. In other words, now is the time to break things.

https://semver.org/#spec-item-4

@jbms
Copy link

jbms commented Jun 14, 2023

Since this attribute is specifying a single unit, not multiple units, the singular form would seem to make more sense.

We discussed conventional english usage of the word "unit" and "units" earlier in the thread. English itself is inconsistent in when units of measure are pluralized. Since English already pluralizes units contextually, Is there a legitimate concern that people will be confused here?

I happened to see that stackexchange page as well, but it seems to be related to the question of whether a specific unit name should be singular or plural, e.g. "1 meter" vs "2 meters" vs "2-meter pole". I don't see anything on that page about whether the word "unit" itself should be singular or plural.

It is true that there are a lot of inconsistencies in English. But overall it seems to me that "unit" is probably at least somewhat preferred over "units" in terms of common usage for this particular case.

The spec does also reference UDUNITS so there may be some value in being consistent with the terminology there. According to udunits, a unit like "meters per second" or "0.7 meters per second" would be considered a "derived unit" (singular).

In any case, I'm not worried about confusion when a person is reading the metadata ---- the meaning will be obvious even without reading the specification.

If this change introduced actual technical compatibility with CF conventions software, I think there would be a reasonable argument to do it. As it is, though, it is merely making the names more similar but does not improve compatibility.

If some existing implementations of ome-zarr are already using units, that would also be a reasonable argument for supporting both.

Although supporting both would be simpler implementation-wise than having to condition on the version, if this change is to be made, I would actually strongly urge that both "unit" and "units" NOT be supported. As I see it, this type of metadata is the sort of thing that is likely to be generated and parsed directly by all sorts of adhoc scripts; each of these scripts can be thought of as OME-zarr implementations. Therefore as this format becomes more popular we can expect there may be a very large number of independent (partial) implementations, and I expect many of them will support only one of "unit" or "units", probably whichever was used in the dataset used as a reference when writing the code.

@joshmoore
Copy link
Member

jbms commented [7 hours ago]
(#168 (comment)) if this change is to be made, I would actually strongly urge that both "unit" and "units" NOT be supported.

I'd second that, or if the support for the two was meant to be transitional, that we discuss that timeline.

mkitti commented [7 hours ago]
(#168 (comment)) Regarding implementations and backwards compatibility, this is a pre-1.0 specification. Under most understandings of versioning schemes, including semantic versioning, there are few if any stability guarantees. In other words, now is the time to break things.

In general, I'd agree that pre-1.0 is the time to break things if necessary, but as I mentioned to @d-v-b during the Zarr call, I would really urge us to also consider the costs (i.e. just because it's sub-1.0 doesn't mean we should break things).

For example, there was a mention that this was just a one letter change. I'll reference my well-worn story of making it possible to support "/" in addition to "." at the Zarr-level: that took over 6 months of work. I don't think this is nearly as complicated, but I would strongly urge anyone who is opening a PR to change the spec to engage with that overhead before asking the community to invest their time.

@d-v-b: you've brought up a few different reasons in the discussion above, can I suggest you update the description of this PR and summarize the various arguments pro and con? Also, if there's not a clear consensus on the value of changing right now, identifying a point in time when there's going to be a breaking change anyway, might help to draw support.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jun 20, 2023

I'd second that, or if the support for the two was meant to be transitional, that we discuss that timeline.

The goal was to make the support for both unit and units transitional, but I'm happy making a hard break from unit to units, if people prefer that. Are we explicit anywhere about what major vs minor versions mean with respect to backwards compatibility?

In general, I'd agree that pre-1.0 is the time to break things if necessary, but as I mentioned to @d-v-b during the Zarr call, I would really urge us to also consider the costs (i.e. just because it's sub-1.0 doesn't mean we should break things).

I'm not proposing this change for fun; I think it materially improves the spec. The cost of changing the spec and the cost of enduring warts in the spec both increase with time, so it's imperative that we change things earlier rather than later... assuming we agree on the change 😅

@d-v-b: you've brought up a few different reasons in the discussion above, can I suggest you update the description of this PR and summarize the various arguments pro and con? Also, if there's not a clear consensus on the value of changing right now, identifying a point in time when there's going to be a breaking change anyway, might help to draw support.

That's a good idea. I will post a summary of pros and cons here, and also add it to the description:

Arguments in favor of keeping unit:

  • The status quo is fine, and switching costs are too high
  • Some English speakers in this issue prefer unit over units (e.g.). But others prefer units over unit, e.g..
  • udunits uses the noun unit in its reference xml files
  • if unit becomes pluralized to units, what about the other fields of axis? e.g., should name become names, and type become types?

Arguments in favor of switching to units:

  • More consistent with CF / xarray. (And the switching cost is worth it).
  • The plural units is more future-proof. If this spec ever allows axis objects to be multi-dimensional, or more generally allows describing the units of multidimensional quantities, then the units field could evolve to a list of udunits-compatible strings without needing to change the name of the field itself. The singular unit fails this test (this might be why CF uses units, but i'm not sure).

@d-v-b
Copy link
Contributor Author

d-v-b commented Sep 7, 2023

What do we need to get this closed? Should I rebase on top of the coordinate transformations PR?

@d-v-b
Copy link
Contributor Author

d-v-b commented Sep 19, 2023

After discussion over in zulip, it's clear that this proposal does not add enough value to warrant inclusion in the spec.

@d-v-b d-v-b closed this Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants