Fairmat 2024: use NXidentifier in NXuser#1416
Fairmat 2024: use NXidentifier in NXuser#1416lukaspie wants to merge 5 commits intonexusformat:mainfrom
Conversation
…put_ranging and NXapm_input_reconstruction contributed classes # Conflicts: # base_classes/NXuser.nxdl.xml # base_classes/nyaml/NXuser.yaml # contributed_definitions/NXapm_composition_space_results.nxdl.xml # contributed_definitions/NXapm_input_ranging.nxdl.xml # contributed_definitions/NXapm_input_reconstruction.nxdl.xml # contributed_definitions/NXapm_paraprobe_tool_common.nxdl.xml # contributed_definitions/NXclustering.nxdl.xml # contributed_definitions/nyaml/NXapm_composition_space_results.yaml # contributed_definitions/nyaml/NXapm_paraprobe_tool_common.yaml # manual/source/classes/contributed_definitions/cgms-structure.rst
# Conflicts: # base_classes/NXsample.nxdl.xml # base_classes/nyaml/NXsample.yaml # base_classes/nyaml/NXuser.yaml # contributed_definitions/NXapm.nxdl.xml # contributed_definitions/NXfabrication.nxdl.xml # contributed_definitions/NXidentifier.nxdl.xml # contributed_definitions/NXoptical_spectroscopy.nxdl.xml # contributed_definitions/nyaml/NXapm.yaml # contributed_definitions/nyaml/NXfabrication.yaml # contributed_definitions/nyaml/NXidentifier.yaml # contributed_definitions/nyaml/NXoptical_spectroscopy.yaml
…ersion # Conflicts: # applications/NXarpes.nxdl.xml # applications/nyaml/NXarpes.yaml # base_classes/NXaperture.nxdl.xml # base_classes/NXbeam.nxdl.xml # base_classes/NXdata.nxdl.xml # base_classes/NXdetector.nxdl.xml # base_classes/NXentry.nxdl.xml # base_classes/NXenvironment.nxdl.xml # base_classes/NXinstrument.nxdl.xml # base_classes/NXmonochromator.nxdl.xml # base_classes/NXroot.nxdl.xml # base_classes/NXsample.nxdl.xml # base_classes/NXsample_component.nxdl.xml # base_classes/NXsensor.nxdl.xml # base_classes/NXsource.nxdl.xml # base_classes/NXsubentry.nxdl.xml # base_classes/NXtransformations.nxdl.xml # base_classes/nyaml/NXaperture.yaml # base_classes/nyaml/NXbeam.yaml # base_classes/nyaml/NXdata.yaml # base_classes/nyaml/NXdetector.yaml # base_classes/nyaml/NXentry.yaml # base_classes/nyaml/NXenvironment.yaml # base_classes/nyaml/NXinstrument.yaml # base_classes/nyaml/NXmonochromator.yaml # base_classes/nyaml/NXprocess.yaml # base_classes/nyaml/NXroot.yaml # base_classes/nyaml/NXsample.yaml # base_classes/nyaml/NXsample_component.yaml # base_classes/nyaml/NXsensor.yaml # base_classes/nyaml/NXsource.yaml # base_classes/nyaml/NXsubentry.yaml # base_classes/nyaml/NXtransformations.yaml # base_classes/nyaml/NXuser.yaml
paulmillar
left a comment
There was a problem hiding this comment.
Just to be clear, I strongly support for NeXus including NXidentifier :-)
| </field> | ||
| <group type="NXidentifier"> | ||
| <doc> | ||
| Details about an author code, open researcher, or contributor |
There was a problem hiding this comment.
There are a number of problems with this doc.
It talks about the different possible roles, despite this information recorded elsewhere (in the role field).
It also explicitly mentions ORCID, which is only one possible identifier.
It provides (or hints at) a preferred format. Formatting instructions belong in NXidentifier, not here.
Suggest:
An identifier for the user responsible for this entry.
| --> | ||
| <definition xmlns="http://definition.nexusformat.org/nxdl/3.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" category="base" type="group" name="NXidentifier" extends="NXobject" xsi:schemaLocation="http://definition.nexusformat.org/nxdl/3.1 ../nxdl.xsd"> | ||
| <doc> | ||
| An identifier for a (persistent) resource, e.g., a DOI or orcid. |
There was a problem hiding this comment.
Some thoughts:
- Putting information in parentheses can lead to confusion. I would try to avoid doing this.
- The examples are not that helpful. First because they are confusing is "DOI" itself an identifier? No, obviously not. However, this is obvious if you already know what a DOI is, but in that case you wouldn't need the example.
If you want to give an example of a persistent identifier then use an actual identifier.
My suggestion:
An identifier, provided by some authority, that has been assigned to the real-world object described by this NXobject. To be useful, the identifier must not be reassigned to a different real-world object. It is typical for there to be some mechanism to resolve an identifier, obtaining metadata about the object. Identifiers for which some guarantees exist regarding this resolution process are called persistent identifiers. Persistent identifiers are also known as PIDs. The DOI 10.5281/zenodo.13373909 and the ORCID https://orcid.org/0000-0001-6932-604X are examples of PIDs.
| <doc> | ||
| An identifier for a (persistent) resource, e.g., a DOI or orcid. | ||
| </doc> | ||
| <field name="service" type="NX_CHAR"> |
There was a problem hiding this comment.
I wouldn't call this information "service". To me, "service" (a singular noun) suggest some (single) computing resource that is responding to queries. For DOIs there's at least two services: the underlying handle infrastructure and the DOI proxy service (https://doi.org/10....). There's also the Handle proxy service (https://hdl.handle.net) that works for both DOI and non-DOI Handles. Defining the services for URN, URL and PURL are equally problematic. (Not sure about ISO, but likely suffers from the same problem).
One might naively call this "authority" or "namespace", but these, too, becomes problematic. What is the authority for "URN" or "URL"? The namespace for PURL is contained within the namespace of URL. The namespace of DOI is contained within the namespace of Handles, etc.
I think DataCite's approach of calling this information "type" (as in relatedIdentifierType) makes the most sense.
If type is not possible (already used by NeXus) then perhaps identierType or idType.
| <doc> | ||
| The service by which the resource can be resolved. | ||
|
|
||
| Examples: doi, urn, hdl, purl, orcid, iso, url |
There was a problem hiding this comment.
These shouldn't be "examples", but rather definitions as part of an enumeration. The enumeration could be open, to support additional types.
For each defined value, the documentation should say from where the identifier comes (who issues the IDs) and how it is formatted, ideally with an example. For inspiration, see how DataCite handle this for relatedIdentifierType values, such as DOIs.
Also, what's an "ISO" type?
| The unique code, IRI or hash to resolve this reference. | ||
| Typically, this is stated by the service which is considered a complete | ||
| identifier, e.g., for a DOI it's something of the form `10.1107/S1600576714027575` | ||
| or `https://doi.org/10.1107/S1600576714027575`, which are both resolvable. |
There was a problem hiding this comment.
For each identifier type ("service"), the format should be clearly defined (purl, Hdl, DOI, ORCID, ...). That could be done here or in the docs of the (earlier defined) "service" field.
| The unique code, IRI or hash to resolve this reference. | ||
| Typically, this is stated by the service which is considered a complete | ||
| identifier, e.g., for a DOI it's something of the form `10.1107/S1600576714027575` | ||
| or `https://doi.org/10.1107/S1600576714027575`, which are both resolvable. |
There was a problem hiding this comment.
I would suggest avoiding having ambiguity in how identifiers are described. Something like a DOI should have a single, canonical representation. See DataCite for inspiration.
The word "or" shouldn't be used!
| </field> | ||
| <field name="is_persistent" type="NX_BOOLEAN"> | ||
| <doc> | ||
| True if the identifier is persistent (i.e., unique and available indefinitely), |
There was a problem hiding this comment.
For me, this isn't a very good description.
It uses the word "unique" without describing what is unique. Hint, it's not the identifier (as assigned to an object): any object may have any number of PIDs. Rather it's the object (as assigned to an identifier) that is unique.
A (perhaps) more intuitive way of saying this that, once assigned, a PIDs must not be reassigned: assigned to a different object.
It also fails to specify what is made persistent. Seemingly, that the identifier is persistent means "[the identifier] is available indefinitely". How is an identifier made available?
For me, persistency is the ability to query some service to obtain meaningful metadata about the object.
| or `https://doi.org/10.1107/S1600576714027575`, which are both resolvable. | ||
| </doc> | ||
| </field> | ||
| <field name="is_persistent" type="NX_BOOLEAN"> |
There was a problem hiding this comment.
I'm not sure what is the use-case for this field.
For many types of PIDs, the persistency is tacit: an ORCID is a persistent identifier, so why specify this? Put another way, what does it mean to specify an NXidentifier with service "ORCID" and is_persistent of "false"?
I suppose the intention here is to allow ad-hoc values in "service" field and allow software to discover these NXidentifier entries are actually PIDs. Perhaps, to allow software to discover that DOIs are persistent identifiers without that software knowing what are DOIs.
However, in most cases, persistency is a feature of the underlying identifier "service" (or "type"). Including this field seems like it is inviting inconsistent entries.
|
Dear @paulmillar, thanks for the super valuable feedback. During NIAC2024, it was decided (in a different discussion) that identifier is such a valuable concept that it should be possible to attach it to any base class. Therefore, it will be added to NXobject here: #1486. I will close this PR, but consider your changes in the other PR (which is currently still an early draft). |
No description provided.