Skip to content

Fairmat 2024: use NXidentifier in NXuser#1416

Closed
lukaspie wants to merge 5 commits intonexusformat:mainfrom
FAIRmat-NFDI:fairmat-2024-nxuser
Closed

Fairmat 2024: use NXidentifier in NXuser#1416
lukaspie wants to merge 5 commits intonexusformat:mainfrom
FAIRmat-NFDI:fairmat-2024-nxuser

Conversation

@lukaspie
Copy link
Copy Markdown
Contributor

No description provided.

atomprobe-tc and others added 5 commits September 23, 2024 18:19
…put_ranging and NXapm_input_reconstruction contributed classes

# Conflicts:
#	base_classes/NXuser.nxdl.xml
#	base_classes/nyaml/NXuser.yaml
#	contributed_definitions/NXapm_composition_space_results.nxdl.xml
#	contributed_definitions/NXapm_input_ranging.nxdl.xml
#	contributed_definitions/NXapm_input_reconstruction.nxdl.xml
#	contributed_definitions/NXapm_paraprobe_tool_common.nxdl.xml
#	contributed_definitions/NXclustering.nxdl.xml
#	contributed_definitions/nyaml/NXapm_composition_space_results.yaml
#	contributed_definitions/nyaml/NXapm_paraprobe_tool_common.yaml
#	manual/source/classes/contributed_definitions/cgms-structure.rst
# Conflicts:
#	base_classes/NXsample.nxdl.xml
#	base_classes/nyaml/NXsample.yaml
#	base_classes/nyaml/NXuser.yaml
#	contributed_definitions/NXapm.nxdl.xml
#	contributed_definitions/NXfabrication.nxdl.xml
#	contributed_definitions/NXidentifier.nxdl.xml
#	contributed_definitions/NXoptical_spectroscopy.nxdl.xml
#	contributed_definitions/nyaml/NXapm.yaml
#	contributed_definitions/nyaml/NXfabrication.yaml
#	contributed_definitions/nyaml/NXidentifier.yaml
#	contributed_definitions/nyaml/NXoptical_spectroscopy.yaml
…ersion

# Conflicts:
#	applications/NXarpes.nxdl.xml
#	applications/nyaml/NXarpes.yaml
#	base_classes/NXaperture.nxdl.xml
#	base_classes/NXbeam.nxdl.xml
#	base_classes/NXdata.nxdl.xml
#	base_classes/NXdetector.nxdl.xml
#	base_classes/NXentry.nxdl.xml
#	base_classes/NXenvironment.nxdl.xml
#	base_classes/NXinstrument.nxdl.xml
#	base_classes/NXmonochromator.nxdl.xml
#	base_classes/NXroot.nxdl.xml
#	base_classes/NXsample.nxdl.xml
#	base_classes/NXsample_component.nxdl.xml
#	base_classes/NXsensor.nxdl.xml
#	base_classes/NXsource.nxdl.xml
#	base_classes/NXsubentry.nxdl.xml
#	base_classes/NXtransformations.nxdl.xml
#	base_classes/nyaml/NXaperture.yaml
#	base_classes/nyaml/NXbeam.yaml
#	base_classes/nyaml/NXdata.yaml
#	base_classes/nyaml/NXdetector.yaml
#	base_classes/nyaml/NXentry.yaml
#	base_classes/nyaml/NXenvironment.yaml
#	base_classes/nyaml/NXinstrument.yaml
#	base_classes/nyaml/NXmonochromator.yaml
#	base_classes/nyaml/NXprocess.yaml
#	base_classes/nyaml/NXroot.yaml
#	base_classes/nyaml/NXsample.yaml
#	base_classes/nyaml/NXsample_component.yaml
#	base_classes/nyaml/NXsensor.yaml
#	base_classes/nyaml/NXsource.yaml
#	base_classes/nyaml/NXsubentry.yaml
#	base_classes/nyaml/NXtransformations.yaml
#	base_classes/nyaml/NXuser.yaml
Copy link
Copy Markdown

@paulmillar paulmillar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, I strongly support for NeXus including NXidentifier :-)

</field>
<group type="NXidentifier">
<doc>
Details about an author code, open researcher, or contributor
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a number of problems with this doc.

It talks about the different possible roles, despite this information recorded elsewhere (in the role field).

It also explicitly mentions ORCID, which is only one possible identifier.

It provides (or hints at) a preferred format. Formatting instructions belong in NXidentifier, not here.

Suggest:

An identifier for the user responsible for this entry.

-->
<definition xmlns="http://definition.nexusformat.org/nxdl/3.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" category="base" type="group" name="NXidentifier" extends="NXobject" xsi:schemaLocation="http://definition.nexusformat.org/nxdl/3.1 ../nxdl.xsd">
<doc>
An identifier for a (persistent) resource, e.g., a DOI or orcid.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts:

  • Putting information in parentheses can lead to confusion. I would try to avoid doing this.
  • The examples are not that helpful. First because they are confusing is "DOI" itself an identifier? No, obviously not. However, this is obvious if you already know what a DOI is, but in that case you wouldn't need the example.

If you want to give an example of a persistent identifier then use an actual identifier.

My suggestion:

An identifier, provided by some authority, that has been assigned to the real-world object described by this NXobject. To be useful, the identifier must not be reassigned to a different real-world object. It is typical for there to be some mechanism to resolve an identifier, obtaining metadata about the object. Identifiers for which some guarantees exist regarding this resolution process are called persistent identifiers. Persistent identifiers are also known as PIDs. The DOI 10.5281/zenodo.13373909 and the ORCID https://orcid.org/0000-0001-6932-604X are examples of PIDs.

<doc>
An identifier for a (persistent) resource, e.g., a DOI or orcid.
</doc>
<field name="service" type="NX_CHAR">
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call this information "service". To me, "service" (a singular noun) suggest some (single) computing resource that is responding to queries. For DOIs there's at least two services: the underlying handle infrastructure and the DOI proxy service (https://doi.org/10....). There's also the Handle proxy service (https://hdl.handle.net) that works for both DOI and non-DOI Handles. Defining the services for URN, URL and PURL are equally problematic. (Not sure about ISO, but likely suffers from the same problem).

One might naively call this "authority" or "namespace", but these, too, becomes problematic. What is the authority for "URN" or "URL"? The namespace for PURL is contained within the namespace of URL. The namespace of DOI is contained within the namespace of Handles, etc.

I think DataCite's approach of calling this information "type" (as in relatedIdentifierType) makes the most sense.

If type is not possible (already used by NeXus) then perhaps identierType or idType.

<doc>
The service by which the resource can be resolved.

Examples: doi, urn, hdl, purl, orcid, iso, url
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These shouldn't be "examples", but rather definitions as part of an enumeration. The enumeration could be open, to support additional types.

For each defined value, the documentation should say from where the identifier comes (who issues the IDs) and how it is formatted, ideally with an example. For inspiration, see how DataCite handle this for relatedIdentifierType values, such as DOIs.

Also, what's an "ISO" type?

The unique code, IRI or hash to resolve this reference.
Typically, this is stated by the service which is considered a complete
identifier, e.g., for a DOI it's something of the form `10.1107/S1600576714027575`
or `https://doi.org/10.1107/S1600576714027575`, which are both resolvable.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each identifier type ("service"), the format should be clearly defined (purl, Hdl, DOI, ORCID, ...). That could be done here or in the docs of the (earlier defined) "service" field.

The unique code, IRI or hash to resolve this reference.
Typically, this is stated by the service which is considered a complete
identifier, e.g., for a DOI it's something of the form `10.1107/S1600576714027575`
or `https://doi.org/10.1107/S1600576714027575`, which are both resolvable.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest avoiding having ambiguity in how identifiers are described. Something like a DOI should have a single, canonical representation. See DataCite for inspiration.

The word "or" shouldn't be used!

</field>
<field name="is_persistent" type="NX_BOOLEAN">
<doc>
True if the identifier is persistent (i.e., unique and available indefinitely),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, this isn't a very good description.

It uses the word "unique" without describing what is unique. Hint, it's not the identifier (as assigned to an object): any object may have any number of PIDs. Rather it's the object (as assigned to an identifier) that is unique.

A (perhaps) more intuitive way of saying this that, once assigned, a PIDs must not be reassigned: assigned to a different object.

It also fails to specify what is made persistent. Seemingly, that the identifier is persistent means "[the identifier] is available indefinitely". How is an identifier made available?

For me, persistency is the ability to query some service to obtain meaningful metadata about the object.

or `https://doi.org/10.1107/S1600576714027575`, which are both resolvable.
</doc>
</field>
<field name="is_persistent" type="NX_BOOLEAN">
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what is the use-case for this field.

For many types of PIDs, the persistency is tacit: an ORCID is a persistent identifier, so why specify this? Put another way, what does it mean to specify an NXidentifier with service "ORCID" and is_persistent of "false"?

I suppose the intention here is to allow ad-hoc values in "service" field and allow software to discover these NXidentifier entries are actually PIDs. Perhaps, to allow software to discover that DOIs are persistent identifiers without that software knowing what are DOIs.

However, in most cases, persistency is a feature of the underlying identifier "service" (or "type"). Including this field seems like it is inviting inconsistent entries.

@sanbrock sanbrock mentioned this pull request Sep 29, 2024
@lukaspie lukaspie linked an issue Sep 29, 2024 that may be closed by this pull request
@lukaspie
Copy link
Copy Markdown
Contributor Author

lukaspie commented Oct 4, 2024

Dear @paulmillar, thanks for the super valuable feedback. During NIAC2024, it was decided (in a different discussion) that identifier is such a valuable concept that it should be possible to attach it to any base class. Therefore, it will be added to NXobject here: #1486. I will close this PR, but consider your changes in the other PR (which is currently still an early draft).

@lukaspie lukaspie closed this Oct 4, 2024
@lukaspie lukaspie deleted the fairmat-2024-nxuser branch March 26, 2025 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NXuser

3 participants