9 Document the identifiers you issue and use

Stian Soiland-Reyes edited this page Mar 12, 2016 · 5 revisions

Rule 9: Document the identifiers you issue and use

Permalink URI: https://w3id.org/id-rules/1

The global-scale identification cycle is a shared responsibility and provider/consumer roles often overlap in the context of data integration. Whether you issue your own identifiers or just reference those of others, you must document your identifier policies.

Supplemental Table S6 provides a set of questions that data providers and re-distributors can use to develop such documentation. Documentation should be published alongside and/or included together in a dataset description, as outlined in the recommendations for Dataset Descriptions developed by the W3C Semantic Web in the Health Care and Life Sciences Interest Group . For examples of such documentation see ChEMBL and Monarch ; the format may vary.

Questions that good identifier documentation should answer

Scope Question to answer Recommendation
Provider What types of entities are identified, what is the scope of these entities? (See Note 1) Must include
Provider What is your primary resolving namespace, if only one exists? If multiple, equally-valid resolving namespaces co-exist, what are these?

(e.g. INSDC.org has four such schemes as the entire dataset is fully represented by each of four authorities: NCBI, GenBank, ENA, and DDBJ)

Must include
Provider Are you aware of any alternate URIs (eg. different resolvers) that other groups use for your identifiers? (Even though alternates are not recommended for use, knowing what which URIs are equivalent facilitates data integration.) Should include
Provider What is the prefix you wish others to use if they reference your entities in an abbreviated way? If this prefix is registered, where? What is the compact URI you wish others to use? (See Note 2 Must include
Provider What is your persistence policy regarding maintenance of the URIs? What is your persistence policy regarding the corresponding entities and metadata? Must include
Provider Can machine-readable representations of your entities be accessed? If so, where and in what formats? Must include
Provider What is the regular expression of your Local Resource Identifiers and URIs? Strongly recommended
Provider Are there relationships between your identifiers? Where are these described? (See Note 1) Should include
Provider Under what license are identifiers made available? Should include
Provider Does the lifecycle of the entities potentially include versioning, splitting, merging, or deprecation? How are these changes managed, communicated, and synchronized between those using that entity? (See Note 1) Must include
Provider-Redistributor Do you identify entities that are also identified by others? Who are these others? Where are these mappings found and who, if anyone, maintains them? Strongly recommended
Provider-Redistributor Do you reference identifiers that are issued by other authorities? If so, in what cases? How often are the identifiers synchronized? Must include
Provider-Redistributor If you reference identifiers that are issued by other authorities, what are the prefix-to-resolving-namespace mappings used? What is the source of these mappings (e.g. manual or identifier service). Where can your mappings be found? Must include

Note 1: Adapted from the Metadata Recommendations For Linked Open Data Vocabularies

Note 2: If your LRIs already have a colon, make it clear to users what your preferred corresponding compact URI syntax is. We recommend referencing the LRI as if it were already a compact URI. For instance, the case of GO:0007049, the prefix GO can be expanded to http://purl.obolibrary.org/obo/GO_ and prepended to the numeric fragment (after :) to yield http://purl.obolibrary.org/obo/GO_0007049, in accordance with their documentation.