Clone this wiki locally
PROV-DM defines a PROV Identifier as a Qualified Name with the following definition: A qualified name is a name subject to namespace interpretation. It consists of a namespace, denoted by an optional prefix, and a local name. PROV-DM stipulates that a qualified name can be mapped into an IRI by concatenating the IRI associated with the prefix and the local part.
PROV-N provides a concrete syntax for prov:QUALIFIED_NAME, further noting that a PROV-N qualified name QUALIFIED_NAME can be mapped to a valid IRI [RFC3987] by concatenating the namespace denoted its local name to the local name, whose -escaped characters have been unescaped by dropping the character '' (backslash).
PROV-XML defines the type of both the
prov:ref xml-attributes to be
xsd:QName as that is the XSD datatype that most closely matches the qualified name definition by PROV-DM. Care should be taken when generating PROV identifier values in PROV-XML such that there is a known mapping to a URI.
A further note adds:
The xsd:QName datatype is more restrictive than the QualifiedName defined in [PROV-N], e.g. PROV-N allows local names to start with numbers, therefore valid identifier values in [PROV-N] serializations have to potential to not be valid identifier values in PROV-XML. It is recommended to enhance interoperability that provenance users strive to always use identifier schemes that map to valid xsd:QNames and URIs.
While this suggestion may work well for applications that are in full control of the design of their identifiers, this suggestion is not workable for applications, such as ProvToolbox, expected to consume arbitrary provenance in arbitrary representations. Any form of URI needs to be mapped to a Qualified Name for PROV-N and to an
xsd:QName for PROV-XML.
This limitation was recognized by the Provenance Working Group, a beginning of solution was outlined in email discussions, but never made it to the PROV-XML specification.
The purpose of this document is to outline the mapping process of Qualified Names to
xsd:QName adopted by ProvToolbox.
The suggestion outlined in email discussions was escaping Qualified Names in an unspecified way, and was relying on a separate explicit URI representation, for converting PROV-XML representations back into other PROV formats. Based on experience with ProvToolbox, we felt it would negatively affect the readability of PROV-XML.
Instead, we have implemented a reversible encoding from Qualified Name to
xsd:QName, which allows such
xsd:QName to be converted back to Qualified Name.
There already exists an encoding scheme that is reversible: Percent encoding as used in URIs. However, the character % is not valid in
xsd:QName. So, instead, we had to choose a character that is valid in local names and was not too frequently used, because itself would have to be escaped.
After consideration, it was decided to use _ (Underscore).
The first character of an
xsd:QName local name is expected to belong to a restricted subset of characters. For instance, a local name cannot start with a digit. Therefore, after underscore-encoding a local name, we further escape the first character with a _ (Underscore) if it not a valid start character.
The following table illustrates a few conversions.
|ex:abc||ex:abc||Provly identifier, no escaping required|
|ex:abc01||ex:abc01||Provly identifier, no escaping required|
|ex:01||ex:_01||QName starting by a non PN_CHAR_START to be escaped with _|
|ex:||ex:_||empty local name mapped to _|
|ex:_||ex:___||_ escaped, and escaped again since at the start|
|ex:a@b||ex:a_40b||Mapping of @ to _40|
|ex:a~b||ex:a_7Eb||Mapping of ~ to _7E|
|ex:a&b||ex:a_26b||Mapping of & to _26|
|ex:a+b||ex:a_2Bb||Mapping of + to _2B|
|ex:a*b||ex:a_2Ab||Mapping of * to _2A|
|ex:a#b||ex:a_23b||Mapping of # to _23|
|ex:a$b||ex:a_24b||Mapping of $ to _24|
|ex:a!b||ex:a_21b||Mapping of ! to _21b|
|ex:a01/bc||ex:a01_2Fbc||Mapping of / to _2F|
|ex:a01b\c||ex:a01b_5Cc||Mapping of \ to _5C|
|ex:a01b=c||ex:a01b_3Dc||Mapping of = to _3D|
|ex:a01b'c||ex:a01b_27c||Mapping of ' to _27|
|ex:a01b(c||ex:a01b_28c||Mapping of ( to _28|
|ex:a01b)c||ex:a01b_29c||Mapping of ) to _29|
|ex:a01b,c||ex:a01b_2Cc||Mapping of , to _2C|
|ex:a01b:c||ex:a01b_3Ac||Mapping of : to _3A|
|ex:a01b;c||ex:a01b_3Bc||Mapping of ; to _3B|
|ex:a01b[c||ex:a01b_5Bc||Mapping of [ to _5B|
|ex:a01b]c||ex:a01b_5Dc||Mapping of ] to _5D|
|ex:a01b.c||ex:a01b.c||. permitted in QName|
|ex:a01bc.||ex:a01bc.||. permitted at end of QName|
|ex:='(),_:;.@~||ex:__3D_27_28_29_2C___3A _3B_5B_5D._40_7E||Escape them all except .|
|ex:55348dff-4fcc-4ac2-ab56-641798c64400||ex:_55348dff-4fcc-4ac2-ab56-641798c64400||Escaping of a UUID-like QualifiedName|
|ex:À-ÖØ-öø-˿Ͱͽ||ex:À-ÖØ-öø-˿Ͱͽ||Support for Unicode|
(*) Note that the prov:QUALIFIED_NAME column displays unescaped Qualified Names. So, the correct syntax for
ex:a01bc\. since . is not allowed in final position.
PROV-XML makes the following suggestion.
It is recommended to enhance interoperability that >provenance users strive to always use identifier schemes > that map to valid xsd:QNames and URIs.
We call these "provly identifiers". For instance ex:ab01 is a "provly identifier", since it is both a PROV-N Qualified Name and a
The class org.openprovenance.prov.model.QualifiedNameUtils offers conversion methods implementing the encoding describe in this section.
The method toQName() implements the encoding describes in Section 2.
We recognize that this solution is our own, and in a sense, is not inter-operable. Other solutions are possible. But, one such solution (or more) is required to support inter-operable conversions between PROV-XML and the other representations.
A consequence of releasing ProvToolbox 0.7.0 with support for this encoding is that PROV-XML documents previously generated may not be readable if they don't support this encoding.
A future version of PROV will have to specify the mapping between PROV representations, and specifically, will have to address the mapping of PROV-N identifier to xsd:QName, as mandated by PROV-XML. The solution presented here will be an input to this standardization effort.