Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

414: Attempt to implement expanding the allowed character repertoire #546

Merged
merged 2 commits into from
Jul 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
35 changes: 27 additions & 8 deletions specifications/xpath-datamodel-40/src/xpath-datamodel.xml
Original file line number Diff line number Diff line change
Expand Up @@ -949,24 +949,43 @@ referenced specifications is generally <emph>recommended</emph> but not
It is implementation-dependent how a processor handles any such
conflicts.</p>

<p><termdef term="string" id="dt-string">A
<p diff="chg" at="2023-06-12"><termdef term="string" id="dt-string">A
<term>string</term>
is a sequence of zero or more
<termref def="dt-character">characters</termref>, or equivalently, a
value in the value space of the <code>xs:string</code> data
type.</termdef></p>
<termref def="dt-character">characters</termref>.</termdef></p>

<p><termdef term="character" id="dt-character">
A <term>character</term> is an instance of the <code>Char</code> production in
<bibref ref="xml"/>.
</termdef></p>
<p diff="chg" at="2023-06-12"><termdef term="character" id="dt-character">A
<term>character</term> is any Unicode character.</termdef>
Implementations <rfc2119>may</rfc2119> restrict characters
to those Unicode characters allowed by the <code>Char</code>
production in <bibref ref="xml"/>. Unpaired surrogates are always forbidden.</p>

<p><termdef id="dt-codepoint" term="codepoint">A
<term>codepoint</term> is a non-negative integer assigned to a
<termref def="dt-character">character</termref> by the Unicode
consortium, or reserved for future assignment to a
character.</termdef></p>

<p diff="add" at="2023-06-12">The definitions of <termref def="dt-string">string</termref> and
<termref def="dt-character">character</termref> in the data model
allow an implementation to accept input that cannot occur in a
well-formed XML document. For example, an implementation might allow
the <code>unparsed-text()</code> function to return the content of a
text file that includes a literal BEL character (&amp;#x7;) or might
not restrict what <code>codepoints-to-string()</code> can return.</p>

<p diff="add" at="2023-06-12">An implementation that allows a broader repertoire of characters to
be consumed by the processor, <rfc2119>must</rfc2119> ensure that</p>

<olist diff="add" at="2023-06-12">
<item>
<p>Any characters serialized with the XML or XHTML output methods satisfy the
well-formedness criteria of the selected version of XML.</p>
</item>
<item><p>Any schema validation carried out using an XML Schema 1.0 or 1.1 schema rejects
any nodes or atomic values containing characters that do not satisfy the
constraints of the selected version of XML.</p></item>
</olist>
</div3>

<!-- Jim: New text to correspond to new graphics and to new tables copied from F&O, 2009-10-21 -->
Expand Down
28 changes: 14 additions & 14 deletions specifications/xpath-functions-40/src/function-catalog.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3375,9 +3375,10 @@ return if (starts-with($preprocessed-value, "-")) then -$abs else +$abs]]></eg>

</fos:rules>
<fos:errors>
<p>A dynamic error is raised <errorref class="CH" code="0001"
<p diff="chg" at="2023-06-12">A dynamic error is raised <errorref class="CH" code="0001"
/> if any of the codepoints in
<code>$values</code> is not a permitted XML character.</p>
<code>$values</code> is not a
<termref def="dt-permitted-character">permitted character</termref>.</p>
</fos:errors>
<fos:examples>
<fos:example>
Expand Down Expand Up @@ -14841,15 +14842,14 @@ else
to an absolute URI (for example, because the base-URI property in the static context is absent),
</phrase>or if it cannot be used to retrieve the string
representation of a resource. </p>
<p>A dynamic error is raised <errorref class="UT" code="1190"
<p diff="chg" at="2023-06-12">A dynamic error is raised <errorref class="UT" code="1190"
/> if the value of the
<code>$encoding</code> argument is not a valid encoding name, if the
processor does not support the specified encoding, if
the string representation of the retrieved resource contains octets that cannot be
decoded into Unicode <termref
def="character"
>characters</termref> using the specified
encoding, or if the resulting characters are not permitted XML characters.</p>
decoded into Unicode <termref def="character">characters</termref> using the specified
encoding, or if the resulting characters are not
<termref def="dt-permitted-character">permitted character</termref>.</p>
<p>A dynamic error is raised <errorref class="UT" code="1200"
/> if <code>$encoding</code>
is absent and the processor cannot infer the
Expand Down Expand Up @@ -19492,8 +19492,7 @@ else map:put($MAP, $KEY, $ACTION(())</eg>
<fos:type>xs:boolean</fos:type>
<fos:default>false</fos:default>
<fos:values>
<fos:value value="false"
>
<fos:value value="false">
All characters in the input that are valid
in the version of XML supported by the implementation, whether or not they are represented
in the input by means of an escape sequence, are represented as unescaped characters in the result. Any
Expand Down Expand Up @@ -20142,8 +20141,7 @@ else map:put($MAP, $KEY, $ACTION(())</eg>
<fos:type>xs:boolean</fos:type>
<fos:default>true</fos:default>
<fos:values>
<fos:value value="false"
>
<fos:value value="false">
All characters in the input that are valid
in the version of XML supported by the implementation, whether or not they are represented
in the input by means of an escape sequence, are represented as unescaped characters in the result. Any
Expand Down Expand Up @@ -20386,7 +20384,8 @@ else map:put($MAP, $KEY, $ACTION(())</eg>
of the initial octets of the resource.</p>
</item>
<item>
<p>If the resource contains characters that are not valid in the version of XML used by the processor,
<p diff="chg" at="2023-06-12">If the resource contains characters that are not
<termref def="dt-permitted-character">permitted characters</termref>,
then rather than raising an error as <code>fn:unparsed-text#1</code> does, the function replaces such characters by the equivalent
JSON escape sequence prior to parsing.</p>
<note>
Expand Down Expand Up @@ -23549,8 +23548,9 @@ declare function fn:all(
letters A-F may be in either upper or lower case.</p></item>

</olist>
<p>The result must consist of valid XML characters (XML 1.0 or XML 1.1, whichever is supported by
the processor). For example <code>fn:char("#xDEAD")</code> is invalid because it is in the surrogate range.</p>
<p diff="chg" at="2023-06-12">The result must consist of
<termref def="dt-permitted-character">permitted characters</termref>.
For example <code>fn:char("#xDEAD")</code> is invalid because it is in the surrogate range.</p>
</fos:rules>
<fos:errors>
<p>The function fails with a dynamic error <errorref
Expand Down
27 changes: 20 additions & 7 deletions specifications/xpath-functions-40/src/xpath-functions.xml
Original file line number Diff line number Diff line change
Expand Up @@ -364,6 +364,16 @@ for transition to Proposed Recommendation. </p>'>
includes the option of supporting revised definitions of types such as <code>xs:NCName</code>
based on the rules in XML 1.1 rather than 1.0.</p>
</note>

<p diff="add" at="2023-06-12">The <bibref ref="xpath-datamodel-40"/> allows flexibility in the
repertoire of characters permitted during processing that goes beyond even what
version of XML is supported. A processor
<rfc2119>may</rfc2119> allow the user to construct nodes
and atomic values that contain characters not allowed by any version of
XML.
<termdef id="dt-permitted-character" term="permitted character">A <term>permitted character</term>
is one within the repertoire accepted by the implementation.</termdef></p>

<p>In this document, text labeled as an example or as a Note is
provided for explanatory purposes and is not normative.</p>
</div2>
Expand Down Expand Up @@ -10287,6 +10297,8 @@ currently, Version 9.0.0.
key="XQuery and XPath Data Model (XDM) 3.0"/>
<bibl id="xpath-datamodel-31"
key="XQuery and XPath Data Model (XDM) 3.1"/>
<bibl id="xpath-datamodel-40"
key="XQuery and XPath Data Model (XDM) 4.0"/>

<bibl id="xslt-xquery-serialization-31"
key="XSLT and XQuery Serialization 3.1"/>
Expand Down Expand Up @@ -10433,8 +10445,8 @@ ISBN 0 521 77752 6.</bibl>
than the implementation can represent (the implementation also has the option of rounding).</p>
</error>
<error class="CH" code="0001" label="Codepoint not valid." type="dynamic">
<p>Raised by <code>fn:codepoints-to-string</code> if the input contains an integer that is not the codepoint
of a valid XML character.</p>
<p diff="chg" at="2023-06-12">Raised by <code>fn:codepoints-to-string</code> if the input contains an integer that is not the codepoint
of a <termref def="dt-permitted-character">permitted character</termref>.</p>
</error>
<error class="CH" code="0002" label="Unsupported collation." type="dynamic">
<p>Raised by any function that uses a collation if the requested collation is not recognized.</p>
Expand All @@ -10450,9 +10462,9 @@ ISBN 0 521 77752 6.</bibl>
</error>
<error class="CH" code="0005" label="Unrecognized or invalid character name."
type="dynamic">
<p>Raised by <code>fn:char</code> if the supplied character name is not recognized, or
if it represents a codepoint that is not valid in the version of XML supported by the
processor.</p>
<p diff="chg" at="2023-06-12">Raised by <code>fn:char</code> if the supplied character name is not recognized, or
if it represents a codepoint that is not
a <termref def="dt-permitted-character">permitted character</termref>.</p>
</error>
<error class="DC" code="0001" label="No context document." type="dynamic">
<p>Raised by <code>fn:id</code>, <code>fn:idref</code>, and <code>fn:element-with-id</code>
Expand Down Expand Up @@ -10714,12 +10726,13 @@ ISBN 0 521 77752 6.</bibl>
</error>
<error class="UT" code="1190" label="Cannot decode external resource."
type="dynamic">
<p>Raised by <code>fn:unparsed-text</code> or <code>fn:unparsed-text-lines</code>
<p diff="chg" at="2023-06-12">Raised by <code>fn:unparsed-text</code> or <code>fn:unparsed-text-lines</code>
if the <code>$encoding</code> argument is not a valid encoding name,
if the processor does not support the specified encoding, if the string
representation of the retrieved resource contains octets that cannot be decoded
into Unicode <termref def="character">characters</termref> using the specified
encoding, or if the resulting characters are not permitted XML characters.</p>
encoding, or if the resulting characters are not
<termref def="dt-permitted-character">permitted characters</termref>.</p>
</error>
<error class="UT" code="1200" label="Cannot infer encoding of external resource."
type="dynamic">
Expand Down