Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String comparison in deep-equal #934

Closed
michaelhkay opened this issue Jan 8, 2024 · 6 comments · Fixed by #1191
Closed

String comparison in deep-equal #934

michaelhkay opened this issue Jan 8, 2024 · 6 comments · Fixed by #1191
Assignees
Labels
Editorial Minor typos, wording clarifications, example fixes, etc. PR Pending A PR has been raised to resolve this issue Propose for V4.0 The WG should consider this item critical to 4.0 XQFO An issue related to Functions and Operators

Comments

@michaelhkay
Copy link
Contributor

The code showing how strings should be compared in deep-equal has gone awry, it doesn't match the prose. In equal-strings(), the lines

let $n1 := if ($options?whitespace = "normalize"))
             then normalize-unicode(?, $options?normalization-form) 
             else identity#1
  let $n2 := if ($options?normalize-space)
             then normalize-space#1 
             else identity#1      

should read:

let $n1 := if ($options?whitespace = 'normalize')
             then normalize-space#1 
             else identity#1    
let $n2 := if ($options?normalization-form))
             then normalize-unicode(?, $options?normalization-form) 
             else identity#1
  

Actually, the whole thing can now be expressed more concisely using fn:chain:

declare function equal-strings(
  $string1   as xs:string,
  $string2   as xs:string, 
  $collation as xs:string,
  $options   as map(*)
) as xs:boolean {
  let $norm := fn:chain(?,
                        (normalize-unicode(?, $options?normalization-form)[$options?whitespace = "normalize"],
                         normalize-space#1[$options?normalize-space]))          
  return compare($norm($string1), $norm($string2), $collation) eq 0    
}
@michaelhkay
Copy link
Contributor Author

Related: we say

Whitespace-only text nodes are discarded if both the following conditions are true:

  1. Either:
    1.1 The option whitespace is set to strip or normalize; or
    1.2 $parent is a schema-validated element node whose type annotation is a complex type with an element-only or empty content model.
  2. The text node is not within the scope of an element that has the attribute xml:space="preserve".

I need to check this, but I think that whitespace text nodes will be removed from elements with element-only content model during the schema validation process even if xml:space="preserve" has been set.

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Jan 9, 2024

Also closely related: the current rules for string comparison under whitespace normalization are non-transitive.

This is because whitespace normalization is not applied to anyURI values.

Currently, if the whitespace='normalize' option is set, then

  • string(' a ') equals string('a')
  • string('a') equals anyURI('a')
  • string(' a ') does not equal anyURI('a')

This makes it very hard to implement unordered comparison (option ordered="false", or the implicit unordered comparison for attribute nodes) because it's not enough to find a matching pair of values from the two sequences and then compare the items that remain; you have to consider the possibility that there might be a different pairing of the two sequences that works.

This is the only case of non-transitivity that I have found, but there may be others. This one is easily fixed by saying that whitespace normalization should be applied when comparing strings and URIs.

(Of course, we depend on the collation being transitive. Perhaps we should state this as an explicit assumption/dependency. I believe that the UCA is designed to be transitive).

@ChristianGruen ChristianGruen added XQFO An issue related to Functions and Operators Editorial Minor typos, wording clarifications, example fixes, etc. labels Jan 9, 2024
@michaelhkay
Copy link
Contributor Author

michaelhkay commented Jan 9, 2024

The whitespace=normalize option is not working as expected for attribute nodes. This is because by default, the typed-values option is set to true, and the whitespace option does not apply when comparing typed values (as distinct from string values). I think there would be fewer surprises if the rule for attribute nodes were changed from:

F. Either the option typed-value is false, or the typed value of $i1 is deep-equal to the typed value of $i2.
G. Either the option typed-value is true, or the equal-strings function returns true when applied to the string value of $i1 and the string value of $i2.

to:

F. Let T be true if the typed-value option is true and both attributes have a type annotation other than xs:untypedAtomic. Then either T is true and the typed value of $i1 is deep-equal to the typed value of $i2, or T is false and the equal-strings function returns true when applied to the string value of $i1 and the string value of $i2.

This parallels the rule for element nodes, where typed-value comparison takes place only if the actual nodes to be compared are typed.

@ChristianGruen
Copy link
Contributor

@michaelhkay The test suite contains fn:deep-equal tests with the obsolete text-boundaries option. Do you plan to rewrite these tests to use the whitespace option, or should they be dropped?

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Jan 23, 2024

Two more questions. The current rules in the spec are as follows:

  1. Comment nodes are discarded if the option comments is false.
  2. Processing instruction nodes are discarded if the option processing-instructions is false.
  3. Adjacent text nodes are merged.
  4. Whitespace-only text nodes are discarded if both the following conditions are true […]

I assume that with new rules, K2-SeqDeepEqualFunc-20 and K2-SeqDeepEqualFunc-22 should be rewritten to return true?

Next, is it correct that the following query is expected to return true, because Step 4 (the check for whitespace-only nodes) is defined after Step 3 (the merge of adjacent text nodes)?

deep-equal(
  <x>A<?_?>{ ' ' }<?_?>C</x>,
  <x>A C</x>,
  options := map { 'whitespace': 'strip' }
)

@ChristianGruen
Copy link
Contributor

…and one more (sorry): Isn’t deep-equal-40-whitespace-010 supposed to return true?

@ChristianGruen ChristianGruen added the Propose for V4.0 The WG should consider this item critical to 4.0 label Mar 7, 2024
ChristianGruen added a commit to qt4cg/qt4tests that referenced this issue Mar 7, 2024
@michaelhkay michaelhkay added the PR Pending A PR has been raised to resolve this issue label May 6, 2024
@ndw ndw closed this as completed in #1191 May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Editorial Minor typos, wording clarifications, example fixes, etc. PR Pending A PR has been raised to resolve this issue Propose for V4.0 The WG should consider this item critical to 4.0 XQFO An issue related to Functions and Operators
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants