Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Effect of atomization on array:index-of() #1096

Open
michaelhkay opened this issue Mar 18, 2024 · 12 comments
Open

Effect of atomization on array:index-of() #1096

michaelhkay opened this issue Mar 18, 2024 · 12 comments
Labels
Editorial Minor typos, wording clarifications, example fixes, etc. XQFO An issue related to Functions and Operators

Comments

@michaelhkay
Copy link
Contributor

What is the expected result of the expression:

array:index-of( [[1,2], [3,4]], [3,4] )

It seems that the second argument is atomised (because its declared type is atomic), but the first argument is not.

So both members of the array have count=1, whereas $search has count=2, so nothing matches, so the result is ().

Now, what if we write:

array:index-of( [[1,2], (3,4)], [3,4] )

This time it seems that the second member of the array matches, so the result is 2.

This doesn't feel right. One solution would be to say that each member of the array is itself atomised. But that seems to lead to other surprises with other examples of nested arrays.

An alternative would be to atomize neither argument (which would mean changing the function signature). But then we would need to use a different comparison operation.

We seem to be back where we started -- I was unhappy about introducing this function because of the difficulty of defining a good comparison operation for it to use.

@ChristianGruen ChristianGruen added XQFO An issue related to Functions and Operators Editorial Minor typos, wording clarifications, example fixes, etc. labels Mar 18, 2024
@ChristianGruen
Copy link
Contributor

I agree, this may be confusing.

On the other hand, considering that index-of([4, 3], [3]) returns 2, it would be safe to argue that fn:index-of is similarly confusing… As soon as atomization comes into play, people need to understand the concept.

My (and I think Dimitre’s) initial suggestion was to use fn:deep-equal and avoid atomization; it would still be my favorite solution.

@michaelhkay
Copy link
Contributor Author

Yes, I think using deep-equal would give fewer surprises. But then you might as well use array:index-where() and make it explicit.

@ChristianGruen
Copy link
Contributor

Yes, I think using deep-equal would give fewer surprises. But then you might as well use array:index-where() and make it explicit.

That’s certainly true. I would probably do it, but the idea was to offer a function for people who don’t use higher-order functions (which will still be the majority).

@michaelhkay
Copy link
Contributor Author

index-of([4, 3], [3]) returns 2

In what way is that confusing? I think the key thing with fn:index-of is that both arguments are atomized. With array:index-of the problem is that (a) only one of the arguments is atomized, while (b) we are using a comparator that only works on atomic values - or at any rate, sequences of atomic values.

Another solution might be to atomize the members of the supplied array, but not the array itself. Change the definition to

(1 to array:size($array))[
  let $member := data($array(.))
  count($member) = count($search) and
  every(for-each-pair(
    $member, $search, fn($a, $b) { index-of($a, $b, $collation) = 1) }
  ))
]

With this definition the following expressions will all return 2:

array:index-of( [[1,2], [3,4]],  [3,4] )
array:index-of( [[1,2], [3,4]],  (3,4) )
array:index-of( [[1,2], (3,4)],  [3,4] )
array:index-of( [[1,2], (3,4)],  (3,4) )

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Mar 19, 2024

Note also, the rather cumbersome

count($member) = count($search) and
  every(for-each-pair(
    $member, $search, fn($a, $b) { index-of($a, $b, $collation) = 1) }
  ))

could be replaced by

equals-sequence($member, $search, fn($a, $b) { index-of($a, $b, $collation) = 1) }

if we had such a thing. We have starts-with-subsequence(), contains-subsequence(), and ends-with-subsequence(), but we don't have equals-sequence().

I proposed these three functions in PR #222 in response to issue #94 and issue #96, and the only reason I didn't propose the equals-sequence() variant was difficulty in choosing a name. Clearly matching a sequence in its entirety is at least as common as the other 3 operations.

The requirement for sequence-equals() and array-equals() is described in open issue #99.

Or perhaps we should instead have a more powerful compare-sequences() returning 0, -1, +1 taking as argument a comparator function that returns 0, -1, or +1.

@ChristianGruen
Copy link
Contributor

index-of([4, 3], [3]) returns 2
In what way is that confusing?

In a way that atomization of arrays (although powerful) is still largely unknown to users (at least the ones I have come across). But from a technical point of view it’s consistent.

Another solution might be to atomize the members of the supplied array, but not the array itself […]

Looks good (exists(index-of($a, $b, $collation)) and index-of($a, $b, $collation) = 1 are equivalent in our case).

@ChristianGruen
Copy link
Contributor

the only reason I didn't propose the equals-sequence() variant was difficulty in choosing a name.

Maybe just fn:equal (as it could also be used to compare single items), or fn:equal-where to indicate that a predicate function is expected (unless we define a default comparison function).

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Mar 19, 2024

In PR #1100 I have proposed a new fn:equal() function, and I propose we redefine array:index-of to use it. Specifically

declare function array:index-of($array as array(*), $search as item()*) as xs:integer* {
  $array => array:for-each(fn($member, $pos){if (equal($member, $search)) {$pos}})) => array:values()
}

@ChristianGruen
Copy link
Contributor

declare function array:index-of($array as array(*), $search as item()*) as xs:integer* {
  $array => array:for-each(fn($member, $pos){if (equal($member, $search)) {$pos}})) => array:values()
}

Sounds good. Following your suggestion to atomize the members of the supplied array, it should probably be if (equal(data($member), $search)) { $pos }. Next, we shouldn’t forget the $collation argument.

@michaelhkay
Copy link
Contributor Author

Well, I was proposing here to NOT atomize, because I think that's clearer when dealing with arrays. I was also proposing to use atomic-equal() by default, which is context free and therefore doesn't use a collation; if they want a collation, they can supply a callback that uses one. But we could change it to if (equal($member, $search, $some-other-function)) {$pos}} if preferred.

@ChristianGruen
Copy link
Contributor

Makes sense. Regarding collations, I’m not really fond of them, I just noticed that the signature would differ from fn:index-of. Anyway, the simpler, the better.

@dnovatchev
Copy link
Contributor

Or perhaps we should instead have a more powerful compare-sequences() returning 0, -1, +1 taking as argument a comparator function that returns 0, -1, or +1.

Yes, definitely.

and also:

array:compare()

Recently I needed a function to compare two arrays for "less than" and couldn't find one. I ended up using array:sort() passing to it an array with its two members the arrays I wanted to compare. Works, but has too-many levels of indirection, that are absolutely unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Editorial Minor typos, wording clarifications, example fixes, etc. XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

3 participants