Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add the missing functions for arrays: array:exists() and array:empty() #229

Closed
dnovatchev opened this issue Oct 31, 2022 · 21 comments
Labels
Feature A change that introduces a new feature XQFO An issue related to Functions and Operators

Comments

@dnovatchev
Copy link
Contributor

The functions over sequences: fn:exists and fn:empty are amongst the most useful and well-understood sequence functions.

In case we need to know whether or not an array has at least one member, or none, these function cannot help us:

image


To fill this gap, these two functions are proposed:

array:exists:

Signature

array:exists($arg as array(*)) as xs:boolean

Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

If the value of $arg is a non-empty array, the function returns true(); otherwise, the function returns false().

Examples

The expression array:exists(array:remove(["hello"], 1) returns false().

The expression aray:exists(array:remove(["hello", "world"], 1) returns true().

The expression aray:exists([]) returns false().

The expression aray:exists([ () ]) returns true().

array:empty:

Signature

array:empty($arg as array(*)) as xs:boolean

Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

If the value of $arg is the empty array, the function returns true(); otherwise, the function returns false().

Examples

The expression array:empty(array:remove(["hello"], 1) returns true().

The expression aray:empty(array:remove(["hello", "world"], 1) returns false().

The expression aray:empty([]) returns true().

The expression aray:empty([ () ]) returns false().

@dnovatchev dnovatchev added the XQFO An issue related to Functions and Operators label Nov 1, 2022
@michaelhkay
Copy link
Contributor

I worry a little about the name array:exists(), it feels too much like a test as to whether the array exists, not whether it has any members.

@ChristianGruen
Copy link
Contributor

I always felt fn:exists to be similarly confusing. On the other hand, if people know what it means, they should also understand how array:exists works (and, possibly, map:exists).

@ChristianGruen ChristianGruen added the Feature A change that introduces a new feature label Nov 1, 2022
@ndw
Copy link
Contributor

ndw commented Nov 1, 2022

It's unfortunate that we have an inconsistency between sequences, arrays, and maps: we use fn:count to find out how many elements are in a sequence but array:size and map:size for arrays and maps.

For sequences, there's no confusion about what "not existing" means, but I see Mike's point about "exists" functions on maps and arrays. I suppose as long as we don't allow the argument to array:size and map:size to be optional, they work analagously to fn:exists and people will either find it does exactly what they expect or they'll get used to it.

It does strike me as slightly disconcerting that if you wanted to know if an optional array (or map) was non-empty, you'd have to write:

exists($arr) and array:exists($arr)

But the alternative of allowing the argument to array:exists to be optional would be differently confusing. Presumably, if we allowed the argument to array:exists to be empty, we'd have to say that array:exists(()) returned () where array:exists(array{}) returns false().

On balance, I'm in favor. They work like fn:exists and trying to come up with new names would simply be another inconsistency between sequences and maps and arrays.

OTOH, I don't find if (array:size($arr) gt 0) ... that troubling to write.

@michaelhkay
Copy link
Contributor

Compatibility means deliberately repeating other people's mistakes.

Yes, I think fn:exists() was a mistake. It reads OK in a context like fn:exists(child::h1). But "array:exists" reads as "does the array exist?" not as "does the array have any members?"

@ndw
Copy link
Contributor

ndw commented Nov 1, 2022

I suppose fn:has-items could be proposed which takes an optional argument:

  1. If the argument is empty, return false()
  2. If the argument is a sequence of length greater than one, return true()
  3. If the argument is a single map, return map:size($arg) gt 0
  4. If the argument is a single array, return array:size($arg gt 0
  5. Return true()

Whether that's a good idea or not...

@Arithmeticus
Copy link
Contributor

+1 to latest suggestion by @ndw with some suggested alternative names for discussion (because I think array:exists is misleading):

  • fn:is-populated
  • fn:is-empty (obviously the inverse of the function)
  • fn:has-members

@dnovatchev
Copy link
Contributor Author

In LINQ (C#) any IEnumerable has the method

Any()

There is no equivalent for fn:empty. The code just checks for : if(!collection.Any()) , and this is perfectly understandable and readable.

Thus I would prefer something similar (in decreasing order of preference):

array:any, or array:has-any, or array:has-members

For implementation I would strongly recommend not to mention array:size at all, as knowing the size is not required for this functionality, and in fact computing the size could take O(N), while this function can and should be implemented in O(1).

@benibela
Copy link

benibela commented Nov 1, 2022

array:exists
The expression array:exists([ () ]) returns true().

Perhaps array:exists([ () ]) false would be better, if people want to know if there is actual data. Even array:exists([ [] ]) could be reasonable

For implementation I would strongly recommend not to mention array:size at all, as knowing the size is not required for this functionality, and in fact computing the size could take O(N), while this function can and should be implemented in O(1).

The implementation does not need to calculate the size to test array:size($arg gt 0, some optimizer could change it

@dnovatchev
Copy link
Contributor Author

array:exists
The expression array:exists([ () ]) returns true().

Perhaps array:exists([ () ]) false would be better, if people want to know if there is actual data. Even array:exists([ [] ]) could be reasonable

Definitely not. The array has one member. And this empty sequence is the actual data contained in that member. An empty sequence is in fact actual data. For example, the empty sequence can be the value passed as argument in a given function call.

For implementation I would strongly recommend not to mention array:size at all, as knowing the size is not required for this functionality, and in fact computing the size could take O(N), while this function can and should be implemented in O(1).

The implementation does not need to calculate the size to test array:size($arg gt 0, some optimizer could change it

Yes, and given that some of the comments above referred to array:size, we need to not have any mentions of array:size in the specification of these two functions.

@joewiz
Copy link

joewiz commented Nov 1, 2022

These would be useful functions!

I agree that array:exists() and map:exists() are potentially confusing. A single catch-all fn:has-items() also risks confusion, since the data types are so different - across sequences, arrays, and maps.

I prefer data-type specific terminology like array:has-members() since according to https://www.w3.org/TR/xquery-31/#dt-member:

Definition: The values of an array are called its members.

I'd also welcome a corresponding map:has-entries() (a convenience function for map:size($map) gt 0), since according to https://www.w3.org/TR/xquery-31/#dt-entry:

Definition: Each key / value pair in a map is called an entry.

And for the empty version, I think map:empty() and array:empty() wouldn't risk confusion (shorthands for map:size($map) eq 0 and array:size($array) eq 0).

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Nov 1, 2022

@joewiz ,

These would be useful functions!

I agree that array:exists() and map:exists() are potentially confusing. A single catch-all fn:has-items() also risks confusion, since the data types are so different - across sequences, arrays, and maps.

I prefer data-type specific terminology like array:has-members() since according to https://www.w3.org/TR/xquery-31/#dt-member:

Definition: The values of an array are called its members.

👍

I'd also welcome a corresponding map:has-entries() (a convenience function for map:size($map) gt 0), since according to https://www.w3.org/TR/xquery-31/#dt-entry:

Definition: Each key / value pair in a map is called an entry.

👍

And for the empty version, I think map:empty() and array:empty() wouldn't risk confusion (shorthands for map:size($map) eq 0 and array:size($array) eq 0).

Let us not mention array:size and map:size because a for naive Xpath processor implementing this it will take O(N) time, while the proposed functions are O(1).

I think that this problem is eliminated using the following equivalent expressions, instead:

not(map:has-entries($m))

not(array:has-members($a))

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Nov 2, 2022

I like array:has-members() and map:has-entries().

Most existing built-in functions have names without auxiliary verbs, indicating possession (has), or a copula (is), but we already have fn:has-children and we’ll soon have fn:is-NaN, and the general tendency for version 4.0 seems to be to choose a more self-explanatory wording.

I think we don’t additionally need array:empty and map:empty. If we believe we do, we could think about array:is-empty and map:is-empty, as map:empty might indicate that it returns an empty map (it reminds me of map:entry, which returns a map with a single entry).

@michaelhkay
Copy link
Contributor

Ignoring other questions, there's nothing wrong with the specification being defined in terms of a very inefficient implementation. We do that all the time. It's a specification of what result the function delivers, it's not guidance for the implementor on how to implement it efficiently.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Nov 2, 2022

Ignoring other questions, there's nothing wrong with the specification being defined in terms of a very inefficient implementation. We do that all the time. It's a specification of what result the function delivers, it's not guidance for the implementor on how to implement it efficiently.

@michaelhkay , Even so, maybe it is always a good idea putting a warning that literally having such implementation could be ridiculously inefficient, right?

@dnovatchev In general, I don't think implementors expect the specification writers to give them advice on how to implement the specification efficiently. We do so occasionally, e.g. in the Note at the end of XSLT 3.0 section 5.5.3 (see https://www.w3.org/TR/xslt-30/#pattern-semantics) but if we did this too often then (a) it would get very tedious, and (b) we would often get the advice wrong, since implementors are sometimes much smarter than we imagine.

@ChristianGruen
Copy link
Contributor

I agree with Michael: I see no need to add hints for implementors to the spec. There are numerous other language features that would be very inefficient if implemented without further thoughts (think of e.g. (1 to 100000000)[position() = 1]. Personally, I would never look up the specification to get aware of the obvious pitfalls. It's usually the complex stuff that's much harder to optimize.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Nov 2, 2022

I agree with Michael: I see no need to add hints for implementors to the spec. There are numerous other language features that would be very inefficient if implemented without further thoughts (think of e.g. (1 to 100000000)[position() = 1]. Personally, I would never look up the specification to get aware of the obvious pitfalls. It's usually the complex stuff that's much harder to optimize.

Yes, this is true, but we can at least put a short reminder in parenthesis, something like:

(note, this can be implemented with O(1) time complexity)

@dnovatchev I can envisage implementations of arrays in which determining whether the array is empty involves looking to see how many members are marked as deleted/removed. That will not be O(1). We might think that's not a good implementation, but it's not our call. The implementor might want to reduce the cost of remove() operations at the expense of increasing the cost of is-empty() - that's entirely up to them, it's not for us to judge. It's perfectly adequate to say that is-empty() (or whatever we call it) returns the same result as size()=0, and if the implementor wants to make one faster than the other, that's entirely their choice.

@dnovatchev
Copy link
Contributor Author

Yes, this is true, but we can at least put a short reminder in parenthesis, something like:

(note, this can be implemented with O(1) time complexity)

@dnovatchev I can envisage implementations of arrays in which determining whether the array is empty involves looking to see how many members are marked as deleted/removed. That will not be O(1). We might think that's not a good implementation, but it's not our call. The implementor might want to reduce the cost of remove() operations at the expense of increasing the cost of is-empty() - that's entirely up to them, it's not for us to judge. It's perfectly adequate to say that is-empty() (or whatever we call it) returns the same result as size()=0, and if the implementor wants to make one faster than the other, that's entirely their choice.

@michaelhkay,

If the specification says (as suggested in your comment:

is-empty() (or whatever we call it) returns the same result as size()=0,

It would be good also to add: "but this is not a suggestion to use size() in the actual implementation"

@ChristianGruen
Copy link
Contributor

It would be good also to add: "but this is not a suggestion to use size() in the actual implementation"

@dnovatchev All these comments could certainly be added, but to be honest, I don't understand who would benefit from such notes. We should find at least one implementor out there who believes this would be of help.

@dnovatchev
Copy link
Contributor Author

It would be good also to add: "but this is not a suggestion to use size() in the actual implementation"

@dnovatchev All these comments could certainly be added, but to be honest, I don't understand who would benefit from such notes. We should find at least one implementor out there who believes this would be of help.

@ChristianGruen We could have such statement just in a single, central section of the specification, just to be known that the provided "possible implementations code" is only for instructive, definitional purposes and should not be perceived as any recommendation to implement exactly this code/algorithm.

When distinguishing between "clear" and "useful", "clear" comes before "useful" (because if something is unclear, we cannot judge whether it is useful or not)

@ChristianGruen
Copy link
Contributor

When distinguishing between "clear" and "useful", "clear" comes before "useful"

So we should possibly find at least one implementor who believes that the specification gets more clear if hints of that kind are added.

@ChristianGruen
Copy link
Contributor

The initially proposed functions have been added to the spec (#250).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A change that introduces a new feature XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

7 participants