Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XPath] Functions symmetric to head() and tail() for sequences and arrays #97

Closed
dnovatchev opened this issue Nov 22, 2021 · 57 comments
Closed
Labels
Feature A change that introduces a new feature XQFO An issue related to Functions and Operators

Comments

@dnovatchev
Copy link
Contributor

dnovatchev commented Nov 22, 2021

In Xpath 3.1 we already have head(), tail(), and last()

But there is no function that produces the subsequence of all items of a sequence except the last one. There exists such a function in other programming languages. For example, in Haskell this is the init function.

And the last() function isn't the symmetric opposite of head() -- it doesn't give us the last item in a sequence, just its position. So we need another function: fn:heel() for this.

fn:init($sequence as item()*) as item()*

fn:heel($sequence as item()*) as item()?

init($seq) is a convenient shorthand for subsequence($seq, 1, count($seq) -1)

heel($seq) is a convenient shorthand for slice($seq, -1)

Examples:

fn:init(('a', 'b', 'c')) returns 'a', 'b'

fn:init(('a', 'b')) returns 'a'

fn:init('a') returns ()

fn:init(()) returns ()

fn:heel('a', 'b', 'c') returns 'c'

('a', 'b', 'c') => init() => heel() returns 'b'

It makes sense to have fn:init() and fn:heel() defined on arrays, too.

array:init($array as array(*)) as array(*)

array:heel($array as array(*)) as item()*

Examples:

array:init([1, 2, 3, 4, 5]) returns [1, 2, 3, 4]

array:init([1]) returns []

array:heel([1, 2, 3, (4, 5)]) returns (4, 5)

array:heel([()]) returns () (the empty sequence)

array:init([]) produces error

array:heel([]) produces error

[1, 2, 3, (4, 5)] =>array:heel() => heel() returns 5

I would challenge anyone to re-write the last example in understandable way using fn:slice() 💯

@ChristianGruen
Copy link
Contributor

+1. I proposed this function a while ago (I cannot find out where, though, apart from some remarks on Slack), and I remember that @michaelhkay preferred to name it fn:truncate.

@ChristianGruen
Copy link
Contributor

PS: Analogous to fn:head and other sequence functions, fn:init(()) should return ().

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Nov 22, 2021

And PPS (sorry, I’ll be more disciplined next time again) fn:truncate still seems to be mentioned in an example in the current draft of the spec: https://qt4cg.org/branch/master/xpath-functions-40/Overview-diff.html#func-range-to.

I agree with Dimitre, and I’d love to see fn:init (or fn:truncate) readded, as well as e.g. fn:foot to return the last item.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Nov 22, 2021

I agree completely with Christian, and indeed, we need a fn:foot, because the existing fn:last doesn't do the symmetric of fn:head, as it is supposed to in other languages, such as Haskell.

@dnovatchev
Copy link
Contributor Author

I agree completely with Christian, and indeed, we need a fn:foot, because the existing fn:last doesn't do the symmetric of fn:head, as it is supposed to in other languages, such as Haskell.

But the name foot is... E w e . . .

My preferred names are: final, rear or ending.

@ChristianGruen
Copy link
Contributor

Related (quoting Michael Kay, https://app.slack.com/client/T011VK9115Z/C011NLXE4DU/thread/C011NLXE4DU-1605590306.224400):

I proposed foot($s) to get the last item in a sequence, but I'm now leaning towards slice($s, -1) as that packs a lot more power into one function. The second argument can be any sequence of integers, with negative integers counting from the end, so for example slice($s, -$n to -1) gives you the last $n.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Nov 23, 2021

Related (quoting Michael Kay, https://app.slack.com/client/T011VK9115Z/C011NLXE4DU/thread/C011NLXE4DU-1605590306.224400):

I proposed foot($s) to get the last item in a sequence, but I'm now leaning towards slice($s, -1) as that packs a lot more power into one function. The second argument can be any sequence of integers, with negative integers counting from the end, so for example slice($s, -$n to -1) gives you the last $n.

. . .
Still, what matters is convenience, brevity and being easy to understand. Thus it would be good to have a function with intuitive name for that.

fn:rear($sequence as item()*) as item()

Is a shorthand for: slice($sequence, -1)

Examples:

('a', 'b', 'c') => rear() produces 'c'

It makes sense to have fn:init() and fn:rear() defined on arrays, too.

array:init($array as array(*)) as array(*)

array:rear($array as array(*)) as item()*

Examples:

array:init([1, 2, 3, 4, 5]) returns [1, 2, 3, 4]

array:rear([1, 2, 3, (4, 5)]) returns (4, 5)

[1, 2, 3, (4, 5)] =>array:rear() => rear() returns 5

I would challenge anyone to re-write the last example in understandable way using fn:slice() 💯

@ChristianGruen
Copy link
Contributor

My impression is that fn:rear and fn:butt are too similar ;) Maybe it should be up to native speakers to make the decision. I’d also prefer a 4-letter-term.

I agree that the syntax of fn:slice is not that intuitive (but it's definitely powerful).

@dnovatchev
Copy link
Contributor Author

My impression is that fn:rear and fn:butt are too similar ;)

:) A little humor isn't bad

Maybe it should be up to native speakers to make the decision. I’d also prefer a 4-letter-term.

I agree that the syntax of fn:slice is not that intuitive (but it's definitely powerful).

Yes, and I updated the previous comment with examples of rear() on arrays, that are understandable, but would be difficult to assimilate if slice() were used

@dnovatchev
Copy link
Contributor Author

I updated the issue to include definition for fn:rear() and the array equivalents: array:init() and array:rear()

@dnovatchev dnovatchev changed the title [XPath] Function for the subsequence of all items except the last one [XPath] Functions symmetric to head() and tail() for sequences and arrays Nov 23, 2021
@adamretter
Copy link

Is fn:rear($seq) just a single function for fn:head(fn:reverse($seq)) ?

@martin-honnen
Copy link

I understand the proposals but fn:init(()) produces error astonishes me, that (an error) wouldn't happen with subsequence, or would it?

@ChristianGruen
Copy link
Contributor

True. It's only the array functions that should trigger errors (and actually I never understood why array operations and functions were designed to raise range errors).

@michaelhkay
Copy link
Contributor

I'm afraid there's always been tension between the error and no-error philosophies. Personally I tend to the "raise an error" camp - errors are easier to debug than wrong answers, and with XPath the hardest thing of all to debug is the expression that returns an empty sequence and you can't work out why. But the worst problem when you design by committee is that it's very hard to achieve a consistent policy on such things.

@ChristianGruen
Copy link
Contributor

My experience is mostly that it's confusing that sequence and array lookups behave differently. Maps are (fortunately?) similar to sequences: You won’t get errors if you request the value of a non-existing key. As a consequence, it's often easier to refactor existing sequence-based code into map-based code, although arrays would be the more natural choice.

But, yes, it's difficult to evolve a language that has been very lax in the beginning, and I’m glad that the semantics of XQuery are stricter than the ones of XPath 1.0.

@ChristianGruen
Copy link
Contributor

Is fn:rear($seq) just a single function for fn:head(fn:reverse($seq)) ?

@adamretter Exactly

@joewiz
Copy link

joewiz commented Nov 23, 2021

As an alternative to the name fn:foot or fn:rear, how about fn:toe? This matches the common phrase, "head to toe," https://en.wiktionary.org/wiki/head_to_toe, and the 1987 Lisa Lisa and Cult Jam classic.

@ChristianGruen
Copy link
Contributor

This matches the common phrase, "head to toe,"

Interesting! In German, it’s »von Kopf bis Fuß« (from head to foot).

@dnovatchev
Copy link
Contributor Author

This matches the common phrase, "head to toe,"

I thought about "toe". But one has many toes, and just one head. Even for "foot", one has two feet, but just one head.

Certainly, if there is such a saying, people may accept a function name as "toe" naturally. This is for the native English speakers to decide.

@michaelhkay
Copy link
Contributor

But then, we have to rename the parent axis, because as well as having two feet, we also have two parents.

@gimsieke
Copy link
Contributor

head and tail obviously derive from horizontal animals, not from vertical humans, so the last item in a sequence must be something like the tail tip or a cow’s switch. Switch according to Merriam-Webster: “a tuft of long hairs at the end of the tail of an animal (such as a cow).” But the term “switch” is already taken, and it doesn’t meet Christian’s four-letter requirement. “Tuft” meets this requirement, but no-one associates “tuft” with “last item in a sequence.” Naming things…

@dnovatchev
Copy link
Contributor Author

image

@gimsieke
Copy link
Contributor

Are you advocating in favor of any of the terms presented in the image, @dnovatchev?

It just came to me that even though it doesn’t meet Christian’s 4-letter criterion, end() is a function name that is not taken yet by anything else in our “namespace.”

And a string-length() of 3 is only off by one, an excusable error that may happen to any of us.

So end() is what I’m cautiously proposing if we really need to have a function that gives the last item of a sequence. (I don’t think we need such a function urgently though.)

@dnovatchev
Copy link
Contributor Author

Are you advocating in favor of any of the terms presented in the image, @dnovatchev?

@gimsieke, Gerrit,
I am not a native speaker of English, but final and furthest seem best to me.

@gimsieke
Copy link
Contributor

Ok, final() is also off by one, so it’s a valid contender (say I in German word order as another non-native English speaker)

@dnovatchev
Copy link
Contributor Author

@gimsieke
If "foreign" words were allowed, the french: a_priori and dernier also seem fine 😊

@gimsieke
Copy link
Contributor

In German, we can make the distinction by
letzte() (≘ tail()),
letztes() (≘ last item of a sequence),
allerletztes() (≘ really last item of a sequence),
allerallerletztes() (≘ really really last item of a sequence),
vorletztes() (≘ penultimate item of a sequence),
übernächstes() (≘ next after next item of a sequence, or rather, as an übernächst:: axis),
etc.

@dnovatchev
Copy link
Contributor Author

This matches the common phrase, "head to toe,"

I thought about "toe". But one has many toes, and just one head. Even for "foot", one has two feet, but just one head.

Certainly, if there is such a saying, people may accept a function name as "toe" naturally. This is for the native English speakers to decide.

I think I found the definitive correct name for this function.

The heel is the rearmost part of the foot, which is the farthest from the head

Thus:

fn:heel($sequence as item()*) as item()

So, head() and heel() , that's it! And both start with an H and this is easier to remember.

@martin-honnen
Copy link

But it should be fn:heel($sequence as item()*) as item()?, to indicate a single item or the empty sequence (if the argument is the empty sequence) is returned.

@ChristianGruen
Copy link
Contributor

Note that array:heel must raise an error if we want to be consistent (see my comment further above).

@martin-honnen
Copy link

I am not sure having fn:head or fn:heel throw an error if the argument is empty is the right thing. First of all, in XPath itself we have no try/catch way, we only have that in host languages like XQuery or XSLT.

And it would break the way people write recursion using head/tail, for instance, if head would give an error for the empty sequence, the examples https://www.w3.org/TR/xpath-functions/#highest-lowest using e.g. fn:fold-left(fn:tail($seq), fn:head($seq), function($highestSoFar as item()*, $this as item()*) as item()* { or fn:fold-left(fn:tail($seq), fn:head($seq), function($highestSoFar as item()*, $this as item()*) as item()* { would break, it seems.

That is just one example but it even appears in the (admittedly explanatory/optional) part of the functions and operators spec.

@michaelhkay
Copy link
Contributor

Consistency should be the guiding rule here. The function should be symmetric with fn:head(). No errors. It also follows the principle in many standard functions that if the argument is an empty sequence, the result is an empty sequence.

There's another principle here, which is that a function should be designed to work over the largest domain for which it is meaningful; errors should be reserved for cases where there's no conceivable use case for supplying a particular argument value. It would have made sense for subsequence($x, 3.14159) to be an error because there's no way anyone would deliberately do that to achieve the specified effect. But having head($x) (or tail($x)) return empty when $x is empty is useful and reasonable: it makes a head-tail recursive functions easier to express.

@dnovatchev
Copy link
Contributor Author

Note that array:heel must raise an error if we want to be consistent (see my comment further above).

Thanks @ChristianGruen.
Updated the signature of array:heel() and provided examples when both array:heel() and array:init() throw an error.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Nov 26, 2021

But having head($x) (or tail($x)) return empty when $x is empty is useful and reasonable: it makes a head-tail recursive functions easier to express.

@michaelhkay, Obviously the people (and editor 😊) who put array:head() and array:tail() in the XPath 3.1 F&O Specification thought differently ?

@ChristianGruen
Copy link
Contributor

One last remark: array:heel() may indeed return zero or several items. Both the return type and the error handling should be identical to array:head() (https://www.w3.org/TR/xpath-functions-31/#func-array-head).

@dnovatchev
Copy link
Contributor Author

One last remark: array:heel() may indeed return zero or several items. Both the return type and the error handling should be identical to array:head() (https://www.w3.org/TR/xpath-functions-31/#func-array-head).

Done!

Also provided examples for this case.

@dnovatchev dnovatchev added XQFO An issue related to Functions and Operators Feature A change that introduces a new feature labels Sep 14, 2022
@ChristianGruen
Copy link
Contributor

In #80 (comment), some more use cases are given for fn:init and fn:foot/fn:heel.

@PieterLamers
Copy link

I don't know if there is still discussion about whether to use foot or heel. In my personal experience, foot is more generically refering to something at the bottom (cf. table foot, page footer, etc) whereas heel refers to a human body part. So if I were to vote, I'd vote in favor of fn:foot and against fn:heel.
BTW some Dutch inspiration: we say "aan de voet van de berg" meaning the bottom end of the mountain.
The Dutch counterpart of from Head to Toe is Van top tot teen: From top to toe.

@michaelhkay
Copy link
Contributor

foot(), I think, will be readily accepted as referring to the last item in a sequence. heel() is a joke.

For selecting all items in a sequence except the last, I would vote for truncate().

As regards the difference in error handling between arrays and sequences when a subscript is out of range, I think we have to live with it, and new functions should remain consistent. There are advantages and disadvantages to both designs, and it's a classic example of design-by-committee that different solutions have been adopted for the two cases. It's very difficult to change now for compatibility reasons.

@dnovatchev
Copy link
Contributor Author

For selecting all items in a sequence except the last, I would vote for truncate()

truncate() is a verb, but all other names are nouns.

So I would prefer either the function name already in use in Haskell: init(), or, if foot() is accepted, then why not follow up with another joke: footless() ?

@michaelhkay
Copy link
Contributor

Yes, truncate() is a different part of speech, but we already have a heady mix of nouns, verbs, and adjectives for very similar operations: filter, subsequence, insert, remove, empty, exists. We even have "for-each" - I have no idea what part of speech that is. Until we had tail(), using remove($x, 1) was a common way of extracting all items except the first. So I don't think using a verb is a problem.

Looking at init() I would assume it referred to the initial item in a sequence; the word "initial" usually means the first, not everything except the last. Only a tiny minority of our users will be familiar with a Haskell function of the same name.

@ndw
Copy link
Contributor

ndw commented Oct 6, 2022

I take your point about trucate() being a verb, but init() is just too generally applied to mean "initialize" and we've largely stayed away from abbreviations. I suppose you could persuade me to accept initial-items() or all-but-the-last(). Maybe.

@dnovatchev
Copy link
Contributor Author

I take your point about trucate() being a verb, but init() is just too generally applied to mean "initialize" and we've largely stayed away from abbreviations. I suppose you could persuade me to accept initial-items() or all-but-the-last(). Maybe.

That's an excellent idea Norm:

sans-last()

@dnovatchev
Copy link
Contributor Author

Yes, truncate() is a different part of speech, but we already have a heady mix of nouns, verbs, and adjectives for very similar operations: filter, subsequence, insert, remove, empty, exists. We even have "for-each" - I have no idea what part of speech that is. Until we had tail(), using remove($x, 1) was a common way of extracting all items except the first. So I don't think using a verb is a problem.

Looking at init() I would assume it referred to the initial item in a sequence; the word "initial" usually means the first, not everything except the last. Only a tiny minority of our users will be familiar with a Haskell function of the same name.

Still, all other 3 names used here head , tail, foot are non-verbs. Let us try to have them all non-verbs,

Maybe truncated() ?

@ChristianGruen
Copy link
Contributor

foot and truncate sounds good and intuitive.

As far as I know, Haskell is the only language that uses init, but I never found out why they chose that name.

@dnovatchev
Copy link
Contributor Author

foot and truncate sounds good and intuitive.

As far as I know, Haskell is the only language that uses init, but I never found out why they chose that name.

I am against truncate() . It is not intuitive, to me truncate means "perform the action of truncating", not "get the thing that is truncated".

So, if this should have anything closer to "truncate", I definitely would prefer "truncated()" or "truncation()", or even "starting-segment()" or "left-segment()"

@ChristianGruen
Copy link
Contributor

I am against truncate() . It is not intuitive, to me truncate means "perform the action of truncating", not "get the thing that is truncated".

Isn't that similar with fn:remove, and all other functions with verbs included?

If we wanted, we could check if other programming languages provide an ever better term.

@benibela
Copy link

benibela commented Oct 6, 2022

If we wanted, we could check if other programming languages provide an ever better term.

In another project, I settled for left/right with/of first/last to get subsequences.

left returns elements from the start, right returns element till the end. with is inclusive, of is exclusive. Now this gives a subsequence, so there is another end somewhere in the middle of the origin sequence, and first counts this middle end from the start, and last counts it from the end.

It would match the functions like this:


head($x)       ~~  left-with-first($x, 1)

tail($x)       ~~  right-of-first($x, 1)

init($x)       ~~  left-of-last($x, 1)

heel($x)       ~~  right-with-last($x, 1)

@michaelhkay
Copy link
Contributor

If we didn't already have insert-before, remove, parse-xml, normalize-space, reverse, resolve-uri, sort, translate, replace, tokenize, then I could understand an objection to using an imperative verb. But it's too late for that.

And if imperative verbs are allowed, then there can be no objection to truncate because it expresses precisely what this function does.

@dnovatchev
Copy link
Contributor Author

If we didn't already have insert-before, remove, parse-xml, normalize-space, reverse, resolve-uri, sort, translate, replace, tokenize, then I could understand an objection to using an imperative verb. But it's too late for that.

And if imperative verbs are allowed, then there can be no objection to truncate because it expresses precisely what this function does.

Disagree. In this specific group of functions 3 of the 4 functions all have noun-names. So, what is so difficult to keep the naming of this group "clean" and find a last 4th noun that is suitable? And not to mention that we have quite a lot of native English speakers here ...

@ChristianGruen
Copy link
Contributor

So, what is so difficult to keep the naming of this group "clean" and find a last 4th noun that is suitable?

Remember that we started with init. I'm not sure if it’s justified to call it a word at all, let alone a noun.

If we hadn't tail, remove-first and remove-last would be other choices.

@michaelhkay
Copy link
Contributor

I'm not a native English speaker, but I have acquired a reasonable grasp of the language over the last 65 years, and I'm not aware of any noun that intuitively describes the thing that's left over when you remove its far end.

You're thinking of a group of four functions as being in some way homogenous. I don't share that perspective. I see truncate() more as a special case of remove(). I'd be quite happy to call it remove-last().

@michaelhkay
Copy link
Contributor

PR #250 has been raised.

@ndw
Copy link
Contributor

ndw commented Dec 7, 2022

Closed by #250.

@ndw ndw closed this as completed Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A change that introduces a new feature XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

10 participants