Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fn:chain or fn:compose #882

Open
michaelhkay opened this issue Dec 6, 2023 · 36 comments
Open

fn:chain or fn:compose #882

michaelhkay opened this issue Dec 6, 2023 · 36 comments
Labels
Discussion A discussion on a general topic. PRG-easy Categorized as "easy" at the Prague f2f, 2024 PRG-optional Categorized as "optional for 4.0" at the Prague f2f, 2024 XQFO An issue related to Functions and Operators

Comments

@michaelhkay
Copy link
Contributor

I thought I had a great opportunity to use fn:chain the other day, and then found it didn't do what I wanted.

I wanted to negate a predicate: in pseudo-code

items-where($seq, not contains(?, "e"))

and I thought I could do this by chaining contains and not. But it doesn't work that way: fn:chain applies a sequence of functions to an argument, it doesn't compose a sequence of functions to yield a new function.

I wonder if a function that composes functions would be more useful, so I could write

items-where($seq, compose((contains(?, "e"), not#1)))

@MarkNicholls
Copy link

I'm not familiar with chain...for functional composition its more normal to use an operator.

items-where($seq, not#1 >> contains(?,"e"))

it also mirrors the use of '=>'

@michaelhkay
Copy link
Contributor Author

Indeed, there are a few varieties of arrow that we could assign (>> is not available), but the first thing is to establish the principle.

One of the problems in usability terms is the directionality. For functional programming aficionados, writing f(g(x)) as (f ~> g)(x) seems natural. But object-oriented languages write the pipeline from left to right x.g().f(), and the => arrow syntax reflects this, x => contains() => not().

@MarkNicholls
Copy link

tbh, OO languages rarely have function composition, your example x.g().f() isnt composition its application, in fact the notion of function composition is a bit odd in an OO world because composition is about composing behaviour without having the data, and obviously in OO you can't have behaviours without data to bind to.

If I had to do it in an OO language I would probably write some sort of extension method on Func<A,B> called Then and go

x.f.Then(y.g)

and you'd need x and y in advance, else something like

(x => x.f()).Then(y => y.g())

if you didnt.

as xslt and xpath arent really setup in an OO, this would seem dissonant to me.

We already have 'pipe' i.e. => it would seem odd to not have a compose operator => that doesn't mirror its usage.

So I like the suggestion, I'd just be careful of the syntax

@michaelhkay
Copy link
Contributor Author

I think the arrow operator has proved so popular for two reasons: firstly, it cuts out the nested parentheses, and secondly, you read it as a pipeline of operations to be performed from left to right, x => h => g => f, in contrast to f(g(h(x))) where you evaluate x first , then apply h, then g, then f.

The challenge with function composition is that one ordering seems natural and intuitive to one group of people, and the opposite ordering seems natural and intuitive to a different group.

@MarkNicholls
Copy link

MarkNicholls commented Dec 9, 2023

I agree, so the arrow operator is popular and a success (even with OOers?) but you think composition as an operator is problematic

even though
x => h => g => f
is equivalent to
x => (h >> g >> f)
?
This ordering is also quite prevalent in the OO community in the form of 'fluent' APIS, where it would probably be written.

x.Apply(h.Then(g).Then(f))
.

now we're just discussing whether OOers balk at an operator rather than a method name and a '.'.

I think you underestimate OOers (I say so as a former OO zealot)

@dnovatchev
Copy link
Contributor

dnovatchev commented Dec 9, 2023

I thought I had a great opportunity to use fn:chain the other day, and then found it didn't do what I wanted.

I wanted to negate a predicate: in pseudo-code

items-where($seq, not contains(?, "e"))

and I thought I could do this by chaining contains and not. But it doesn't work that way: fn:chain applies a sequence of functions to an argument, it doesn't compose a sequence of functions to yield a new function.

Actually, fn:chain can be used as function composition:

items-where($seq, chain(?, (contains(?, "e"), not#1 )) )

@michaelhkay
Copy link
Contributor Author

Actually, fn:chain can be used as function composition:

Thanks, good to know.

@ChristianGruen ChristianGruen added XQFO An issue related to Functions and Operators Discussion A discussion on a general topic. labels Dec 10, 2023
@MarkNicholls
Copy link

MarkNicholls commented Dec 13, 2023

I thought I had a great opportunity to use fn:chain the other day, and then found it didn't do what I wanted.
I wanted to negate a predicate: in pseudo-code
items-where($seq, not contains(?, "e"))
and I thought I could do this by chaining contains and not. But it doesn't work that way: fn:chain applies a sequence of functions to an argument, it doesn't compose a sequence of functions to yield a new function.

Actually, fn:chain can be used as function composition:

items-where($seq, chain(?, (contains(?, "e"), not#1 )) )

its pretty ugly though

much better would be (using ".>" as composition as ">>" is taken and the mathematicians "." is quite problematic)

items-where($seq, contains(?, "e") .> not#1 )

@dnovatchev
Copy link
Contributor

I thought I had a great opportunity to use fn:chain the other day, and then found it didn't do what I wanted.
I wanted to negate a predicate: in pseudo-code
items-where($seq, not contains(?, "e"))
and I thought I could do this by chaining contains and not. But it doesn't work that way: fn:chain applies a sequence of functions to an argument, it doesn't compose a sequence of functions to yield a new function.

Actually, fn:chain can be used as function composition:
items-where($seq, chain(?, (contains(?, "e"), not#1 )) )

its pretty ugly though

much better would be (using ".>" as composition as ">>" is taken and the mathematicians "." is quite problematic)

items-where($seq, contains(?, "e") .> not#1 )

I disagree. We have used all possible keyboard symbols for operators and now need more ...

Who is to remember all "magic strings" ? Almost no one, even their creators.

As for "beauty", beauty is in the eyes of the beholder. Enough with all ugly, long and beyond understanding lambda expressions!

What is not so very subjective as "beauty" is understandability.

And it has been more than 10 years since we started using "?" in the place of an argument to express partial application. No one has claimed this is "ugly" or not useful.

Trying to invent "magic strings" for well-understood concepts that we have had for years is contributing to the pollution of the language. Not to mention that such amount of effort could be directed towards something that could be really useful.

@MarkNicholls
Copy link

MarkNicholls commented Dec 13, 2023

I don't understand,

I LIKE "?" as shorthand for a lambda expression, i wasnt objecting to it.

It seems like a very useful and memorable "magic string".
I like "=>" it seems also useful and memorable, similarly, "+", "-", "="....all very sensible "magic strings".

what seems less understandable is expression composition in terms of a function that does composition AND application by effectively turning it into a lambda.

contains(?, "e") .> not#1
seems more understandable (if ".>" were part of language and as familiar as "=>"), than

chain(?,(contains(?, "e"), not#1))

`

@dnovatchev
Copy link
Contributor

contains(?, "e") .> not#1 seems more understandable (if ".>" were part of language and as familiar as "=>"), than

chain(?,(contains(?, "e"), not#1))

For me it is the opposite:

chain(?,(contains(?, "e"), not#1))

is a function that still needs its first argument, and when given this, it will perform on it the chain of all other specified functions.

Absolutely clear.

where with:

contains(?, "e") .> not#1

I need to think hard what .> was, whether it applies from right-to-left or from left=to-right, etc.

And why in one case (to the left) ? is used, but then inconsistently, .> is used?

And why should I need to learn and memorize yet even new operators, when we clearly can express this in the current language with consistent use of partial application:

chain(?,(contains(?, "e"), not#1))

@MarkNicholls
Copy link

MarkNicholls commented Dec 13, 2023

Its not obvious to me, and it wasn't obvious to the OP, thus the question, the whole topic is based on it not being obvious.

the right to left questions seems odd, the direction is indicated by ">" in exactly the same way as it does for "=>", they are very closely linked, and thus very similarly named.

I don't understand the question about "?" being inconsistent to ".>", the analogous expression using "=>" would be

x => contains(?, "e") => not#1
if there is no inconsistency in this expression, then there would seem to be no inconsistency in the one I presented.

The question is "fn:chain or fn:compose"

the main question to me isn't the syntax (operator vs function) rather than the definition.

if we choose chain, then we don't need to memorise the function compose.
if we choose compose then we don't need to memorise the function chain AND memorise the trick to convert it into the more primitive compose.

If you set up both alternatives and a compose expression and a chain expression and expressed it in both options, you find that compose is syntactically simpler (in terms of syntax tree size), but not the whole story. As a non XSLT/XPath developer (or at least a relatively ignorant one) I would expect compose to exist, because compose exists in lots of languages, because its a primitive operation on functions, and people are taught in in college on SE/CS courses.
I would not expect (or look for) chain to exist as a base function, I'm not aware of existing elsewhere (it may do).

I prefer the operator form because

a) its syntactically even simpler than "fn:compose()".
b) it directly mirrors the current "=>" operator.

x => f => g => h is equivalent to
x => (f .> g .> h)
I do note though that scala and kotlin do use an operator but both languages allow "names" for operators
so in scala it would read

f andThen g andThen h
kotlin would read

f then g then h

I'm sympathetic to the "no more magic strings" message, but that boat seems to have already sailed with "/", "!", "=>", so it would seem operators have to be "magic strings"

"then" to me is a perfectly acceptable operator, if that disolves part of the objection.

@dnovatchev
Copy link
Contributor

@MarkNicholls The history of fn:chain is that I initially proposed a function multiCompose as a generalization of function composition for more than 2 functions, that would, exactly as the mathematically-defined functional composition, apply the provided sequence of functions starting from the right-most - that is from right to left.

Many people wished the direction to be left to right, and this is how function chain was born.

As for whether or not it is more convenient to return a function or produce a result (item()*) there are plusses and minuses for both of these alternatives and neither one is clearly superior compared to the other.

People, who are not comfortable with functional programming, will want the result to be directly produced. Otherwise, they must append (some-initial-argument) to the right of the function call, thus they have to add one more pair of parentheses, and even worse, the initial argument will be farthest away from the first function (the leftmost one) that is to be applied on it. Clearly inconvenient, less-readable, and maybe even confusing.

We don't want to have too-many functions and specifically two functions that are almost the same. This is why I started with multiCompose (from right to left) but taking into account the preferences of the majority, this evolved into fn:chain that is left-to-right and in a most convenient and readable way has the initial (non-function) argument as its left-most one, where it belongs.

I think this is a complete explanation why the things are as they are.

I will not provide more answers, because all is said in this reply. If you try to convince the majority of people here that right to left evaluation is better and if you succeed, it is simply a mechanical task to turn fn:chain to whatever you are suggesting.

As I said, both alternatives are OK with me, but I think people are right to request what they consider the most convenient syntax and evaluation order, and at present fn:chain gives them what they wanted -- without needing a single new operator.

@dnovatchev
Copy link
Contributor

dnovatchev commented Dec 14, 2023

b) it directly mirrors the current "=>" operator.

x => f => g => h is equivalent to `x => (f .> g .> h)

x => f => g => h is invalid syntax in XPath. Try for example:

(1, 2, 3) => sum#1

We get an error: "Expecting '(' "

image

The syntactically valid expression requires using not just a function name, but a function call:

(1, 2, 3) => sum()

image

Thus you will need:

x => f(all_args_here_except_the_first) => g(all_args_here_except_the_first) => h(all_args_here_except_the_first)

Unfortunately, this is not too-elegant, is less readable, and could even be considered rather ugly.

@dnovatchev
Copy link
Contributor

dnovatchev commented Dec 14, 2023

f then g then h

I'm sympathetic to the "no more magic strings" message, but that boat seems to have already sailed with "/", "!", "=>", so it would seem operators have to be "magic strings"

"then" to me is a perfectly acceptable operator, if that disolves part of the objection.

then is already used in XPath as one of the alternatives of an if expression.

Also, in JavaScript/Typescript then is used to resolve a promise , that is in concurrent processing (similar to await a Task in C#). We may also decide to use then in a similar role if in the future we introduce concurrent processing in XPath.

Thus it could be confusing to use then in yet another, 3rd role.

@MarkNicholls
Copy link

so the real alternative for multicompose was

x => multicompose((f,g))()
or

multicompose((f,g))(x)
vs

chain(x,(f,g))

To me all both options are not ideal. I think I prefer multicompose, because it captures the primitive notion directly. The order I'm on the fence about, given a function syntax, where right to left application takes precedence, actually right to left seems more consistent, despite the fact I personally find left to right more intuitive. (i.e. (a->b) -> (b ->c) -> (a -> c))

I throw broad generalisations, psuedo code and suggestions around, ignorant of the nuances of the actual syntax.

You shouldnt take my psuedo code as too literally, its there to express a feeling rather than a concrete proposal. I accept at some stage the nuances/reality of the syntax can be the difference between something simple and obvious rather than irritating and awkward.

But these nuances seem to hamper multiCompose, but not an operator?

if 'then' is taken, 'andThen' seems a reasonable alternative.

f andThen g andThen h

(I'd personally be happy with a magic string with a '>' embedded in it somewhere too to mirror "=>").

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Dec 14, 2023

In #517 (Dimitre’s original proposal for fn:multi-compose), @line-o, who wrote the xBow library, referenced the fn:compose function from John Snelson’s functional.xq library, which doesn’t process input, but merely combines functions.

I think that both fn:chain and fn:compose can be beneficial: fn:chain works fine when you have some input at hand. fn:compose looks more intuitive to me when you want to create a self-contained function item.

As the number of (even experienced) users who’ll be capable of deciphering filter($seq, compose((contains(?, "e"), not#1))), or even filter($seq, chain(?, (contains(?, "e"), not#1))), will be very limited, I would advise against introducing new operators at this stage. It could still be an option for the future.

Here’s a version that may be slightly better readable (just in case, for those who’ve never heard of function composition):

(: returns ('ab', 'cd') :)
let $input := ('ab', 'cd', 'ef')
return filter($input, compose( (contains(?, 'e'), not(?)) )

(: …or even… :)
let $contains-function := contains(?, 'e')
let $not-function := not(?)
return filter($seq, compose(($contains-function, $not-function)))

Similar to fn:concat, to reduce parentheses we could make it a variadic function:

compose($contains-function, $not-function)

@ChristianGruen
Copy link
Contributor

For what it’s worth, an equivalent implementation in XQuery would be:

declare function compose(
  $functions  as function(*)*
) as function(*) {
  fold-left($functions, identity#1, fn($composed, $function) {
    fn($arg) { $function($composed($arg)) }
  })
};

@MarkNicholls
Copy link

if compose/chain is a function, then I'm personally open minded about left to right or right to left, I can see the arguments for both,

I think this is pretty self explanatory though, and makes the right/left thing pretty obvious.

filter($seq, contains(?, "e") andThen not#1)
and I'll leave it at that

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Dec 14, 2023

I think this is pretty self explanatory though, and makes the right/left thing pretty obvious.
filter($seq, contains(?, "e") andThen not#1) and I'll leave it at that

I agree, a keyword would make the question of order irrelevant. In XPath lingo it could be A and then B (analogous to cast as, contains text, etc.).

@dnovatchev
Copy link
Contributor

dnovatchev commented Dec 14, 2023

if compose/chain is a function, then I'm personally open minded about left to right or right to left, I can see the arguments for both,

I think this is pretty self explanatory though, and makes the right/left thing pretty obvious.

filter($seq, contains(?, "e") andThen not#1) and I'll leave it at that

This is simply not good.

nobody says contains(?, "e") andThen not#1

People form their expressions as not(someCondition)

Notice that the not is leftmost. This is how we, human beings think. Putting the not at the end is so confusing, almost German (apologies to @ChristianGruen 😄 )

So, dear friends, I don't believe we could get some breakthrough in this issue...

We could perhaps devise something that is the 180 degrees reflection of fn:chain, but this will also have its own shortcomings, as it is merely a reflection.

Let us use our energy for something more useful, like producing an abstract base for any Kollection 😄

@michaelhkay
Copy link
Contributor Author

In XPath lingo it could be A and then B

No, I think that gives an ambiguous parse. A and then (X) could parse as A and (then (X)) where then(X) is a function call. The difference from other two-word operators is that the first word is already in use as a binary operator.

@ChristianGruen
Copy link
Contributor

No, I think that gives an ambiguous parse. A and then (X) could parse as A and (then (X)) where then(X) is a function call. The difference from other two-word operators is that the first word is already in use as a binary operator.

True, thanks.

@michaelhkay
Copy link
Contributor Author

My comment generated a lot of discussion about whether it might be better to use an operator rather than a function.

I would like to return to the original issue that I raised, which is whether we should have chain($input, (F, G)) that returns F(G($input)), or chain((F, G)) that returns a function that can be applied to $input.

Dimitre has shown that you can derive the second from the first by using ? in the first argument position, but that feels artificial to me. In the use cases I have personally encountered, the second form is the one I have wanted: for example

index-where($input, chain((not#1, contains(?, 'e'))))

@dnovatchev
Copy link
Contributor

My comment generated a lot of discussion about whether it might be better to use an operator rather than a function.

I would like to return to the original issue that I raised, which is whether we should have chain($input, (F, G)) that returns F(G($input)), or chain((F, G)) that returns a function that can be applied to $input.

Dimitre has shown that you can derive the second from the first by using ? in the first argument position, but that feels artificial to me. In the use cases I have personally encountered, the second form is the one I have wanted: for example

index-where($input, chain((not#1, contains(?, 'e'))))

Nothing prevents us from having a second overload for fn:chain that accepts only a single argument - a sequence of functions:

fn:chain( $functions as function(*)* ) as function(*)

@michaelhkay
Copy link
Contributor Author

Nothing prevents us...

Nothing apart from precedent, convention, and consistency. We do not currently define any function families containing multiple functions with the same name and different arity having different result types.

@ChristianGruen
Copy link
Contributor

I would like to return to the original issue that I raised, which is whether we should have chain($input, (F, G)) that returns F(G($input)), or chain((F, G)) that returns a function that can be applied to $input.

I can see that there are use cases for fn:chain as it is currently defined.
My favorite would be to additionally have fn:compose (returning a function).

@dnovatchev
Copy link
Contributor

Nothing prevents us...

Nothing apart from precedent, convention, and consistency. We do not currently define any function families containing multiple functions with the same name and different arity having different result types.

Different result types? Really?

isn't function(*) an item()* ?

as for the different arity, we actually do have many functions that have the same name and different arity (not to mention fn:concat ... 😄 ) - this is the main reason why in a named function reference we must specify the arity - like fn:substring#2 and fn:substring#3 ... and virtually any function that has default value(s) for some of its argument(s)

@dnovatchev
Copy link
Contributor

dnovatchev commented Jan 15, 2024

I would like to return to the original issue that I raised, which is whether we should have chain($input, (F, G)) that returns F(G($input)), or chain((F, G)) that returns a function that can be applied to $input.

I can see that there are use cases for fn:chain as it is currently defined. My favorite would be to additionally have fn:compose (returning a function).

@ChristianGruen Yes, and if we remember well, this started as a proposal for a multi-compose function. But then people asked for left-to-right order of evaluation of the chain of functions and this is how we got fn:chain.

If we have functions with different order of (directions of) evaluation, my preference for their naming would be:

fn:chain and fn:chain-from-right

and not compose, because most people know that compose applies right-to left order of evaluation and having it evaluate from left-to-right would bring significant confusion.

Personally I am happy even with just having fn:chain as currently defined, and using fn:chain(?, funcList) whenever needed.

multicompose(funcList)(x) is perfectly readable and understandable, because x is at the right-most position and all the functions are applied on it starting from the right-most (closest to x) in a right-to-left order.

Contrary to this, fn:chain(funcList)(x) is deeply disturbing and unreadable, because in order for me to understand which function is applied on x I have to traverse from x to the leftmost function - and imagine if there are a dozen or more such functions in funcList ... . This is a heavy load for the brain and thus is easily confusing and error-prone.

So, let us not unnecessarily bloat the set of functions - at least we will not end up with a 5000 pages Spec.

@MarkNicholls
Copy link

MarkNicholls commented Jan 15, 2024

I agree with @michaelhkay, chain feels at least non primitive to me compared to compose.
As for the order of the composition, I think that depends on whether its an operator or not and how its named.
Yes, I agree with @dnovatchev, calling it 'compose' (or using the mathematical notation ".") but then not following the mathematical definition of

(b->c) -> (a -> b) -> (a -> c)

(apologies for the functional signature but I think it makes it clearest).

is confusing to people familiar with the textbook definition.

BUT, most implementations I see don't call it compose or "." and use the (and I think) more intuitive

(a -> b) -> (b -> c) -> (a -> c)

(I actually had to write the 1st signature twice because I got it wrong)

So...

  • in Scala and Kotlin its called "andThen" and has the 2nd signature.
  • In F# there is '>>' that follows the 2nd signature and '<<' that follows the first BUT, I can't ever remember seeing the 2nd one used in real code, .
  • F# devs use forward pipelining a lot '|>' which is very similar to the xpath '=>' and the symetry between the order is very natural, the same is true in xpath w.r.t. '=>'
  • In Haskell, yes...'.' is mostly used with 1st signature (though there are a few implementations in the libraries that use the 2nd signature because...i quote...

"(.>) Read as "compose forward" or "and then". Use this to create long chains of computation that suggest
which direction things move in."

i.e. its seems intuitive that

(f .> g .> h .> i) x

is f and then g and then h and then i applied to x.

  • so for me, composition, feels more primitive than chain (in a good way)
  • "forward" composition is more natural than "backwards" (i.e. as per kotlin/scala/F# rather than haskell)
  • an operator with an indication of direction is preferable to a named function
  • but a named function that makes the order explicitly is quite nice too e.g. "andThen" or "composeForward".

@ChristianGruen ChristianGruen added the Propose for V4.0 The WG should consider this item critical to 4.0 label Feb 7, 2024
@ChristianGruen ChristianGruen removed the Propose for V4.0 The WG should consider this item critical to 4.0 label May 22, 2024
@ndw ndw added PRG-easy Categorized as "easy" at the Prague f2f, 2024 PRG-optional Categorized as "optional for 4.0" at the Prague f2f, 2024 labels Jun 4, 2024
@ChristianGruen
Copy link
Contributor

One argument for a pure composition function is that it would be more general than what fn:chain currently does – and in principle, we should always try to provide general functionality in a spec.

@dnovatchev
Copy link
Contributor

One argument for a pure composition function is that it would be more general than what fn:chain currently does – and in principle, we should always try to provide general functionality in a spec.

As I said earlier, we can have both fn:chain and fn:multi-compose - and I initially proposed the latter, but the majority of people here wanted a chain of left-to-write evaluation, and this is what fn:chain does.

Please, feel free to propose fn:multi-compose as generalization of mathematical composition - I would be glad that people have seen what I initially wanted.

As for "more general" vs. "more convenient", there are both pros and cons for each - some people prefer not to have to write an additional pair of parentheses every time, other people want not to need to express a partial application.

As there is no obvious "best" alternative, any argumentation for one or the other would be inconclusive.

@line-o
Copy link
Contributor

line-o commented Jun 11, 2024

I am strongly in favour of adding a compose - or combine - function.

The real-world use-case I encountered in the past is: building up a sequence of filters, transforms and sorting based on user-input which is then applied as one function call to a data set (in a search function for example).

As a result of the above a function to combine functions should allow a sequence rather than being variadic. In a variadic call one needs to know all functions at compile time. Please correct me if I am wrong here.
If it would be variadic then the first parameter should accept a sequence to allow the combination of a sequence of functions only known at runtime.

@michaelhkay
Copy link
Contributor Author

If the function is variadic then you can either do compose(f1, f2, f3) or equivalently compose((f1, f2, f3)) - in the first case you need to know statically how many functions you are composing, but of course you don't need to know statically what functions they are. In the second case it's completely dynamic.

@line-o
Copy link
Contributor

line-o commented Jun 11, 2024

Thank you @michaelhkay for making that clear. Is a mixture also possible as in the example below?

compose($function-sequence, serialize(?, { "method": $method }))

@michaelhkay
Copy link
Contributor Author

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion A discussion on a general topic. PRG-easy Categorized as "easy" at the Prague f2f, 2024 PRG-optional Categorized as "optional for 4.0" at the Prague f2f, 2024 XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

6 participants