Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

517: fn:chain #734

Merged
merged 19 commits into from
Oct 24, 2023
Merged

517: fn:chain #734

merged 19 commits into from
Oct 24, 2023

Conversation

dnovatchev
Copy link
Contributor

Added fn:chain.

Took much effort to ensure Unix-style line-endings are used.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Sep 30, 2023

As I closed PR 733, here is the useful comment from @michaelhkay that I am transferring here (from there):

First of all, thank you, Michael for the useful comments!

I suggest using $functions rather than $funSequence as the argument keyword. We don't normally use camel-case except where XSD uses it.

Done

I'm reading the rendered version of the new spec linked from the dashboard page. On this the "more formally" XPath code appears to be incomplete - it ends in a comma.

Fixed in the latest to be pushed.

I'm not comfortable with logic reading if(not($x instance of array(*))) then array{$x} else $x. This seems to break substitutability. We've generally taken the view that if something works on items in general then it shouldn't treat arrays specially. Also there doesn't seem to be any equivalent in the "informal" exposition. If we had a general approach of treating arrays differently then this might make sense, but the function library should try and follow common conventions throughout.

I agree with this thinking. However, this is just one possible implementation and it is a "black box" to the user.

This means that nowhere are we encouraging anyone to think that the existing rules are not enforced.

Any implementation could be done in any language (not XPath at all) thus we are not prescribing to the implementors exactly what coding techniques to use - just that their results must be identical to the results produced by this oracle.

Generally I think it would be simpler if we confined this to functions of arity one.

Yes, but this significantly reduces the power and usefulness of the function. Please, note that nobody is prohibiting the user to use functions that always need a single argument, and indeed many users will do so. However, when this is necessary, fn:chain provides the power to chain together functions that can get their arguments from an array (or a sequence). We already have this feature in the standard XPath function fn:apply, which gets all needed function arguments from a user-provided array, thus this is not something new, and definitely not a violation of the established Xpath practices.

This function provides capabilities similar to those of transducers, first proposed by Rich Hickey in Closure. A transducer in a chain of transducers doesn't depend on (or even know) the type of its input or the type of its output.

Or perhaps we could devise a general mechanism, a function fn:singularize (for want of a better name) that takes a function of arity N as input and converts it to a function of arity 1 that takes an array as its single argument?

This would be a nice new function to have, and it will be useful in other cases. Here it is suggested that the "singularization" is performed inside the black-box implementation, so that the result is most concise, short, readable, convenient and meaningful to the user.

In Note 4, I suspect this should be a type error rather than a dynamic error. Rather than saying "It is a dynamic|type error", which is a phrase we often use for a normative statement, it might be worth using different phrasing to indicate that here we are only telling the reader something they could have worked out from the normative rules, perhaps "A consequence of the rules given earler is that a type error occurs if...".

Done in the latest to be pushed.

There is markup available for declaring variables that are used in examples. It's necessary to use this markup to ensure that the examples are testable; at the moment the examples will generate test cases that fail with static errors due to undeclared variables. I can help with this if necessary.

Absolutely! Your help will be appreciated!

Also, the result in examples should always be an XPath expression, so string values should be in quotes.

Done in the latest to be pushed.

Once again, thank you!

@ChristianGruen
Copy link
Contributor

Took much effort to ensure Unix-style line-endings are used.

Thanks for your efforts.

As you may have seen in the GitHub diff view, the files still have lots of differences. The new files seems to use a different indentation with tabs instead of spaces.

Which text editor do you use?

@michaelhkay
Copy link
Contributor

michaelhkay commented Oct 1, 2023

Thanks. So the outstanding question is how to handle arity>1 functions: I think we need to discuss this.

The current logic (which I have inverted for clarity):

if(function-arity($f) gt 1)
                then if($x instance of array(*)) then $x else array{$x}
                else [$x]) 

So the three cases are:

  1. The next function has arity 1; the result of the previous function is wrapped as a single-member array (as required by the fn:apply function). This seems to work well in all cases.
  2. The next function has arity >1 and the previous function returns a single array. If we didn't do anything special this would be a type error. So instead, we interpret the array as an array of arguments and invoke the next function directly. Perhaps the user would get better diagnostics if we checked that the array size matches the arity? But that doesn't actually need a change to the spec; it's an error either way and the system can do additional checks to improve the error message.
  3. The next function has arity>1 and the previous function returns something that isn't an array. We interpret the value as a sequence of single-item arguments. This strikes me as somewhat fragile. I think that rather than guessing that this is the intended mapping, I would prefer to make this case an error.

If we made this change I think the rules would be much easier to state and to understand:

  • To call an arity 1 function, the previous function should return the value to be passed to the single parameter.
  • To call an arity >1 function, the previous function should return an array containing the values to be passed to the individual parameters.

@ndw
Copy link
Contributor

ndw commented Oct 1, 2023 via email

@ChristianGruen
Copy link
Contributor

We should configure Git to fix that for you. “On my list.” Be seeing you, norm

Thanks, Norm (related: #698)

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Oct 1, 2023

dnovatchev @.> writes:
Added fn:chain. Took much effort to ensure Unix-style line-endings are used.
We should configure Git to fix that for you. “On my list.” Be seeing you, norm

-- Norm Tovey-Walsh @.
> https://norm.tovey-walsh.com/
It is seldom that any liberty is lost all at once.--David Hume

Wow Norm, thank you so much!

💯

@dnovatchev
Copy link
Contributor Author

Took much effort to ensure Unix-style line-endings are used.

Thanks for your efforts.

As you may have seen in the GitHub diff view, the files still have lots of differences. The new files seems to use a different indentation with tabs instead of spaces.

Which text editor do you use?

@ChristianGruen,

Initially (unaware of this problem) I used the VS XML Editor.

Then, in order to fix the problem, I used NotePad++. It has built-in command for converting CR+LF to LF .

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Oct 1, 2023

3. The next function has arity>1 and the previous function returns something that isn't an array. We interpret the value as a sequence of single-item arguments. This strikes me as somewhat fragile.

This is the "sauce" of the function. This is what makes the function so powerful. So far we have been filling the FO with a multitude of convenience functions, that are just synonyms for expressions that the user could write easily. Let's have at least a few functions that provide greater power.

I think that rather than guessing that this is the intended mapping, I would prefer to make this case an error.

We are not guessing - the specification of the function (Note 3) tells directly and precisely how the current result is processed in this case.

If we made this change I think the rules would be much easier to state and to understand:

  • To call an arity 1 function, the previous function should return the value to be passed to the single parameter.
  • To call an arity >1 function, the previous function should return an array containing the values to be passed to the individual parameters.

I would rather keep the function as it is, no one forbids the user to use fn:chain in a restrained mode as per the above 2 bullets, but we do not restrict any user from the full power of this function.

I think we could add a final note, telling the reader, that it is recommended to start initially using the function just as left-compose, where every function accepts and returns a single argument/result, . . ., and then after getting accustomed, to try its full power.

So, the two bullets will be almost the same as the above:

  • To call an arity 1 function, the previous function should return the value to be passed, as the single parameter.
  • To call an arity >1 function, the previous function should return an array or a sequence containing the values to be passed as the individual parameters

@ChristianGruen
Copy link
Contributor

Then, in order to fix the problem, I used NotePad++. It has built-in command for converting CR+LF to LF .

Good to know, thanks. Could you try to commit a version with the original indentation (space characters instead of tabs)?

@dnovatchev
Copy link
Contributor Author

Then, in order to fix the problem, I used NotePad++. It has built-in command for converting CR+LF to LF .

Good to know, thanks. Could you try to commit a version with the original indentation (space characters instead of tabs)?

I did, but there are still 14K + differences ☹️

@dnovatchev
Copy link
Contributor Author

@ndw Can DeltaXML be configured to ignore any blanks<-->tab or CRLF <--> LF differences? Or even whetespace-only differences?

@ChristianGruen
Copy link
Contributor

I did, but there are still 14K + differences ☹️

I advise you to take the original files, add your edits without modifying the indentation of the overall file, and push these files into this pull request.

It's not only about DeltaXML, it's also about not losing track of the actual changes in the git history, and to reduce conflicts with other pull requests.

@dnovatchev
Copy link
Contributor Author

I did, but there are still 14K + differences ☹️

I advise you to take the original files, add your edits without modifying the indentation of the overall file, and push these files into this pull request.

It's not only about DeltaXML, it's also about not losing track of the actual changes in the git history, and to reduce conflicts with other pull requests.

Finally done successfully! 😄

Thanks for your support!

@ChristianGruen ChristianGruen changed the title fn:chain 517: fn:chain Oct 2, 2023
@ChristianGruen
Copy link
Contributor

+1 for the solution with arity-1 functions. For functions with a higher arity, people could be confused about the missing symmetry. This query would work…

let $input := array { 2 to 3 }
let $fn := fn { . > 2 }
return fn:chain(($input, $fn), array:filter#2)

…but this one wouldn’t:

let $input := (2 to 3)
let $fn := fn { . > 2 }
return fn:chain(($input, $fn), filter#2)

And we should need to define what is going to happen if arity-0 functions are supplied.

@dnovatchev
Copy link
Contributor Author

Thanks for this comment.

+1 for the solution with arity-1 functions. For functions with a higher arity, people could be confused about the missing symmetry. This query would work…

let $input := array { 2 to 3 }
let $fn := fn { . > 2 }
return fn:chain(($input, $fn), array:filter#2)

…but this one wouldn’t:

let $input := (2 to 3)
let $fn := fn { . > 2 }
return fn:chain(($input, $fn), filter#2)

There is nothing confusing here.

Note 3 specifically says:

  1. For a function with arity N, greater than 1, the result produced by the application of the previous function must be either a sequence (with N items), or, if some of the parameters of the function could be sequence themselves, then an array with N members, and each of these N members is passed by the implementation in the function call as the corresponding argument of this function.

I absolutely agree that the flat sequence is a bad and misleading data-structure, but this is not a fault of fn:chain ...

And we should need to define what is going to happen if arity-0 functions are supplied.

This follows from the rules and notes. A function with 0-arity can be the first in the chain if the input is nothing() , or it may follow any function that produces nothing() (at present we only have the empty-sequence() type, but there has been a serious discussion to introduce a nothing() type that unifies the empty sequence, the empty array and the empty map).

Similarly, a function in the chain that produces nothing() must be followed by a zero-arity function, or must be the last function in the chain.

Do you want me to include a new note explaining this?

@ChristianGruen
Copy link
Contributor

There is nothing confusing here.
Note 3 specifically says:

A feature can be confusing even if its behavior is clearly defined in the spec. Symmetry, orthogonality and simplicity are well-proven design principles, and I believe the function would be more intuitive and easier to understand if we restricted it to arity-1 functions.

I absolutely agree that the flat sequence is a bad and misleading data-structure, but this is not a fault of fn:chain ...

I agree, too. It’s precisely the reason why I think we should be strict, and not include built-in functions that motivate and support the usage of flawed data structures.

I’ve checked all your examples, and I noticed that most of them only use arity-1 functions anyway. Some others use op('+'), for which sum#1 can be used, and the remaining examples can easily be rewritten, e.g. as follows:

$product3 := function($x, $y, $z) { $x * $y * $z }
→ $product := fold-left($n, 1, op('*'))

op('*')
→ $product

contains#2fn($a) { contains($a[1], $a[2]) }
→ fn { contains(head(.), tail(.)) }

The focus function in the last example (fn {}) will work once we’ve resolved #131.

This follows from the rules and notes. A function with 0-arity can be the first in the chain if the input is nothing() , or it may follow any function that produces nothing() (at present we only have the empty-sequence() type, but there has been a serious discussion to introduce a nothing() type that unifies the empty sequence, the empty array and the empty map).

Do you have a working example? The following one results in an exception, at least with the included XPath code: $chain((), true#0).

@michaelhkay
Copy link
Contributor

There is nothing confusing here.

I have learnt over time that confusingness is in the eye of the beholder. If someone says it is confusing, then by definition, it is.

@dnovatchev
Copy link
Contributor Author

Do you have a working example? The following one results in an exception, at least with the included XPath code: $chain((), true#0).

Thanks for catching this, @ChristianGruen .

I fixed this and added this as an example. Please, do have a look.

As for:

A feature can be confusing even if its behavior is clearly defined in the spec. Symmetry, orthogonality and simplicity are well-proven design principles, and I believe the function would be more intuitive and easier to understand if we restricted it to arity-1 functions.

Users have been accustomed for years to the fact that the standard XPath 3.1 function fn:apply requires that the arguments to the function call must be passed in an array. And this is what fn:chain is doing. Nothing new has been done, and speaking of confusing features, we have such outstanding examples, as the XML namespaces, which, compared to fn:apply are like mountains compared to a grain of sand.

@ChristianGruen
Copy link
Contributor

speaking of confusing features, we have such outstanding examples, as the XML namespaces, which, compared to fn:apply are like mountains compared to a grain of sand.

The existence of confusing features shouldn’t be a motivation to add more of them. But I understand that it seems important to you to keep the current proposal unchanged.

@dnovatchev
Copy link
Contributor Author

speaking of confusing features, we have such outstanding examples, as the XML namespaces, which, compared to fn:apply are like mountains compared to a grain of sand.

The existence of confusing features shouldn’t be a motivation to add more of them. But I understand that it seems important to you to keep the current proposal unchanged.

It is not that.

It is simply the fact that the standard fn:apply has been there for everyone for 7 years now, so people know that this standard function (fn:apply) requires that the intended arguments for a function call must be packed into an array - and this is exactly what fn:chain does.

@dnovatchev
Copy link
Contributor Author

I’ve checked all your examples, and I noticed that most of them only use arity-1 functions anyway. Some others use op('+'), for which sum#1 can be used, and the remaining examples can easily be rewritten, e.g. as follows:

I can add examples such as Volume, Area, TriangleArea, etc. of well-known formulas that would definitely be degraded into unreadable mess if we pass all parameters not separately, but as items in a single sequence.

The beautiful Heron's formula for the area of triangle would have now to use $sides[1], $sides[2], $sides[3] instead of simply $a, $b, $c ...

Even more ugly would be the formula for the roots of a quadratic equation.

Reminds me of that old joke about the man who agreed to have a suit with many unnatural limitations:

“Good heavens. Look at that poor crippled fellow,” the first doctor said to the second.

“Yeah but doesn’t that suit fit great?”

@Arithmeticus
Copy link
Contributor

I recommend more be said about error conditions, at the minimum by supplying examples that illustrate cases of error, especially in falling below or above arity, e.g.,
chain((1, 2, 3), ($product3, $product3)) returns
image

chain((1, 2, 3, 4), $product3) returns
image

@dnovatchev
Copy link
Contributor Author

I recommend more be said about error conditions, at the minimum by supplying examples that illustrate cases of error, especially in falling below or above arity, e.g., chain((1, 2, 3), ($product3, $product3)) returns image

chain((1, 2, 3, 4), $product3) returns image

Absolutely!

I thought about this and need just to find some free time to do it.

Thank you Joel!

@dnovatchev
Copy link
Contributor Author

I have added the 2 error examples as recommended by @Arithmeticus . Please, have a look.

@Arithmeticus
Copy link
Contributor

Looks ok, except that the custom in the specs, when providing an example that raises an error, is to say "Raises error CODE." or "Raises a X error [CODE]."

In browsing for examples in the XQFO specs I was surprised that examples of errors were relatively sparse.

@dnovatchev
Copy link
Contributor Author

Thanks for the valuable feedback, @Arithmeticus .

I reflected your comments. Please, do have a look

@ndw
Copy link
Contributor

ndw commented Oct 24, 2023

The CG agreed to accept this PR at meeting 051

@ndw ndw merged commit 037f823 into qt4cg:master Oct 24, 2023
2 checks passed
@ChristianGruen ChristianGruen added the Tests Needed Tests need to be written or merged label Nov 15, 2023
@michaelhkay michaelhkay mentioned this pull request Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tests Needed Tests need to be written or merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants