150 fn:ranks #1027

dnovatchev · 2024-02-17T22:21:06Z

As proposed and discussed here: #150

michaelhkay · 2024-02-19T17:34:16Z

Looking at the first example, why do we want to return [(2, 4)], [3] rather than [2, 4], [3]. Generally I would have thought an array with two singleton members was more useful than an array with one member being a sequence of two items.

To make this change return [$input[$key(.) eq $v]] should be return array{$input[$key(.) eq $v]}.

In all of the examples given, the supplied key function returns a single item. But it is allowed (according to the signature) to return any sequence of atomic values. I don't think I understand the intended behaviour when it returns multiple items. The predicate $key(.) eq $v requires $key(.) to return zero or one items.

What is the effect of NaN values?

The supplied $collation is used when sorting the values, but not when deciding whether they are distinct. Is that right?

Specification style: this is the subject of a separate issue. We should either provide an expression that can act as an implementation of the function being specified, or we should provide a user-written function that has the same effect. In this case I think providing an expression will do the job. Alternatively, wouldn't an XQuery expression using group by and order by be clearer (perhaps not, since FLWOR expressions cannot have a dynamic collation).

Summary: I would suggest: Sorts a supplied sequence based on the value of a sort key function, grouping the results so that items with the same key appear together as members of the same array.

dnovatchev · 2024-02-19T18:02:40Z

Looking at the first example, why do we want to return [(2, 4)], [3] rather than [2, 4], [3]. Generally I would have thought an array with two singleton members was more useful than an array with one member being a sequence of two items.

To make this change return [$input[$key(.) eq $v]] should be return array{$input[$key(.) eq $v]}.

In all of the examples given, the supplied key function returns a single item. But it is allowed (according to the signature) to return any sequence of atomic values. I don't think I understand the intended behaviour when it returns multiple items. The predicate $key(.) eq $v requires $key(.) to return zero or one items.

What is the effect of NaN values?

The supplied $collation is used when sorting the values, but not when deciding whether they are distinct. Is that right?

Specification style: this is the subject of a separate issue. We should either provide an expression that can act as an implementation of the function being specified, or we should provide a user-written function that has the same effect. In this case I think providing an expression will do the job. Alternatively, wouldn't an XQuery expression using group by and order by be clearer (perhaps not, since FLWOR expressions cannot have a dynamic collation).

Summary: I would suggest: Sorts a supplied sequence based on the value of a sort key function, grouping the results so that items with the same key appear together as members of the same array.

@michaelhkay Thank you for these observations.

I am studying them and will respond.

dnovatchev · 2024-02-19T18:08:23Z

Looking at the first example, why do we want to return [(2, 4)], [3] rather than [2, 4], [3]. Generally I would have thought an array with two singleton members was more useful than an array with one member being a sequence of two items.

To make this change return [$input[$key(.) eq $v]] should be return array{$input[$key(.) eq $v]}.

A good observation. Probably I wanted the key() function to be most general, but it feels difficult to find an immediate and compelling example.

In all of the examples given, the supplied key function returns a single item. But it is allowed (according to the signature) to return any sequence of atomic values. I don't think I understand the intended behaviour when it returns multiple items. The predicate $key(.) eq $v requires $key(.) to return zero or one items.

Yes, then we would need a function such as deep-equal

ChristianGruen · 2024-02-19T18:12:32Z

A good observation. Probably I wanted the key() function to be most general, but it feels difficult to find an immediate and compelling example.

Off-topic, but maybe we can ask the same question for the scan functions: Wouldn’t singleton members be more intuitive?

dnovatchev · 2024-02-19T18:19:41Z

What is the effect of NaN values?

Aren't NaN values supposed to be smaller than anything else? The answer should be: "The effect is the same as when sorting."

The supplied $collation is used when sorting the values, but not when deciding whether they are distinct. Is that right?

The function distinct-values, used in the sample implementation, can be passed a collation, too. Not sure if the collation in the signature of fn:ranks should be used both for sorting and getting the distinct values from the input-sequence, or (if this at all is so important), we could have two different collations as parameters.

I prefer this to be as simple as possible. The $colation parameter was intended only because fn:sort needs one, not for producing the distinct values.

dnovatchev · 2024-02-19T18:25:15Z

Specification style: this is the subject of a separate issue. We should either provide an expression that can act as an implementation of the function being specified, or we should provide a user-written function that has the same effect. In this case I think providing an expression will do the job. Alternatively, wouldn't an XQuery expression using group by and order by be clearer (perhaps not, since FLWOR expressions cannot have a dynamic collation).

A function definition is an expression, isn't it?

Summary: I would suggest: Sorts a supplied sequence based on the value of a sort key function, grouping the results so that items with the same key appear together as members of the same array.

A good one, thanks.

I will incorporate these suggestions now.

dnovatchev · 2024-02-19T20:17:22Z

The supplied $collation is used when sorting the values, but not when deciding whether they are distinct. Is that right?

The function distinct-values, used in the sample implementation, can be passed a collation, too. Not sure if the collation in the signature of fn:ranks should be used both for sorting and getting the distinct values from the input-sequence, or (if this at all is so important), we could have two different collations as parameters.

@michaelhkay ,

Thinking further on this, if the key function is, say, translation from English to Swedish, then we must have two different collations - one for the English input words, and one for the Swedish translation results.

It is a pity we don't have the set type yet, otherwise the type of $input would more precisely be specified as set and the question about making the input values distinct would be eliminated,

This will also make a fine example - maybe close synonyms will have the same translation and would thus be in the same ranking set.

What do you think?

michaelhkay · 2024-02-20T00:26:13Z

Thanks for responding to my comments.

I find it hard to believe that multiple collations are needed here; on the contrary, the way that sort keys are compared using distinct-values needs to be consistent with the way they are compared using sort. I'm also worried that there's a third comparison being done using eq, which uses the default collation rather than the supplied collation. I think this is also why I was uneasy about NaN - there are three different comparisons here which all potentially treat NaN differently.

I would like to suggest an alternative approach.

Make the signature compatible with fn:sort except that it returns array(item())*.
Take the rules of fn:sort as currently written, and modify them as described below to define fn:ranks
Change the definition of fn:sort so that fn:sort($input, $collations, $keys, $orders) returns fn:ranks($input, $collations, $keys, $orders)?*. So we define fn:sort in terms of fn:ranks, not the other way round.

The changes needed to the fn:sort rules might be primarily, under "The result of the function is obtained as follows:" change rules 1, 3, and 4 as follows:

The result is a sequence of arrays S such that S?* contains the same items as the input sequence $input, but generally in a different order.
(unchanged)
When a pair of corresponding sort key values of $A and $B are found to be not equal, then $A and $B appear in different arrays in the result sequence, and the array containing $A precedes the array containing $B in the result sequence if both the following conditions are true, or if both conditions are false:

The sort key value for $A is less than the sort key value for $B, as defined below.
The order direction in the corresponding sort key definition is "ascending".

If all the sort key values for $A and $B are pairwise equal, then $A and $B appear in the same array in the result sequence, and $A precedes $B in this array if and only if $A precedes $B in the input sequence.
Note:
That is, the sort is stable.

dnovatchev · 2024-02-20T00:58:49Z

I would like to suggest an alternative approach.

Make the signature compatible with fn:sort except that it returns array(item())*.

Take the rules of fn:sort as currently written, and modify them as described below to define fn:ranks

Change the definition of fn:sort so that fn:sort($input, $collations, $keys, $orders) returns fn:ranks($input, $collations, $keys, $orders)?*. So we define fn:sort in terms of fn:ranks, not the other way round.

The changes needed to the fn:sort rules might be primarily, under "The result of the function is obtained as follows:" change rules 1, 3, and 4 as follows:

Thank you, @michaelhkay ,

I understand exactly what you are proposing, and yes, this is possible, however it becomes overly (and is that necessary?) complicated.

In particular, I never wanted to have a sequence of key-functions, and it seems that just one function can internally perform multiple comparisons, if that is necessary at all.

Also, by definition, fn:ranks is defined (to be meaningful) over a set of (distinct) items -- while fn:sort returns all the input items, even in the case when they are not distinct.

As for comparing NaN values, can't we just say that NaN is less than any other item, and for the purposes of this function NaN is equal to NaN? Thus no additional collation for treating NaN would be necessary.

… separate ranking

michaelhkay · 2024-02-20T07:48:50Z

however it becomes overly (and is that necessary?) complicated.

I think there's a lot of complexity in the current proposed spec, which compares values in three different ways: For example, sort and distinct-values treat two NaNs as equal, while eq treats them as not-equal. The proposal to define fn:sort in terms of fn:ranks is certainly a significant refactoring that may be difficult to get right, but if successful it will reduce complexity overall. (It might also be possible to define other functions such as distinct-values, duplicates, min, max, highest and lowest by reference to fn:ranks, and that would certainly be a great reduction in complexity if it can be achieved). But I agree it might be over-ambitious.

Another point, I just spotted the error condition "If the set of computed keys contains xs:untypedAtomic values that are not castable to xs:double then [the] operation will fail with a dynamic error." Why is that? All three comparisons that are used in the specification (sort(), distinct-values() and 'eq') treat untypedAtomic values as strings; I can't see where untypedAtomic-to-double conversion occurs.

dnovatchev · 2024-03-05T20:01:26Z

I would prefer to spend a little bit more time reading and understanding (being assigned with this) or hearing the person assigned to do so, than realizing when it is too-late that everybody's time was wasted at one or more meetings due to unrealized complexity and lack of understanding.

»Everybody« implies I’m part of it, but I don’t see myself involved. Are you sure others, or even all of us, share your perspective?

When I have the impression that a feature is too complex to be accepted, I tend to ask for more time before we accept it.

What I’ve indeed suggested just recently is that we should spend time on the features that have already been added to the draft, but have not been officially accepted (https://lists.w3.org/Archives/Public/public-xslt-40/2024Feb/0016.html). I didn’t get any reply, so it could be that people don’t feel it’s necessary (or again it’s a matter of time).

If it is regularly the case that I don't understand well at least 50% of what someone is writing, should I constantly raise this (might well be mistaken for having a personal grudge or embarrassment) or should we deal in a more organized, systematic way? And what if I am not the only one who feels that way and who is shy to raise their voice? Doesn't this make for a significant part of the people (maybe even the majority)?

I think it is the Chair's responsibility not to ask for a vote if there is even the slightest sense of not understanding and discomfort. Maybe we are often rushed to make decisions when we are still not fully prepared to do so? Here is where having an officially assigned independent reviewer could help everyone of us get a better understanding.

ChristianGruen · 2024-03-05T20:31:50Z

If it is regularly the case that I don't understand well at least 50% of what someone is writing, should I constantly raise this (might well be mistaken for having a personal grudge or embarrassment)

I welcome this personally (neutral language might helps to avoid irritations). In addition, I have repeatedly observed that my lack of native language skills lead to technical misunderstandings that I like to have clarified myself.

Doesn't this make for a significant part of the people (maybe even the majority)?

…could very well be the case.

I think it is the Chair's responsibility not to ask for a vote if there is even the slightest sense of not understanding and discomfort.

We should take in mind that a too strict procedure might lead to stagnancy. Several years have already passed, and we are far from finalizing version 4.

But I think we would not lose anything by spending 10 or 20 minutes of our joint time to discuss the current procedure in an upcoming meeting.

My personal hope is slightly different: I think we all should be as open-minded as possible to accept others’ thoughts and opinions. It hurts to see a PR questioned for which one has spent hours and hours to make it seemingly water-proof. However, that doesn’t prevent anyone of us to be confronted with a result that differs a lot from the initial proposal.

When saying this, I hope not to be suggestive. I don’t refer to this specific proposal; I rather have my own proposals in mind that underwent various changes before becoming accepted or eventually rejected.

ChristianGruen · 2024-03-12T20:24:19Z

@dnovatchev Thanks for the example code. I took the liberty of pasting your reply to the mailing list:

Yes, any sequence of functions can be replaced by a single function.

Here is one such example:

We are given a company's employees and each employee has a name, department and salary.

We will rank the employees first just by department, then by both department and salary - done with a single function as specified in the 2nd call to fn:ranks below:

let $employees := map{
"John Smith": map{ "dept": "Sales", "salary": 50000},
"Erin Carter": map{ "dept": "Computing", "salary": 120000},
"Ryan Gosling": map{ "dept": "Sales", "salary": 100000},
"Ann Gould": map{ "dept": "Computing", "salary": 150000},
"Pete Lagard": map{ "dept": "Sales", "salary": 50000},
"Jim Carter": map{ "dept": "Sales", "salary": 80000},
"Greg Wilson": map{ "dept": "Computing", "salary": 120000}
}
return
(
ranks(map:keys($employees), fn($emp){$employees($emp)("dept")}),

"===============================================================================",
ranks(map:keys($employees),
fn($emp){$employees($emp)("dept")
|| (let $sal := $employees($emp)("salary"),
$salDigits := string-length(string($sal))
return substring('0000000', $salDigits +1) ||
string($sal) )})
)

I see that the concatenated string seems to be based on the actual value distribution of the input data (e.g., knowledge on the maximum value)…

How would you handle arbitrary numbers (e.g. doubles)?
How would you sort a secondary double sort key in a descending order?

dnovatchev · 2024-03-12T20:34:52Z

@dnovatchev Thanks for the example code. I took the liberty of pasting your reply to the mailing list:

@Christian Grün Thanks, but I actually sent this to the mailing list and to Norm Walsh and not to any other recipient.

ChristianGruen · 2024-03-12T20:39:12Z

@Christian Grün Thanks, but I actually sent this to the mailing list and to Norm Walsh and not to any other recipient.

That’s what I wanted to say (might have been a misunderstanding?).

dnovatchev · 2024-03-12T20:43:01Z

I see that the concatenated string seems to be based on the actual value distribution of the input data (e.g., knowledge on the maximum value)…

How would you handle arbitrary numbers (e.g. doubles)?

A general answer: A double is less precise than a decimal, and the example shows how to handle decimals - thus handling doubles can be done in a similar way

How would you sort a secondary double sort key in a descending order?

By substituting it with the difference between a suitable constant and this value.For any x we use N - x where N is the largest possible value.

ChristianGruen · 2024-03-12T20:49:17Z

A general answer: A double is less precise than a decimal, and the example shows how to handle decimals - thus handling doubles can be done in a similar way

This is how I would sort ascending strings and descending doubles with two keys:

let $items := (
  map { 'name': 'A', 'size': 1e33  },
  map { 'name': 'A', 'size': .1    },
  map { 'name': 'B', 'size': 0.01  },
  map { 'name': 'B', 'size': -1e99 }
)
return sort($items, (), (fn { ?name }, fn { -?size }))

How would you do it with a single key?

dnovatchev · 2024-03-12T20:53:14Z

A general answer: A double is less precise than a decimal, and the example shows how to handle decimals - thus handling doubles can be done in a similar way

This is how I would sort ascending strings and descending doubles with two keys:
let $items := (
  map { 'name': 'A', 'size': 1e33  },
  map { 'name': 'A', 'size': .1    },
  map { 'name': 'B', 'size': 0.01  },
  map { 'name': 'B', 'size': -1e99 }
)
return sort($items, (), (fn { ?name }, fn { -?size }))
How would you do it with a single key?

There are many ways to do this.

We can even return the hash of the concatenation of the 'name' and the normalized (meaning having the same agreed upon representation) of the 'size'.

dnovatchev · 2024-03-12T21:01:12Z

A general answer: A double is less precise than a decimal, and the example shows how to handle decimals - thus handling doubles can be done in a similar way

This is how I would sort ascending strings and descending doubles with two keys:
let $items := (
  map { 'name': 'A', 'size': 1e33  },
  map { 'name': 'A', 'size': .1    },
  map { 'name': 'B', 'size': 0.01  },
  map { 'name': 'B', 'size': -1e99 }
)
return sort($items, (), (fn { ?name }, fn { -?size }))
How would you do it with a single key?
There are many ways to do this.

We can even return the hash of the concatenation of the 'name' and the normalized (meaning having the same agreed upon representation) of the 'size'.

This is the maximum double value:

dMax =1.7976931348623157E+308

Use: dMax - ?size, then convert this to a fixed-length string with the decimal representation, then concat the result to ?name.

There are many possible ways to compute the final single value, and I am not saying that I can immediately provide the best algorithm to do that.

The statement is that all this can be done with a single function.

dnovatchev · 2024-05-19T17:02:30Z

I will am closing this PR because it is from my master branch and this is not good when one has more than one open PRs.

Will re-submit it from a dedicated feature branch.

dnovatchev added 6 commits February 17, 2024 14:18

fn:ranks

ebbd656

fixed the result of Example 2

6d2f47a

Added Example 3 - World Cup Group C final standings

4a9ffc7

fixed the definition of Example 3 (3 points for a win)

5515804

Example - rplaced a period with a comma

2d991ca

fix in Example 3 removed the $ sign at the start of ranks.

1db1358

ChristianGruen changed the title ~~fn:ranks~~ 150 fn:ranks Feb 19, 2024

dnovatchev added 2 commits February 19, 2024 10:35

Reflected Michael Kay's comments

55ed089

Expanded the Summary

87f9d3f

dnovatchev added 3 commits February 19, 2024 13:22

Reflected Michael Kay's comments and added a second collation and a note

7d61d7c

Removed the word singleton

8756a29

Remuved a comma from the commented rreturn

a574c57

dnovatchev added 7 commits February 19, 2024 17:05

Added Example 4 - Synonyms translated to the same Swedish word form a…

668767b

… separate ranking

Fixed Example 4

1450e5d

Expanded the rules with a non-formal definition

393f9a6

Refined the rules

69ceb1c

Now using fn:compare to form each array-result

23b7f06

Refined the Notes

6fa30c3

compare() is now calles with $collation-key

c651d7e

ChristianGruen added the Tests Needed Tests need to be written or merged label Feb 20, 2024

Merge branch 'qt4cg:master' into master

b9df595

ChristianGruen added PR Pending A PR has been raised to resolve this issue and removed PR Pending A PR has been raised to resolve this issue labels Mar 6, 2024

dnovatchev added 4 commits March 8, 2024 17:34

Added a missing comma in Example 2

81db0f9

Merge branch 'qt4cg:master' into master

fd546b2

Merge branch 'master' of https://github.com/dnovatchev/qtspecs

d215677

Merge branch 'qt4cg:master' into master

47817ba

dnovatchev added 10 commits March 19, 2024 19:20

Merge branch 'qt4cg:master' into master

c6490fc

Merge branch 'qt4cg:master' into master

430f8cd

Merge branch 'qt4cg:master' into master

6e574d4

Merge branch 'qt4cg:master' into master

ffb3703

Merge branch 'qt4cg:master' into master

8617b20

Merge branch 'qt4cg:master' into master

6cdcfd2

Merge branch 'qt4cg:master' into master

b0627a8

just minor test

3811f42

just minor test - removed comment

8df749b

Merge branch 'qt4cg:master' into master

bf606b7

dnovatchev mentioned this pull request May 19, 2024

Add the BLAKE3 hashing algorithm to fn:hash #1226

Closed

dnovatchev closed this May 19, 2024

dnovatchev mentioned this pull request May 19, 2024

150 PR resubmission for fn ranks #1227

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

150 fn:ranks #1027

150 fn:ranks #1027

dnovatchev commented Feb 17, 2024

michaelhkay commented Feb 19, 2024

dnovatchev commented Feb 19, 2024

dnovatchev commented Feb 19, 2024

ChristianGruen commented Feb 19, 2024

dnovatchev commented Feb 19, 2024 •

edited

Loading

dnovatchev commented Feb 19, 2024

dnovatchev commented Feb 19, 2024 •

edited

Loading

michaelhkay commented Feb 20, 2024 •

edited

Loading

dnovatchev commented Feb 20, 2024 •

edited

Loading

michaelhkay commented Feb 20, 2024

dnovatchev commented Mar 5, 2024 •

edited

Loading

ChristianGruen commented Mar 5, 2024

ChristianGruen commented Mar 12, 2024

dnovatchev commented Mar 12, 2024

ChristianGruen commented Mar 12, 2024

dnovatchev commented Mar 12, 2024 •

edited by ChristianGruen

Loading

ChristianGruen commented Mar 12, 2024

dnovatchev commented Mar 12, 2024 •

edited

Loading

dnovatchev commented Mar 12, 2024

dnovatchev commented May 19, 2024

150 fn:ranks #1027

150 fn:ranks #1027

Conversation

dnovatchev commented Feb 17, 2024

michaelhkay commented Feb 19, 2024

dnovatchev commented Feb 19, 2024

dnovatchev commented Feb 19, 2024

ChristianGruen commented Feb 19, 2024

dnovatchev commented Feb 19, 2024 • edited Loading

dnovatchev commented Feb 19, 2024

dnovatchev commented Feb 19, 2024 • edited Loading

michaelhkay commented Feb 20, 2024 • edited Loading

dnovatchev commented Feb 20, 2024 • edited Loading

michaelhkay commented Feb 20, 2024

dnovatchev commented Mar 5, 2024 • edited Loading

ChristianGruen commented Mar 5, 2024

ChristianGruen commented Mar 12, 2024

dnovatchev commented Mar 12, 2024

ChristianGruen commented Mar 12, 2024

dnovatchev commented Mar 12, 2024 • edited by ChristianGruen Loading

ChristianGruen commented Mar 12, 2024

dnovatchev commented Mar 12, 2024 • edited Loading

dnovatchev commented Mar 12, 2024

dnovatchev commented May 19, 2024

dnovatchev commented Feb 19, 2024 •

edited

Loading

dnovatchev commented Feb 19, 2024 •

edited

Loading

michaelhkay commented Feb 20, 2024 •

edited

Loading

dnovatchev commented Feb 20, 2024 •

edited

Loading

dnovatchev commented Mar 5, 2024 •

edited

Loading

dnovatchev commented Mar 12, 2024 •

edited by ChristianGruen

Loading

dnovatchev commented Mar 12, 2024 •

edited

Loading