-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
150 fn:ranks #1027
150 fn:ranks #1027
Conversation
Looking at the first example, why do we want to return To make this change In all of the examples given, the supplied key function returns a single item. But it is allowed (according to the signature) to return any sequence of atomic values. I don't think I understand the intended behaviour when it returns multiple items. The predicate What is the effect of NaN values? The supplied $collation is used when sorting the values, but not when deciding whether they are distinct. Is that right? Specification style: this is the subject of a separate issue. We should either provide an expression that can act as an implementation of the function being specified, or we should provide a user-written function that has the same effect. In this case I think providing an expression will do the job. Alternatively, wouldn't an XQuery expression using group by and order by be clearer (perhaps not, since FLWOR expressions cannot have a dynamic collation). Summary: I would suggest: Sorts a supplied sequence based on the value of a sort key function, grouping the results so that items with the same key appear together as members of the same array. |
@michaelhkay Thank you for these observations. I am studying them and will respond. |
A good observation. Probably I wanted the key() function to be most general, but it feels difficult to find an immediate and compelling example.
Yes, then we would need a function such as |
Off-topic, but maybe we can ask the same question for the scan functions: Wouldn’t singleton members be more intuitive? |
Aren't
The function I prefer this to be as simple as possible. The |
A function definition is an expression, isn't it?
A good one, thanks. I will incorporate these suggestions now. |
Thinking further on this, if the It is a pity we don't have the set type yet, otherwise the type of $input would more precisely be specified as set and the question about making the input values distinct would be eliminated, This will also make a fine example - maybe close synonyms will have the same translation and would thus be in the same ranking set. What do you think? |
Thanks for responding to my comments. I find it hard to believe that multiple collations are needed here; on the contrary, the way that sort keys are compared using distinct-values needs to be consistent with the way they are compared using sort. I'm also worried that there's a third comparison being done using I would like to suggest an alternative approach.
The changes needed to the fn:sort rules might be primarily, under "The result of the function is obtained as follows:" change rules 1, 3, and 4 as follows:
|
Thank you, @michaelhkay , I understand exactly what you are proposing, and yes, this is possible, however it becomes overly (and is that necessary?) complicated. In particular, I never wanted to have a sequence of key-functions, and it seems that just one function can internally perform multiple comparisons, if that is necessary at all. Also, by definition, As for comparing NaN values, can't we just say that NaN is less than any other item, and for the purposes of this function NaN is equal to NaN? Thus no additional collation for treating NaN would be necessary. |
… separate ranking
I think there's a lot of complexity in the current proposed spec, which compares values in three different ways: For example, sort and distinct-values treat two NaNs as equal, while Another point, I just spotted the error condition "If the set of computed keys contains xs:untypedAtomic values that are not castable to xs:double then [the] operation will fail with a dynamic error." Why is that? All three comparisons that are used in the specification ( |
If it is regularly the case that I don't understand well at least 50% of what someone is writing, should I constantly raise this (might well be mistaken for having a personal grudge or embarrassment) or should we deal in a more organized, systematic way? And what if I am not the only one who feels that way and who is shy to raise their voice? Doesn't this make for a significant part of the people (maybe even the majority)? I think it is the Chair's responsibility not to ask for a vote if there is even the slightest sense of not understanding and discomfort. Maybe we are often rushed to make decisions when we are still not fully prepared to do so? Here is where having an officially assigned independent reviewer could help everyone of us get a better understanding. |
I welcome this personally (neutral language might helps to avoid irritations). In addition, I have repeatedly observed that my lack of native language skills lead to technical misunderstandings that I like to have clarified myself.
…could very well be the case.
We should take in mind that a too strict procedure might lead to stagnancy. Several years have already passed, and we are far from finalizing version 4. But I think we would not lose anything by spending 10 or 20 minutes of our joint time to discuss the current procedure in an upcoming meeting. My personal hope is slightly different: I think we all should be as open-minded as possible to accept others’ thoughts and opinions. It hurts to see a PR questioned for which one has spent hours and hours to make it seemingly water-proof. However, that doesn’t prevent anyone of us to be confronted with a result that differs a lot from the initial proposal. When saying this, I hope not to be suggestive. I don’t refer to this specific proposal; I rather have my own proposals in mind that underwent various changes before becoming accepted or eventually rejected. |
@dnovatchev Thanks for the example code. I took the liberty of pasting your reply to the mailing list: Yes, any sequence of functions can be replaced by a single function. Here is one such example: We are given a company's employees and each employee has a name, department and salary. We will rank the employees first just by department, then by both department and salary - done with a single function as specified in the 2nd call to fn:ranks below: let $employees := map{ "===============================================================================", I see that the concatenated string seems to be based on the actual value distribution of the input data (e.g., knowledge on the maximum value)…
|
@Christian Grün Thanks, but I actually sent this to the mailing list and to Norm Walsh and not to any other recipient. |
That’s what I wanted to say (might have been a misunderstanding?). |
A general answer: A double is less precise than a decimal, and the example shows how to handle decimals - thus handling doubles can be done in a similar way
By substituting it with the difference between a suitable constant and this value.For any |
This is how I would sort ascending strings and descending doubles with two keys: let $items := (
map { 'name': 'A', 'size': 1e33 },
map { 'name': 'A', 'size': .1 },
map { 'name': 'B', 'size': 0.01 },
map { 'name': 'B', 'size': -1e99 }
)
return sort($items, (), (fn { ?name }, fn { -?size })) How would you do it with a single key? |
There are many ways to do this. We can even return the hash of the concatenation of the 'name' and the normalized (meaning having the same agreed upon representation) of the 'size'. |
This is the maximum double value: dMax = Use: There are many possible ways to compute the final single value, and I am not saying that I can immediately provide the best algorithm to do that. The statement is that all this can be done with a single function. |
I will am closing this PR because it is from my master branch and this is not good when one has more than one open PRs. Will re-submit it from a dedicated feature branch. |
As proposed and discussed here: #150