Unify and fix rank module result types #183

bvenn · 2022-01-31T15:53:56Z

There are 4 ranking methods in FSharp.Stats.Rank:

example = [5,3,3,4,2]

rankFirst - when ties are present, rank them corresponding to the order of their appearance
- result :int [] = [5,2,3,4,1]
rankMin - when ties are present, their rank correspond to the rank of the first occurrence
- result :float [] = [5,2,2,4,1]
rankMax - when ties are present, their rank correspond to the rank of the last occurrence
- result :float [] = [5,3,3,4,1]
rankAvg - when ties are present, their rank correspond to the average of all tie ranks
- result :float [] = [5,2.5,2.5,4,1]

By now, all functions except rankFirst result in float arrays. The only function where floats can occur is rankAvg. For harmonization I would suggest, that rankFirst as well should report a float array, although it would be a breaking change.
There seems to be an issue, that ties are not ranked correct here:

FSharp.Stats/src/FSharp.Stats/Rank.fs

Line 65 in 46ab30a

let minTies a _ = (float a + 1.)

It seems the +1 increment belongs to rankMax rather than rankMin.
EDIT 01/02/22: THIS WAS A TEMPORAL LOCAL ERROR!

A fix is on the way!

The text was updated successfully, but these errors were encountered:

bvenn · 2022-01-31T15:56:38Z

in line 23 there is abs (a-b) <= 0. Since an absolute cannot be smaller than 0 it can be changed to abs (a-b) = 0

FSharp.Stats/src/FSharp.Stats/Rank.fs

Line 23 in 46ab30a

if (abs (data'.[i] - data'.[pi]) <= zero) then

bvenn · 2022-02-01T09:18:28Z

Apparently, restarting the computer fixed issue 2. Unit tests are added and changes are on their way.

bvenn · 2022-02-01T10:23:25Z

How to handle nan and infinity values within the sequence?

At this moment the ranking order is as follows:

nan
-infinity
all real numbers
infinity

nans and infinities are treated as individual elements:

let example = [|-infinity;1;nan;infinity;infinity|]
rankFirst =   [|    2;    3; 1;    4;       5    |]
rankMin   =   [|    2;    3; 1;    4;       5    |]
rankMax   =   [|    2;    3; 1;    4;       5    |]
rankAvg   =   [|    2;    3; 1;    4;       5    |]

How do others handle nans?

pandas has the following options for na_option:
- keep (default): assign NaN rank to NaN values
- top: assign lowest rank to NaN values
- bottom: assign highest rank to NaN values
R! has the following options for na.last:
- TRUE (default), missing values in the data are put last
- FALSE, they are put first
- NA, they are removed
- "keep" they are kept with rank NA

Suggestion

I would recommend to assign nan ranks to nan values as default case.

let example = [|-infinity;1;nan;infinity;infinity|]
rankFirst =   [|    1;    2;nan;   3;       4    |]

What do you think?
@muehlhaus
@kMutagene
@ZimmerD

muehlhaus · 2022-02-01T10:45:00Z

Yes, I think your suggestion is very good.

bvenn · 2022-02-02T08:55:40Z

I've just created a update-rank branch to solve all issues. By default nan is sorted to the start of a sequence. This corrupts the loop of the implemented version. There are two possibilities to solve it:

Change the comparer to sort nan to the end. Afterwards the loop can be modified, that
- a) nan ranks are assigned to nans. This version is implemented in f9e8a98 but because in every comparison the new comparer compNaNLast has to perform two nan checks the performance is reduced 20fold. Therefore I added specialized functions rankFirst -> rankFirstNaNLast
- b) end the loop since all following nans must be nan and no comparison must be performed. All zeros can then be replaced by nans.
Leave the sorting as it is and add a counter to the loop that counts whenever a nan occurs. Set its rank to nan and subtract the counter value from the rank of all following real values. Thereby the nan checks are reduced by half. Because these checks are not necessary for all other types than nan, a type query in the beginning could help by reducing the number of nan checks.

bvenn closed this as completed in fad65e0 Feb 1, 2022

bvenn reopened this Feb 1, 2022

bvenn added a commit that referenced this issue May 26, 2022

fix rank module result type #183

c911277

bvenn added a commit that referenced this issue May 26, 2022

add rank documentation #183

5b17a5f

bvenn added a commit that referenced this issue May 26, 2022

add obsolete tags #183

174115d

bvenn mentioned this issue May 26, 2022

Update rank module #204

Merged

2 tasks

bvenn closed this as completed in #204 May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify and fix rank module result types #183

Unify and fix rank module result types #183

bvenn commented Jan 31, 2022 •

edited

Loading

bvenn commented Jan 31, 2022

bvenn commented Feb 1, 2022

bvenn commented Feb 1, 2022

muehlhaus commented Feb 1, 2022

bvenn commented Feb 2, 2022

Unify and fix rank module result types #183

Unify and fix rank module result types #183

Comments

bvenn commented Jan 31, 2022 • edited Loading

bvenn commented Jan 31, 2022

bvenn commented Feb 1, 2022

bvenn commented Feb 1, 2022

How to handle nan and infinity values within the sequence?

How do others handle nans?

Suggestion

muehlhaus commented Feb 1, 2022

bvenn commented Feb 2, 2022

bvenn commented Jan 31, 2022 •

edited

Loading