Rank Data, Correlation, Covariance, R Squared #3484

jdunkerley · 2022-05-25T15:34:24Z

Pull Request Description

Added new Statistics: Covariance, Pearson, Spearman, R Squared
Added covariance_matrix function
Added pearson_correlation function to compute correlation matrix
Added rank_data and Rank_Method type to create rankings of a Vector
Added spearman_correlation function to compute Spearman Rank correlation matrix

Important Notes

Added Panic.throw_wrapped_if_error and Panic.handle_wrapped_dataflow_error to help with errors within a loop.
Removed Array.set_at use from Table.Vector_Builder

Checklist

Please include the following checklist in your PR:

The documentation has been updated if necessary.
All code conforms to the
Scala,
Java,
and
Rust
style guides.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed: Enso GUI was tested when built using BOTH
  ./run.sh ide dist and ./run.sh ide watch.

distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics.enso

distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics/Rank_Method.enso

radeusgd · 2022-05-30T08:45:37Z

distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics/Rank_Method.enso

+## Specifies how to handle ranking of equal values.
+type Rank_Method
+    ## Use the mean of all ranks for equal values.
+    type Average
+
+    ## Use the lowest of all ranks for equal values.
+    type Minimum
+
+    ## Use the highest of all ranks for equal values.
+    type Maximum
+
+    ## Use same rank value for equal values and next group is the immediate
+       following ranking number.
+    type Dense
+
+    ## Equal values are assigned the next rank in order that they occur.
+    type Ordinal


This explanation is much better, but I'm still thinking if it would be worth to maybe offer some examples? Not sure if here or next to some method using this. But I'm still not sure if I correctly understand how Dense or Average work

radeusgd

Looks good to me.

Just two questions:

I did check the code, but did not read deep into formulas for each statistic, I assume they are correct. But if you want me to double check them, I can do that.
What are the reference values for the results coming from? I guess Excel? I'm wondering if it would make sense to indicate somehow (in a very short way) how the references are computed if it is something that is not completely trivial, just in case someone wanted to double check them later if some issues were to appear.

distribution/lib/Standard/Base/0.0.0-dev/src/Error/Common.enso

radeusgd · 2022-05-30T09:09:35Z

std-bits/base/src/main/java/org/enso/base/statistics/Rank.java

+          case MINIMUM -> start + 1;
+          case MAXIMUM -> index;
+          case DENSE -> dense;
+          case AVERAGE -> (start + 1 + index) / 2.0;


I'm just wondering out of curiosity in which situation one may want to use the average ranking?

(and why is it the default one? not saying it shouldn't be, just a honest question, because I haven't seen that)

Added test cases

Create helpers for promoting dataflow errors as panics. Remove set_at where possible.

Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>

distribution/lib/Standard/Base/0.0.0-dev/src/Data/Vector.enso

std-bits/base/src/main/java/org/enso/base/statistics/CorrelationStatistics.java

- Added new `Statistic`s: Covariance, Pearson, Spearman, R Squared - Added `covariance_matrix` function - Added `pearson_correlation` function to compute correlation matrix - Added `rank_data` and Rank_Method type to create rankings of a Vector - Added `spearman_correlation` function to compute Spearman Rank correlation matrix # Important Notes - Added `Panic.throw_wrapped_if_error` and `Panic.handle_wrapped_dataflow_error` to help with errors within a loop. - Removed `Array.set_at` use from `Table.Vector_Builder`

jdunkerley marked this pull request as ready for review May 26, 2022 16:12

jdunkerley requested review from 4e6 and radeusgd as code owners May 26, 2022 16:12

jdunkerley force-pushed the wip/jd/covariance-182059993 branch from e5c14ac to 3b80fb9 Compare May 26, 2022 17:00

jdunkerley requested a review from hubertp May 26, 2022 17:14

radeusgd reviewed May 30, 2022

View reviewed changes

jdunkerley force-pushed the wip/jd/covariance-182059993 branch from 2fe1ae5 to 4a15578 Compare May 30, 2022 08:13

jdunkerley requested a review from radeusgd May 30, 2022 08:14

radeusgd reviewed May 30, 2022

View reviewed changes

radeusgd approved these changes May 30, 2022

View reviewed changes

jdunkerley and others added 12 commits May 30, 2022 14:45

Covariance and Correlation matrices

c9dc5b4

WIP

e9d8638

Adding Rank functionality to Java and Enso

8516c35

Add rank_data to Statistics module

b0bf554

Single statistic version of Correlation, Covariance and RSquared.

18dbeb5

Added test cases

Error handling in spearman matrix still to do

80f07d7

Add Stats tests to Main.enso.

b127fb8

Create helpers for promoting dataflow errors as panics. Remove set_at where possible.

Changelog

5be226f

Formatting

1fc867d

Extra doc-strings

44e05fa

Update distribution/lib/Standard/Base/0.0.0-dev/src/Error/Common.enso

5c1488e

Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>

Update distribution/lib/Standard/Base/0.0.0-dev/src/Error/Common.enso

e3e75d4

Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>

jdunkerley force-pushed the wip/jd/covariance-182059993 branch from df9297b to e3e75d4 Compare May 30, 2022 13:45

jdunkerley added the CI: Ready to merge This PR is eligible for automatic merge label May 30, 2022

hubertp approved these changes May 30, 2022

View reviewed changes

distribution/lib/Standard/Base/0.0.0-dev/src/Data/Vector.enso Show resolved Hide resolved

std-bits/base/src/main/java/org/enso/base/statistics/CorrelationStatistics.java Show resolved Hide resolved

jdunkerley and others added 2 commits May 30, 2022 15:53

Hubert's PR comments.

a4fa1d4

Merge branch 'develop' into wip/jd/covariance-182059993

4e7ea37

mergify bot merged commit 1aa0bb3 into develop May 30, 2022

mergify bot deleted the wip/jd/covariance-182059993 branch May 30, 2022 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rank Data, Correlation, Covariance, R Squared #3484

Rank Data, Correlation, Covariance, R Squared #3484

jdunkerley commented May 25, 2022 •

edited

Loading

radeusgd May 30, 2022

radeusgd left a comment

radeusgd May 30, 2022

Rank Data, Correlation, Covariance, R Squared #3484

Rank Data, Correlation, Covariance, R Squared #3484

Conversation

jdunkerley commented May 25, 2022 • edited Loading

Pull Request Description

Important Notes

Checklist

radeusgd May 30, 2022

Choose a reason for hiding this comment

radeusgd left a comment

Choose a reason for hiding this comment

radeusgd May 30, 2022

Choose a reason for hiding this comment

jdunkerley commented May 25, 2022 •

edited

Loading