Classification evaluation #188

kMutagene · 2022-03-30T15:29:25Z

This PR adds several modules, functions and types/classes for classification evaluation. For now i only focused on binary classification evaluation, but all of this can (and most likely will) be generalized for multi-label classification.

This PR will consist of 3 parts:

Binary confusion matrix

This can be used for evaluating any kind of binary test or binary classification. See https://en.wikipedia.org/wiki/Confusion_matrix
- Add the BinaryConfusionMatrix type
- Add tests
- Add docs
Multi-label confusion matrix

A generalization of the binary confusion matrix for any amount of labels.
- Add the MultiLabelConfusionMatrix type
- Add tests
- Add docs
Comparison metrics

Exhaustive collection of metrics that can be derived from confusion matrices
- Add the ComparisonMetrics type
- Macro + micro averaging for multi-label comparisons
- static methods for calculation of single metrics
- create confusion matrices for thresholds of classification scores (general implementation, ROC and precision-recall can be created from this)
- Receiver-operator-characteristic (ROC) (explixit implementation)
- Add tests
- Add docs
Equivalents of the integration methods already implemented in https://github.com/fslaborg/FSharp.Stats/blob/developer/src/FSharp.Stats/Integration/Integration.fs for estimating AUC from (x,y) data (instead of estimating AUC of a function)

it was only possible to integrate functions with the current approximation methods, this PR aims to enable integration of (x,y) observations.
- Minor module refactoring: Left/Right/MidPoint, Trapezoid, Simpson for integrating functions (float -> float)
- Add Equivalents for estimating AUC from observations (Left/Right/MidPoint, Trapezoid, Simpson)
- Add tests
- Add docs

ZimmerD · 2022-03-31T09:24:03Z

src/FSharp.Stats/Testing/ConfusionMatrix.fs

+    // get tp tn fp fn (and p/n as combinations) in one iteration
+    let _ = 
+        Seq.zip actual predictions
+        |> Seq.iter (fun (truth,pred) ->


would it be a possibility to omit the mutable variables by replacing the seq.iter by a fold using an anonymous record holding with tp, tn, fp and fn as fields?

BinaryConfusionMatrix is now only that - a record of TP/TN/FP/FN, and is used as accumulator directly - not sure how {x with ...} compares to increasing mutable variables though.

… for multi label comparisons

… method

…onfusion matrices

…dictions

codecov-commenter · 2022-04-06T07:23:53Z

Codecov Report

Merging #188 (0d03f62) into developer (db03345) will increase coverage by 2.94%.
The diff coverage is 73.08%.

❗ Current head 0d03f62 differs from pull request most recent head 6d66b9a. Consider uploading reports for the commit 6d66b9a to get more accurate results

@@              Coverage Diff              @@
##           developer     #188      +/-   ##
=============================================
+ Coverage      24.57%   27.52%   +2.94%     
=============================================
  Files            118      121       +3     
  Lines          10969    11531     +562     
  Branches        1972     2029      +57     
=============================================
+ Hits            2696     3174     +478     
- Misses          7782     7846      +64     
- Partials         491      511      +20

Impacted Files	Coverage Δ
tests/FSharp.Stats.Tests/Main.fs	`0.00% <0.00%> (ø)`
tests/FSharp.Stats.Tests/Testing.fs	`97.68% <ø> (+1.89%)`	⬆️
src/FSharp.Stats/Integration/Integration.fs	`55.75% <19.67%> (+5.75%)`	⬆️
src/FSharp.Stats/Testing/ComparisonMetrics.fs	`69.07% <69.07%> (ø)`
tests/FSharp.Stats.Tests/TestExtensions.fs	`90.90% <75.00%> (-9.10%)`	⬇️
src/FSharp.Stats/Testing/ConfusionMatrix.fs	`88.67% <88.67%> (ø)`
tests/FSharp.Stats.Tests/Integration.fs	`96.18% <96.18%> (ø)`
src/FSharp.Stats/Distributions/Continuous.fs	`16.04% <100.00%> (+0.07%)`	⬆️
src/FSharp.Stats/Ops.fs	`3.84% <0.00%> (+3.84%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db03345...6d66b9a. Read the comment docs.

bvenn · 2022-04-07T08:42:32Z

docs/ComparisonMetrics.fsx

+
+Predictors can be compared by comparing the relative frequency distributions of metrics of interest for each possible (or obtained) confidence value.
+
+Two prominent examples are the **Reciever Operating Characteristic (ROC)** or the **Precision-Recall metric**


typo 'Receiver'

bvenn · 2022-04-07T08:45:09Z

docs/ComparisonMetrics.fsx

+
+(**
+#### ROC curve example
+


Could you please add one or two sentences what you can see on the ROC plot and how to interpret the result. It is stated above, that it is used to evaluate the validity of the predictor, but at this chapter a short summary is missing (at least for me).

Edit: Add a sentence that AUC stands for 'area under the curve' and describe if it is a quality parameter that should be maximized.

bvenn · 2022-04-07T08:50:47Z

docs/Integration.fsx

+
+Instead of integrating a function by sampling the function values in a set interval, we can also calculate the definite integral of (x,y) pairs with these methods.
+
+This may be of use for example for calculating the area under the curve for prediction metrics such as the ROC(Reciever operator characteristic), which yields a distinct set of (Specificity/Fallout) pairs.


typo: Receiver

bvenn · 2022-04-07T08:52:17Z

docs/Integration.fsx

-We compare the resulting values with the values of the known differential f'(x) = 3x^2, here called g(x)
+## Explanation of the methods
+
+In the following chapter, each estimation method is introduced brefly and visualized for the example of $f(x) = x^3$ in the interval $[0,1]$ using 5 partitions.


typo: briefly

bvenn · 2022-04-07T08:57:12Z

docs/Integration.fsx

+\int_a^b f(x)\,dx \approx \frac{b - a}6 [f(a) + 4f(\frac{a+b}2) + f(b)]
+$$
+
+The integral of the whole integration interval is obtained by summing the integral of n partitions.


Is it somehow possible to visualize simpsons rule, or to describe the strategy?

bvenn · 2022-04-07T09:03:03Z

docs/Integration.fsx

+$$
+
+The integral of the whole integration interval is obtained by summing the integral of n partitions.
+


This is a high quality integration documentation 🚀

bvenn · 2022-04-07T09:24:23Z

src/FSharp.Stats/Integration/Integration.fs

+                    let rectWidth = x - xVals[i-1]
+                    (rectWidth*yVals[i])
+            )
+        | Midpoint -> fun (observations: (float*float) []) ->


I think there is a problem with the midpoint strategy:

As you can see the performed calculations are equal to the Trapezoidal method. The result would always be the same.

Since the midpoint rule requires a function to calculate the true value of (a+b)/2 it can only be applied if a function is given. For float*float inputs this is impossible and therefore should be either removed or midpoint should ask for a (f: float -> float) input with interval rather than a (float*float) []

this is only the case for midpoint rule definite integrals for observations. i added a hint in the xml docs fr the midpoint rule.

bvenn · 2022-04-07T09:42:56Z

tests/FSharp.Stats.Tests/Integration.fs

+                let expected = 0.25
+                Expect.floatClose Accuracy.low actual expected "LeftEndpoint did not return the correct result"
+            )
+            testCase "Midpoint x^3" (fun _ ->


see comment in Integration.fs
While your estimation is correct for x^3 it won't be correct for -x^3+2x^2 in the interval of [1,2]

bvenn · 2022-04-07T14:22:48Z

src/FSharp.Stats/Integration/Integration.fs

+    ) = 
+        fun (data: (float*float) []) -> NumericalIntegrationMethod.integrateObservations method data |> Seq.sum
+
+
 module DefiniteIntegral =


This can be removed. Make sure to replace any other references.

kMutagene mentioned this pull request Mar 30, 2022

update to project-based build pipeline, use .net 6.0 #189

Merged

2 tasks

kMutagene changed the title ~~Add binary confusion matrix with tests~~ Add some classification evaluation things Mar 30, 2022

kMutagene force-pushed the mlstuff branch from 2cb46e3 to a8fd155 Compare March 31, 2022 08:21

ZimmerD reviewed Mar 31, 2022

View reviewed changes

kMutagene changed the title ~~Add some classification evaluation things~~ Classification evaluation Apr 1, 2022

kMutagene added 13 commits April 6, 2022 09:05

Add binary confusion matrix with tests

b7e8d7f

Rework integration module, add observation integration and tests

dc3ee45

Split binary confusion matrix and comparison metrics

00262ac

add multi label confusion matrix

bbe7151

Refactor comparison metrics as record type, add macro/micro averaging…

05036d2

… for multi label comparisons

Add single calculation methods for all comparisn metrics

e9972cf

fix one-vs-rest binary confusion matrix creation from multi label cm

1ee2ae3

add more tests

41cd736

Add possibility to return partition integrals, fix naming of Midpoint…

38ade2c

… method

Add integration docs, reorder sidebar

458f42f

Add macro/micro-average tests for comparison metrics of multi label c…

a97d882

…onfusion matrices

Add threshold map and roc creation for both binary and multilabel pre…

e46b86b

…dictions

Add Comparison metrics docs

568266b

kMutagene force-pushed the mlstuff branch from c3f23bc to 568266b Compare April 6, 2022 07:11

Add tests for metric threshold map functions

0ea832f

kMutagene requested a review from ZimmerD April 6, 2022 12:33

kMutagene marked this pull request as ready for review April 6, 2022 12:33

kMutagene requested a review from bvenn April 6, 2022 12:33

bvenn requested changes Apr 7, 2022

View reviewed changes

bvenn reviewed Apr 7, 2022

View reviewed changes

kMutagene added 4 commits April 8, 2022 12:40

Integration: better naming, more optimizations

b000799

doc typo fixes, add simpsons rule explanation

0d03f62

ADD ROC description

b3e1e49

Finish up comparison metrics docs

6d66b9a

bvenn approved these changes Apr 8, 2022

View reviewed changes

bvenn merged commit dd76c80 into developer Apr 8, 2022

bvenn deleted the mlstuff branch April 8, 2022 13:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classification evaluation #188

Classification evaluation #188

kMutagene commented Mar 30, 2022 •

edited

Loading

ZimmerD Mar 31, 2022

kMutagene Apr 1, 2022 •

edited

Loading

codecov-commenter commented Apr 6, 2022 •

edited

Loading

bvenn Apr 7, 2022

kMutagene Apr 8, 2022

bvenn Apr 7, 2022

kMutagene Apr 8, 2022

bvenn Apr 7, 2022

kMutagene Apr 8, 2022

bvenn Apr 7, 2022

kMutagene Apr 8, 2022

bvenn Apr 7, 2022

kMutagene Apr 8, 2022

bvenn Apr 7, 2022

bvenn Apr 7, 2022

kMutagene Apr 8, 2022

bvenn Apr 7, 2022

kMutagene Apr 8, 2022

bvenn Apr 7, 2022

kMutagene Apr 8, 2022


		Predictors can be compared by comparing the relative frequency distributions of metrics of interest for each possible (or obtained) confidence value.

		Two prominent examples are the Reciever Operating Characteristic (ROC) or the Precision-Recall metric


		Instead of integrating a function by sampling the function values in a set interval, we can also calculate the definite integral of (x,y) pairs with these methods.

		This may be of use for example for calculating the area under the curve for prediction metrics such as the ROC(Reciever operator characteristic), which yields a distinct set of (Specificity/Fallout) pairs.

		$$

		The integral of the whole integration interval is obtained by summing the integral of n partitions.

Classification evaluation #188

Classification evaluation #188

Conversation

kMutagene commented Mar 30, 2022 • edited Loading

Choose a reason for hiding this comment

kMutagene Apr 1, 2022 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Apr 6, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kMutagene commented Mar 30, 2022 •

edited

Loading

kMutagene Apr 1, 2022 •

edited

Loading

codecov-commenter commented Apr 6, 2022 •

edited

Loading