Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classification evaluation #188

Merged
merged 18 commits into from
Apr 8, 2022
Merged

Classification evaluation #188

merged 18 commits into from
Apr 8, 2022

Conversation

kMutagene
Copy link
Member

@kMutagene kMutagene commented Mar 30, 2022

This PR adds several modules, functions and types/classes for classification evaluation. For now i only focused on binary classification evaluation, but all of this can (and most likely will) be generalized for multi-label classification.

This PR will consist of 3 parts:

  1. Binary confusion matrix

    This can be used for evaluating any kind of binary test or binary classification. See https://en.wikipedia.org/wiki/Confusion_matrix

    • Add the BinaryConfusionMatrix type
    • Add tests
    • Add docs
  2. Multi-label confusion matrix

    A generalization of the binary confusion matrix for any amount of labels.

    • Add the MultiLabelConfusionMatrix type
    • Add tests
    • Add docs
  3. Comparison metrics

    Exhaustive collection of metrics that can be derived from confusion matrices

    • Add the ComparisonMetrics type
    • Macro + micro averaging for multi-label comparisons
    • static methods for calculation of single metrics
    • create confusion matrices for thresholds of classification scores (general implementation, ROC and precision-recall can be created from this)
    • Receiver-operator-characteristic (ROC) (explixit implementation)
    • Add tests
    • Add docs
  4. Equivalents of the integration methods already implemented in https://github.com/fslaborg/FSharp.Stats/blob/developer/src/FSharp.Stats/Integration/Integration.fs for estimating AUC from (x,y) data (instead of estimating AUC of a function)

    it was only possible to integrate functions with the current approximation methods, this PR aims to enable integration of (x,y) observations.

    • Minor module refactoring: Left/Right/MidPoint, Trapezoid, Simpson for integrating functions (float -> float)
    • Add Equivalents for estimating AUC from observations (Left/Right/MidPoint, Trapezoid, Simpson)
    • Add tests
    • Add docs

@kMutagene kMutagene changed the title Add binary confusion matrix with tests Add some classification evaluation things Mar 30, 2022
// get tp tn fp fn (and p/n as combinations) in one iteration
let _ =
Seq.zip actual predictions
|> Seq.iter (fun (truth,pred) ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be a possibility to omit the mutable variables by replacing the seq.iter by a fold using an anonymous record holding with tp, tn, fp and fn as fields?

Copy link
Member Author

@kMutagene kMutagene Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BinaryConfusionMatrix is now only that - a record of TP/TN/FP/FN, and is used as accumulator directly - not sure how {x with ...} compares to increasing mutable variables though.

@kMutagene kMutagene changed the title Add some classification evaluation things Classification evaluation Apr 1, 2022
@codecov-commenter
Copy link

codecov-commenter commented Apr 6, 2022

Codecov Report

Merging #188 (0d03f62) into developer (db03345) will increase coverage by 2.94%.
The diff coverage is 73.08%.

❗ Current head 0d03f62 differs from pull request most recent head 6d66b9a. Consider uploading reports for the commit 6d66b9a to get more accurate results

@@              Coverage Diff              @@
##           developer     #188      +/-   ##
=============================================
+ Coverage      24.57%   27.52%   +2.94%     
=============================================
  Files            118      121       +3     
  Lines          10969    11531     +562     
  Branches        1972     2029      +57     
=============================================
+ Hits            2696     3174     +478     
- Misses          7782     7846      +64     
- Partials         491      511      +20     
Impacted Files Coverage Δ
tests/FSharp.Stats.Tests/Main.fs 0.00% <0.00%> (ø)
tests/FSharp.Stats.Tests/Testing.fs 97.68% <ø> (+1.89%) ⬆️
src/FSharp.Stats/Integration/Integration.fs 55.75% <19.67%> (+5.75%) ⬆️
src/FSharp.Stats/Testing/ComparisonMetrics.fs 69.07% <69.07%> (ø)
tests/FSharp.Stats.Tests/TestExtensions.fs 90.90% <75.00%> (-9.10%) ⬇️
src/FSharp.Stats/Testing/ConfusionMatrix.fs 88.67% <88.67%> (ø)
tests/FSharp.Stats.Tests/Integration.fs 96.18% <96.18%> (ø)
src/FSharp.Stats/Distributions/Continuous.fs 16.04% <100.00%> (+0.07%) ⬆️
src/FSharp.Stats/Ops.fs 3.84% <0.00%> (+3.84%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db03345...6d66b9a. Read the comment docs.

@kMutagene kMutagene requested a review from ZimmerD April 6, 2022 12:33
@kMutagene kMutagene marked this pull request as ready for review April 6, 2022 12:33
@kMutagene kMutagene requested a review from bvenn April 6, 2022 12:33

Predictors can be compared by comparing the relative frequency distributions of metrics of interest for each possible (or obtained) confidence value.

Two prominent examples are the **Reciever Operating Characteristic (ROC)** or the **Precision-Recall metric**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo 'Receiver'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


(**
#### ROC curve example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add one or two sentences what you can see on the ROC plot and how to interpret the result. It is stated above, that it is used to evaluate the validity of the predictor, but at this chapter a short summary is missing (at least for me).

Edit: Add a sentence that AUC stands for 'area under the curve' and describe if it is a quality parameter that should be maximized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


Instead of integrating a function by sampling the function values in a set interval, we can also calculate the definite integral of (x,y) pairs with these methods.

This may be of use for example for calculating the area under the curve for prediction metrics such as the ROC(Reciever operator characteristic), which yields a distinct set of (Specificity/Fallout) pairs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Receiver

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

We compare the resulting values with the values of the known differential f'(x) = 3x^2, here called g(x)
## Explanation of the methods

In the following chapter, each estimation method is introduced brefly and visualized for the example of $f(x) = x^3$ in the interval $[0,1]$ using 5 partitions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: briefly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

\int_a^b f(x)\,dx \approx \frac{b - a}6 [f(a) + 4f(\frac{a+b}2) + f(b)]
$$

The integral of the whole integration interval is obtained by summing the integral of n partitions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it somehow possible to visualize simpsons rule, or to describe the strategy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

$$

The integral of the whole integration interval is obtained by summing the integral of n partitions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a high quality integration documentation 🚀

let rectWidth = x - xVals[i-1]
(rectWidth*yVals[i])
)
| Midpoint -> fun (observations: (float*float) []) ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a problem with the midpoint strategy:

  • As you can see the performed calculations are equal to the Trapezoidal method. The result would always be the same.
  • Since the midpoint rule requires a function to calculate the true value of (a+b)/2 it can only be applied if a function is given. For float*float inputs this is impossible and therefore should be either removed or midpoint should ask for a (f: float -> float) input with interval rather than a (float*float) []

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only the case for midpoint rule definite integrals for observations. i added a hint in the xml docs fr the midpoint rule.

let expected = 0.25
Expect.floatClose Accuracy.low actual expected "LeftEndpoint did not return the correct result"
)
testCase "Midpoint x^3" (fun _ ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment in Integration.fs
While your estimation is correct for x^3 it won't be correct for -x^3+2x^2 in the interval of [1,2]

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

) =
fun (data: (float*float) []) -> NumericalIntegrationMethod.integrateObservations method data |> Seq.sum


module DefiniteIntegral =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed. Make sure to replace any other references.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@bvenn bvenn merged commit dd76c80 into developer Apr 8, 2022
@bvenn bvenn deleted the mlstuff branch April 8, 2022 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants