#330 add checks for metrics #358

JumpingDino · 2023-09-12T03:00:17Z

Description

Checks are good for metrics calculation. It's a good idea to assure the vectors (y_pred, y_probs) are the same size and don't have NaNs or infs. For this, two functions were created:

check_arrays_length
check_array_nan

Fixes #330

Type of change

Create some checks on utils.py and these functions are used in the metrics in metrics.py

How Has This Been Tested?

Creation of tests in test_utils.py

test_length
test_inf_values
test_nan_values

Checklist

I have read the contributing guidelines
I have updated the HISTORY.rst and AUTHORS.rst files
Linting passes successfully : make lint
Typing passes successfully : make type-check
Unit tests pass successfully : make tests
Coverage is 100% : make coverage
Documentation builds successfully : make doc

JumpingDino · 2023-09-12T03:06:24Z

When I run the coverage I have the following results:

However, when I look at the data in test_calibration.py, we have some NaNs on our data

In what cases I expect to have data with NaNs?
Should I refactor these np.nan?
What is your recommendation?

LacombeLouis · 2023-09-12T13:12:48Z

Maybe we need to check that there are no Nans per row? I think that would be a good start! 😄
We should always expect to have Nans in the calibration (Top-Label), since we only re-calibrate the top-label.

…trics

…y over all values

JumpingDino · 2023-09-23T16:38:53Z

Awesome @LacombeLouis !! Thanks for the idea.
I implemented a function that see if the array has only NaNs.
I broke the function check_array_nan in check_array_nan and check_array_inf.
Thanks for the support and feel free to give feedback on code :)

codecov-commenter · 2023-09-24T11:29:40Z

Codecov Report

All modified lines are covered by tests ✅

Files	Coverage Δ
mapie/metrics.py	`100.00% <100.00%> (ø)`
mapie/tests/test_metrics.py	`100.00% <100.00%> (ø)`
mapie/tests/test_utils.py	`100.00% <100.00%> (ø)`
mapie/utils.py	`100.00% <100.00%> (ø)`

📢 Thoughts on this report? Let us know!.

mapie/metrics.py

vincentblot28

Thank you for the PR ! Great job :)

LacombeLouis

Thank you for the PR! It's great!

Thinking out loud here: do you think it would be smart to add a check for the metrics where we expect a score between 0 and 1. @JumpingDino @vincentblot28 @thibaultcordier

mapie/utils.py

thibaultcordier

Excellent PR, thank you for your contribution! A few suggestions to refine your code:

Replace all NDArray with ArrayLike.
Delete some line breaks.

You can use the functionality directly on GitHub to commit/push directly if you wish.

mapie/utils.py

mapie/metrics.py

mapie/utils.py

JumpingDino · 2023-09-30T12:17:32Z

Hi people, actually with the actual code we coverage all the steps, however when I change the check_arrays_length function to use ArrayLike type the check-type throws the errors said above.

Could someone explain the major differences of ArrayLike and NDArray as an argument please? Maybe you can give some reference and/or ideas of benefits of using the ArrayLike type.
@LacombeLouis I think it makes sense to check the boundaries of the elements of the array for classification tasks!
Do you think it makes sense to implement these sanity checks (check_nan, inf, length) in a single function? By this way I expect to have less verbosity over our metrics.py functions, what do you think?

thibaultcordier

Hi @JumpingDino,

Could someone explain the major differences of ArrayLike and NDArray as an argument please? Maybe you can give some reference and/or ideas of benefits of using the ArrayLike type.

ArrayLike: https://numpy.org/devdocs/reference/typing.html#numpy.typing.ArrayLike

A Union representing objects that can be coerced into an ndarray.

NDArray: https://numpy.org/devdocs/reference/typing.html#numpy.typing.NDArray

Can be used during runtime for typing arrays with a given dtype and unspecified shape.

ArrayLike is more generic than NDArray except that ArrayLike must be cast in NDArray to be used. In the end, we use the parameters as an Numpy array. Sometimes, you may have to manipulate pandas type arrays, which is why we want to take into account all types of array.

When I see the error you encountered, I think we can make the assumption that the parameters are already of type NDArray (if this is not the case, we can simply cast it with numpy and use the check_array function test to check).

I've suggested a modification. You can add it directly in GitHub (you have a function that lets you commit suggestions directly in the browser).

Do you think it makes sense to implement these sanity checks (check_nan, inf, length) in a single function? By this way I expect to have less verbosity over our metrics.py functions, what do you think?

Indeed, it could make sense to implement them in a single function. But as it stands, your contribution fulfils the desired objective. You are free to make this change if you wish.

mapie/utils.py

fixing typos Co-authored-by: Thibault Cordier <124613154+thibaultcordier@users.noreply.github.com>

fixing docstring from check_array_nan Co-authored-by: Thibault Cordier <124613154+thibaultcordier@users.noreply.github.com>

fixing docstring from check_array_inf Co-authored-by: Thibault Cordier <124613154+thibaultcordier@users.noreply.github.com>

fixing docstring from check_arrays_length Co-authored-by: Thibault Cordier <124613154+thibaultcordier@users.noreply.github.com>

JumpingDino · 2023-10-06T08:42:49Z

I though about the idea to make the checks from 0 to 1 but this would require me to understand which are classification metrics and I'm kinda worried about some scenarios this may not happen. Maybe we could just finish this issue and we revisit with more advanced checks later.

I think we're good to go right? What do you think?

I'm available for any refinements! and thanks for the knowledge :)

thibaultcordier · 2023-10-06T09:33:11Z

I though about the idea to make the checks from 0 to 1 but this would require me to understand which are classification metrics and I'm kinda worried about some scenarios this may not happen. Maybe we could just finish this issue and we revisit with more advanced checks later.

I think we're good to go right? What do you think?

I agree with you, we could finish this PR and open a new issue for more advanced checks.

I'm available for any refinements! and thanks for the knowledge :)

Yes, just one more turn: can you add a new line in HISTORY.rst specifying that you are adding new checks, and a new line in CONTRIBUTING.rst adding your name (if you wish)? I'll be approving your PR next :)

JumpingDino added 11 commits September 11, 2023 02:38

check lower and upper bounds for regression coverage

0b85b9f

add a function to check array nans and infs

111d96b

minor fix to output bool

2cd0093

adding nan checks

0984a54

correct raise in nan checking. create check array length

ec1950c

add tests

9a2bacd

add checks in runtime

be293d9

fixing checks ordering

eb32639

evaluation of tests

6749c93

flake8 linting

8956c3c

fixes for mypy

5b9ba62

LacombeLouis requested review from LacombeLouis, vincentblot28 and thibaultcordier September 12, 2023 13:13

LacombeLouis assigned JumpingDino Sep 12, 2023

JumpingDino and others added 5 commits September 23, 2023 09:03

Merge branch 'scikit-learn-contrib:master' into 330-add-checks-for-me…

d742a48

…trics

break a function to check NaN and inf values and refactor NaN to appl…

7af0305

…y over all values

refactor to test array to see if all values are NaNs

30cf7f9

add checks for infinite values

5fc0810

add test for check_array_inf

426c926

vincentblot28 reviewed Sep 25, 2023

View reviewed changes

mapie/metrics.py Show resolved Hide resolved

vincentblot28 reviewed Sep 25, 2023

View reviewed changes

mapie/metrics.py Outdated Show resolved Hide resolved

rafael-saraiva-dh added 3 commits September 25, 2023 12:45

ordering imports

0832633

remove irrelevant try except

c47900e

check array length

678545f

vincentblot28 approved these changes Sep 26, 2023

View reviewed changes

LacombeLouis requested changes Sep 26, 2023

View reviewed changes

mapie/utils.py Outdated Show resolved Hide resolved

mapie/utils.py Show resolved Hide resolved

thibaultcordier requested changes Sep 26, 2023

View reviewed changes

rafael-saraiva-dh added 3 commits September 30, 2023 11:27

changing error string

551bd7c

changing error string and check_array_inf function

a2188b9

linting

34f7968

thibaultcordier reviewed Oct 3, 2023

View reviewed changes

mapie/utils.py Outdated Show resolved Hide resolved

mapie/utils.py Outdated Show resolved Hide resolved

mapie/utils.py Outdated Show resolved Hide resolved

mapie/utils.py Show resolved Hide resolved

JumpingDino and others added 4 commits October 3, 2023 15:55

Update mapie/metrics.py

ebb00f3

fixing typos Co-authored-by: Thibault Cordier <124613154+thibaultcordier@users.noreply.github.com>

Update mapie/utils.py

3aedeaa

fixing docstring from check_array_nan Co-authored-by: Thibault Cordier <124613154+thibaultcordier@users.noreply.github.com>

Update mapie/utils.py

fcea354

fixing docstring from check_array_inf Co-authored-by: Thibault Cordier <124613154+thibaultcordier@users.noreply.github.com>

fixing docstring from check_arrays_length

90f9ad2

fixing docstring from check_arrays_length Co-authored-by: Thibault Cordier <124613154+thibaultcordier@users.noreply.github.com>

LacombeLouis approved these changes Oct 4, 2023

View reviewed changes

thibaultcordier and others added 4 commits October 4, 2023 17:34

Update metrics.py

bf48fd3

reordering cast

d8ad782

reformulating arrays length

73ca185

merge

e0416b5

add name and ypdate HISTORY.rst

3d38ae9

thibaultcordier approved these changes Oct 9, 2023

View reviewed changes

thibaultcordier merged commit 614293e into scikit-learn-contrib:master Oct 9, 2023
6 checks passed

github-actions bot mentioned this pull request Jun 24, 2024

Monthly issue metrics report for opened issues and prs thibaultcordier/MAPIE#11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#330 add checks for metrics #358

#330 add checks for metrics #358

JumpingDino commented Sep 12, 2023 •

edited

Loading

JumpingDino commented Sep 12, 2023

LacombeLouis commented Sep 12, 2023

JumpingDino commented Sep 23, 2023

codecov-commenter commented Sep 24, 2023 •

edited

Loading

vincentblot28 left a comment

LacombeLouis left a comment

thibaultcordier left a comment

JumpingDino commented Sep 30, 2023

thibaultcordier left a comment

JumpingDino commented Oct 6, 2023 •

edited

Loading

thibaultcordier commented Oct 6, 2023

#330 add checks for metrics #358

#330 add checks for metrics #358

Conversation

JumpingDino commented Sep 12, 2023 • edited Loading

Description

Type of change

How Has This Been Tested?

Checklist

JumpingDino commented Sep 12, 2023

LacombeLouis commented Sep 12, 2023

JumpingDino commented Sep 23, 2023

codecov-commenter commented Sep 24, 2023 • edited Loading

Codecov Report

vincentblot28 left a comment

Choose a reason for hiding this comment

LacombeLouis left a comment

Choose a reason for hiding this comment

thibaultcordier left a comment

Choose a reason for hiding this comment

JumpingDino commented Sep 30, 2023

thibaultcordier left a comment

Choose a reason for hiding this comment

JumpingDino commented Oct 6, 2023 • edited Loading

thibaultcordier commented Oct 6, 2023

JumpingDino commented Sep 12, 2023 •

edited

Loading

codecov-commenter commented Sep 24, 2023 •

edited

Loading

JumpingDino commented Oct 6, 2023 •

edited

Loading