Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Use Array API in mean_tweedie_deviance #28106

Merged
merged 17 commits into from
May 8, 2024

Conversation

lithomas1
Copy link
Contributor

@lithomas1 lithomas1 commented Jan 12, 2024

Reference Issues/PRs

xref #26024

Inspired by #27904

What does this implement/fix? Explain your changes.

Array API support in mean_tweedie_deviance

Any other comments?

Copy link

github-actions bot commented Jan 12, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: d951d51. Link to the linter CI: here

@lithomas1 lithomas1 marked this pull request as ready for review January 12, 2024 04:33
@lithomas1 lithomas1 changed the title ENH: Use Array API in mean_tweedie_deviance ENH Use Array API in mean_tweedie_deviance Jan 12, 2024
@ogrisel
Copy link
Member

ogrisel commented Jan 18, 2024

Thanks very much for the PR. Let's a first wait for a second review on the r2_score PR (#27904) before reviewing yours to avoid cross/redundant discussions between PRs.

@fcharras
Copy link
Contributor

Thanks for the work @lithomas1 , with #27904 now merged we're resuming reviews on the array api side. If you are still available to iterate on the pr could you pull main on your branch please ?

sklearn/metrics/tests/test_common.py Outdated Show resolved Hide resolved
doc/modules/array_api.rst Outdated Show resolved Hide resolved
sklearn/metrics/_regression.py Outdated Show resolved Hide resolved
@lithomas1
Copy link
Contributor Author

@fcharras

This should be ready for another review.

Sorry for the slow turnaround on this!

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Once the above and below comments are addressed, +1 on my end.

doc/whats_new/v1.5.rst Outdated Show resolved Hide resolved
@ogrisel
Copy link
Member

ogrisel commented Apr 10, 2024

BTW, I have run the tests of this PR with torch and cupy on cuda host and torch on a MPS host and everything is green.

@lithomas1
Copy link
Contributor Author

lithomas1 commented Apr 14, 2024

OK, so this PR now fails with errors like

_________ [doctest] sklearn.metrics._regression.mean_poisson_deviance __________
[gw0] linux -- Python 3.11.8 /usr/share/miniconda/envs/testvenv/bin/python
1424     -------
1425     loss : float
1426         A non-negative floating point value (the best value is 0.0).
1427 
1428     Examples
1429     --------
1430     >>> from sklearn.metrics import mean_poisson_deviance
1431     >>> y_true = [2, 0, 1, 4]
1432     >>> y_pred = [0.5, 0.5, 2., 2.]
1433     >>> mean_poisson_deviance(y_true, y_pred)
Expected:
    1.4260...
Got:
    array(1.42601513)

A solution would be to call float on the result from _array_api._average,
(but this would not be ideal for e.g. dask, which would prefer to stay lazy if possible).

An alternative solution would be either to update the doctests (and accept this API change),
or we can patch the numpy version of _array_api._average only here to not return an array (i.e. remove the asarray call).
https://github.com/scikit-learn/scikit-learn/blame/3ee60a720aab3598668af3a3d7eb01d6958859be/sklearn/utils/_array_api.py#L598

EDIT:

I've just added the call to float on the result (to match other usages of _average), given that the dask stuff is on hold for now.

@lithomas1
Copy link
Contributor Author

@ogrisel @betatim

This should be ready for another look now.

@ogrisel
Copy link
Member

ogrisel commented Apr 23, 2024

I've just added the call to float on the result (to match other usages of _average), given that the dask stuff is on hold for now.

Yes and this is consistent with what we do for other 1d regression scores (e.g. r2_score).

@ogrisel
Copy link
Member

ogrisel commented Apr 23, 2024

I am still +1 for merge BTW.

doc/whats_new/v1.5.rst Outdated Show resolved Hide resolved
@ogrisel
Copy link
Member

ogrisel commented May 7, 2024

Maybe @OmarManzoor would be interested in reviewing this one as well :)

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @lithomas1. Otherwise looks good.

sklearn/metrics/_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/_regression.py Outdated Show resolved Hide resolved
@lithomas1
Copy link
Contributor Author

Updated.

Thanks for the reviews.

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @lithomas1

@OmarManzoor OmarManzoor merged commit e12f192 into scikit-learn:main May 8, 2024
30 checks passed
@lithomas1 lithomas1 deleted the tweedie-array-api branch May 8, 2024 21:40
@jeremiedbb jeremiedbb mentioned this pull request May 20, 2024
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants