Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Add Array API compatibility to mean_absolute_error #27736

Merged
merged 37 commits into from
May 15, 2024

Conversation

EdAbati
Copy link
Contributor

@EdAbati EdAbati commented Nov 6, 2023

Reference Issues/PRs

Towards #26024

What does this implement/fix? Explain your changes.

It makes the mean_absolute_error implementation compatible and tested with the Array API.

Not sure if it's the best approach, but I have converted the np.average implementation so that it is compatible with the Array API. Is there a better way? (I will fix add the tests to make codecov happy, if you agree to have the _average function)

cc @betatim @ogrisel

Copy link

github-actions bot commented Nov 6, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 0fa23d3. Link to the linter CI: here

@EdAbati EdAbati marked this pull request as ready for review November 12, 2023 15:30
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Could you please expand the tests to make sure that we cover the raised exceptions?

More details below.

sklearn/metrics/_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/tests/test_common.py Outdated Show resolved Hide resolved
sklearn/metrics/tests/test_common.py Outdated Show resolved Hide resolved
sklearn/metrics/tests/test_common.py Outdated Show resolved Hide resolved
sklearn/utils/_array_api.py Outdated Show resolved Hide resolved
sklearn/utils/_array_api.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_array_api.py Outdated Show resolved Hide resolved
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! One more suggestion for a more informative message. The test needs to be updated accordingly.

BTW, I think that numpy error message would benefit from a similar treatment upstream.

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved
@EdAbati
Copy link
Contributor Author

EdAbati commented Nov 15, 2023

BTW, I think that numpy error message would benefit from a similar treatment upstream.

I will make an issue on numpy to see if they are intrested in updating the messages :)

Also BTW I added a set of tests for multioutput='raw_values' to check when the output is an array and not a float: 48bd567

@ogrisel
Copy link
Member

ogrisel commented Nov 22, 2023

I launched the Array API tests with pytorch with an MPS device on my local laptop and with pytorch on a machine with a cuda device and all the mean absolute error Array API compliance tests pass.

However I observed some failures with cupy:

FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_absolute_error-check_array_api_regression_metric-cupy.array_api-None-None] - AttributeError: module 'cupy.array_api' has no attribute 'swapaxes'
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_absolute_error-check_array_api_multioutput_regression_metric-cupy-None-None] - TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_absolute_error-check_array_api_multioutput_regression_metric-cupy.array_api-None-None] - ValueError: Unsupported device 'cpu'

Here are the details of the tracebacks:

=============================================================== FAILURES ===============================================================
______________ test_array_api_compliance[mean_absolute_error-check_array_api_regression_metric-cupy.array_api-None-None] _______________

metric = <function mean_absolute_error at 0x7ff1b90f0b80>, array_namespace = 'cupy.array_api', device = None, dtype = None
check_func = <function check_array_api_regression_metric at 0x7ff1714e4280>

    @pytest.mark.parametrize(
        "array_namespace, device, dtype", yield_namespace_device_dtype_combinations()
    )
    @pytest.mark.parametrize("metric, check_func", yield_metric_checker_combinations())
    def test_array_api_compliance(metric, array_namespace, device, dtype, check_func):
>       check_func(metric, array_namespace, device, dtype)

sklearn/metrics/tests/test_common.py:1873: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
sklearn/metrics/tests/test_common.py:1810: in check_array_api_regression_metric
    check_array_api_metric(
sklearn/metrics/tests/test_common.py:1747: in check_array_api_metric
    metric_xp = metric(y_true_xp, y_pred_xp, sample_weight=sample_weight)
sklearn/utils/_param_validation.py:214: in wrapper
    return func(*args, **kwargs)
sklearn/metrics/_regression.py:214: in mean_absolute_error
    output_errors = _average(xp.abs(y_pred - y_true), weights=sample_weight, axis=0)
sklearn/utils/_array_api.py:651: in _average
    weights = xp.swapaxes(weights, -1, axis)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <sklearn.utils._array_api._ArrayAPIWrapper object at 0x7ff1b5d40730>, name = 'swapaxes'

    def __getattr__(self, name):
>       return getattr(self._namespace, name)
E       AttributeError: module 'cupy.array_api' has no attribute 'swapaxes'

sklearn/utils/_array_api.py:203: AttributeError
_____________ test_array_api_compliance[mean_absolute_error-check_array_api_multioutput_regression_metric-cupy-None-None] ______________

metric = <function mean_absolute_error at 0x7ff1b90f0b80>, array_namespace = 'cupy', device = None, dtype = None
check_func = <function check_array_api_multioutput_regression_metric at 0x7ff1714e4310>

    @pytest.mark.parametrize(
        "array_namespace, device, dtype", yield_namespace_device_dtype_combinations()
    )
    @pytest.mark.parametrize("metric, check_func", yield_metric_checker_combinations())
    def test_array_api_compliance(metric, array_namespace, device, dtype, check_func):
>       check_func(metric, array_namespace, device, dtype)

sklearn/metrics/tests/test_common.py:1873: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
sklearn/metrics/tests/test_common.py:1831: in check_array_api_multioutput_regression_metric
    check_array_api_metric(
sklearn/metrics/tests/test_common.py:1754: in check_array_api_metric
    assert_allclose(
sklearn/utils/_testing.py:284: in assert_allclose
    actual, desired = np.asanyarray(actual), np.asanyarray(desired)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.

cupy/_core/core.pyx:1475: TypeError
________ test_array_api_compliance[mean_absolute_error-check_array_api_multioutput_regression_metric-cupy.array_api-None-None] _________

metric = <function mean_absolute_error at 0x7ff1b90f0b80>, array_namespace = 'cupy.array_api', device = None, dtype = None
check_func = <function check_array_api_multioutput_regression_metric at 0x7ff1714e4310>

    @pytest.mark.parametrize(
        "array_namespace, device, dtype", yield_namespace_device_dtype_combinations()
    )
    @pytest.mark.parametrize("metric, check_func", yield_metric_checker_combinations())
    def test_array_api_compliance(metric, array_namespace, device, dtype, check_func):
>       check_func(metric, array_namespace, device, dtype)

sklearn/metrics/tests/test_common.py:1873: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
sklearn/metrics/tests/test_common.py:1831: in check_array_api_multioutput_regression_metric
    check_array_api_metric(
sklearn/metrics/tests/test_common.py:1752: in check_array_api_metric
    metric_xp = xp.asarray(metric_xp, device="cpu")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = Array([0.5, 1. ], dtype=float32)

    def asarray(
        obj: Union[
            Array,
            bool,
            int,
            float,
            NestedSequence[bool | int | float],
            SupportsBufferProtocol,
        ],
        /,
        *,
        dtype: Optional[Dtype] = None,
        device: Optional[Device] = None,
        copy: Optional[bool] = None,
    ) -> Array:
        """
        Array API compatible wrapper for :py:func:`np.asarray <numpy.asarray>`.
    
        See its docstring for more information.
        """
        # _array_object imports in this file are inside the functions to avoid
        # circular imports
        from ._array_object import Array
    
        _check_valid_dtype(dtype)
        if device is not None and not isinstance(device, _Device):
>           raise ValueError(f"Unsupported device {device!r}")
E           ValueError: Unsupported device 'cpu'

/data/parietal/store3/work/ogrisel/mambaforge/envs/dev/lib/python3.10/site-packages/cupy/array_api/_creation_functions.py:59: ValueError
======================================================= short test summary info ========================================================

@ogrisel
Copy link
Member

ogrisel commented Nov 22, 2023

For the swapaxes problem, we should report the problem upstream (if not already existing) and mark it the corresponding test case as xfail in the scikit-learn test suite with a link to the upstream issue. For information, I am running the 12.2.0 version of cupy which is the latest available on conda-forge apparently.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following might fix the other 2 failures with cupy.

You can probably use google colab or kaggle code if you need a machine with a cuda device to launch the tests with cupy.

sklearn/metrics/tests/test_common.py Outdated Show resolved Hide resolved
@EdAbati
Copy link
Contributor Author

EdAbati commented Nov 27, 2023

Thank you very much for checking cupy :)

I've added the suggestions, but I still have to test if xfail works as expected. (probably I'll do it tomorrow evening)

@EdAbati
Copy link
Contributor Author

EdAbati commented Mar 13, 2024

Hi @ogrisel, @glemaitre and team, thanks for merging the r2 score PR, it makes my life easier here 😄

I updated this PR with the latest changes and taking inspiration from r2_score. Please let me know if I missed something

(I tested with torch with cpu and mps, but haven't yet tested with cupy)

@fcharras
Copy link
Contributor

Looks like the tests in the red pipeline actually passed, but the publishing step failed, not related to the PR.

@EdAbati
Copy link
Contributor Author

EdAbati commented Mar 18, 2024

I tested with cupy and it is all green with array_api_compat<1.5

FYI it seems there is an issue with array_api_compat==1.5 (but now fixed in main)

xp.asarraydoesn't convert the array when xp == 'array_api_compat.cupy' and the array is a numpy.ndarray:

>>> import numpy as np
>>> import array_api_compat.cupy as xp
>>> a = np.asarray([1,2,3])
>>> l = [1,2,3]
>>> type(xp.asarray(a))
<class 'numpy.ndarray'>
>>> type(xp.asarray(l))
<class 'cupy.ndarray'>

Testing with array_api_compat==1.5:

FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_absolute_error-check_array_api_multioutput_regression_metric-cupy-None-None] - AttributeError: 'numpy.ndarray' object has no attribute 'get'

I see the same error when running the tests on the new _average function.

@ogrisel
Copy link
Member

ogrisel commented Mar 25, 2024

FYI it seems there is an issue with array_api_compat==1.5 (but now fixed in main)

Since we don't have a cupy CI yet, we can just assume an array-api-compat version with the fix will be out before it bothers our CI :)

But if that's the case, maybe we can spend some effort to skip cupy tests when array_api_compat's version is one of the known versions with the bug.

doc/whats_new/v1.5.rst Outdated Show resolved Hide resolved
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise. Thanks for the PR.

@EdAbati
Copy link
Contributor Author

EdAbati commented Apr 29, 2024

Hey @glemaitre sorry for the ping, I noticed you are the other reviewer :) I was wondering if there is anything I should change in this PR

doc/whats_new/v1.5.rst Outdated Show resolved Hide resolved
@ogrisel
Copy link
Member

ogrisel commented May 7, 2024

Maybe @OmarManzoor would be interested in reviewing this one as well :)

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @EdAbati. Could you kindly resolve the conflict?

sklearn/metrics/tests/test_common.py Outdated Show resolved Hide resolved
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments regarding the change log, otherwise looks good.

doc/whats_new/v1.6.rst Show resolved Hide resolved
doc/whats_new/v1.5.rst Outdated Show resolved Hide resolved
doc/whats_new/v1.5.rst Outdated Show resolved Hide resolved
doc/whats_new/v1.5.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @EdAbati!

@EdAbati
Copy link
Contributor Author

EdAbati commented May 15, 2024

Thank you both 🙂

@OmarManzoor OmarManzoor merged commit 9f44f1f into scikit-learn:main May 15, 2024
30 checks passed
@EdAbati EdAbati deleted the mae-array-api branch May 15, 2024 10:46
@jeremiedbb jeremiedbb mentioned this pull request May 20, 2024
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants