array API support for mean_absolute_percentage_error#29300
Conversation
|
CUDA: https://colab.research.google.com/drive/1SKzB8XaT2j_j4j7S-w4W6Du3DCo1I2B2?usp=sharing |
ogrisel
left a comment
There was a problem hiding this comment.
LGTM once the merge conflicts are resolved.
a65b388 to
b5bc817
Compare
There was a problem hiding this comment.
I left a comment regarding mps.
I also noticed that other tests seems to fail also in main:
FAILED sklearn/metrics/cluster/tests/test_supervised.py::test_entropy_array_api[torch-mps-float32] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[accuracy_score-check_array_api_multilabel_classification_metric-torch-mps-float32] - ValueError: unrecognized csr_matrix constructor usage
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[zero_one_loss-check_array_api_multilabel_classification_metric-torch-mps-float32] - ValueError: unrecognized csr_matrix constructor usage
is anyone having a look at these? (or I can try to help)
| epsilon = xp.asarray(xp.finfo(xp.float64).eps, dtype=xp.float64) | ||
| y_true_abs = xp.asarray(xp.abs(y_true), dtype=xp.float64) | ||
| mape = xp.asarray(xp.abs(y_pred - y_true), dtype=xp.float64) / xp.where( | ||
| epsilon < y_true_abs, y_true_abs, epsilon | ||
| ) |
There was a problem hiding this comment.
float64 does not work with Torch on mps unfortunately. One way to get the max supported float precision could be xp.asarray(0.0).dtype as this function does . Can you think of a better way?
I might have missed something, why do we need cast to float64 each time we run abs?
There was a problem hiding this comment.
Good catch. The original code did not upcast to float64 so we should not either.
There was a problem hiding this comment.
I might have missed something, why do we need cast to float64 each time we run
abs?
btw the casting is for xp.asarray, not abs
@EdAbati feel free to have a look and open a PR to fix those if you have time. |
|
@EmilyXinyi I merged |
| epsilon = xp.asarray(xp.finfo(xp.float64).eps, dtype=xp.asarray(0.0).dtype) | ||
| y_true_abs = xp.asarray(xp.abs(y_true), dtype=xp.asarray(0.0).dtype) | ||
| mape = xp.asarray(xp.abs(y_pred - y_true), dtype=xp.asarray(0.0).dtype) / xp.where( |
There was a problem hiding this comment.
I would rather use the floating point dtype used in the regressor's predictions than a device dependent dtype.
| epsilon = xp.asarray(xp.finfo(xp.float64).eps, dtype=xp.asarray(0.0).dtype) | |
| y_true_abs = xp.asarray(xp.abs(y_true), dtype=xp.asarray(0.0).dtype) | |
| mape = xp.asarray(xp.abs(y_pred - y_true), dtype=xp.asarray(0.0).dtype) / xp.where( | |
| epsilon = xp.asarray(xp.finfo(xp.float64).eps, dtype=y_pred.dtype) | |
| y_true_abs = xp.asarray(xp.abs(y_true), dtype=y_pred.dtype) | |
| mape = xp.asarray(xp.abs(y_pred - y_true), dtype=y_pred.dtype) / xp.where( |
There was a problem hiding this comment.
Using y_pred.dtype can't get past the test cases and I think this is because y_pred.dtype could be array_api_strict.int64, but only floating-point types are allowed in __truediv__ (which means division, I believe). As well, MAPE behaves likes a symmetric function instead of asymmetric if we use y_pred.dtype, which I suspect is due to the reduced accuracy in cases where dtype is int64.
I am not sure if there is a better way to approach this... Any suggestions would be greatly appreciated, thank you!! :)
There was a problem hiding this comment.
We have a function defined _find_matching_floating_dtype in the array api utils. Maybe you could use that to get the float dtype once and then use it in all three places?
There was a problem hiding this comment.
Thank you @OmarManzoor I have made the changes accordingly
b9c8c7c to
ddebb21
Compare
OmarManzoor
left a comment
There was a problem hiding this comment.
Otherwise LGTM. Thanks @EmilyXinyi
|
FYI I tested the CUDA CI label after #29456 was merged and it looks like this is working fine:
|
|
@lesteve The CUDA CI just failed. Isn't array-api-strict updated to the latest version? |
|
I think you can ignore the GPU failures for now, I was mostly testing to make sure setting the label did trigger the GPU CI. The versions are all in the lock-file and indeed currently array-api-strict still 1.1.1 FYI trying to update the lock-file to the latest version, there seems to be some issue but I guess someone needs to look into it at one point: #29373 (or wait until Monday morning European time so that the lock-file PR is updated and cross our fingers). |
So should we update and use maximum or should we keep the current code? In my opinion I think we can work considering the latest version and fix the CI whenever possible. What do you think? |
Yes you should ignore the GPU CI failures in this PR definitely, and someone should look at the CI issues in #29373 at one point. |
|
So I am not going to pretend I understand what is going on and there is no huge rush on fixing this at all, but it seems like this PR broke the doc-min-dependencies build (i.e. we run the doc build with our minimal dependencies to check that all the example run) ... See Build log Here is the error, it does look related to |
Yes I saw this too. How come this did not error out in the PR? Seems to be a case in this example: |
The doc build in PR only runs examples that have changes in the PR, if you want a full doc build you can push a commit with To reproduce, I guess the best bet (if you are on Linux) is to use the doc-min-dependencies lock-file, see the quick doc I recently added. |
I am on a Mac with M2 chip. |
OK, you can still to use the associated environment file (see doc mentioned above) and see whether you can reproduce. A quick guess would be that we use old numpy/pandas/something else version and that the logic needs to be adapted ... |
|
I am also trying to reproduce the problem. I am on Mac with Intel i5 cores |
I was not able to configure the environment on my mac using the instructions that @lesteve mentioned but maybe you can create a conda environment using the lock file or the environment file. |
Reference Issues/PRs
Towards #26024
What does this implement/fix? Explain your changes.
Adds array API support for mean_absolute_percentage_error
Any other comments?
Keep this as draft until I add PR number and CUDA is green
Failing CI: I ran the command that triggers the failing test cases locally (
pytest --durations=20 --junitxml=test-data.xml --pyargs sklearn) but they all pass. I am not sure what contributes to the difference in behaviour between our pipeline and my local tests...