array API support for mean_poisson_deviance#29227
array API support for mean_poisson_deviance#29227OmarManzoor merged 14 commits intoscikit-learn:mainfrom
Conversation
|
The upcomping scipy 1.14.0 release will ship a array API support for this function via a generic fallback. Since it is not released yet, we have two option:
|
Thank you @ogrisel. I have skipped the tests for now. I chose to skip the tests because we are not changing anything in the function itself but only skipping tests that we know are not failing, so I think it's the safer option. |
ogrisel
left a comment
There was a problem hiding this comment.
We would also need to set the SCIPY_ARRAY_API environment variable to "1" in our CI to enable scipy support for array API (see https://docs.scipy.org/doc/scipy/dev/api-dev/array_api.html#using-array-api-standard-support).
To do that, you can edit the pylatest_conda_forge_mkl config in azure-pipelines.yml to set this variable, similarly to what we do for other environment variables such as SKLEARN_TESTS_GLOBAL_RANDOM_SEED for instance.
We would also need to do something similar in .github/workflows/cuda-gpu-ci.yml. In this case you can just edit the last step to replace:
pytest -k 'array_api'
by:
SCIPY_ARRAY_API=1 pytest -k 'array_api'
|
Could you also please launch the tests yourself to check that they pass with the nightly build of scipy? You can install the nightly dev builds of scipy (and numpy, just in case) with the following command: I would recommend not doing that in your usual dev env but rather create a dedicated conda environment for those tests. Alternatively you can use a through away google colab env setup with the help of this notebook (to customize to install the scipy dev version): https://gist.github.com/EdAbati/ff3bdc06bafeb92452b3740686cc8d7c Doing this on colab would also make it possible to test with cuda devices. |
|
I have made the suggested changes to |
If we only do the "wait for scipy 1.14.0" then we will introduce a minimum requirement on the scipy version for array API support to work right? I'm not sure if that is a bad thing or not. However if we do we need some way to express that (version check when array API is enabled via In general I think minimising the number of functions implemented in |
That sounds like a good idea.
If scipy has already been imported, it's too late (I tried in a shell). The environment variable should be set before scipy is initialized apparently. |
|
@EmilyXinyi the |
|
BTW, to set the environment variable in the colab notebook, you would need to do: import os
os.environ["SCIPY_ARRAY_API"] = "1"in one of the first cells, before importing EDIT: @EdAbati you might want to update your colab notebook accordingly, in case this PR or other similar PRs that need array API support for scipy functions are merged into |
I think I have resolved the conflicts, but strangely enough the changes that happened on main ended up in my "resolve merge conflict" commit... If there is a way to avoid that please let me know, thank you! |
Should this change be made this PR? (If yes, could anyone help me with it? Thanks!) |
Actually the EDIT: looking at the notebook, it's not installing the scipy dev version: We might need to add a Then check the output of the cell with |
|
@EmilyXinyi I merged the |
I think so. I think the best place to do this is in We would need to raise a warning when |
4e1ed08 to
4618be3
Compare
|
Just a note: please ignore all the recent commits and force pushes. I think I did a merge instead of rebase and a bunch of files I didn't touch changed. |
Using this as thread to see how the CUDA situation goes. (Here is the link again: https://colab.research.google.com/drive/1Dw6ABHw-2wyjlzUQKb4VDJw_51mGZMLn?usp=sharing) NumPy's nightly build (basically version 2.0.0 and above) has dependency issue, but Scipy's dev version can be installed using |
3750613 to
21f658c
Compare
azure-pipelines.yml
Outdated
| DISTRIB: 'conda' | ||
| LOCK_FILE: './build_tools/azure/pylatest_conda_forge_mkl_osx-64_conda.lock' | ||
| SKLEARN_TESTS_GLOBAL_RANDOM_SEED: '5' # non-default seed | ||
| SCIPY_ARRAY_API=1 |
There was a problem hiding this comment.
If #29436 is merged first, a similar change will be needed for pylatest_pip_openblas_pandas.
|
#29436 was merged. Please feel free to merge |
5d16602 to
c32d6e1
Compare
8579f6f to
211a7f1
Compare
|
I closed this by accident while fixing merge conflicts ;-; if a maintainer could reopen this PR it would be great, thank you! |
|
I added one more change and git is allowing me to reopen it... |
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
OmarManzoor
left a comment
There was a problem hiding this comment.
LGTM. Thanks @EmilyXinyi
|
I tried running the tests on my machine and I get the following weird failure with array-api-strict, any suggestion (see details below)? The command I used: I have scipy 1.14, array-api-strict 2.0.1. Not sure what else could explain the discrepancy with the CI ... Details |
|
I got this error too. Maybe this is related to an older version of numpy? I am updating numpy to check. Edit: This doesn't seem to be the case. |
|
All tests pass on my end with |
|
@lesteve Add this environment variable |
|
OK I confirm that Honestly I think this is fair to say that as a newcomer to the array API work, my feeling is that it can break way too easily in very mysterious ways. There are probably a few unrelated reasons for this but I guess since at the last bi-weekly meetings we agreed that we should try to pay attention to this kind of breakages and improve the situation, here are a few comments on this PR:
|
Should we open an issue to discuss these things? |
|
I agree that it is a bit inconvenient if this breaks like this and a more informative error might be better. I am not sure how feasible it is to handle this because scipy is involved and if the env variable is not set I think scipy returns numpy arrays instead of the actual api's arrays which can cause conflicts. |
Yep I opened #29549 with my original comment |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Omar Salman <omar.salman@arbisoft.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Reference Issues/PRs
towards #26024
What does this implement/fix? Explain your changes.
add array API support for mean_poisson_deviance
Any other comments?
there is a dependency on
scipy.special.xlogy(insklearn.metrics._regression._mean_tweedie_deviance, used by this function) which does not have an equivalent array API compatible version. I am unsure what is the best way to approach this: create a wrapper, implement our own function that accomplishes the same task, open an issue in the array-api repo about this function (maybe we can help contributing to this function too), or something else?