-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make more of the "tools" of scikit-learn Array API compatible #26024
Comments
Mark an estimator or function as done if it not only "doesn't raise an exception" but also outputs a sensible value. The latter is something that will require a human at the start, but maybe later we can write a test for it. Below a list of preprocessors and metrics. The lists are pretty long already, so I won't add more stuff until we make progress (or decide that a different area is a better starting point). The next thing to work on is to work out if there is some generic advice around "fixing" these. NOTE: it's possible to test the changes in your pull request on a CUDA GPU host for free with the help of this notebook on Google Colab: https://gist.github.com/EdAbati/ff3bdc06bafeb92452b3740686cc8d7c Transformers from
DetailsCode used to create the list:for name, Trn in discovery.all_estimators(type_filter="transformer"):
if Trn.__module__.startswith("sklearn.preprocessing."):
with config_context(array_api_dispatch=True):
tr = Trn()
try:
tr.fit_transform(X_torch, y_torch)
print(f"* [ ] {name} - no exception with pytorch X")
except:
print(f"* [ ] {name}") Metrics from
Detailsfor name, func in discovery.all_functions():
if func.__module__.startswith("sklearn.metrics."):
with config_context(array_api_dispatch=True):
try:
func(y_torch, y_torch)
print(f"* [ ] {name} - no exception with pytorch y")
except:
print(f"* [ ] {name}") |
It turns out, it is more tricky than you think. For example in |
A more comprehensible (less sentence fragments) version of the below text is in https://github.com/scikit-learn/scikit-learn/pull/25956/files#r1172450244 Some thoughts: should we add a "compat layer" in scikit-learn to add things like cc @thomasjpfan maybe you have thoughts/opinion and/or time to chat about this. |
I am fine with starting with a private helper in scikit-learn and discussing with the maintainers of the array-api-compat project if they think that such extensions to the spec API can make it upstream into |
Moving my comment from https://github.com/scikit-learn/scikit-learn/pull/25956/files#r1172771551 here regarding adding more methods to scikit-learn's compat layer:
|
As discussed in data-apis/array-api#627 (related to a potential |
Hi all, I am working on |
I've just added a PR for the |
Yes, that would make sense. |
I'll try converting |
Hi! Inspired by the lightning talk at the Swiss python summit. I'll work on the |
The following is what I used to test it. (It's basically the same as the first example in the docs for the same. >>> from sklearn.preprocessing import OneHotEncoder
>>> import numpy.array_api as xp
>>> enc = OneHotEncoder(handle_unknown='ignore')
>>> X = xp.asarray([[1, 1], [2, 3], [2, 2]])
>>> enc.fit(X)
OneHotEncoder(handle_unknown='ignore')
>>> enc.categories_
[array([1, 2]), array([1, 2, 3])]
>>> enc.transform(xp.asarray([[2, 1], [1, 4]])).toarray()
array([[0., 1., 1., 0., 0.],
[1., 0., 0., 0., 0.]])
>>> enc.inverse_transform([[1., 0., 1., 0., 0.], [0., 1., 0., 0., 0.]])
array([[1, 1],
[2, None]], dtype=object)
>>> enc.get_feature_names_out(['g1', 'g2'])
array(['g1_1', 'g1_2', 'g2_1', 'g2_2', 'g2_3'], dtype=object) I'll check I don't really have access to a GPU to check this with CuPy, I hope that's not a problem |
Hey @rotuna, I think that we should also test that
I'd start by adding the estimator to the list there and see if something fails. If you are lucky that everything works and there are no numpy specific operations in the implementation, I think you can just make a PR by adding |
I am working on |
I was working on Also when I used |
I would like to try out |
For information, I edited the above comment with the list of estimators / function to focus on to add link to this notebook that can be very helpful to debug failing tests on CUDA GPU for free using Google Colab or similar: |
Hi! I would like to work on |
Working on |
Working on |
Looking at |
@elindgren @lithomas1 @EdAbati @Tialo @EmilyXinyi since you all had the experience of having some array API PRs already merged in |
Working on |
Working on |
@ogrisel Are we supposed to handle the latest version of array-api-strict which is 2.0, because some tests are now failing FAILED sklearn/model_selection/tests/test_search.py::test_array_api_search_cv_classifier[GridSearchCV-array_api_strict-None-None] - ValueError:
FAILED sklearn/model_selection/tests/test_search.py::test_array_api_search_cv_classifier[RandomizedSearchCV-array_api_strict-None-None] - ValueError:
FAILED sklearn/preprocessing/tests/test_label.py::test_label_encoder_array_api_compliance[y0-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/preprocessing/tests/test_label.py::test_label_encoder_array_api_compliance[y1-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/preprocessing/tests/test_label.py::test_label_encoder_array_api_compliance[y2-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/tests/test_common.py::test_estimators[LinearDiscriminantAnalysis()-check_array_api_input(array_namespace=array_api_strict,dtype_name=None,device=None)] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[int16-14-True-True-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[int16-14-True-False-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[int16-14-False-True-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[int16-14-False-False-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[int32-14-True-True-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[int32-14-True-False-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[int32-14-False-True-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[int32-14-False-False-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[int64-14-True-True-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[int64-14-True-False-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[int64-14-False-True-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[int64-14-False-False-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[uint8-14-True-True-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[uint8-14-True-False-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[uint8-14-False-True-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict
FAILED sklearn/utils/tests/test_array_api.py::test_isin[uint8-14-False-False-array_api_strict-None-None] - TypeError: array iteration is not allowed in array-api-strict |
@OmarManzoor Interesting. So far the currently opened PRs run with the version (1.1.1) from the lock files of the CI:
But indeed our lock file bot will attempt to open a PR to bump up the versions of the dependencies on Monday and this will fail with the error your reported so feel free to open a dedicated PR to start fixing those. You can already trigger the update of the lock file for |
Looking at |
🚨 🚧 This issue requires a bit of patience and experience to contribute to 🚧 🚨
Please mention this issue when you create a PR, but please don't write "closes #26024" or "fixes #26024".
scikit-learn contains lots of useful tools, in addition to the many estimators it has. For example metrics, pipelines, pre-processing and mode selection. These are useful to and used by people who do not necessarily use an estimator from scikit-learn. This is great.
The fact that many users install scikit-learn "just" to use
train_test_split
is a testament to how useful it is to provide easy to use tools that do the right(!) thing. Instead of everyone implementing them from scratch because it is "easy" and making mistakes along the way.In this issue I'd like to collect and track work related to making it easier to use all these "tools" from scikit-learn even if you are not using Numpy arrays for your data. In particular thanks to the Array API standard it should be "not too much work" to make things usable with data that is in an array that conforms to the Array API standard.
There is work in #25956 and #22554 which adds the basic infrastructure needed to use "array API arrays". Right now you need to checkout #25956 (this is part of the reason why this is a draft issue).
The goal of this issue is to make code like the following work:
The first step is to compile a list of tools that are in scope for this. The next step (or maybe part of the first) is to check which of them already "just work". After that is done we can start the work (one PR per class/function) making changes. Hopefully by then #25956 is ready or already merged.
The text was updated successfully, but these errors were encountered: