Add tests for train_test_split with Array API input #26855

betatim · 2023-07-18T16:12:10Z

Reference Issues/PRs

(need to find one)

What does this implement/fix? Explain your changes.

This mostly adds some tests that use train_test_split with Array API input and compare to using a pure Numpy array as input.

Any other comments?

First attempt of seeing what happens when you feed cupy/pytorch/array api arrays to train_test_split. Need to explore more of the different parameters to see if they all "just work".

add a changelog entry

github-actions · 2023-07-18T16:13:59Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 998eb93. Link to the linter CI: here}

ogrisel

Great that it works out of the box.

sklearn/model_selection/tests/test_split.py

betatim · 2023-07-19T16:14:18Z

sklearn/utils/_array_api.py

+    def __eq__(self, other):
+        return self._namespace == other._namespace


Do we want this? It is convenient in the test to be able to compare (wrapped) namespaces for equivalence.

I think i prefer explicit namespace assertions 8n tests. We could have a helper to assert same namespace in tests.

What do you mean with explicit? Getting a string representation that we can compare to the array_namespace passed in to the test?

In the test itself I use get_namespace(input)[0] == get_namespace(output)[0] to check that input and output are in the same namespace. This works when the namespace is one from the array compat library, but not for the few namespaces that we wrap in this wrapper.

I am okay with overriding __eq__ like this.

sklearn/preprocessing/tests/test_function_transformer.py

betatim · 2023-07-20T06:11:04Z

@thomasjpfan and @ogrisel - if you want to look at a PR that mostly adds new tests, this is one :D

sklearn/model_selection/tests/test_split.py

ogrisel · 2023-07-20T06:26:01Z

sklearn/utils/_array_api.py

+    def __eq__(self, other):
+        return self._namespace == other._namespace


I think i prefer explicit namespace assertions 8n tests. We could have a helper to assert same namespace in tests.

thomasjpfan

Thanks for the PR!

sklearn/model_selection/tests/test_split.py

thomasjpfan · 2023-07-31T19:09:40Z

sklearn/utils/_array_api.py

+    def __eq__(self, other):
+        return self._namespace == other._namespace


I am okay with overriding __eq__ like this.

sklearn/preprocessing/tests/test_function_transformer.py

betatim · 2023-08-04T08:51:11Z

Should we list this kind of thing (functions, not estimators) in the "estimators with support" section of doc/modules/array_api.rst? New section?

thomasjpfan · 2023-08-07T18:48:00Z

Should we list this kind of thing (functions, not estimators) in the "estimators with support" section of doc/modules/array_api.rst? New section?

I like a new section. I think it's good to keep track of all the Array API supported estimators & functions in array_api.rst.

betatim · 2023-08-08T07:23:36Z

What do you think of the current patch? I added subsections, one called Estimators and one called Tools.

thomasjpfan · 2023-08-09T17:46:19Z

What do you think of the current patch? I added subsections, one called Estimators and one called Tools.

That is okay with me.

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

github-actions bot added module:model_selection module:utils labels Jul 18, 2023

ogrisel reviewed Jul 18, 2023

View reviewed changes

sklearn/model_selection/tests/test_split.py Show resolved Hide resolved

ogrisel added the Array API label Jul 18, 2023

betatim commented Jul 19, 2023

View reviewed changes

sklearn/model_selection/tests/test_split.py Outdated Show resolved Hide resolved

betatim marked this pull request as ready for review July 19, 2023 16:13

betatim commented Jul 19, 2023

View reviewed changes

sklearn/preprocessing/tests/test_function_transformer.py Show resolved Hide resolved

ogrisel approved these changes Jul 20, 2023

View reviewed changes

thomasjpfan reviewed Jul 31, 2023

View reviewed changes

betatim added 5 commits August 3, 2023 13:52

Add tests for train_test_split with Array API input

4fc3d53

Check dtype, device and array namespace of returned values

1818e6d

Remove use of _safe_indexing in test

b4a9fba

Remove debug, use suffixes for variables

06e29dc

Add what's new entry

8a7814d

betatim force-pushed the array_api_train_test_split branch from fcb0edf to 8a7814d Compare August 3, 2023 11:53

List train_test_split as supporting Array API input

1127854

Merge remote-tracking branch 'upstream/main' into pr/26855

998eb93

thomasjpfan enabled auto-merge (squash) August 9, 2023 17:36

thomasjpfan approved these changes Aug 9, 2023

View reviewed changes

thomasjpfan merged commit 1b0a51b into scikit-learn:main Aug 9, 2023
25 checks passed

betatim deleted the array_api_train_test_split branch August 10, 2023 13:04

TamaraAtanasoska pushed a commit to TamaraAtanasoska/scikit-learn that referenced this pull request Aug 21, 2023

Add tests for train_test_split with Array API input (scikit-learn#26855)

c66339b

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023

Add tests for train_test_split with Array API input (scikit-learn#26855)

a6d824b

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

betatim mentioned this pull request Feb 12, 2024

FIX Fix array API train_test_split #28407

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests for train_test_split with Array API input #26855

Add tests for train_test_split with Array API input #26855

betatim commented Jul 18, 2023 •

edited

github-actions bot commented Jul 18, 2023 •

edited

ogrisel left a comment

betatim Jul 19, 2023

ogrisel Jul 20, 2023

betatim Jul 20, 2023

thomasjpfan Jul 31, 2023

betatim commented Jul 20, 2023

ogrisel Jul 20, 2023

thomasjpfan left a comment

thomasjpfan Jul 31, 2023

betatim commented Aug 4, 2023

thomasjpfan commented Aug 7, 2023

betatim commented Aug 8, 2023

thomasjpfan commented Aug 9, 2023

		def __eq__(self, other):
		return self._namespace == other._namespace

Add tests for train_test_split with Array API input #26855

Add tests for train_test_split with Array API input #26855

Conversation

betatim commented Jul 18, 2023 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Jul 18, 2023 • edited

✔️ Linting Passed

ogrisel left a comment

Choose a reason for hiding this comment

betatim Jul 19, 2023

Choose a reason for hiding this comment

ogrisel Jul 20, 2023

Choose a reason for hiding this comment

betatim Jul 20, 2023

Choose a reason for hiding this comment

thomasjpfan Jul 31, 2023

Choose a reason for hiding this comment

betatim commented Jul 20, 2023

ogrisel Jul 20, 2023

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

thomasjpfan Jul 31, 2023

Choose a reason for hiding this comment

betatim commented Aug 4, 2023

thomasjpfan commented Aug 7, 2023

betatim commented Aug 8, 2023

thomasjpfan commented Aug 9, 2023

betatim commented Jul 18, 2023 •

edited

github-actions bot commented Jul 18, 2023 •

edited