MAINT activate common test sklearn #647

glemaitre · 2023-07-18T12:38:42Z

Run scikit-learn common test.

GaelVaroquaux · 2023-07-20T10:07:59Z

There is a failing test. It seems that the TableVectorizer no longer accepts 1D inputs.

This is a good thing, but the changelog must be updated to reflect this, and the test updated.

GaelVaroquaux

One minor comment and then merge

GaelVaroquaux · 2023-07-20T10:17:40Z

skrub/tests/test_sklearn.py

+                for j in range(X.shape[1]):
+                    X[i, j] = str(X[i, j])
+        elif _safe_tags(estimator, key="allow_nan"):
+            X = X.astype(np.float64)


Why float64 and not 32?

The scikit-learn conversion will be np.float64 so it will be better for all common tests.

GaelVaroquaux · 2023-07-20T10:18:07Z

skrub/tests/test_sklearn.py


-def test_sklearn_compatible_GapEncoder():
-    check_estimator(GapEncoder())
+    if estimator.__class__.__name__ == "SkewedChi2Sampler":


Aaaaaaaah 🤣 🤣 🤣 🤣

GaelVaroquaux · 2023-07-20T10:20:50Z

skrub/tests/test_sklearn.py


-
-def test_sklearn_compatible_SimilarityEncoder():
-    check_estimator(SimilarityEncoder())


Please add a todo here

GaelVaroquaux · 2023-07-20T12:40:47Z

You have another test failing. The feature names have changed (I'm a bit surprised, I don't see where the change comes from, but it does not matter terribly).

I don't mind the change of name. We should however update the test and maybe mention it in the changelog.

glemaitre · 2023-07-20T13:08:03Z

The reason is that we now first check_input(X) before running the _check_feature_names to pass the common test. Before, I assume that we create the feature_names potentially on a data frame (before the validation of X) while now this is on a numpy array.

glemaitre · 2023-07-20T13:13:01Z

However, this is a regression because this is now different from what the scikit-learn encoder are doing.

GaelVaroquaux · 2023-07-20T13:24:33Z

However, this is a regression because this is now different from what the scikit-learn encoder are doing.

Right, that might be a problem

skrub/tests/test_sklearn.py

GaelVaroquaux · 2023-07-20T15:40:25Z

LGTM. Updating the branch with main and then merging.

glemaitre added 10 commits July 18, 2023 14:38

MAINT activate common test sklearn

40004e0

iter

2299f09

Merge remote-tracking branch 'origin/main' into common_test

b524edd

TST make GapEncoder compatible with scikit-learn

57904da

iter

4d6602c

SimilarityEncoder compat

2f0cb58

DatetimeEncoder support

2efa4ad

iter

7c379e5

iter

b45f48c

iter

eae158a

GaelVaroquaux added no changelog needed and removed no changelog needed labels Jul 20, 2023

GaelVaroquaux reviewed Jul 20, 2023

View reviewed changes

glemaitre added 2 commits July 20, 2023 12:23

fix ci

0f778b0

iter

a3c2255

glemaitre added 2 commits July 20, 2023 15:50

iter

37c75e8

iter

92087c9

GaelVaroquaux reviewed Jul 20, 2023

View reviewed changes

skrub/tests/test_sklearn.py Outdated Show resolved Hide resolved

Update skrub/tests/test_sklearn.py

4e3ce66

LilianBoulard mentioned this pull request Jul 20, 2023

Use composition in the TableVectorizer #660

Closed

Merge branch 'main' into common_test

2665754

GaelVaroquaux approved these changes Jul 20, 2023

View reviewed changes

GaelVaroquaux enabled auto-merge (squash) July 20, 2023 15:41

GaelVaroquaux merged commit 363b34b into skrub-data:main Jul 20, 2023
22 of 23 checks passed

Vincent-Maladiere mentioned this pull request Aug 29, 2023

MAINT Fix feature_name warning during transform for MinHashEncoder #725

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT activate common test sklearn #647

MAINT activate common test sklearn #647

glemaitre commented Jul 18, 2023

GaelVaroquaux commented Jul 20, 2023

GaelVaroquaux left a comment

GaelVaroquaux Jul 20, 2023

glemaitre Jul 20, 2023 •

edited

GaelVaroquaux Jul 20, 2023

GaelVaroquaux Jul 20, 2023

GaelVaroquaux Jul 20, 2023

GaelVaroquaux commented Jul 20, 2023

glemaitre commented Jul 20, 2023

glemaitre commented Jul 20, 2023

GaelVaroquaux commented Jul 20, 2023

GaelVaroquaux commented Jul 20, 2023



		def test_sklearn_compatible_SimilarityEncoder():
		check_estimator(SimilarityEncoder())

MAINT activate common test sklearn #647

MAINT activate common test sklearn #647

Conversation

glemaitre commented Jul 18, 2023

GaelVaroquaux commented Jul 20, 2023

GaelVaroquaux left a comment

Choose a reason for hiding this comment

GaelVaroquaux Jul 20, 2023

Choose a reason for hiding this comment

glemaitre Jul 20, 2023 • edited

Choose a reason for hiding this comment

GaelVaroquaux Jul 20, 2023

Choose a reason for hiding this comment

GaelVaroquaux Jul 20, 2023

Choose a reason for hiding this comment

GaelVaroquaux Jul 20, 2023

Choose a reason for hiding this comment

GaelVaroquaux commented Jul 20, 2023

glemaitre commented Jul 20, 2023

glemaitre commented Jul 20, 2023

GaelVaroquaux commented Jul 20, 2023

GaelVaroquaux commented Jul 20, 2023

glemaitre Jul 20, 2023 •

edited