Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT activate common test sklearn #647

Merged
merged 16 commits into from
Jul 20, 2023

Conversation

glemaitre
Copy link
Member

Run scikit-learn common test.

@GaelVaroquaux
Copy link
Member

There is a failing test. It seems that the TableVectorizer no longer accepts 1D inputs.

This is a good thing, but the changelog must be updated to reflect this, and the test updated.

Copy link
Member

@GaelVaroquaux GaelVaroquaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment and then merge

for j in range(X.shape[1]):
X[i, j] = str(X[i, j])
elif _safe_tags(estimator, key="allow_nan"):
X = X.astype(np.float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why float64 and not 32?

Copy link
Member Author

@glemaitre glemaitre Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scikit-learn conversion will be np.float64 so it will be better for all common tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


def test_sklearn_compatible_GapEncoder():
check_estimator(GapEncoder())
if estimator.__class__.__name__ == "SkewedChi2Sampler":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaaaaaaah 🤣 🤣 🤣 🤣



def test_sklearn_compatible_SimilarityEncoder():
check_estimator(SimilarityEncoder())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a todo here

@GaelVaroquaux
Copy link
Member

You have another test failing. The feature names have changed (I'm a bit surprised, I don't see where the change comes from, but it does not matter terribly).

I don't mind the change of name. We should however update the test and maybe mention it in the changelog.

@glemaitre
Copy link
Member Author

The reason is that we now first check_input(X) before running the _check_feature_names to pass the common test. Before, I assume that we create the feature_names potentially on a data frame (before the validation of X) while now this is on a numpy array.

@glemaitre
Copy link
Member Author

However, this is a regression because this is now different from what the scikit-learn encoder are doing.

@GaelVaroquaux
Copy link
Member

However, this is a regression because this is now different from what the scikit-learn encoder are doing.

Right, that might be a problem

@GaelVaroquaux
Copy link
Member

LGTM. Updating the branch with main and then merging.

@GaelVaroquaux GaelVaroquaux enabled auto-merge (squash) July 20, 2023 15:41
@GaelVaroquaux GaelVaroquaux merged commit 363b34b into skrub-data:main Jul 20, 2023
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants