Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimilarityEncoder cleanup #596

Merged
merged 13 commits into from
Jun 14, 2023

Conversation

LeoGrin
Copy link
Contributor

@LeoGrin LeoGrin commented Jun 13, 2023

fix #573

@LeoGrin LeoGrin changed the base branch from main to 0.2.X June 13, 2023 08:52
@LeoGrin LeoGrin changed the base branch from 0.2.X to main June 13, 2023 08:52
@LeoGrin LeoGrin force-pushed the remove_old_features_sim_enc branch from 9882a1d to 54bcf12 Compare June 13, 2023 09:02
@LilianBoulard
Copy link
Member

Thanks! Do you mind using this PR for some more cleanup of the SimilarityEncoder?
I'm thinking of is the similarity argument, which should be removed (along with the TODO).

@GaelVaroquaux
Copy link
Member

I would modify the "notes" section of the SimilarityEncoder as such

"The functionality of :class:SimilarityEncoder is easy to explain and understand, but it is not scalable. It is useful only to capture links across a few categories (eg eg: “west”, “north”, “north-west”), but not when there are many categories, as with open-ended entries.
Instead, the :class:~skrub.GapEncoder is usually recommended."

Copy link
Member

@GaelVaroquaux GaelVaroquaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look's great to me. I've left a few comments, and in addition I agree that if you can tackle @LilianBoulard 's point its great.

Thanks!!

skrub/_similarity_encoder.py Outdated Show resolved Hide resolved
skrub/_similarity_encoder.py Outdated Show resolved Hide resolved
skrub/_similarity_encoder.py Outdated Show resolved Hide resolved
skrub/tests/test_similarity_encoder.py Outdated Show resolved Hide resolved
LeoGrin and others added 3 commits June 13, 2023 12:21
@LeoGrin
Copy link
Contributor Author

LeoGrin commented Jun 13, 2023

I'm thinking of is the similarity argument, which should be removed (along with the TODO).

I don't see any TODO @LilianBoulard.

Copy link
Member

@GaelVaroquaux GaelVaroquaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two tiny suggestions. +1 to merge on my side once they have been addressed.

@LilianBoulard : I'll let you do a final review and press the merge button if everything is fine on your side.

skrub/_similarity_encoder.py Outdated Show resolved Hide resolved
skrub/_similarity_encoder.py Outdated Show resolved Hide resolved
@LeoGrin LeoGrin changed the title Remove "most_frequent" and "k-means" strategies from SimilarityEncoder SimilarityEncoder cleanup Jun 13, 2023
Co-authored-by: Gael Varoquaux <gael.varoquaux@normalesup.org>
@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jun 13, 2023 via email

CHANGES.rst Outdated Show resolved Hide resolved
@LilianBoulard LilianBoulard merged commit 746db33 into skrub-data:main Jun 14, 2023
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove "most_frequent" and "k-means" strategies from SimilarityEncoder
3 participants