Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handle_missing parameters and standardize input data shape for MinHashEncoder #210

Merged
merged 9 commits into from
Oct 8, 2021

Conversation

alexis-cvetkov
Copy link
Contributor

@alexis-cvetkov alexis-cvetkov commented Sep 28, 2021

I have made small modifications to the MinHashEncoder and GapEncoder:

  • In MinHashEncoder, the default parameterhandle_missing="" becomes zero_impute, since we actually do not impute missing values with an empty string. Instead we assign them a encoding vector filled with zeros.
  • Since other encoders take as input data with shape (N_samples, 1) (to be consistent with scikit-learn), I updated the MinHashEncoder to behave in the same way.
  • In GapEncoder, the default parameterhandle_missing="zero_impute" becomes empty_impute, since we impute NaN with an empty string "".

@GaelVaroquaux
Copy link
Member

Can you fix the conflicts and update the changelog please.

@GaelVaroquaux GaelVaroquaux added this to the 0.2.0 release milestone Oct 8, 2021
@alexis-cvetkov alexis-cvetkov changed the title [MinHashEncoder] Fix handle_missing parameter and standardize input data shape [MinHashEncoder & GapEncoder] Fix handle_missing parameter and standardize input data shape Oct 8, 2021
@alexis-cvetkov alexis-cvetkov changed the title [MinHashEncoder & GapEncoder] Fix handle_missing parameter and standardize input data shape Fix handle_missing parameters and standardize input data shape for MinHashEncoder Oct 8, 2021
Copy link
Member

@LilianBoulard LilianBoulard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the contributions !

dirty_cat/gap_encoder.py Outdated Show resolved Hide resolved
dirty_cat/test/test_minhash_encoder.py Outdated Show resolved Hide resolved
@alexis-cvetkov
Copy link
Contributor Author

Merging !

@alexis-cvetkov alexis-cvetkov merged commit c206a85 into skrub-data:master Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants