Skip to content

Improve SuperVectorizer#201

Merged
alexis-cvetkov merged 27 commits into
skrub-data:masterfrom
LilianBoulard:fix_sv_na_values
Oct 8, 2021
Merged

Improve SuperVectorizer#201
alexis-cvetkov merged 27 commits into
skrub-data:masterfrom
LilianBoulard:fix_sv_na_values

Conversation

@LilianBoulard

@LilianBoulard LilianBoulard commented Jul 23, 2021

Copy link
Copy Markdown
Member

Implements various improvements in the SuperVectorizer, such as :

@LilianBoulard LilianBoulard marked this pull request as draft July 23, 2021 14:06
@LilianBoulard LilianBoulard marked this pull request as ready for review July 29, 2021 11:36
Comment thread dirty_cat/super_vectorizer.py Outdated
Comment thread dirty_cat/super_vectorizer.py Outdated
@LilianBoulard LilianBoulard changed the title Don't handle missing values in the SuperVectorizer Improve SuperVectorizer Jul 30, 2021
@LilianBoulard

Copy link
Copy Markdown
Member Author

Updated original comment to describe the changes of d5f6193
I still have a bit of work to do, especially improving the tests and documenting the new properties.

@LilianBoulard LilianBoulard added the enhancement New feature or request label Aug 2, 2021

@GaelVaroquaux GaelVaroquaux left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review. I'll finish this ASAP

Comment thread dirty_cat/super_vectorizer.py Outdated
Comment thread dirty_cat/super_vectorizer.py
Comment thread dirty_cat/super_vectorizer.py Outdated
Comment thread dirty_cat/super_vectorizer.py
Comment thread dirty_cat/super_vectorizer.py Outdated
Comment thread dirty_cat/super_vectorizer.py Outdated
Comment thread dirty_cat/super_vectorizer.py Outdated
Comment thread dirty_cat/super_vectorizer.py Outdated
Comment thread dirty_cat/super_vectorizer.py
Comment thread dirty_cat/super_vectorizer.py Outdated
Comment thread dirty_cat/super_vectorizer.py Outdated
Comment thread dirty_cat/super_vectorizer.py Outdated
Comment thread dirty_cat/test/test_super_vectorizer.py
Comment thread dirty_cat/test/test_super_vectorizer.py
Comment thread dirty_cat/test/test_super_vectorizer.py

@GaelVaroquaux GaelVaroquaux left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I finished going through all the code, and I put in all my comments. This PR will be good to go.

@GaelVaroquaux

GaelVaroquaux commented Sep 2, 2021 via email

Copy link
Copy Markdown
Member

@alexis-cvetkov

Copy link
Copy Markdown
Contributor

I finished reviewing the PR:

  • we need to update CHANGES.rst
  • maybe add a test to ensure that fit_transform(X) is equivalent to fit(X).transform(X)
  • is there a reason to define a separate encoder for string column and categorical columns in the parameters of the SuperVectorizer ?

@GaelVaroquaux GaelVaroquaux left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@GaelVaroquaux

Copy link
Copy Markdown
Member

The questions of Alexis are very relevant. Once they are addressed, I am +1 for merge.

@LilianBoulard

LilianBoulard commented Oct 8, 2021

Copy link
Copy Markdown
Member Author

Alexis' questions are addressed by the last commits, thanks for the feedback !

@GaelVaroquaux GaelVaroquaux added this to the 0.2.0 release milestone Oct 8, 2021
@alexis-cvetkov alexis-cvetkov merged commit 932fd23 into skrub-data:master Oct 8, 2021
@alexis-cvetkov

Copy link
Copy Markdown
Contributor

Merging !

@LilianBoulard LilianBoulard deleted the fix_sv_na_values branch October 12, 2021 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing values replaced by the SuperVectorizer

3 participants