-
Notifications
You must be signed in to change notification settings - Fork 182
Make GapEncoder a single-column transformer #920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make GapEncoder a single-column transformer #920
Conversation
skrub/_dataframe/_common.py
Outdated
| "to_list", | ||
| "to_numpy", | ||
| "to_pandas", | ||
| "reset_index", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes in this file are take from #919 and should be reviewed there
| from datetime import datetime | ||
|
|
||
| import numpy as np | ||
| import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes in this file are take from #919 and should be reviewed there
| class GapEncoderColumn(BaseEstimator, TransformerMixin): | ||
| """GapEncoder for encoding a single column. | ||
| class GapEncoder(SingleColumnTransformer, TransformerMixin): | ||
| """Constructs latent topics with continuous encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docstring has been moved essentially unchanged from the previous GapEncoder with 2 minor changes:
- the docstring still mentioned 'empty_impute', but apparently the parameter has been renamed 'zero_impute'
- the docstring mentioned 'inverse_transform' but it doesn't seem to be defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that this was resolved after what we discuss in the meeting.
|
@jeromedockes could you solve the conflicts |
As discussed in skrub meetings.
Currently the actual functionality of the GapEncoder is implemented by the GapEncoderColumn. The GapEncoder is a wrapper that takes care of fitting a separate GapEncoderColumn to each column and collecting the results in a 2D array.
That part is now handled by the TableVectorizer, so we only need the GapEncoderColumn.
This PR essentially removes the GapEncoder, then renames GapEncoderColumn to GapEncoder