-
Notifications
You must be signed in to change notification settings - Fork 194
FEA Add tabular_learner factory function
#926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Name suggestion: "make_skrub_learner" ? We could have "make_skrub_pipeline", but I like to put the emphasis on the fact that this gives a supervised learner |
|
I don't have a strong opinion on the name. I think the advantage of having "pipeline" or something similar in it is it makes a clearer distinction between the learner provided by the user and the learner returned to them |
3f862c1 to
5643fae
Compare
|
let's vote to pick a name in #938 , don't hesitate to add options! |
TheooJ
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! We just need a name :)
Co-authored-by: Théo Jolivet <57430673+TheooJ@users.noreply.github.com>
|
@GaelVaroquaux do we merge table_learner and make another PR if the name changes ?
I'd say: go for it. Use "tabular_learner" for now, as it's the one that gets the most positive vibes.
|
GaelVaroquaux
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. A few suggestions.
Can you also add it in see also section in the docstring of the TableVectorizer, as well as on the encoding narrative doc (first section).
Thanks!!
skrub/_tabular_learner.py
Outdated
| do not accept heterogeneous dataframes containing complex data such as | ||
| datetimes or strings. Moreover, they do not always accept the input to | ||
| contain missing values. Therefore, some preprocessing must be applied to | ||
| dataframes before they are passed to an estimator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description on top of the docstring is too long. It should be moved to a "Notes" section, leaving only a quick read before the parameters section.
Co-authored-by: Gael Varoquaux <gael.varoquaux@normalesup.org>
|
@GaelVaroquaux IIRC some experimentation you're working on reveals that adding a missing value indicator when imputing usually helps. Should we set |
Oh yes ;) |
tabular_learner factory function
|
Should we set add_indicator=True on the SimpleImputer here?
Yes, IMHO
|
Co-authored-by: Théo Jolivet <57430673+TheooJ@users.noreply.github.com>
|
Thanks @jeromedockes ! |
|
Oups, I was not fast enough for giving my review. I'll open a PR with little changes in the doc :) |
|
great feature! thx :) |
closes #924
also maybe enough to close #871 and #866?
TODO: