FIX accept input_features parameter in TableVectorizer.get_feature_names_out#1258
FIX accept input_features parameter in TableVectorizer.get_feature_names_out#1258jeromedockes merged 2 commits intoskrub-data:mainfrom
Conversation
GaelVaroquaux
left a comment
There was a problem hiding this comment.
LGTM, but could we have a tiny changelog entry. it's useful for users
Indeed :) |
|
Thanks a lot @glemaitre !! Note we have a bunch of estimators defining
I would say no, in any case a preprocessing check in the tablevectorizer raises an exception if the columns of the input are different in transform than in fit. but you know more about |
+1 Can we use scikit-learn's common tests to detect this? |
We should check if the common test is covering this particular feature but I would hope so. I think that we are still bypassing the common test. I should give it another look in |
closes #1256
This PR makes sure to expose an
input_featuresparameter in theget_feature_names_outto be compatible with scikit-learn.Right now, the parameter is ignored. One question would be: do we want to allow overwriting the column names. In this case, we would need to change more code because we would need to overwrite the name of the columns of the output dataframe in
transformto have a consistency betweenget_feature_names_out()and thefeature_names_in_of the subsequent steps in a scikit-learn pipeline.