You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice if the ColumnTransformer would get_feature_names from even transformers that don't implement get_feature_names, and used the full API of get_feature_names in transformers where it has been implemented.
Describe your proposed solution
Current:
defget_feature_names(self):
"""Get feature names from all transformers. Returns ------- feature_names : list of strings Names of the features produced by transform. """check_is_fitted(self)
feature_names= []
forname, trans, column, _inself._iter(fitted=True):
iftrans=='drop'or (
hasattr(column, '__len__') andnotlen(column)):
continueiftrans=='passthrough':
ifhasattr(self, '_df_columns'):
if ((notisinstance(column, slice))
andall(isinstance(col, str) forcolincolumn)):
feature_names.extend(column)
else:
feature_names.extend(self._df_columns[column])
else:
indices=np.arange(self._n_features)
feature_names.extend(['x%d'%iforiinindices[column]])
continueifnothasattr(trans, 'get_feature_names'):
raiseAttributeError("Transformer %s (type %s) does not ""provide get_feature_names."% (str(name), type(trans).__name__))
feature_names.extend([name+"__"+fforfintrans.get_feature_names()])
returnfeature_names
If a transformer does not implement get_feature_names, it simply raises an error.
If a transformer DOES implement get_feature_names, the ColumnTransformer ignores part of that API (ignoring fitted column names, using instead an integer column index).
Proposed Solution:
defget_feature_names(self):
fromsklearn.utils.validationimportcheck_is_fittedfromnumpyimportarange"""Get feature names from all transformers. Returns ------- feature_names : list of strings Names of the features produced by transform. """check_is_fitted(self)
feature_names= []
forname, trans, column, _inself._iter(fitted=True):
iftrans=='drop'or (
hasattr(column, '__len__') andnotlen(column)):
continueiftrans=='passthrough':
ifhasattr(self, '_df_columns'):
if ((notisinstance(column, slice))
andall(isinstance(col, str) forcolincolumn)):
feature_names.extend(column)
else:
feature_names.extend(self._df_columns[column])
else:
indices=arange(self._n_features)
feature_names.extend(['x%d'%iforiinindices[column]])
continueifnothasattr(trans, 'get_feature_names'):
# ADDED SECTION Aifhasattr(self, '_df_columns'):
if ((notisinstance(column, slice))
andall(isinstance(col, str) forcolincolumn)):
feature_names.extend(f'{name}_{col}'forcolincolumn)
else:
feature_names.extend(
f'{name}_{col}'forcolinself._df_columns[column]
)
else:
indices=arange(self._n_features)
feature_names.extend(['x%d'%iforiinindices[column]])
continue# END SECTION A# ADDED SECTION Bgfn_args=inspect.getfullargspec(trans.get_feature_names).argsargs_to_send= []
if ('input_features'ingfn_args) and \
notisinstance(column, slice):
args_to_send= [column]
feature_names.extend([name+"__"+fforfintrans.get_feature_names(*args_to_send)])
# END SECTION Breturnfeature_names
Section A adds:
<<transformer name>>_<<column>> for each transformer that doesn't implement get_feature_names
Section A removes:
Raising an error
Section B adds:
<<transformer name>>__<<output of get_feature_names>> for the transformer by sending in the column names it received at fit.
So - if the transformer doesn't implement get_feature_names, we either return the column names (in the case of a 1:1 transformation), or an integer index.
If the transformer DOES implement get_feature_names, we try to get the original feature names that were fed in, and use them to get more descriptive feature names from each transformer.
If that isn't possible, we fall back to the original behavior.
Describe alternatives you've considered, if relevant
The alternative is to stay with what it is. But I think this is a valuable addition.
Additional context
I know I haven't considered every eventuality, which is why there is not a pull request associated with this feature request. But I do think I'm close, and I would welcome any input.
The text was updated successfully, but these errors were encountered:
This feature is going to discuss and implemented within a SLEP: scikit-learn/enhancement_proposals#48
We want to have a consistent API for that matter. I am closing this issue since this was already discussed and this is a duplicate.
Describe the workflow you want to enable
It would be nice if the
ColumnTransformer
wouldget_feature_names
from even transformers that don't implementget_feature_names
, and used the full API ofget_feature_names
in transformers where it has been implemented.Describe your proposed solution
Current:
If a transformer does not implement get_feature_names, it simply raises an error.
If a transformer DOES implement get_feature_names, the ColumnTransformer ignores part of that API (ignoring fitted column names, using instead an integer column index).
Proposed Solution:
Section A adds:
<<transformer name>>_<<column>>
for each transformer that doesn't implementget_feature_names
Section A removes:
Raising an error
Section B adds:
<<transformer name>>__<<output of get_feature_names>>
for the transformer by sending in the column names it received at fit.So - if the transformer doesn't implement
get_feature_names
, we either return the column names (in the case of a 1:1 transformation), or an integer index.If the transformer DOES implement
get_feature_names
, we try to get the original feature names that were fed in, and use them to get more descriptive feature names from each transformer.If that isn't possible, we fall back to the original behavior.
Describe alternatives you've considered, if relevant
The alternative is to stay with what it is. But I think this is a valuable addition.
Additional context
I know I haven't considered every eventuality, which is why there is not a pull request associated with this feature request. But I do think I'm close, and I would welcome any input.
The text was updated successfully, but these errors were encountered: