ENH store per-transformer index into feature space in FeatureUnion #1952
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
FeatureUnion
provided no simple way to tell which parts of the stack belonged to which transformer. This provides afeature_ptr_
attribute (of the form ofcsr_matrix.indptr
) to solve that.Caveats:
fit_transform
, notfit
, is calledtransform
time without a refit
.Both these caveats would be solved if each transformer in sklearn provided a way to get the number of output features. I suggest a
transformed_width_
attribute orget_transformed_width()
method (better name?), the latter making it more clear that the output can be affected byset_params
. I also suggest this be posed as an Easy Issue for someone to tackle.Finally,
feature_ptr_
is compact and versatile, but it might be more usable if I add a method to get this data as a dict from transformer-name toslice
s.