ENH store per-transformer index into feature space in FeatureUnion #1952

Open
wants to merge 1 commit into
from

Conversation

Projects
None yet
1 participant
Owner

jnothman commented May 8, 2013

FeatureUnion provided no simple way to tell which parts of the stack belonged to which transformer. This provides a feature_ptr_ attribute (of the form of csr_matrix.indptr) to solve that.

Caveats:

  • it can only be determined when fit_transform, not fit, is called
  • it will be incorrect if one of the sub-transformers has a change of parameters that affects the output size at transform time without a refit.

Both these caveats would be solved if each transformer in sklearn provided a way to get the number of output features. I suggest a transformed_width_ attribute or get_transformed_width() method (better name?), the latter making it more clear that the output can be affected by set_params. I also suggest this be posed as an Easy Issue for someone to tackle.

Finally, feature_ptr_ is compact and versatile, but it might be more usable if I add a method to get this data as a dict from transformer-name to slices.

Owner

jnothman commented Aug 10, 2014

Closing due to lack of interest and underspecification.

@jnothman jnothman closed this Aug 10, 2014

@jnothman jnothman reopened this Jun 28, 2017

Owner

jnothman commented Jun 28, 2017

I reckon this kind of thing is still worth having. I intend to resurrect it. I still don't have a certain solution for the fit() case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment