Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC generalised Pipeline.get_feature_names #6424

Open
jnothman opened this issue Feb 23, 2016 · 3 comments

Comments

@jnothman
Copy link
Member

@jnothman jnothman commented Feb 23, 2016

There has been some demand for Pipeline.get_feature_names (#2007, #5172, #6421) for the case where the last element in the pipeline is a feature extractor. Following on from #6372, we instead shall make get_feature_names able to transform some list of input features in the general case. I propose the following behaviour:

  1. Pipeline.get_feature_names may be called with a list input_features as an argument only if all its estimators support get_feature_names with an argument. The initial input_features is transformed iteratively through the estimators.
  2. Pipeline.get_feature_names may be called without an argument only if a suffix of its estimators support get_feature_names. The first of that suffix may or may not accept input_features, and the remainder must accept input_features; the output of the first get_feature_names call is iteratively modified by downstream transformers' get_feature_names.
    • To be cautious until we find a use-case otherwise, get_feature_names will not be supported in the case that get_feature_names is available before (but not adjacent to) that suffix.
  3. Otherwise, a ValueError is raised. Or: should the attribute become invisible, as for predict et al.?
@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Feb 24, 2016

agreed on 1) and 2).
For three: maybe an AttriubuteError: the last step has no get_feature_names

@jnothman

This comment has been minimized.

Copy link
Member Author

@jnothman jnothman commented Feb 24, 2016

Do you mean an AttributeError if the last step has no get_feature_names? The problem with the AttributeError is that the definition currently allows for get_feature_names that does not take an argument. Testing for this when doing the attribute lookup is fairly heavy. (Though I suspect that we will require get_feature_names to take an argument, even if unused, in any estimator where the pipeline functionality is sought.)

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Feb 25, 2016

Ah, I didn't think about that. But these are two different errors, right? one is there is no post-fix with get_features_names and the other is feature_names was passed and there is no post-fix that takes feature_names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Andy's pets
Design phase
2 participants
You can’t perform that action at this time.