Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding get_estimated method to pipelines #2562

Closed
schwarty opened this Issue Oct 31, 2013 · 8 comments

Comments

Projects
None yet
5 participants
@schwarty
Copy link
Contributor

commented Oct 31, 2013

The motivation is the same as PR #2561. In a nutshell, we want to be able to apply
the inverse_transform steps from a pipeline on an estimated parameter from the last step of the same pipeline. PR #2561 aims to achieve that goal by applying the inverse_transforms from all steps, except the last one if it misses an inverse_transform method. @GaelVaroquaux pointed out that it will fail in the cases where the last estimator implements both transform/inverse_transform methods and predict/score methods.

The alternative could be to implement a get_estimated method for Pipelines that explicitly retrieves a given attribute from the pipeline last step, and subsequently applies the inverse_transforms. The code would look like the following:

pca = PCA()
clf = LinearSVC()
pipeline = Pipeline([('pca', pca), ('clf', clf)])

pipeline.fit(X, y)
coef_ = pipeline.get_estimated('coef_')

assert X.shape[1] == coef_.shape[1]

Note that the final learner may itself be a Pipeline, a GridSearchCV, a MetaClassifier, or any valid combination in sklearn.

Any comments, suggestions?

@ogrisel

This comment has been minimized.

Copy link
Member

commented Oct 31, 2013

I find the name get_estimated confusing for a need that only applies to inverting the transformation estimated parameters of linear models.

@GaelVaroquaux

This comment has been minimized.

Copy link
Member

commented Oct 31, 2013

I find the name get_estimated confusing for a need that only applies to
inverting the transformation estimated parameters of linear models.

I think that it shouldn't be only for linear models. It's kind of
get_params, but for estimated params, rather than hyper parameters. That
said, I am happy to hear suggestions about the name.

@ogrisel

This comment has been minimized.

Copy link
Member

commented Oct 31, 2013

But 99% of the estimated params won't have an n_derived_features compatible shape suitable for backward transformation.

@GaelVaroquaux

This comment has been minimized.

Copy link
Member

commented Oct 31, 2013

But 99% of the estimated params won't have an n_derived_features compatible
shape suitable for backward transformation.

Well, maybe not 99%, but a significant fraction. This is why it is not
possible to have this method as a black box that automatically does what
Yannick wants.

@jnothman

This comment has been minimized.

Copy link
Member

commented Nov 1, 2013

I think you want to consider how such an API might more generally apply to
meta-estimators. Should there be an easy way for me to get the coef of the
last step (by name) of a pipeline of the best estimator chosen by
cross-validated grid search?

Also, I don't think you have a strong case here for specially treating the
model of the last estimator and no other.

On Fri, Nov 1, 2013 at 3:52 AM, Gael Varoquaux notifications@github.comwrote:

But 99% of the estimated params won't have an n_derived_features
compatible
shape suitable for backward transformation.

Well, maybe not 99%, but a significant fraction. This is why it is not
possible to have this method as a black box that automatically does what
Yannick wants.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2562#issuecomment-27503856
.

@GaelVaroquaux

This comment has been minimized.

Copy link
Member

commented Nov 1, 2013

I think you want to consider how such an API might more generally apply to
meta-estimators. Should there be an easy way for me to get the coef of the
last step (by name) of a pipeline of the best estimator chosen by
cross-validated grid search?

Yes, I think that this remark is relevent. I am a bit worried about where
it could lead us to in terms of code complexity, but this is the correct
line of thought for @schwarty's problem.

Also, I don't think you have a strong case here for specially treating the
model of the last estimator and no other.

I don't see a strong use case for other than the last estimator. Could
you be more explicit please.

@jnothman

This comment has been minimized.

Copy link
Member

commented Nov 2, 2013

Actually, I misread the original post here. I didn't see that you wanted get_estimated to include the inverse transforms of the parameter. I agree with @ogrisel that it's a poor name for that purpose. Further, I could imagine it applying to an intermediate step. For a trivial example, what if we are using our L1 model as a feature selector, after some other processing and it's step -2; we still might want to get its coefficients in the input space.

While I don't think this is a marginal use-case, I do think it has very particular semantics to provide a single method.

@amueller

This comment has been minimized.

Copy link
Member

commented Oct 25, 2016

closing as we decided against it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.