Pipeline et al. design issues

This page is to collate issues related to Pipelines and other meta-estimator API design. In general, a meta-estimator M with (primary) sub-estimator S should be more-or-less usable in place of S. Deficiencies in the current models mean this is not always the case; which of these deficiencies should be fixed and how? Other issues related to meta-estimator support (e.g. nested parameter setting) may also be relevant.

General meta-estimator issues

Duck-typing and methods (#1805, #2019)

hasattr may be used to check an estimator supports a particular functionality (e.g. fit_transform, predict_proba). In meta-estimators this is conditioned on the presence of that method on a sub-estimator. This behaviour can be ensured using magic methods (__getattr__ or __getattribute__) or using descriptors (e.g. property): when these raise AttributeError, hasattr returns false.

PR #2019 supports common methods using property, sacrificing some readability. The question of which common methods need to be supported is a further issue.

A further concern is that in traditional estimators, hasattr will work before or after fitting. If something like GridSearchCV delegates hasattr to its best_estimator_, this will only have effect after fitting.

Accessing fitted attributes (cf. #2561, #2568, #2630 in the context of Pipeline)

It can be cumbersome to access a fitted attribute of an estimator (e.g. in a Pipeline within GridSearchCV, this may involve gs.best_estimator_.steps[-1][1].coef_). To be interpreted with respect to the input space, this may require further transformation (e.g. Pipeline(gs.best_estimator_.steps[:-1]).inverse_transform(gs.best_estimator_.steps[-1][1].coef_)).

Moreover, some fitted attributes are used by meta-estimators; AdaBoostClassifier assumes its sub-estimator has a classes_ attribute after fitting, which means that presently Pipeline cannot be used as the sub-estimator of AdaBoostClassifier. Either meta-estimators such as AdaBoostClassifier need to be configurable in how they access this attribute, or meta-estimators such as Pipeline need to make some fitted attributes of sub-estimators accessible.

Pipeline / FeatureUnion issues

Passing parameters such as `sample_weight` to methods (cf. #2630)

`Pipeline.get_feature_names()` (#2007)

Efficiently reusing partial models/transformations during grid search (#2086)

Inconsistency between `get_params` and `set_params` treatment of sub-estimators (#1769, #1800)

Minor functionality and syntax issues:

constructor verbosity (#2589)
alternating or disabling components through set_params() (#1769)
retrieving a final model in input feature space (#2561, #2568)
heterogeneous input in FeatureUnion (#2034)
partitioning the FeatureUnion output space by transformer (#1952)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline et al. design issues

General meta-estimator issues

Duck-typing and methods (#1805, #2019)

Accessing fitted attributes (cf. #2561, #2568, #2630 in the context of Pipeline)

Pipeline / FeatureUnion issues

Passing parameters such as `sample_weight` to methods (cf. #2630)

`Pipeline.get_feature_names()` (#2007)

Efficiently reusing partial models/transformations during grid search (#2086)

Inconsistency between `get_params` and `set_params` treatment of sub-estimators (#1769, #1800)

Minor functionality and syntax issues:

Clone this wiki locally

Pipeline et al. design issues

General meta-estimator issues

Duck-typing and methods (#1805, #2019)

Accessing fitted attributes (cf. #2561, #2568, #2630 in the context of Pipeline)

Pipeline / FeatureUnion issues

Passing parameters such as sample_weight to methods (cf. #2630)

Pipeline.get_feature_names() (#2007)

Efficiently reusing partial models/transformations during grid search (#2086)

Inconsistency between get_params and set_params treatment of sub-estimators (#1769, #1800)

Minor functionality and syntax issues:

Clone this wiki locally

Passing parameters such as `sample_weight` to methods (cf. #2630)

`Pipeline.get_feature_names()` (#2007)

Inconsistency between `get_params` and `set_params` treatment of sub-estimators (#1769, #1800)