-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
Pipeline et al. design issues
This page is to collate issues related to Pipelines and other meta-estimator API design. In general, a meta-estimator M
with (primary) sub-estimator S
should be more-or-less usable in place of S
. Deficiencies in the current models mean this is not always the case; which of these deficiencies should be fixed and how? Other issues related to meta-estimator support (e.g. nested parameter setting) may also be relevant.
hasattr
may be used to check an estimator supports a particular functionality (e.g. fit_transform
, predict_proba
). In meta-estimators this is conditioned on the presence of that method on a sub-estimator. This behaviour can be ensured using magic methods (__getattr__
or __getattribute__
) or using descriptors (e.g. property
): when these raise AttributeError
, hasattr
returns false.
PR #2019 supports common methods using property
, sacrificing some readability. The question of which common methods need to be supported is a further issue.
A further concern is that in traditional estimators, hasattr
will work before or after fitting. If something like GridSearchCV
delegates hasattr
to its best_estimator_
, this will only have effect after fitting.
It can be cumbersome to access a fitted attribute of an estimator (e.g. in a Pipeline
within GridSearchCV
, this may involve gs.best_estimator_.steps[-1][1].coef_
). To be interpreted with respect to the input space, this may require further transformation (e.g. Pipeline(gs.best_estimator_.steps[:-1]).inverse_transform(gs.best_estimator_.steps[-1][1].coef_)
).
Moreover, some fitted attributes are used by meta-estimators; AdaBoostClassifier
assumes its sub-estimator has a classes_
attribute after fitting, which means that presently Pipeline
cannot be used as the sub-estimator of AdaBoostClassifier
. Either meta-estimators such as AdaBoostClassifier
need to be configurable in how they access this attribute, or meta-estimators such as Pipeline
need to make some fitted attributes of sub-estimators accessible.