Make pipeline cache ignore parameter `verbose` of transformers #28549

tvdboom · 2024-02-28T10:47:38Z

Describe the workflow you want to enable

Introduction

sklearn's pipeline caches the output of transformers in the pipeline. The caching is based on a hash of the arguments of function _fit_transform_one. Unfortunately, the hash changes when any of the transformer's parameters change, including those that don't affect the output, for example the verbose parameter (there could be other ones, perhaps copy?).

It would be a nice feature if there would be a way to indicate to the pipeline (or if the pipeline can detect it automatically) which parameters within the transformers to ignore for caching.

Use case

While developing, I tend to always set a high verbosity to understand what's happening under the hood. Once I am content with the results, I turn the verbosity off. At this point, the results are already calculated and cached, but need to be recalculated because the change of parameter.

Examples of affected transformers

ColumnTransformer
RFE
RFECV
SparsePCA
IterativeImputer

Describe your proposed solution

The pipeline's caching is performed here:

scikit-learn/sklearn/pipeline.py

Line 392 in 38b39a4

fit_transform_one_cached = memory.cache(_fit_transform_one)

It uses joblib's Memory.cache, which accepts an ignore parameter to ignore arguments in the hashing, but you can't ignore parameters within one of the arguments (the first argument to _fit_transform_one is the transformer and we would like to ignore verbose within arg 1).

I couldn't find any trivial solution without monkey patching joblib's code or programmatically changing the verbose parameter in the transformers self (which would lead to unexpected results for the user, for example the lack of output messages). Happy to open a PR if anyone can think of a solution.

Describe alternatives you've considered, if relevant

No response

Additional context

Similar to #23788 but presenting a new use case, which as far as I know, can't be resolved now.

The text was updated successfully, but these errors were encountered:

glemaitre · 2024-03-11T19:49:03Z

Indeed, we should address this issue. I don't know yet what is the best option here, maybe creating a list of parameter to ignore? Maybe using the tag mechanism?

tvdboom · 2024-03-21T07:47:30Z

maybe creating a list of parameter to ignore

Probably the easiest solution, but how would you tell Memory.cache to ignore those parameters within the transformer? They are part of the transformer's hash.

Maybe using the tag mechanism

Better solution imo, specially since it would allow the creation of custom transformers with custom parameters ignored. But it still presents the issue of how to pass this information to the hashing mechanism.

betatim · 2024-03-21T08:08:18Z

Maybe a good way for this to work would be for each estimator to be able to declare which of its constructor arguments should be ignored. I think this goes beyond what the ignore= argument of joblib's Memory provides, because that only concerns itself with the arguments of the function it is caching, while we want to ignore "arguments to an argument of the function". Maybe cache_validation_callback= can help us out, because if not we might be out of luck :-/

tvdboom added Needs Triage Issue requires triage New Feature labels Feb 28, 2024

glemaitre added Enhancement and removed New Feature Needs Triage Issue requires triage labels Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make pipeline cache ignore parameter `verbose` of transformers #28549

Make pipeline cache ignore parameter `verbose` of transformers #28549

tvdboom commented Feb 28, 2024

glemaitre commented Mar 11, 2024

tvdboom commented Mar 21, 2024

betatim commented Mar 21, 2024

Make pipeline cache ignore parameter verbose of transformers #28549

Make pipeline cache ignore parameter verbose of transformers #28549

Comments

tvdboom commented Feb 28, 2024

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

glemaitre commented Mar 11, 2024

tvdboom commented Mar 21, 2024

betatim commented Mar 21, 2024

Make pipeline cache ignore parameter `verbose` of transformers #28549

Make pipeline cache ignore parameter `verbose` of transformers #28549