Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pipeline cache ignore parameter verbose of transformers #28549

Open
tvdboom opened this issue Feb 28, 2024 · 3 comments
Open

Make pipeline cache ignore parameter verbose of transformers #28549

tvdboom opened this issue Feb 28, 2024 · 3 comments

Comments

@tvdboom
Copy link
Contributor

tvdboom commented Feb 28, 2024

Describe the workflow you want to enable

Introduction

sklearn's pipeline caches the output of transformers in the pipeline. The caching is based on a hash of the arguments of function _fit_transform_one. Unfortunately, the hash changes when any of the transformer's parameters change, including those that don't affect the output, for example the verbose parameter (there could be other ones, perhaps copy?).

It would be a nice feature if there would be a way to indicate to the pipeline (or if the pipeline can detect it automatically) which parameters within the transformers to ignore for caching.

Use case

While developing, I tend to always set a high verbosity to understand what's happening under the hood. Once I am content with the results, I turn the verbosity off. At this point, the results are already calculated and cached, but need to be recalculated because the change of parameter.

Examples of affected transformers

  • ColumnTransformer
  • RFE
  • RFECV
  • SparsePCA
  • IterativeImputer

Describe your proposed solution

The pipeline's caching is performed here:

fit_transform_one_cached = memory.cache(_fit_transform_one)

It uses joblib's Memory.cache, which accepts an ignore parameter to ignore arguments in the hashing, but you can't ignore parameters within one of the arguments (the first argument to _fit_transform_one is the transformer and we would like to ignore verbose within arg 1).

I couldn't find any trivial solution without monkey patching joblib's code or programmatically changing the verbose parameter in the transformers self (which would lead to unexpected results for the user, for example the lack of output messages). Happy to open a PR if anyone can think of a solution.

Describe alternatives you've considered, if relevant

No response

Additional context

Similar to #23788 but presenting a new use case, which as far as I know, can't be resolved now.

@tvdboom tvdboom added Needs Triage Issue requires triage New Feature labels Feb 28, 2024
@glemaitre glemaitre added Enhancement and removed New Feature Needs Triage Issue requires triage labels Mar 11, 2024
@glemaitre
Copy link
Member

Indeed, we should address this issue. I don't know yet what is the best option here, maybe creating a list of parameter to ignore? Maybe using the tag mechanism?

@tvdboom
Copy link
Contributor Author

tvdboom commented Mar 21, 2024

maybe creating a list of parameter to ignore

Probably the easiest solution, but how would you tell Memory.cache to ignore those parameters within the transformer? They are part of the transformer's hash.

Maybe using the tag mechanism

Better solution imo, specially since it would allow the creation of custom transformers with custom parameters ignored. But it still presents the issue of how to pass this information to the hashing mechanism.

@betatim
Copy link
Member

betatim commented Mar 21, 2024

Maybe a good way for this to work would be for each estimator to be able to declare which of its constructor arguments should be ignored. I think this goes beyond what the ignore= argument of joblib's Memory provides, because that only concerns itself with the arguments of the function it is caching, while we want to ignore "arguments to an argument of the function". Maybe cache_validation_callback= can help us out, because if not we might be out of luck :-/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants