-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache final transformer in pipeline with memory setting #23112
Comments
The original use case for caching is for the final step to be a classifier or regressor, where all previous steps are cached. As for this issue, the final step is a transformer, so technically it should be cached according to pipeline's docstring:
If we strictly follow this, then we would have caching behavior depending on what the final step is, which can be confusing semantically. The alternative solution is to update the docs and say that we only cache What do you think @jnothman ? |
We can first update the documentation to make it clear that we cache up to the last step (not included). We can later think of an enhancement where we can cache all transformers of a pipeline. |
In compose.rst and pipeline.py there are three places where pipeline caching is explained. An extra sentence was added that currently, the last step will never be cached. In one place it is mentioned that this might change in the future.
Discussion with @glemaitre suggests that if the last step of a pipeline is a transformer, then fit/transform/fit_transform methods of Pipeline class should also cache the fit/transform/fit_transform call of the last step. |
@glemaitre suggested that detecting transformers is most robust with |
I just realized that caching is currently also supported by Pipeline.fit_predict and it is not supported by Pipeline.transform. So I prepare the first solution for fit and fit_transform only. |
This Draft PR illustrates the idea before implementing test code: https://github.com/scikit-learn/scikit-learn/pull/26008/files |
For
|
The main problem with caching the last transformer step in |
In case you say that we are generally on the right track, here, I would add test cases to the draft PR. |
Describe the bug
When setting the
memory
parameter of a transformerPipeline
(i.e., one whose last step is a transformer), the final transformer is not cached.Discovered at https://stackoverflow.com/q/71812869/10495893.
Steps/Code to Reproduce
Expected Results
Actual Results
Versions
The text was updated successfully, but these errors were encountered: