-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
MAINT reduce scope of test_linear_models_cv_fit_for_all_backends to reduce CI usage #21918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT reduce scope of test_linear_models_cv_fit_for_all_backends to reduce CI usage #21918
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left comment about future API for joblib.
Overall this PR LGTM.
# Unfortunately the scikit-learn and joblib APIs do not make it possible to | ||
# change the max_nbyte of the inner Parallel call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there should be an API for changing max_nbytes
in the inner Parallel
call? Something like:
with parallel_backend("loky", max_nbytes=1000):
results = Parallel(n_jobs=4)(delayed(func)(x, y) for x, y in data)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that would be nice but unfortunately this conflicts with the current joblib backend API design and dealing with backward compat is...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is hard for joblib API wise, what do you think about adding a parallel_kwargs
parameter to estimators that creates a Parallel
object?
(I know this is a little counter to how we have been removing pre_dispatch
from the estimator's __init__
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably find a way to change the parallel_backend
to detect Parallel
kwargs and treat time specifically instead of passing them as constructor arguments for the backend.
@jeremiedbb you might want to give this PR a second review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is a non regression test for a fix regarding using the loky backend. By default these estimators use the threading backend which has always been working. We need to keep testing with the "loky" backend
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Indeed, good catch, I had not realized. Let's see if the CI stays green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Do you know how much time it saves ?
When running
|
…educe CI usage (scikit-learn#21918) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
…educe CI usage (#21918) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Alternative to #21907.
Towards: #21407.
Do not test for the threading backend which was not involved in the original problem.
Do not test for the multitask variants that are much more computationally intensive. LassoCV and ElasticNetCV should be enough to cover as a non-regression test for the original problem.
Use the minimal dataset for the fewest number of features to trigger memmaping in joblib.