-
-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Fix LinearModelsCV for loky backend. #14264
[MRG] Fix LinearModelsCV for loky backend. #14264
Conversation
should we backport this to the 0.20.X branch ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you understand why this doesn't fail in common tests in check_classifiers_train(.., readonly_memmap=True) ?
It would be preferable if it was tested in common tests somehow, because the fact that this was possible means that the same issue may be present in other models.
It does not occur at the same moment. I'm not sure we want to do these tests for all estimators and all backends because we need sufficiently large X to cause memmapping which will make the tests run for a significantly longer time |
prefer='threading' is only used in random forest and iforest. Otherwise the default backend is loky so this kind of bug should have been caught by many users by now. Pierre has run several benchamrks with many problem sizes for all sklearn estimators and linearmodelscv are the only one he caught. |
This is not the first time we run into a bug due to such size dependent behavior. In a test, could we force or monkeypatch joblib to always mmap and run this test on a very small dataset? |
yes, through the |
Actually, this is an attribute of The solution would be to make |
I don't understand the coverage decrease... |
**_joblib_parallel_args(prefer="threads"))(jobs) | ||
mse_paths = Parallel( | ||
n_jobs=self.n_jobs, verbose=self.verbose, | ||
**_joblib_parallel_args(require="sharedmem"))(jobs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait but this will make sure threading is used and allow to change the shared X
object by different threads.
Is this really what we want? I would have though we wanted to avoid inplace operations in _path_residuals
(e.g. by triggering a copy)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. prefer='threads'
was already doing that, unless the user specifies another backend, right ?
@jeremiedbb So were different threads making inplace changes to a different chunks of input |
_path_residuals does this:
where train and test are lists. So copies are always made. |
If copies are made, when does it then fail when trying to modify read-only data? Or is the read-only property preserved by copies? |
Ok so if X is a read-only memmap, fancy indexing creates a read-only copy. I think it's a bug and I opened numpy/numpy#14132 So I reverted the |
It brings another issue. The X, y = make_regression(20000, 10)
X.setflags(write=False)
ElasticNetCV(copy_X=False, cv=3).fit(X, y) We should force a copy when X is read-only. I'll do that in a separate PR (this is not catch by common tests because default is copy_X=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well. Let me have a look at the conflicts.
The two failing tests are 60 timeouts when running the multiprocessing test under |
I can reproduce this on my desktop and on the CI with pytest-xdist turned off. The other two backends works. |
@jeremiedbb Could you please resolve conflicts? |
…b/scikit-learn into fix-linear-models-cv-read-only
Done. I left the debugging stuff for now. Last time we tried CI was timing out for the multiprocessing backend. |
So the reason of the hang is a bad interaction of MKL (especially intel openmp) and python multiprocessing. Below is a reproducible example import numpy as np
from multiprocessing import Pool
x = np.ones(10000)
x @ x
def f(i):
a = np.ones(10000)
a @ a
pool = Pool(2)
pool.map(f, [1, 2]) It's kind of related to numpy/numpy#10060 and numpy/numpy#10993 and numpy/numpy#11734, but might be a bit different. First it's about different versions on libiomp (I tested with 2020.0 and 2019.4) and the workaround @oleksandr-pavlyk do you have any idea ? |
conftest.py
Outdated
|
||
def pytest_runtest_teardown(item, nextitem): | ||
if isinstance(item, DoctestItem): | ||
set_config(print_changed_only=False) | ||
print(" {:.3f}s ".format(time() - tic), end="") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove debugging lines
build_tools/azure/test_script.sh
Outdated
@@ -21,7 +21,7 @@ except ImportError: | |||
python -c "import multiprocessing as mp; print('%d CPUs' % mp.cpu_count())" | |||
pip list | |||
|
|||
TEST_CMD="python -m pytest --showlocals --durations=20 --junitxml=$JUNITXML" | |||
TEST_CMD="python -m pytest -v --showlocals --durations=20 --junitxml=$JUNITXML" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove this.
This could also go in 0.23.1 as a fix if it doesn't get merged for the release. |
yes let's try to merge this quickly. @jeremiedbb can you have a look? thanks |
I resolved the conflicts. But there's an unresolved issue with the multiprocessing backend. It seems to be a bad interaction between python multiprocessing and mkl, see this comment. I suggest we forget about the multprocessing backend for now since it's not something we can fix on our side and consider this PR only a fix for the loky backend, so we can merge it quickly. |
The coverage issue is because the only job where we have joblib < 0.12 does not have coverage enabled. Should we care ? |
In this case its the |
Have you tried setting Edit: In fact it seems random with |
Should we be merging this? Are we able to protect it from that MKL-multiprocessing interaction (I've not looked into it)? |
It's ready to merge.
no but it's still an improvement over master. Currently it only works with the threading backend. After this it will also work with the loky backend (but not with the multiprocessing backend). |
Fixes #14249
LinearModelsCV perform inplace operations on input which can cause an error when using loky or multiprocessing backends if the input is sufficiently large to cause a memmapping (1MB).