-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BaseEstimator.get_params and clone thread-safe are not thread safe #2755
Comments
What's the status on this? |
It is still not addressed although as solution based on a class decorator to manage deprecated constructor params declaratively should be possible. |
So sorry for the spam, but I just want to +1 this issue. My app calls sklearn (specifically, |
This just bit me as well when parallelizing scikit-learn with dask. I was able to get around it by not explicitly calling |
Yes. And I'm not altogether sure how a neat implementation is possible with the class decoration suggested by @ogrisel. |
I am sure I had a clear plan in my head at the time but I forgot... BTW could it be the case that this is actually the cause of #8410? |
I don't think so. That issue complains about incorrect results, whereas the issue here is that the File "C:\Program Files\ilastik-1.1.6\python\lib\site-packages\sklearn\ensemble\forest.py", line 302, in fit
for i in range(n_jobs))
File "C:\Program Files\ilastik-1.1.6\python\lib\site-packages\sklearn\externals\joblib\parallel.py", line 517, in __call__
self.dispatch(function, args, kwargs)
File "C:\Program Files\ilastik-1.1.6\python\lib\site-packages\sklearn\externals\joblib\parallel.py", line 312, in dispatch
job = ImmediateApply(func, args, kwargs)
File "C:\Program Files\ilastik-1.1.6\python\lib\site-packages\sklearn\externals\joblib\parallel.py", line 136, in __init__
self.results = func(*args, **kwargs)
File "C:\Program Files\ilastik-1.1.6\python\lib\site-packages\sklearn\ensemble\forest.py", line 82, in _parallel_build_trees
tree = forest._make_estimator(append=False)
File "C:\Program Files\ilastik-1.1.6\python\lib\site-packages\sklearn\ensemble\base.py", line 57, in _make_estimator
estimator = clone(self.base_estimator)
File "C:\Program Files\ilastik-1.1.6\python\lib\site-packages\sklearn\base.py", line 49, in clone
params_set = new_object.get_params(deep=False)
File "C:\Program Files\ilastik-1.1.6\python\lib\site-packages\sklearn\base.py", line 218, in get_params
warnings.filters.pop(0)
IndexError: pop from empty list |
@stuarteberg it would be great if you can post a stand-alone snippet reproducing the problem leading to your traceback. |
For reasons I don't understand, the only users of my software to have seen this are Windows users. I don't have a windows machine, so it's not easy for me to generate a reduced test case. IIUC, the problem should be reproducible by simply calling |
I run into this issue about half the time I run a particular script:
|
This reproduces it on (most) calls: from multiprocessing.pool import ThreadPool
from sklearn.linear_model import LogisticRegression
def f(x):
for i in range(10):
x.get_params()
lr = LogisticRegression()
p = ThreadPool(4)
p.map(f, [lr] * 1000) |
Thanks for the stand-alone snippet @jcrist. I can reproduce the problem indeed. |
Is it worth using locks as a fix for now?? |
Hi, |
FWIW, even without this import time
import io
import warnings
import pprint
import difflib
import sys
from joblib import Parallel, delayed
def do_stuff():
with warnings.catch_warnings(record=True):
warnings.simplefilter('always', DeprecationWarning)
warnings.warn('blah', DeprecationWarning)
time.sleep(0.01)
def diag(file):
pprint.pprint({k: v for k, v in warnings.__dict__.items()
if not k.startswith('__') or not k.endswith('__')},
stream=file)
before = io.StringIO()
diag(before)
Parallel(n_jobs=-1, backend='threading')(delayed(do_stuff)()
for i in range(10000))
after = io.StringIO()
diag(after)
before.seek(0)
after.seek(0)
sys.stdout.writelines(difflib.unified_diff(before.readlines(),
after.readlines()))
assert before.getvalue() == after.getvalue() Output:
The first difference (in |
What are the risks of locking (@GaelVaroquaux)? |
See my comments here. This code is not used in scikit-learn. I think we should get rid of it. If we want to keep it, we can find ways to detect deprecations that don't rely on warnings. |
yes I think we've both made similar comments. We can just change it. But it
won't fix up thread safety issues.
…On 17 Aug 2017 7:03 am, "Andreas Mueller" ***@***.***> wrote:
See my comments here
<#7346 (comment)>.
This code is not used in scikit-learn. I think we should get rid of it. If
we want to keep it, we can find ways to detect deprecations that don't rely
on warnings.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2755 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz64TuFvlCwEQPVw9AjA5YtcgQv9Stks5sY1kPgaJpZM4BZ4db>
.
|
ohhh yeah, 3.5 year old bug! |
:) basically solved by declaring the old code didn't know its head from its
tail...
|
The handling of deprecated constructor parameters is leveraging the execution Python warnings machinery which is not thread-safe and could therefore caused hard to diagnose bugs in
There was a tentative lock-based workaround in #2729. A better solution however would avoid executing the warning machinery at all in
get_params
by leveraging declarative deprecation introspection possibly using a class decorator.I will try to issue a PR for this next week.
The text was updated successfully, but these errors were encountered: