[MRG] add a new test: test_reusable_executor_thread_safety#116
Conversation
|
It seems to cause a random crash: For instance: https://travis-ci.org/tomMoral/loky/jobs/355411973#L785 Similar problem in: https://travis-ci.org/tomMoral/loky/jobs/355411974#L802 and under windows: https://ci.appveyor.com/project/tomMoral/loky/build/1.0.851/job/7yx6x33y1fyaki8c#L567 |
|
I just pushed more parametrizations for the test: the initial state can be either a broken executor or not and the threads can ask for varying numbers of workers or not, to shake it more. With this, I could reproduce some of the random failures on my local workstations by running this test in isolation. |
|
Some of the previous commit failures where caused by a docker timeout. I hope that my last commit will avoid such false positives. |
tomMoral
left a comment
There was a problem hiding this comment.
LGTM for the implementation.
My only concern is that we do not guarantee to the user the number of processes he asks for and all this can go silently. But I am not sure how we could detect that different threads are asking simultaneously different number of processes.
| if self._queue_management_thread_wakeup: | ||
| self._queue_management_thread_wakeup.close() | ||
|
|
||
| if qmtw: |
There was a problem hiding this comment.
To reduce as much as possible the chances of an OSError, we could call qmtw = self.queue_manager_thread_wakeup before this if statement.
| # Wake up queue management thread | ||
| self._queue_management_thread_wakeup.wakeup() | ||
| if qmtw is not None: | ||
| qmtw.wakeup() |
There was a problem hiding this comment.
This can fail with OSError if it has been closed concurrently so adding a try..catch is probably safer.
| ### 2.1.0 (in developement) | ||
|
|
||
| - Fixed a thread-safety issue when iterating over the dict of processes in | ||
| `ReusablePoolExecutor._resize`. |
There was a problem hiding this comment.
Maybe update this to:
Fixed thread-safety issues for ReusablePoolExecutor._resize and ReusablePoolExecutor.shutdown
| passenv = NUMBER_OF_PROCESSORS | ||
| deps = | ||
| pytest | ||
| pytest==3.4.2 |
There was a problem hiding this comment.
pytest3.4.2 does not support python3.3...
There was a problem hiding this comment.
I will remove the Python 3.3 entry in travis :)
|
The travis tests now pass but we still get a deadlock on windows: https://ci.appveyor.com/project/tomMoral/loky/build/1.0.860/job/d2xqvkpc0s5wuogi It's weird, it looks like we have 2 reusable executor instance running: 2 QM threads and 2 QF threads. The factory is probably not thread safe. |
|
@tomMoral Ok I have put a more global reentrant lock that protects:
This way we can never have 2 instances of the |
|
The tests pass. The code coverage has decreased a bit because we need to clean up the py 3.3 specific hacks. But I would rather do that in a different PR. |
|
I have run the joblib tests on my machine several times against this loky branch and everything is fine. @tomMoral feel free to merge and make a release if you agree with this PR. |
|
Let's merge to get the cron job shake this one a bit more. |
Added a new test with several threads concurrently using the same reusable executor. Found an issue, added a fix.
Let see if this test can find new bugs on the CI.