Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] actually test for 64 bit Python on appveyor #87

Merged
merged 6 commits into from Aug 7, 2017

Conversation

ogrisel
Copy link
Collaborator

@ogrisel ogrisel commented Aug 4, 2017

and fix copy-pasted comment in appveyor.yml.

Since we use tox the Python version is not important, just the platform is. However we don't want to give the impression that legacy Python 2.7 is a reasonable default nowadays :)

The issue was revealed by the scikit-learn CI tests: scikit-learn/scikit-learn#9486

@ogrisel
Copy link
Collaborator Author

ogrisel commented Aug 4, 2017

Actually the last commit reveals that we never run the tests under 64 bit windows as we thought we were.

@ogrisel ogrisel changed the title [MRG] do not use legacy Python by default and fix cleanup [WIP] actually test for 64 bit Python on appveyor Aug 4, 2017
@ogrisel ogrisel force-pushed the appveyor-cleanup branch 2 times, most recently from 4ba9970 to 844960e Compare August 4, 2017 21:03
@ogrisel
Copy link
Collaborator Author

ogrisel commented Aug 4, 2017

@tomMoral I fixed some issues, but not all apparently. I have to stop for the WE. Feel free to push stuff directly into this PR if you wish.

@ogrisel
Copy link
Collaborator Author

ogrisel commented Aug 5, 2017

I cannot reproduce on my own windows VM :(.

@ogrisel
Copy link
Collaborator Author

ogrisel commented Aug 5, 2017

The relevant part of the error message is the following:

[DEBUG/LokyProcess-210] recreated blocker with handle 16
[DEBUG/LokyProcess-210] recreated blocker with handle 20
[DEBUG/LokyProcess-210] Queue._after_fork()
[DEBUG/LokyProcess-210] recreated blocker with handle 36
[DEBUG/LokyProcess-210] recreated blocker with handle 40
[INFO/LokyProcess-210] child process calling self.run()
[DEBUG/LokyProcess-210] worker started with timeout=0.01
[INFO/LokyProcess-210] shutting down worker after timeout 0.010s
[INFO/LokyProcess-210] process shutting down
[DEBUG/LokyProcess-210] running all "atexit" finalizers with priority >= 0
[DEBUG/LokyProcess-210] running the remaining "atexit" finalizers
[INFO/LokyProcess-210] process exiting with exitcode 0
[DEBUG:MainProcess:MainThread] Adjust process count : {1360: <LokyProcess(LokyProcess-213, started)>, 2580: <LokyProcess(LokyProcess-212, started)>, 2224: <LokyProcess(LokyProcess-210, stopped)>, 2172: <LokyProcess(LokyProcess-211, started)>}
[DEBUG:MainProcess:QueueManager] The executor is broken as at least one process terminated abruptly
-------------------------- Captured stderr teardown ---------------------------
[DEBUG/LokyProcess-211] Using default backend pickle for pickling.
[DEBUG/LokyProcess-211] recreated blocker with handle 16
[DEBUG/LokyProcess-211] recreated blocker with handle 20
[DEBUG/LokyProcess-211] Queue._after_fork()
[DEBUG/LokyProcess-211] recreated blocker with handle 44
[DEBUG/LokyProcess-211] recreated blocker with handle 52
[INFO/LokyProcess-211] child process calling self.run()
[DEBUG/LokyProcess-211] worker started with timeout=0.01
[DEBUG:MainProcess:QueueManager] terminate process LokyProcess-213
[DEBUG:MainProcess:QueueManager] terminate process LokyProcess-212
[DEBUG:MainProcess:QueueManager] terminate process LokyProcess-210
[DEBUG:MainProcess:QueueManager] terminate process LokyProcess-211
[DEBUG:MainProcess:QueueManager] queue management thread shutting down
[DEBUG:MainProcess:QueueManager] closing call_queue
[DEBUG:MainProcess:QueueManager] telling queue thread to quit
[DEBUG:MainProcess:QueueManager] joining processes
[DEBUG:MainProcess:QueueFeederThread] feeder thread got sentinel -- exiting
[DEBUG:MainProcess:QueueManager] queue management thread clean shutdown of worker processes: {}
================================== FAILURES ===================================
______________ TestsProcessPoolLokyExecutor.test_worker_timeout _______________
self = <tests.test_process_executor_loky.TestsProcessPoolLokyExecutor instance at 0x03569C38>
    @pytest.mark.timeout(50 if sys.platform == "win32" else 25)
    def test_worker_timeout(self):
        self.executor.shutdown(wait=True)
        self.check_no_running_workers(patience=5)
        timeout = getattr(self, 'min_worker_timeout', .01)
        try:
            self.executor = self.executor_type(
                max_workers=4, context=self.context, timeout=timeout)
        except NotImplementedError as e:
            self.skipTest(str(e))
    
        for i in range(5):
            # Trigger worker spawn for lazy executor implementations
>           for result in self.executor.map(id, range(8)):
i          = 2
result     = 44988960
self       = <tests.test_process_executor_loky.TestsProcessPoolLokyExecutor instance at 0x03569C38>
timeout    = 0.01
tests\_test_process_executor.py:638: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
loky\_base.py:584: in result_iterator
    yield future.result()
loky\_base.py:431: in result
    return self.__get_result()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Future at 0x3573370 state=finished raised BrokenExecutor>
    def __get_result(self):
        if self._exception:
>           raise self._exception
E           BrokenExecutor: A process in the process pool was terminated abruptly while the future was running or pending.
self       = <Future at 0x3573370 state=finished raised BrokenExecutor>
loky\_base.py:382: BrokenExecutor

It seems we have a race condition in the QueueManager: it might detect process 211 as started but dead before it actually starts (the starting logs of 211 showup in the teardown phase of the test, after the queue manager has decided that one process was dead).

@ogrisel
Copy link
Collaborator Author

ogrisel commented Aug 7, 2017

CI is green after a rebase on master. The previous failure is probably random. I reported it as an issue in #90. It's probably independent of this PR.

@ogrisel ogrisel merged commit 3afc87f into joblib:master Aug 7, 2017
@ogrisel ogrisel deleted the appveyor-cleanup branch August 7, 2017 11:30
@ogrisel ogrisel changed the title [WIP] actually test for 64 bit Python on appveyor [MRG] actually test for 64 bit Python on appveyor Aug 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant