pytest-xdist hangs if max-slave-restart is set and the restart limit is reached #45

fizyk · 2016-02-09T20:48:46Z

It can be observed in this build:
https://travis-ci.org/ClearcodeHQ/pytest-dbfixtures/jobs/108118580

I'm testing if a package is configured properly (all non py files are included, hence change to install) and installing the package with "pip install ." before the test.
pytest_load_initial_conftests throws a ValidationError if the additional files can not be accessed. however I end up with a loop of restarting slaves indefinitely, or if I set --max-slave-restart, I end up with hanging py.test as well.

The text was updated successfully, but these errors were encountered:

RonnyPfannschmidt · 2016-02-10T06:15:18Z

I found time to take a look, its indeed 2 bugs, one in pytest that will require the upcoming 2.9 release (parameter passing)

The other one in xdist being a deadlock condition that's scheduled for fixing of xdist 2.0 since it needs a extensive internal refactoring to address the issue propperly

fizyk · 2016-02-10T10:35:33Z

Thanks. Will look forward for new xdist and py.test then :)

alfredodeza · 2016-10-26T12:28:07Z

I am hitting this same issue. I get to trigger the problem when using --rsyncdir and my conftest.py file gets executed remotely. In my case there is an exception raised in conftest.py which causes the slave to be restarted.

Since the slave restart got into a loop (slave connects, conftest raises exception, slave restarts) I added the --max-slave-restart=1 initially and then with --max-slave-restart=0 both with the same effect: the connection hangs forever. There is no way to exit out of this unless a Ctrl-C is sent to the process.

[node2] node down: Traceback (most recent call last):
  File "<string>", line 1072, in executetask
  File "<string>", line 1, in do_exec
  File "<remote exec>", line 155, in <module>
  File "<remote exec>", line 129, in remote_initconfig
  File "/home/vagrant/pyexecnetcache/_pytest/config.py", line 912, in fromdictargs
    config.parse(args, addopts=False)
  File "/home/vagrant/pyexecnetcache/_pytest/config.py", line 1039, in parse
    self.hook.pytest_cmdline_preparse(config=self, args=args)
  File "/home/vagrant/pyexecnetcache/_pytest/vendored_packages/pluggy.py", line 745, in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
  File "/home/vagrant/pyexecnetcache/_pytest/vendored_packages/pluggy.py", line 339, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/home/vagrant/pyexecnetcache/_pytest/vendored_packages/pluggy.py", line 334, in <lambda>
    _MultiCall(methods, kwargs, hook.spec_opts).execute()
  File "/home/vagrant/pyexecnetcache/_pytest/vendored_packages/pluggy.py", line 614, in execute
    res = hook_impl.function(*args)
  File "/home/vagrant/pyexecnetcache/tests/conftest.py", line 100, in pytest_cmdline_preparse
    components = load_component_config(component_path, args=args)
  File "/home/vagrant/pyexecnetcache/tests/conftest.py", line 42, in load_component_config
    raise RuntimeError('`%s` is not a file.' % abspath)
RuntimeError: `/home/vagrant/pyexecnetcache/components/base.py` is not a file.

Slave restarting disabled

Versions used:

py.test 3.0.3
pytest-xdist-1.15.0

alfredodeza · 2016-10-26T12:34:42Z

Maybe allowing a timeout value? execnet allows setting a timeout (in seconds) to disconnect gracefully from the remote connection.

This is pretty bad for us because we are relying on this feature for (automated) CI test runs and we keep hitting this

nicoddemus · 2016-10-27T10:42:00Z

Thanks @alfredodeza.

Just to comment that this issue has not been forgotten, it is just that we maintainers didn't yet have the time to look at it properly. If someone wants to try a PR we would appreciate it.

lemontree255 · 2017-06-21T14:16:24Z

Hi,

Is there any updates on this issue?

RonnyPfannschmidt · 2017-06-21T15:37:32Z

nope

timj · 2017-09-28T22:09:51Z

In case it helps, the problem seems to be that the while loop in DSession.loop_once() does not understand that all the workers have disappeared. It keeps on seeing if the queue is empty with a 2 seconds timeout and the answer is always "yes, empty" and then it tries again. Does DSession have any way to know that there are no workers left?

The other thing I see is that forked processes that die don't seem to be collected properly: they hang around until the parent dies (which is the problem I have when infinite worker restarts are allowed: eventually no more processes can be created and the system locks up).

timj · 2017-09-28T23:17:29Z

This works for me, checking to see if there are any active nodes, although I'm sure there's a better way to force an exit rather than raising RuntimeError:

    def loop_once(self):
        """Process one callback from one of the slaves."""
        while 1:
            if not self._active_nodes:
                # If everything has died stop looping
                self.triggershutdown()
                raise RuntimeError("No active nodes")
            try:
                eventcall = self.queue.get(timeout=2.0)
                break
            except queue.Empty:
                continue

nicoddemus · 2017-10-02T22:54:32Z

Did you try just returning from it after triggershutdown?

    def loop_once(self):
        """Process one callback from one of the slaves."""
        while 1:
            if not self._active_nodes:
                # If everything has died stop looping
                self.triggershutdown()
                return
            try:
                eventcall = self.queue.get(timeout=2.0)
                break
            except queue.Empty:
                continue

timj · 2017-10-02T23:59:39Z

Yes. If you do that the process exits with "no tests found". I think it should more strongly indicate that things went wrong.

nicoddemus · 2017-10-03T00:02:26Z

Oh right. Hmm not sure then, perhaps @RonnyPfannschmidt has a suggestion in which exception should be thrown then?

RonnyPfannschmidt · 2017-10-03T05:21:51Z

nope

nicoddemus · 2017-10-03T10:41:38Z

In that case I think raising an internal error is better than the alternative, which is to just loop forever. We can worry about providing a nicer message later.

@timj would you like to open a PR with this fix?

nicoddemus · 2017-10-05T20:24:43Z

Fixed by #238

RonnyPfannschmidt added the bug label Feb 10, 2016

timj mentioned this issue Sep 28, 2017

Infinite retries if segv on import #236

Closed

timj mentioned this issue Oct 4, 2017

Stop looping if all workers have died #238

Merged

2 tasks

nicoddemus closed this as completed Oct 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pytest-xdist hangs if max-slave-restart is set and the restart limit is reached #45

pytest-xdist hangs if max-slave-restart is set and the restart limit is reached #45

fizyk commented Feb 9, 2016

RonnyPfannschmidt commented Feb 10, 2016

fizyk commented Feb 10, 2016

alfredodeza commented Oct 26, 2016 •

edited

Loading

alfredodeza commented Oct 26, 2016

nicoddemus commented Oct 27, 2016

lemontree255 commented Jun 21, 2017

RonnyPfannschmidt commented Jun 21, 2017

timj commented Sep 28, 2017

timj commented Sep 28, 2017 •

edited

Loading

nicoddemus commented Oct 2, 2017

timj commented Oct 2, 2017 •

edited

Loading

nicoddemus commented Oct 3, 2017

RonnyPfannschmidt commented Oct 3, 2017

nicoddemus commented Oct 3, 2017

nicoddemus commented Oct 5, 2017

pytest-xdist hangs if max-slave-restart is set and the restart limit is reached #45

pytest-xdist hangs if max-slave-restart is set and the restart limit is reached #45

Comments

fizyk commented Feb 9, 2016

RonnyPfannschmidt commented Feb 10, 2016

fizyk commented Feb 10, 2016

alfredodeza commented Oct 26, 2016 • edited Loading

alfredodeza commented Oct 26, 2016

nicoddemus commented Oct 27, 2016

lemontree255 commented Jun 21, 2017

RonnyPfannschmidt commented Jun 21, 2017

timj commented Sep 28, 2017

timj commented Sep 28, 2017 • edited Loading

nicoddemus commented Oct 2, 2017

timj commented Oct 2, 2017 • edited Loading

nicoddemus commented Oct 3, 2017

RonnyPfannschmidt commented Oct 3, 2017

nicoddemus commented Oct 3, 2017

nicoddemus commented Oct 5, 2017

alfredodeza commented Oct 26, 2016 •

edited

Loading

timj commented Sep 28, 2017 •

edited

Loading

timj commented Oct 2, 2017 •

edited

Loading