Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test process hanging forever #43

Closed
pytestbot opened this issue Jul 21, 2017 · 8 comments
Closed

Test process hanging forever #43

pytestbot opened this issue Jul 21, 2017 · 8 comments

Comments

@pytestbot
Copy link

We have a test job that uses pytest, xdist, execnet and it has been hanging forever on a regular basis.

Attaching in gdb it looks like it is waiting for a child but there is no child process. The relevant gdb python snippet:

threading.py: wait: 309
threading.py: wait: 603
execnet/gateway_base.py: waitall: 292
execnet/multi.py: safe_terminate: 277

In safe_terminate:

#!python
def safe_terminate(execmodel, timeout, list_of_paired_functions):
    # Snip some code
    workerpool.waitall()

If the timeout value that is passed into safe_terminate is passed into waitall then it will timeout and exit the pytest test run.

This is working for us.

pytest 2.6.4, xdist 1.11, execnet 1.2.0

We haven't used execnet 1.3.0 yet, but that version hasn't modified the above code.

@pytestbot
Copy link
Author

Original comment by @mgoral

Hello,

exactly same issue here, but with pytest 2.9.0, xdist 1.14 and execnet 1.4.1. In my situation child process was previously killed/died because of something, which execnet additionally reported by printing to stderr: killing: [Errno 3] No such process

Adding any timeout to workerpool.waitall(), be it the one passed into safe_terminate or any hardcoded value seems to fix the problem.

@Peque
Copy link

Peque commented Sep 5, 2017

Not sure if we could be hitting this issue as well (opensistemas-hub/osbrain#182).

@RonnyPfannschmidt
Copy link
Member

this was fixed in aca6845

@JCourt1
Copy link

JCourt1 commented Aug 23, 2023

@RonnyPfannschmidt I'm still experiencing an issue where my code appears to get stuck in the reply.get() just above the waitall(timeout=timeout) that was added in aca6845

Would it be possible to just pass through the same timeout there, as well?

@RonnyPfannschmidt
Copy link
Member

I believe so, but someone has to implement it

@JCourt1
Copy link

JCourt1 commented Sep 16, 2023

When you say that, do you just mean that someone has to create a PR with that change I suggested (reply.get(timeout=timeout)), or that an implementation has to be added somewhere else (I don't think this is the case, since Reply::get does already receive the timeout parameter and implements it)?

If you think there's nothing else to be done over and above that, I can create a tiny PR with that change

@RonnyPfannschmidt
Copy link
Member

I hadn't have a chance to investigate, but it should be a step in the right direction

@JCourt1
Copy link

JCourt1 commented Sep 18, 2023

Ok, great, I created that as a PR just to move discussion over to there in case there is anything else to say about this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants