Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Popen4 wait() fails sporadically with threads #41862

Closed
tskogan mannequin opened this issue Apr 15, 2005 · 12 comments
Closed

Popen4 wait() fails sporadically with threads #41862

tskogan mannequin opened this issue Apr 15, 2005 · 12 comments
Labels
stdlib Python modules in the Lib dir

Comments

@tskogan
Copy link
Mannequin

tskogan mannequin commented Apr 15, 2005

BPO 1183780
Nosy @loewis
Files
  • popen_bug.py: Script to reproduce error.
  • popen2.patch: v1
  • popen2.patch: v2 - using del
  • popen2.diff: v3
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2007-07-11.12:46:38.000>
    created_at = <Date 2005-04-15.14:27:54.000>
    labels = ['library']
    title = 'Popen4 wait() fails sporadically with threads'
    updated_at = <Date 2007-07-11.12:46:38.000>
    user = 'https://bugs.python.org/tskogan'

    bugs.python.org fields:

    activity = <Date 2007-07-11.12:46:38.000>
    actor = 'gjb1002'
    assignee = 'nnorwitz'
    closed = True
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2005-04-15.14:27:54.000>
    creator = 'tskogan'
    dependencies = []
    files = ['1677', '1678', '1679', '1680']
    hgrepos = []
    issue_num = 1183780
    keywords = []
    message_count = 12.0
    messages = ['25022', '25023', '25024', '25025', '25026', '25027', '25028', '25029', '25030', '25031', '25032', '25033']
    nosy_count = 5.0
    nosy_names = ['loewis', 'nnorwitz', 'gjb1002', 'tskogan', 'atila-cheops']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1183780'
    versions = ['Python 2.4']

    @tskogan
    Copy link
    Mannequin Author

    tskogan mannequin commented Apr 15, 2005

    Calling wait() on a popen2.Popen4 object fails
    intermittently with the error

    Traceback (most recent call last):
      ...
      File "/usr/local/lib/python2.3/popen2.py", line 90, in wait
        pid, sts = os.waitpid(self.pid, 0)
    OSError: [Errno 10] No child processes

    when using threads.

    The problem seems to be a race condition when a thread
    calls wait() on a popen2.Popen4 object. This also apllies
    to Popen3 objects.

    The constructor of Popen4. calls _cleanup() which calls
    poll() which calls the system call waitpid() for all acitve
    child processes. If another thread calls poll() before the
    current thread calls wait() on it's child process and the
    child process has terminated, the child process is no
    longer waitable and the second call to wait() fails.

    Code to replicate this behavoir is attached in popen_bug.
    py.

    Solution: Popen4 and Popen3 should be threadsafe.

    Related modules: A seemingly related error occurs with
    Popen from the new subprocess module. Use the -s
    option in the popen_bug.py script to test this.

    Tested on Linux RedHat Enterprise 3 for Python 2.3.3,
    Python 2.3.5 and Python 2.4.1 and Solaris for Python 2.
    4.1. The error did not occur on a RedHat 7.3 machine
    with Python 2.3.5. See the attached file popen_bug.py for
    details on the platforms.

    @tskogan tskogan mannequin closed this as completed Apr 15, 2005
    @tskogan tskogan mannequin assigned nnorwitz Apr 15, 2005
    @tskogan tskogan mannequin added the stdlib Python modules in the Lib dir label Apr 15, 2005
    @tskogan tskogan mannequin closed this as completed Apr 15, 2005
    @tskogan tskogan mannequin assigned nnorwitz Apr 15, 2005
    @tskogan tskogan mannequin added the stdlib Python modules in the Lib dir label Apr 15, 2005
    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Mar 23, 2006

    Logged In: YES
    user_id=33168

    The attached patch fixes the problem for me. It also
    addresses another issue where wait could be called from
    outside the popen2 module. I'm not sure this is the best
    solution. I'm not sure there really is a good solution.
    Perhaps it's best to allow an exception to be raised?

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 24, 2006

    Logged In: YES
    user_id=21627

    I don't understand why you are setting self.sts to 0 if wait
    fails: most likely, there was a simultaneous call to .poll,
    which should have set self.sts to the real return value. So
    we should return that instead.

    I think the whole issue can be avoid if we use resurrection:
    If __del__ would put unwaited objects into _active, rather
    than __init__, it would not happen that _cleanup polls a pid
    which a thread still intends to wait for. In fact, it would
    be sufficient to only put the pid into _active (avoiding the
    need for resurrection).

    If then a thread calls poll explicitly, and another calls
    wait, they deserve to lose (with ECHILD). I would claim the
    same error exists if part of the application calls
    os.wait3,4, and another calls .wait later - they also
    deserve the exception.

    With that approach, I don't think further thread
    synchronization would be needed.

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Mar 24, 2006

    Logged In: YES
    user_id=33168

    I agree with your comment about setting self.sts to 0. That
    was the problem I alluded to on python-dev.

    Although I dislike __del__, this does seem like an
    appropriate place to do the modification of _active.

    Note that currently all os.error's are swallowed in poll().
    I'm not sure if that was the best idea, but that's the
    current interface. wait() does *not* catch any exceptions.

    I wasn't really sure what to do about threads. The threads
    could always manage their calls into a popen object like you
    propose (don't try to handle simultaneous calls to poll and
    wait). Another question I had was if popen should be
    deprecated in favor of subprocess?

    I've attached a patch which I think implements your
    suggestion. It also seems to fix all the problems.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 24, 2006

    Logged In: YES
    user_id=21627

    This looks all fine.

    As a further issue, I think _cleanup should also clean
    processes which already have been waited on. So if waitpid
    gives ECHILD (in _cleanup), I think the object should get
    removed from _active - otherwise, it would stay there
    forever. Of course, care is then need to avoid __del__
    adding it back to _active.

    Putting these all together, I propose v3 of the patch.

    Another aspect that puzzles me is the repeated test that
    waitpid() really returns the pid you asked for. How could it
    not? If it fails, you get an os.error.

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Mar 24, 2006

    Logged In: YES
    user_id=33168

    It makes sense to remove from _active on ECHILD.

    I wondered the same thing about waitpid(), but left it as it
    was. I don't believe it's possible for waitpid to return
    any pid other than what you ask for unless the O/S is very,
    very broken.

    This patch is fine with me, feel free to check it in. BTW,
    nice comments and precondition checks in the test.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 24, 2006

    Logged In: YES
    user_id=21627

    Committed as 43286. I also added .cmd to Popen4.

    @atila-cheops
    Copy link
    Mannequin

    atila-cheops mannequin commented Apr 10, 2006

    Logged In: YES
    user_id=1276121

    see patch bpo-1467770 for subprocess.py library module

    @gjb1002
    Copy link
    Mannequin

    gjb1002 mannequin commented Jul 10, 2007

    Did this get fixed in subprocess.py? The patches all seem to be for popen2.

    I have been observing similar problems in subprocess.py, so I downloaded the test script and ran it with the -s option. It didn't work out of the box, I had to pass "shell=True" to the subprocess.Popen before it did anything at all. Which also made me wonder if the subprocess variant of this problem got forgotten.

    Having done that, I have observed failures using Python 2.4.4 on Red Hat EL3 Linux, and also using Python 2.4.3 on Red Hat EL4 linux. Most of the time it works, sometimes it hangs forever, and sometimes we get something that look like this:

    Started 20 threads
    Exception in thread Thread-19:
    Traceback (most recent call last):
      File "/usr/lib/python2.4/threading.py", line 442, in __bootstrap
        self.run()
      File "popen_bug.py", line 53, in run
        pipe.wait()
      File "/usr/lib/python2.4/subprocess.py", line 1007, in wait
        pid, sts = os.waitpid(self.pid, 0)
    OSError: [Errno 10] No child processes

    P.S. Googling for "[Errno 10] No child processes" suggests others have this problem. There have been long discussions on the Zope list as to why some people on Linux get exceptions that look like this, for example.

    @gjb1002
    Copy link
    Mannequin

    gjb1002 mannequin commented Jul 10, 2007

    As an additional note, I have also reproduced the popen problem using Python 2.4.4, though only once.
    It seems to occur more frequently in subprocess.py.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jul 10, 2007

    If you are seeing a bug in subprocess, please report it separately. This one has been fixed.

    Please don't assume that it is the "same" problem as the one reported here, unless you have a working patch that proves that it is indeed a similar problem.

    @gjb1002
    Copy link
    Mannequin

    gjb1002 mannequin commented Jul 11, 2007

    From the original description:

    "Related modules: A seemingly related error occurs with
    Popen from the new subprocess module. Use the -s
    option in the popen_bug.py script to test this."

    I can be confident it is the same problem as I have used the exact
    same script to reproduce it, and got the exact same stack trace.

    As I added, I have also reproduced the "popen" problem using Python 2.4.4 once, though
    this is much harder to reproduce. So I claim this is not in fact fixed in all cases.

    I can report separately if you really want, but it seems odd when the symptoms and test case
    are the same.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    0 participants