Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_asyncio: test_subprocess_send_signal hangs on Fedora builders #65446

Closed
opoplawski mannequin opened this issue Apr 15, 2014 · 12 comments
Closed

test_asyncio: test_subprocess_send_signal hangs on Fedora builders #65446

opoplawski mannequin opened this issue Apr 15, 2014 · 12 comments
Labels
tests Tests in the Lib/test dir topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@opoplawski
Copy link
Mannequin

opoplawski mannequin commented Apr 15, 2014

BPO 21247
Nosy @gvanrossum, @vstinner, @socketpair, @1st1
Files
  • test_signal.out: test strace
  • test_send_signal.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-07-17.21:51:47.556>
    created_at = <Date 2014-04-15.21:33:52.570>
    labels = ['type-bug', 'tests', 'expert-asyncio']
    title = 'test_asyncio: test_subprocess_send_signal hangs on Fedora builders'
    updated_at = <Date 2015-11-06.19:18:56.768>
    user = 'https://bugs.python.org/opoplawski'

    bugs.python.org fields:

    activity = <Date 2015-11-06.19:18:56.768>
    actor = 'socketpair'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-07-17.21:51:47.556>
    closer = 'vstinner'
    components = ['Tests', 'asyncio']
    creation = <Date 2014-04-15.21:33:52.570>
    creator = 'opoplawski'
    dependencies = []
    files = ['34899', '35973']
    hgrepos = []
    issue_num = 21247
    keywords = ['patch', 'buildbot']
    message_count = 12.0
    messages = ['216392', '216397', '216407', '216409', '216434', '216500', '223235', '223349', '223377', '223378', '223450', '254207']
    nosy_count = 6.0
    nosy_names = ['gvanrossum', 'vstinner', 'socketpair', 'python-dev', 'yselivanov', 'opoplawski']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue21247'
    versions = ['Python 3.4']

    @opoplawski
    Copy link
    Mannequin Author

    opoplawski mannequin commented Apr 15, 2014

    Trying to build Python 3.4.0 for Fedora we are seeing test_asyncio test_subprocess_send_signal hang every time, on all architectures. Unfortunately I cannot reproduce this locally. These builds are done inside of chroots, and the host has the kernel version 3.12.8-300.fc20 which is used for all build targets. We see hangs building for Fedora Rawhide and RHEL 7. We do *not* see hangs on our COPR builders which among other possible differences use RHEL6 hosts with kernel 2.6.32-358.el6.

    I've attached an strace of the hanging test. The calling process seems to be stuck in epoll_wait().

    Tried using the watchdog patch from issue bpo-19652 but that doesn't seem to manage to kill things. In fact, the tests are never killed but the 1 hour timeout in the test runner.

    @opoplawski opoplawski mannequin added the type-bug An unexpected behavior, bug, or error label Apr 15, 2014
    @opoplawski
    Copy link
    Mannequin Author

    opoplawski mannequin commented Apr 15, 2014

    Hmm, looking at things a little closer, it looks like the SIGHUP is arriving very early, perhaps too early?

    @opoplawski
    Copy link
    Mannequin Author

    opoplawski mannequin commented Apr 15, 2014

    It may also be possible that something has set the SIGHUP handler to SIG_IGN when the test is run.

    @opoplawski
    Copy link
    Mannequin Author

    opoplawski mannequin commented Apr 15, 2014

    Looks like in the Fedora koji builds, the SIGHUP sigaction is set to SIG_IGN, which causes the processes that the python tests are trying to kill with SIGHUP not to die. Perhaps the koji builders should not be doing that, perhaps the python tests should reset the SIGHUP sigaction to SIG_DFL.

    @vstinner
    Copy link
    Member

    This issue is a race condition or bug in the unit test, not in asyncio. The test doesn't check if echo.py is running, if Python started.

    Python doesn't setup an handler for SIGHUP, it uses the current handler. On my Fedora 20, it looks to be "SIG_DFL":

    Python 3.5.0a0 (default:795d90c7820d+, Apr 16 2014, 00:18:50) 
    [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import signal
    >>> signal.getsignal(signal.SIGHUP)
    <Handlers.SIG_DFL: 0>

    Extract of the attached strace:
    ---
    clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f9d1e8cba10) = 24719
    Process 24719 attached
    ...
    [pid 24719] rt_sigaction(SIGHUP, NULL, {SIG_IGN, [], 0}, 8) = 0
    ...
    [pid 24625] kill(24719, SIGHUP) = 0
    [pid 24719] --- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=24625, si_uid=1000} ---
    ---

    So the child process has SIGHUP configured to SIG_IGN on your platform.

    @opoplawski
    Copy link
    Mannequin Author

    opoplawski mannequin commented Apr 16, 2014

    We have determined that the koji builder is indeed setting the SIGHUP sigaction to SIG_IGN, which the python test is inheriting, and are working on trying to get that fixed. However, it may be worth considering something like pexpect/pexpect@1fbfddf in the python tests to ensure that the test run properly in situations like this (I can imagine someone running them under "nohup").

    @vstinner vstinner added tests Tests in the Lib/test dir topic-asyncio labels Jun 6, 2014
    @vstinner
    Copy link
    Member

    Here is a patch implementing a basic synchronization between the parent and the child processing, to wait until the child is sleeping.

    Can you please try this patch?

    If it doesn't work, we might add a small sleep of 500 ms after the readline().

    @opoplawski
    Copy link
    Mannequin Author

    opoplawski mannequin commented Jul 17, 2014

    That appears to work. Thanks!

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 17, 2014

    New changeset 651475d67225 by Victor Stinner in branch '3.4':
    Issue bpo-21247: Fix a race condition in test_send_signal() of asyncio
    http://hg.python.org/cpython/rev/651475d67225

    New changeset 45e8eb53edbc by Victor Stinner in branch 'default':
    (Merge 3.4) Issue bpo-21247: Fix a race condition in test_send_signal() of asyncio
    http://hg.python.org/cpython/rev/45e8eb53edbc

    @vstinner
    Copy link
    Member

    That appears to work. Thanks!

    Cool, I commited my enhancement of the unit test.

    @opoplawski
    Copy link
    Mannequin Author

    opoplawski mannequin commented Jul 18, 2014

    I'm really sorry, I thought I had done the test build properly, but a second attempt has resulted in the same hang:

    http://koji.fedoraproject.org/koji/taskinfo?taskID=7165208

    So I don't think it does the trick.

    @socketpair
    Copy link
    Mannequin

    socketpair mannequin commented Nov 6, 2015

    Bug still reproduced. Jenkins running from init.d use /usr/bin/daemon. This mean SIGHUP will be in SIG_IGN state. Since echo.py does not setup sighup handler, sighup will be equivalent of SIGKILL. So, why not to use, say, SIGTERM instead? After such change all tests passed.

    If not, signal handling tests should reset signal handling to SIG_DFL.

    Please reopen

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    tests Tests in the Lib/test dir topic-asyncio type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant