Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 475: fnctl functions are not retried if interrupted by a signal (EINTR) #79370

Closed
akeskimo mannequin opened this issue Nov 8, 2018 · 20 comments
Closed

PEP 475: fnctl functions are not retried if interrupted by a signal (EINTR) #79370

akeskimo mannequin opened this issue Nov 8, 2018 · 20 comments
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@akeskimo
Copy link
Mannequin

akeskimo mannequin commented Nov 8, 2018

BPO 35189
Nosy @vstinner, @aixtools, @miss-islington, @akeskimo, @nierob
PRs
  • bpo-35189: Retry fnctl calls on EINTR #10413
  • bpo-35189: Fix eintr_tester.py #10637
  • [3.7] bpo-35189: Retry fnctl calls on EINTR (GH-10413) #10678
  • [3.6] bpo-35189: Retry fnctl calls on EINTR (GH-10413) (GH-10678) #10685
  • bpo-35189, bpo-35316: Make test_eintr less strict #10782
  • [3.7] bpo-35189, bpo-35316: Make test_eintr less strict (GH-10782) #10784
  • [3.6] bpo-35189, bpo-35316: Make test_eintr less strict (GH-10782) #10785
  • bpo-35633: test_lockf() fails with "PermissionError: [Errno 13] Permission denied" on AIX #11424
  • bpo-35633: test_lockf() fails with "PermissionError: [Errno 13] Permission denied" on AIX #11424
  • bpo-35633: test_lockf() fails with "PermissionError: [Errno 13] Permission denied" on AIX #11424
  • bpo-35633: test_lockf() fails with "PermissionError: [Errno 13] Permission denied" on AIX #11424
  • [3.7] bpo-35633: test_lockf() fails with "PermissionError: [Errno 13] Permission denied" on AIX (GH-11424) #11858
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-11-23.18:01:48.601>
    created_at = <Date 2018-11-08.12:45:30.367>
    labels = ['3.8', '3.7', 'library', 'type-crash']
    title = 'PEP 475: fnctl functions are not retried if interrupted by a signal (EINTR)'
    updated_at = <Date 2019-02-14.18:41:41.577>
    user = 'https://github.com/akeskimo'

    bugs.python.org fields:

    activity = <Date 2019-02-14.18:41:41.577>
    actor = 'miss-islington'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-11-23.18:01:48.601>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2018-11-08.12:45:30.367>
    creator = 'akeskimo'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 35189
    keywords = ['patch']
    message_count = 20.0
    messages = ['329469', '329470', '329471', '329472', '329473', '330208', '330335', '330339', '330348', '330350', '330645', '330649', '330652', '332580', '332581', '332591', '332624', '333021', '335551', '335554']
    nosy_count = 5.0
    nosy_names = ['vstinner', 'Michael.Felt', 'miss-islington', 'akeskimo', 'nierob']
    pr_nums = ['10413', '10637', '10678', '10685', '10782', '10784', '10785', '11424', '11424', '11424', '11424', '11858']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue35189'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    @akeskimo
    Copy link
    Mannequin Author

    akeskimo mannequin commented Nov 8, 2018

    According to https://www.python.org/dev/peps/pep-0475/ the EINTR interruption should be retried automatically, but somehow it does not work and the exception is raised:

    2018-11-05 05:21:35,257 ERROR:storage(23491): Remote storage operation failed (request: '{  'excludeSubModules': None,
       'storageLocation': 'qt/qtdatavis3d/68faa5b00f73096eb096c6acdfce76b052ca20b9/LinuxUbuntu_18_04x86_64LinuxQEMUarm64GCCqtci-linux-Ubuntu-18.04-x86_64-a6
    c9f7Release/ac4280d182ec320eaf0e68efaeeeb6be14b9689f/test_1542834179',
       'type': 3}')
    Traceback (most recent call last):
      File "src/storage.py", line 507, in handle
        self.handle_upload_artifact(message)
      File "src/storage.py", line 437, in handle_upload_artifact
        log.info("upload of %s to %s", uploadType, message.storageLocation)
      File "/usr/lib/python3.6/logging/__init__.py", line 1306, in info
        self._log(INFO, msg, args, **kwargs)
      File "/usr/lib/python3.6/logging/__init__.py", line 1442, in _log
        self.handle(record)
      File "/usr/lib/python3.6/logging/__init__.py", line 1452, in handle
        self.callHandlers(record)
      File "/usr/lib/python3.6/logging/__init__.py", line 1514, in callHandlers
        hdlr.handle(record)
      File "/usr/lib/python3.6/logging/__init__.py", line 861, in handle
        self.acquire()
      File "/home/vmbuilder/qt-ci/src/application.py", line 151, in acquire
        fcntl.lockf(self._lock_fd, fcntl.LOCK_EX)
    InterruptedError: [Errno 4] Interrupted system call

    @akeskimo akeskimo mannequin added stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump labels Nov 8, 2018
    @vstinner
    Copy link
    Member

    vstinner commented Nov 8, 2018

    You're right, it should, but the fcntl module hasn't been patched. Are you interested to work or on patch, or do you want to me to find someone to do it?

    @akeskimo
    Copy link
    Mannequin Author

    akeskimo mannequin commented Nov 8, 2018

    My colleague has made a prospective fix:

    nierob@3b76b88

    @vstinner
    Copy link
    Member

    vstinner commented Nov 8, 2018

    nierob@3b76b88

    Oh, nice! Please rebase this change on the master branch and reuse "int async_err = 0;" pattern from Modules/posixmodule.c. You must not raise a new exception if PyErr_CheckSignals() raised an exception, something like:

        return (!async_err) ? posix_error() : NULL;
    

    @nierob
    Copy link
    Mannequin

    nierob mannequin commented Nov 8, 2018

    PR waits for CLA

    @vstinner vstinner changed the title EINTR is not being retried PEP 475: fnctl functions are not retried if interrupted by a signal (EINTR) Nov 8, 2018
    @vstinner vstinner added 3.7 (EOL) end of life 3.8 (EOL) end of life labels Nov 8, 2018
    @vstinner
    Copy link
    Member

    New changeset aac1f81 by Victor Stinner in branch 'master':
    bpo-35189: Fix eintr_tester.py (GH-10637)
    aac1f81

    @vstinner
    Copy link
    Member

    New changeset b409ffa by Victor Stinner (nierob) in branch 'master':
    bpo-35189: Retry fnctl calls on EINTR (GH-10413)
    b409ffa

    @vstinner
    Copy link
    Member

    New changeset 56742f1 by Victor Stinner in branch '3.7':
    [3.7] bpo-35189: Retry fnctl calls on EINTR (GH-10413) (GH-10678)
    56742f1

    @vstinner
    Copy link
    Member

    New changeset eef813b by Victor Stinner in branch '3.6':
    [3.7] bpo-35189: Retry fnctl calls on EINTR (GH-10413) (GH-10678) (GH-10685)
    eef813b

    @vstinner
    Copy link
    Member

    Thanks Aapo Samuli Keskimolo for the bug report and thanks Nierob for the fix!

    @vstinner
    Copy link
    Member

    New changeset 2956bff by Victor Stinner in branch 'master':
    bpo-35189, bpo-35316: Make test_eintr less strict (GH-10782)
    2956bff

    @miss-islington
    Copy link
    Contributor

    New changeset 2fa5b2a by Miss Islington (bot) in branch '3.7':
    bpo-35189, bpo-35316: Make test_eintr less strict (GH-10782)
    2fa5b2a

    @miss-islington
    Copy link
    Contributor

    New changeset 833a706 by Miss Islington (bot) in branch '3.6':
    bpo-35189, bpo-35316: Make test_eintr less strict (GH-10782)
    833a706

    @aixtools
    Copy link
    Contributor

    I have not looked at 3.6, but I have bisected the 3.7 and 3.8 branches for AIX. I get:

    On 3.7 Branch:
    Bisecting: 0 revisions left to test after this (roughly 0 steps)
    [56742f1] [3.7] bpo-35189: Retry fnctl calls on EINTR (GH-10413) (GH-10678)

    On 3.8 Branch:
    Bisecting: 0 revisions left to test after this (roughly 0 steps)
    [b409ffa] bpo-35189: Retry fnctl calls on EINTR (GH-10413)

    So, my assumption is that the PR-10413 is not 100% correct for AIX.

    Will look further, but also request - will this issue reopen, or do we need a new issue?

    @aixtools
    Copy link
    Contributor

    Forgot to include the test failure message:

    ======================================================================
    FAIL: test_all (test.test_eintr.EINTRTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/data/prj/python/git/python3-3.8/Lib/test/test_eintr.py", line 18, in test_all
        script_helper.assert_python_ok("-u", tester)
      File "/data/prj/python/git/python3-3.8/Lib/test/support/script_helper.py", line 157, in assert_python_ok
        return _assert_python(True, *args, **env_vars)
      File "/data/prj/python/git/python3-3.8/Lib/test/support/script_helper.py", line 143, in _assert_python
        res.fail(cmd_line)
      File "/data/prj/python/git/python3-3.8/Lib/test/support/script_helper.py", line 70, in fail
        raise AssertionError("Process return code is %d\n"
    AssertionError: Process return code is 1
    command line: ['/data/prj/python/python3-3.8/python', '-X', 'faulthandler', '-I', '-u', '/data/prj/python/git/python3-3.8/Lib/test/eintrdata/eintr_tester.py']

    stdout:
    ---

    ---

    stderr:
    ---
    .E......sss.............
    ======================================================================
    ERROR: test_lockf (main.FNTLEINTRTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/data/prj/python/git/python3-3.8/Lib/test/eintrdata/eintr_tester.py", line 522, in test_lockf
        self._lock(fcntl.lockf, "lockf")
      File "/data/prj/python/git/python3-3.8/Lib/test/eintrdata/eintr_tester.py", line 507, in _lock
        lock_func(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
    PermissionError: [Errno 13] Permission denied

    Ran 24 tests in 9.692s

    FAILED (errors=1, skipped=3)
    ---

    ----------------------------------------------------------------------

    Ran 1 test in 10.404s

    FAILED (failures=1)
    test test_eintr failed
    test_eintr failed

    == Tests result: FAILURE ==

    1 test failed:
    test_eintr

    Total duration: 10 sec 645 ms

    @aixtools
    Copy link
    Contributor

    The "improved" output after getting back to "latest" commit:

    == CPython 3.8.0a0 (heads/master-dirty:34ae04f74d, Dec 27 2018, 14:05:08) [C]
    == AIX-1-00C291F54C00-powerpc-32bit big-endian
    == cwd: /data/prj/python/python3-3.8/build/test_python_13566116
    == CPU count: 8
    == encodings: locale=ISO8859-1, FS=iso8859-1
    Run tests sequentially
    0:00:00 [1/1] test_eintr
    test_all (test.test_eintr.EINTRTests) ...
    --- run eintr_tester.py ---
    test_flock (main.FNTLEINTRTest) ... ok
    test_lockf (main.FNTLEINTRTest) ... ERROR
    test_read (main.OSEINTRTest) ... ok
    test_wait (main.OSEINTRTest) ... ok
    test_wait3 (main.OSEINTRTest) ... ok
    test_wait4 (main.OSEINTRTest) ... ok
    test_waitpid (main.OSEINTRTest) ... ok
    test_write (main.OSEINTRTest) ... ok
    test_devpoll (main.SelectEINTRTest) ... skipped 'need select.devpoll'
    test_epoll (main.SelectEINTRTest) ... skipped 'need select.epoll'
    test_kqueue (main.SelectEINTRTest) ... skipped 'need select.kqueue'
    test_poll (main.SelectEINTRTest) ... ok
    test_select (main.SelectEINTRTest) ... ok
    test_sigtimedwait (main.SignalEINTRTest) ... ok
    test_sigwaitinfo (main.SignalEINTRTest) ... ok
    test_accept (main.SocketEINTRTest) ... ok
    test_open (main.SocketEINTRTest) ... ok
    test_os_open (main.SocketEINTRTest) ... ok
    test_recv (main.SocketEINTRTest) ... ok
    test_recvmsg (main.SocketEINTRTest) ... ok
    test_send (main.SocketEINTRTest) ... ok
    test_sendall (main.SocketEINTRTest) ... ok
    test_sendmsg (main.SocketEINTRTest) ... ok
    test_sleep (main.TimeEINTRTest) ... ok

    ======================================================================
    ERROR: test_lockf (main.FNTLEINTRTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/data/prj/python/git/python3-3.8/Lib/test/eintrdata/eintr_tester.py", line 522, in test_lockf
        self._lock(fcntl.lockf, "lockf")
      File "/data/prj/python/git/python3-3.8/Lib/test/eintrdata/eintr_tester.py", line 507, in _lock
        lock_func(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
    PermissionError: [Errno 13] Permission denied

    Ran 24 tests in 8.822s

    FAILED (errors=1, skipped=3)
    --- eintr_tester.py completed: exit code 1 ---
    FAIL

    ======================================================================
    FAIL: test_all (test.test_eintr.EINTRTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/data/prj/python/git/python3-3.8/Lib/test/test_eintr.py", line 31, in test_all
        self.fail("eintr_tester.py failed")
    AssertionError: eintr_tester.py failed

    Ran 1 test in 9.392s

    FAILED (failures=1)
    test test_eintr failed
    test_eintr failed

    == Tests result: FAILURE ==

    1 test failed:
    test_eintr

    Total duration: 9 sec 609 ms
    Tests result: FAILURE

    @aixtools
    Copy link
    Contributor

    On 27/12/2018 15:48, Michael Felt wrote:

    Michael Felt <aixtools@felt.demon.nl> added the comment:

    The "improved" output after getting back to "latest" commit:

    == CPython 3.8.0a0 (heads/master-dirty:34ae04f74d, Dec 27 2018, 14:05:08) [C]
    == AIX-1-00C291F54C00-powerpc-32bit big-endian
    == cwd: /data/prj/python/python3-3.8/build/test_python_13566116
    == CPU count: 8
    == encodings: locale=ISO8859-1, FS=iso8859-1
    Run tests sequentially
    0:00:00 [1/1] test_eintr
    test_all (test.test_eintr.EINTRTests) ...
    --- run eintr_tester.py ---
    test_flock (main.FNTLEINTRTest) ... ok
    test_lockf (main.FNTLEINTRTest) ... ERROR
    test_read (main.OSEINTRTest) ... ok
    test_wait (main.OSEINTRTest) ... ok
    test_wait3 (main.OSEINTRTest) ... ok
    test_wait4 (main.OSEINTRTest) ... ok
    test_waitpid (main.OSEINTRTest) ... ok
    test_write (main.OSEINTRTest) ... ok
    test_devpoll (main.SelectEINTRTest) ... skipped 'need select.devpoll'
    test_epoll (main.SelectEINTRTest) ... skipped 'need select.epoll'
    test_kqueue (main.SelectEINTRTest) ... skipped 'need select.kqueue'
    test_poll (main.SelectEINTRTest) ... ok
    test_select (main.SelectEINTRTest) ... ok
    test_sigtimedwait (main.SignalEINTRTest) ... ok
    test_sigwaitinfo (main.SignalEINTRTest) ... ok
    test_accept (main.SocketEINTRTest) ... ok
    test_open (main.SocketEINTRTest) ... ok
    test_os_open (main.SocketEINTRTest) ... ok
    test_recv (main.SocketEINTRTest) ... ok
    test_recvmsg (main.SocketEINTRTest) ... ok
    test_send (main.SocketEINTRTest) ... ok
    test_sendall (main.SocketEINTRTest) ... ok
    test_sendmsg (main.SocketEINTRTest) ... ok
    test_sleep (main.TimeEINTRTest) ... ok

    ======================================================================
    ERROR: test_lockf (main.FNTLEINTRTest)
    ----------------------------------------------------------------------

    > Traceback (most recent call last):
    >   File "/data/prj/python/git/python3-3.8/Lib/test/eintrdata/eintr_tester.py", line 522, in test_lockf
    >     self._lock(fcntl.lockf, "lockf")
    >   File "/data/prj/python/git/python3-3.8/Lib/test/eintrdata/eintr_tester.py", line 507, in _lock
    >     lock_func(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
    > PermissionError: [Errno 13] Permission denied
    >
    > 

    Ran 24 tests in 8.822s

    FAILED (errors=1, skipped=3)
    --- eintr_tester.py completed: exit code 1 ---
    FAIL

    ======================================================================
    FAIL: test_all (test.test_eintr.EINTRTests)
    ----------------------------------------------------------------------

    > Traceback (most recent call last):
    >   File "/data/prj/python/git/python3-3.8/Lib/test/test_eintr.py", line 31, in test_all
    >     self.fail("eintr_tester.py failed")
    > AssertionError: eintr_tester.py failed
    >
    > 

    Ran 1 test in 9.392s

    FAILED (failures=1)
    test test_eintr failed
    test_eintr failed

    == Tests result: FAILURE ==

    1 test failed:
    test_eintr

    Total duration: 9 sec 609 ms
    Tests result: FAILURE

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue35189\>


    I have been doing reading and debugging.

    Question: does mode "wb" imply also open for reading? Both Freebsd and
    AIX man pages specify that a file needs to be open for a shared lock to
    even be considered.

    Further, AIX talks about "enforced" and "advisory" locks, as well as
    "read" and "write" locks. In much older documentation I recall the names
    "simple" and "complex" locks. From memory, advisory (aka simple) locks
    tend to be exclusive in nature. Shared is only for reading, writing is
    always exclusive. I'll have to dig for how "wait" is actually handled -
    and work to not confuse "in memory" locks (for multi-threaded locking of
    variables) with "file-locking".

    Regards,

    Michael

    @vstinner
    Copy link
    Member

    vstinner commented Jan 4, 2019

    Michael created bpo-35633: test_eintr: test_lockf() fails with "PermissionError: [Errno 13] Permission denied" on AIX.

    @miss-islington
    Copy link
    Contributor

    New changeset b94d4be by Miss Islington (bot) (Michael Felt) in branch 'master':
    bpo-35633: test_lockf() fails with "PermissionError: [Errno 13] Permission denied" on AIX (GH-11424)
    b94d4be

    @miss-islington
    Copy link
    Contributor

    New changeset 7e618f3 by Miss Islington (bot) in branch '3.7':
    bpo-35633: test_lockf() fails with "PermissionError: [Errno 13] Permission denied" on AIX (GH-11424)
    7e618f3

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 (EOL) end of life stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants