Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiprocessing: serialization must ensure that contexts are compatible (the same) #77377

Closed
arcivanov mannequin opened this issue Apr 1, 2018 · 14 comments
Closed

multiprocessing: serialization must ensure that contexts are compatible (the same) #77377

arcivanov mannequin opened this issue Apr 1, 2018 · 14 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir topic-multiprocessing type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@arcivanov
Copy link
Mannequin

arcivanov mannequin commented Apr 1, 2018

BPO 33196
Nosy @pitrou, @vstinner, @taleinat, @applio, @arcivanov, @augustogoulart
Files
  • test_lock_sigsegv.py
  • testing_on_fedora.png
  • coredump: coredump (Fedora 29)
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2018-04-01.05:58:11.085>
    labels = ['3.8', '3.7', 'library', 'type-crash']
    title = 'multiprocessing: serialization must ensure that contexts are compatible (the same)'
    updated_at = <Date 2018-11-14.17:28:52.507>
    user = 'https://github.com/arcivanov'

    bugs.python.org fields:

    activity = <Date 2018-11-14.17:28:52.507>
    actor = 'davin'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2018-04-01.05:58:11.085>
    creator = 'arcivanov'
    dependencies = []
    files = ['47510', '47931', '47933']
    hgrepos = []
    issue_num = 33196
    keywords = []
    message_count = 11.0
    messages = ['314762', '314792', '329491', '329719', '329845', '329892', '329893', '329898', '329908', '329909', '329913']
    nosy_count = 6.0
    nosy_names = ['pitrou', 'vstinner', 'taleinat', 'davin', 'arcivanov', 'augustogoulart']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'needs patch'
    status = 'open'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue33196'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    Linked PRs

    @arcivanov
    Copy link
    Mannequin Author

    arcivanov mannequin commented Apr 1, 2018

    While working on GH gevent/gevent#993 I've encountered a stall trying to read from an mp.Queue passed to mp.Process's target as an argument. Trying to print out the lock state in child process I encountered as SEGV in Lock's __repr__. I originally thought it was due to gevent/greenlet stack magic, but it wasn't.

    This happens when fork context Queue (default) is used with spawn context Process (obvious stupidity on my part, alas shouldn't crash).

    Python 3.6.4 from PyEnv
    Fedora 27

    $ python test_lock_sigsegv.py 
    Parent r_q: <Lock(owner=None)>, <Lock(owner=None)>, <BoundedSemaphore(value=2147483647, maxvalue=2147483647)>
    -11
    
    Program terminated with signal SIGSEGV, Segmentation fault.
    #0  __new_sem_getvalue (sem=0x7fc877f54000, sval=sval@entry=0x7fffb130db9c) at sem_getvalue.c:38
    38        *sval = atomic_load_relaxed (&isem->data) & SEM_VALUE_MASK;
    ...
    #0  __new_sem_getvalue (sem=0x7fc877f54000, sval=sval@entry=0x7fffb130db9c) at sem_getvalue.c:38
    #1  0x00007f1116aeb202 in semlock_getvalue (self=<optimized out>) at /tmp/python-build.20171219170845.6548/Python-3.6.4/Modules/_multiprocessing/semaphore.c:531
    

    At a minimum I think there should be a check trying to reduce arguments via incompatible context's process to prevent a SEGV.

    Test attached.

    @arcivanov arcivanov mannequin added stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump labels Apr 1, 2018
    @pitrou
    Copy link
    Member

    pitrou commented Apr 1, 2018

    Thanks for the report. Indeed I think it would be worth preventing this programmer error.

    @pitrou pitrou added 3.7 (EOL) end of life 3.8 only security fixes labels Apr 1, 2018
    @augustogoulart
    Copy link
    Mannequin

    augustogoulart mannequin commented Nov 9, 2018

    I couldn't reproduce the error on Debian 9 nor OSX, although I tried tweaking the test script a little bit to force the error. Arcadiy, did you tried reproducing the same issue in a different platform? Did someone report something similar in recent issues on gevent?

    @taleinat
    Copy link
    Contributor

    On Win10 I've also failed to reproduce the reported issue with the supplied script. I tried with Python versions 3.6.3, 3.7.0, and a recent build of the master branch (to be 3.8).

    Can someone try to reproduce this on Fedora?

    @augustogoulart
    Copy link
    Mannequin

    augustogoulart mannequin commented Nov 13, 2018

    I've tested on Fedora 29 server and also failed to reproduce the error.

    @arcivanov
    Copy link
    Mannequin Author

    arcivanov mannequin commented Nov 14, 2018

    @gus.goulart you have reproduced it. The screenshot showing -11 means the process dumped core. Because it's the child that dumps core, it's masked by abrt.

    Observe:

    $ python3 --version
    Python 3.7.1
    $ python3 ~/Downloads/test_lock_sigsegv.py 
    Parent r_q: <Lock(owner=None)>, <Lock(owner=None)>, <BoundedSemaphore(value=2147483647, maxvalue=2147483647)>
    -11
    $ abrt
    61bdd28 1x /usr/bin/python3.7 2018-11-14 04:18:06
    $ uname -a
    Linux myhost 4.18.17-300.fc29.x86_64 #1 SMP Mon Nov 5 17:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

    @arcivanov
    Copy link
    Mannequin Author

    arcivanov mannequin commented Nov 14, 2018

    @taleinat The above has been reproduced on Fedora 29.

    @vstinner
    Copy link
    Member

    At a minimum I think there should be a check trying to reduce arguments via incompatible context's process to prevent a SEGV.

    I'm not sure that I understand the bug. The reproducer script pass a multiprocessing.Queue to a child process and then the child crash when attempting to call multiprocessing.synchronize.Lock.__repr__().

    Does the child reuse a copy of the lock of the parent process? Or does the child create a new SemLock?

    I reproduced the bug on Fedora 26. I attached the child process in gdb. The crash occurs on sem_getvalue() in the child process.

    Program received signal SIGSEGV, Segmentation fault.
    0x00007f29a5156610 in sem_getvalue@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
    (gdb) where
    #0 0x00007f29a5156610 in sem_getvalue@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
    #1 0x00007f299c60e7bb in semlock_getvalue (self=0x7f299a95e2b0, _unused_ignored=0x0)
    at /home/haypo/prog/python/master/Modules/_multiprocessing/semaphore.c:541
    #2 0x0000000000434537 in _PyMethodDef_RawFastCallKeywords (method=0x7f299c8102e0 <semlock_methods+192>,
    self=<_multiprocessing.SemLock at remote 0x7f299a95e2b0>, args=0x7f299c5f47e8, nargs=0, kwnames=0x0) at Objects/call.c:629
    #3 0x0000000000607aff in _PyMethodDescr_FastCallKeywords (descrobj=<method_descriptor at remote 0x7f299ca42520>, args=0x7f299c5f47e0, nargs=1,
    kwnames=0x0) at Objects/descrobject.c:288
    #4 0x0000000000512f92 in call_function (pp_stack=0x7ffd3591f730, oparg=1, kwnames=0x0) at Python/ceval.c:4595
    (...)

    (gdb) py-bt
    Traceback (most recent call first):
      File "/home/haypo/prog/python/master/Lib/multiprocessing/synchronize.py", line 170, in __repr__
        elif self._semlock._get_value() == 1:
      File "/home/haypo/prog/python/master/test_lock_sigsegv.py", line 20, in child
        print("Child r_q: %r, %r, %r" % (r_q._rlock, r_q._wlock, r_q._sem), flush=True)
      File "/home/haypo/prog/python/master/Lib/multiprocessing/process.py", line 99, in run
        self._target(*self._args, **self._kwargs)
      File "/home/haypo/prog/python/master/Lib/multiprocessing/process.py", line 297, in _bootstrap
        self.run()
      File "/home/haypo/prog/python/master/Lib/multiprocessing/spawn.py", line 130, in _main
        return self._bootstrap()
      File "/home/haypo/prog/python/master/Lib/multiprocessing/spawn.py", line 629, in spawn_main
      File "<string>", line 1, in <module>

    @augustogoulart
    Copy link
    Mannequin

    augustogoulart mannequin commented Nov 14, 2018

    @vstinner, on Debian 9 I can see the problem as well but wasn't able to debug with the level of details you did. Could you please share the process you followed?

    What I found was:

    ./python -X dev test_lock_sigsegv.py
    Parent r_q: <Lock(owner=None)>, <Lock(owner=None)>, <BoundedSemaphore(value=2147483647, maxvalue=2147483647)>
    Fatal Python error: Segmentation fault

    Current thread 0x00007fab36124480 (most recent call first):
    File "/home/gus/Workspace/cpython/Lib/multiprocessing/synchronize.py", line 170 in __repr__
    File "/home/gus/Workspace/cpython/test_lock_sigsegv.py", line 17 in child
    File "/home/gus/Workspace/cpython/Lib/multiprocessing/process.py", line 99 in run
    File "/home/gus/Workspace/cpython/Lib/multiprocessing/process.py", line 297 in _bootstrap
    File "/home/gus/Workspace/cpython/Lib/multiprocessing/spawn.py", line 130 in _main
    File "/home/gus/Workspace/cpython/Lib/multiprocessing/spawn.py", line 117 in spawn_main
    File "<string>", line 1 in <module>
    -11

    Using GDB:

    (gdb) set follow-fork-mode child
    (gdb) run test_lock_sigsegv.py
    Starting program: /home/gus/Workspace/cpython/python test_lock_sigsegv.py
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    Parent r_q: <Lock(owner=None)>, <Lock(owner=None)>, <BoundedSemaphore(value=2147483647, maxvalue=2147483647)>
    [New process 4941]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    process 4941 is executing new program: /home/gus/Workspace/cpython/python
    -11
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    [Inferior 2 (process 4941) exited normally]
    (gdb) where
    No stack.
    (gdb) py-bt
    Unable to locate python frame
    (gdb)

    @arcivanov
    Copy link
    Mannequin Author

    arcivanov mannequin commented Nov 14, 2018

    @vstinner

    I'm not sure that I understand the bug.

    The bug is, if a user makes an error and passes a Queue from context 'fork' to a child that is spawned using 'spawn', the passed Queue is, for obvious reasons, broken.

    The 'print("Child r_q: %r, %r, %r" % (r_q._rlock, r_q._wlock, r_q._sem), flush=True)' is simply a demonstration of a broken state of the SemLock observed in the child.

    The expected fix would be to stop the mixed context use of MP objects on the API level (ValueError?) or at least prevent a segfault.

    @vstinner vstinner changed the title SEGV in mp.synchronize.Lock.__repr__ in spawn'ed proc if ctx mismatched multiprocessing: serialization must ensure that contexts are compatible (the same) Nov 14, 2018
    @vstinner
    Copy link
    Member

    The bug is, if a user makes an error and passes a Queue from context 'fork' to a child that is spawned using 'spawn', the passed Queue is, for obvious reasons, broken.

    Ok. I rewrote the issue title.

    @albanD
    Copy link
    Contributor

    albanD commented Aug 1, 2023

    cc @vstinner I sent a PR for this, are you the right person to review it?

    @vstinner
    Copy link
    Member

    @albanD:

    cc @vstinner I sent a PR for this, are you the right person to review it?

    I'm not available to review your PR. Maybe @pitrou can review it.

    pitrou added a commit that referenced this issue Aug 23, 2023
    …cess before serializing it (#107275)
    
    Ensure multiprocessing SemLock is valid for spawn Process before serializing it.
    
    Creating a multiprocessing SemLock with a fork context, and then trying to pass it to a spawn-created Process, would segfault if not detected early.
    
    ---------
    
    Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
    Co-authored-by: Antoine Pitrou <pitrou@free.fr>
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 23, 2023
    …ed Process before serializing it (pythonGH-107275)
    
    Ensure multiprocessing SemLock is valid for spawn Process before serializing it.
    
    Creating a multiprocessing SemLock with a fork context, and then trying to pass it to a spawn-created Process, would segfault if not detected early.
    
    ---------
    
    (cherry picked from commit 1700d34)
    
    Co-authored-by: albanD <desmaison.alban@gmail.com>
    Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
    Co-authored-by: Antoine Pitrou <pitrou@free.fr>
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 23, 2023
    …ed Process before serializing it (pythonGH-107275)
    
    Ensure multiprocessing SemLock is valid for spawn Process before serializing it.
    
    Creating a multiprocessing SemLock with a fork context, and then trying to pass it to a spawn-created Process, would segfault if not detected early.
    
    ---------
    
    (cherry picked from commit 1700d34)
    
    Co-authored-by: albanD <desmaison.alban@gmail.com>
    Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
    Co-authored-by: Antoine Pitrou <pitrou@free.fr>
    pitrou added a commit that referenced this issue Aug 23, 2023
    …sed Process before serializing it (GH-107275) (#108378)
    
    gh-77377: Ensure multiprocessing SemLock is valid for spawn-based Process before serializing it (GH-107275)
    
    Ensure multiprocessing SemLock is valid for spawn Process before serializing it.
    
    Creating a multiprocessing SemLock with a fork context, and then trying to pass it to a spawn-created Process, would segfault if not detected early.
    
    ---------
    
    (cherry picked from commit 1700d34)
    
    Co-authored-by: albanD <desmaison.alban@gmail.com>
    Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
    Co-authored-by: Antoine Pitrou <pitrou@free.fr>
    Yhg1s pushed a commit that referenced this issue Aug 23, 2023
    …sed Process before serializing it (GH-107275) (#108377)
    
    gh-77377: Ensure multiprocessing SemLock is valid for spawn-based Process before serializing it (GH-107275)
    
    Ensure multiprocessing SemLock is valid for spawn Process before serializing it.
    
    Creating a multiprocessing SemLock with a fork context, and then trying to pass it to a spawn-created Process, would segfault if not detected early.
    
    ---------
    
    (cherry picked from commit 1700d34)
    
    Co-authored-by: albanD <desmaison.alban@gmail.com>
    Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
    Co-authored-by: Antoine Pitrou <pitrou@free.fr>
    @terryjreedy
    Copy link
    Member

    Patch apparently caused regression for nested multiprocessing calls in 11.5. #108520

    @pitrou pitrou closed this as completed Sep 2, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir topic-multiprocessing type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    Development

    No branches or pull requests

    5 participants