Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug pymalloc crash when using os.fork() [regression] #86706

Closed
CendioOssman mannequin opened this issue Dec 2, 2020 · 9 comments
Closed

Debug pymalloc crash when using os.fork() [regression] #86706

CendioOssman mannequin opened this issue Dec 2, 2020 · 9 comments
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@CendioOssman
Copy link
Mannequin

CendioOssman mannequin commented Dec 2, 2020

BPO 42540
Nosy @vstinner, @ambv, @colesbury, @miss-islington, @stestagg, @CendioOssman
PRs
  • bpo-42540: reallocation of id_mutex should not force default allocator #29564
  • [3.10] bpo-42540: reallocation of id_mutex should not force default allocator (GH-29564) #29599
  • [3.9] bpo-42540: reallocation of id_mutex should not force default allocator (GH-29564) #29600
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-11-17.23:01:52.738>
    created_at = <Date 2020-12-02.14:22:16.372>
    labels = ['interpreter-core', '3.10', '3.8', '3.9', 'type-crash']
    title = 'Debug pymalloc crash when using os.fork() [regression]'
    updated_at = <Date 2021-11-18.12:42:27.933>
    user = 'https://github.com/CendioOssman'

    bugs.python.org fields:

    activity = <Date 2021-11-18.12:42:27.933>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-11-17.23:01:52.738>
    closer = 'lukasz.langa'
    components = ['Interpreter Core']
    creation = <Date 2020-12-02.14:22:16.372>
    creator = 'CendioOssman'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42540
    keywords = ['patch']
    message_count = 9.0
    messages = ['382314', '382612', '382613', '382614', '406492', '406493', '406503', '406505', '406533']
    nosy_count = 6.0
    nosy_names = ['vstinner', 'lukasz.langa', 'colesbury', 'miss-islington', 'stestagg', 'CendioOssman']
    pr_nums = ['29564', '29599', '29600']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue42540'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    @CendioOssman
    Copy link
    Mannequin Author

    CendioOssman mannequin commented Dec 2, 2020

    A python equivalent of the classical daemon() call started throwing an error from 3.8 when the debug hooks are active for pymalloc. If the tracing is also active it segfaults.

    This simple example triggers it:

    import os
    
    def daemon():
        pid = os.fork()
        if pid != 0:
            os._exit(0)
    
    daemon()

    The error given is:

    Debug memory block at address p=0xf013d0: API '1'
    0 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
    at p-7: 0x00 *** OUCH
    at p-6: 0x00 *** OUCH
    at p-5: 0x00 *** OUCH
    at p-4: 0x00 *** OUCH
    at p-3: 0x00 *** OUCH
    at p-2: 0x00 *** OUCH
    at p-1: 0x00 *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
    may be bogus, and checking the trailing pad bytes may segfault.
    The 8 pad bytes at tail=0xf013d0 are not all FORBIDDENBYTE (0xfd):
    at tail+0: 0x01 *** OUCH
    at tail+1: 0x00 *** OUCH
    at tail+2: 0x00 *** OUCH
    at tail+3: 0x00 *** OUCH
    at tail+4: 0x00 *** OUCH
    at tail+5: 0x00 *** OUCH
    at tail+6: 0x00 *** OUCH
    at tail+7: 0x00 *** OUCH

    Enable tracemalloc to get the memory block allocation traceback

    Fatal Python error: bad ID: Allocated using API '1', verified using API 'r'
    Python runtime state: finalizing (tstate=0xf023b0)

    Tested on Fedora, Ubuntu and RHEL with the same behaviour everwhere. Everything up to 3.8 works fine. 3.8 and 3.9 both exhibit the issue.

    Since this is a very standard way of starting a daemon it should affect quite a few users. At the very least it makes it annoying to use development mode to catch other issues.

    @CendioOssman CendioOssman mannequin added 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Dec 2, 2020
    @stestagg
    Copy link
    Mannequin

    stestagg mannequin commented Dec 6, 2020

    Reproducible on master, but doesn't occur with a debug build.

    My configure:

    ./configure --prefix=$PREFIX --exec-prefix=$EPREFIX --cache-file=../config.cache --without-ensurepip > /dev/null
    PYTHONMALLOC=pymalloc_debug $EPREFIX/bin/python3 ../test.py

    Crash comes out of the child process, I modified example to invoke lldb for child:

    import os
    import time
    import subprocess
    
    def daemon():
        pid = os.fork()
        if pid != 0:
            subprocess.run(["lldb", '-p', str(pid)])
            time.sleep(10)
            os._exit(0)
        time.sleep(1)
    
    daemon()
    

    Attaching to the child gives this backtrace:

    (lldb) bt

    The call stack is trying to free an interpreter mutex:
        PyThread_free_lock(interp->id_mutex);

    @stestagg
    Copy link
    Mannequin

    stestagg mannequin commented Dec 6, 2020

    Something seems to be: breaking alignment, changing the allocator used, or trashing that memory.

    In my case, the address of the mutex is: 0x5603a3666630 (*1) below
    and the surrounding memory:

    0x5603a3666610: n\0\0\03\0\0\0\0\0\0\0(*2)\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd1\0\0\0\0\0\0\0
    0x5603a3666630: (*1)\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd
    0x5603a3666650: \xdd\xdd\xdd\xdd\xdd\xdd\xdd\xddQ\0\0\0\0\0\0\0\x90ds\xa3\x03V\0\0\xf0\xc4\xfbq\a\x7f\0\0

    The fun thing is that the nbytes field lines up at (*2) which is full of DEADBYTEs, and thus tries to find the tail of the allocation at p + 15987178197214944733 (which happily segfaults)

    @stestagg
    Copy link
    Mannequin

    stestagg mannequin commented Dec 6, 2020

    So, I'm not an allocator/init/teardown expert, but it looks like: When you fork, PyRuntimeState creates a new mutex, explicitly using the default allocator (without the debug allocator active)..

    #ifdef HAVE_FORK
    /* This function is called from PyOS_AfterFork_Child to ensure that
       newly created child processes do not share locks with the parent. */
    PyStatus
    _PyRuntimeState_ReInitThreads(_PyRuntimeState *runtime)
    {
        // This was initially set in _PyRuntimeState_Init().
        runtime->main_thread = PyThread_get_thread_ident();
    /* Force default allocator, since _PyRuntimeState_Fini() must
       use the same allocator than this function. */
    PyMemAllocatorEx old_alloc;
    _PyMem_SetDefaultAllocator(PYMEM_DOMAIN_RAW, &old_alloc);
    
        int reinit_interp = _PyThread_at_fork_reinit(&runtime->interpreters.mutex);
        int reinit_main_id = _PyThread_at_fork_reinit(&runtime->interpreters.main->id_mutex);
        int reinit_xidregistry = _PyThread_at_fork_reinit(&runtime->xidregistry.mutex);
    
        PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &old_alloc);
    if (reinit_interp < 0
        || reinit_main_id < 0
        || reinit_xidregistry < 0)
    {
        return _PyStatus_ERR("Failed to reinitialize runtime locks");
    
        }
        return _PyStatus_OK();
    }
    #endif

    But the PyInterpreterState_Delete function does not do this:

    if (interp->id_mutex != NULL) {
        PyThread_free_lock(interp->id_mutex);
    }

    Which causes it to try to use the debug allocator to free, and hence crash..

    @vstinner
    Copy link
    Member

    New changeset 736684b by Sam Gross in branch 'main':
    bpo-42540: reallocation of id_mutex should not force default allocator (GH-29564)
    736684b

    @miss-islington
    Copy link
    Contributor

    New changeset 1079b3e by Miss Islington (bot) in branch '3.10':
    bpo-42540: reallocation of id_mutex should not force default allocator (GH-29564)
    1079b3e

    @ambv
    Copy link
    Contributor

    ambv commented Nov 17, 2021

    New changeset 87787c8 by Sam Gross in branch '3.9':
    [3.9] bpo-42540: reallocation of id_mutex should not force default allocator (GH-29564) (GH-29600)
    87787c8

    @ambv
    Copy link
    Contributor

    ambv commented Nov 17, 2021

    Thanks, Sam! ✨ 🍰 ✨

    @ambv ambv closed this as completed Nov 17, 2021
    @vstinner
    Copy link
    Member

    Do you think that it would be worth it to dump the memory allocation when a Fatal Python error related to a memory error is triggered?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants