Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctypes change made clang fail to build #81321

Closed
encukou opened this issue Jun 3, 2019 · 18 comments
Closed

ctypes change made clang fail to build #81321

encukou opened this issue Jun 3, 2019 · 18 comments
Labels
3.8 (EOL) end of life 3.9 only security fixes topic-ctypes

Comments

@encukou
Copy link
Member

encukou commented Jun 3, 2019

BPO 37140
Nosy @vstinner, @encukou, @ambv, @zooba, @serge-sans-paille, @miss-islington, @paulmon
PRs
  • bpo-37140 ctypes change made clang fail to build #13796
  • bpo-37140: Fix StructUnionType_paramfunc() #15612
  • [3.8] bpo-37140: Fix StructUnionType_paramfunc() (GH-15612) #15613
  • Files
  • bug.py
  • point.c
  • point.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-08-30.14:56:33.927>
    created_at = <Date 2019-06-03.09:21:37.671>
    labels = ['ctypes', '3.8', '3.9']
    title = 'ctypes change made clang fail to build'
    updated_at = <Date 2019-08-30.14:56:33.926>
    user = 'https://github.com/encukou'

    bugs.python.org fields:

    activity = <Date 2019-08-30.14:56:33.926>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-08-30.14:56:33.927>
    closer = 'vstinner'
    components = ['ctypes']
    creation = <Date 2019-06-03.09:21:37.671>
    creator = 'petr.viktorin'
    dependencies = []
    files = ['48565', '48574', '48575']
    hgrepos = []
    issue_num = 37140
    keywords = ['patch', '3.8regression']
    message_count = 18.0
    messages = ['344394', '344461', '344493', '344498', '344579', '344581', '344583', '344589', '344625', '345394', '346451', '350667', '350862', '350863', '350864', '350874', '350876', '350884']
    nosy_count = 7.0
    nosy_names = ['vstinner', 'petr.viktorin', 'lukasz.langa', 'steve.dower', 'serge-sans-paille', 'miss-islington', 'Paul Monson']
    pr_nums = ['13796', '15612', '15613']
    priority = None
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue37140'
    versions = ['Python 3.8', 'Python 3.9']

    @encukou
    Copy link
    Member Author

    encukou commented Jun 3, 2019

    Hello,
    I haven't investigated this fully yet, and won't be able to find time today. I'm forwarding the report here in case someone familiar with the code wants to take a look.

    On Fedora, "clang" fails to build with Python 3.8, probably due this change (which was supposed to be Windows-only):
    32119e1#diff-998bfefaefe2ab83d5f523e18f158fa4R413

    According to serge_sans_paille: if self->b_ptr contains pointer, the memcpy creates sharing, and this is dangerous: if a __del__ happens to free the original pointer, we end up with a dangling reference in new_ptr. As far as I can tell, this is what happens in the clang bindings code.

    Fedora discussion with link to logs: https://bugzilla.redhat.com/show_bug.cgi?id=1715016

    @paulmon
    Copy link
    Mannequin

    paulmon mannequin commented Jun 3, 2019

    The change causing the build failure comes from this commit.
    ea8a0dc

    The commit comment says: "Fix copy of structures when passed by value."

    Steve, do you recall what this change fixed?

    @paulmon
    Copy link
    Mannequin

    paulmon mannequin commented Jun 3, 2019

    If I undo the changes to StructUnionType_paramfunc then test_pass_by_value (ctypes.test.test_structures.StructureTestCase) fails on x64 on Windows. Looking at the code I don't think this is specific to Windows.

    This is a test for fixing this issue: https://bugs.python.org/issue29565

    @paulmon
    Copy link
    Mannequin

    paulmon mannequin commented Jun 3, 2019

    Reading the related bugs more carefully I think the struct/union passing conventions are different on Windows x64 and Linux. I have a fix which works for Windows but preserves the prior code for Linux.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 4, 2019

    On Fedora, "clang" fails to build with Python 3.8

    Would you mind to elaborate? It works for me on Fedora 30 with clang 8.0.0:

    $ ./configure CC=clang
    $ make
    $ ./python -m test -v test_ctypes
    ...
    Tests result: SUCCESS

    I tested the master branch of Python.

    @encukou
    Copy link
    Member Author

    encukou commented Jun 4, 2019

    The issue is with building clang using Python 3.8; not building Python 3.8 using clang :)

    @serge-sans-paille
    Copy link
    Mannequin

    serge-sans-paille mannequin commented Jun 4, 2019

    @vstinner: to reproduce the issue

    git clone https://github.com/llvm/llvm-project.git
    cd llvm-project
    mkdir _build
    cd _build
    cmake3 ../llvm -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$PWD/install -DPYTHON_EXECUTABLE=$HOME/sources/cpython/_build/install/bin/python3
    make -j4
    

    @vstinner
    Copy link
    Member

    vstinner commented Jun 4, 2019

    It would be great to be able to write a reproducer and then convert it into a proper unit test.

    @paulmon
    Copy link
    Mannequin

    paulmon mannequin commented Jun 4, 2019

    I added a unittest to the PR that illustrates the problem. It doesn't pass yet.

    @paulmon
    Copy link
    Mannequin

    paulmon mannequin commented Jun 12, 2019

    I did some reading about parameter passing and it's still not clear to me whether https://bugs.python.org/issue37140 is a bug in CPython or whether the clang bindings were relying on incorrect parameter passing behavior to work.

    The change in #13796 restores the previous behavior where Windows and non-Windows builds pass structs differently.

    @zooba
    Copy link
    Member

    zooba commented Jun 24, 2019

    According to serge_sans_paille: if self->b_ptr contains pointer, the memcpy creates sharing, and this is dangerous: if a __del__ happens to free the original pointer, we end up with a dangling reference in new_ptr. As far as I can tell, this is what happens in the clang bindings code.

    We probably need a second parg->obj to keep self alive for as long as copied_self. Or pack it into a tuple.

    Having a repro test for this would be ideal, especially if we can make it happen (even crash) on all platforms. The double-free issue would seem to be real, and I don't want it to crash on Windows either.

    @vstinner
    Copy link
    Member

    Ok, I'm able to reproduce the crash using:

    git clone https://github.com/llvm/llvm-project.git
    cd llvm-project
    mkdir _build
    cd _build
    cmake3 ../llvm -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$PWD/install -DPYTHON_EXECUTABLE=$HOME/sources/cpython/_build/install/bin/python3
    make -j4
    make check-clang-python

    The crash occurs in the test_access_specifiers() method defined in tests/cindex/test_access_specifiers.py. This test uses ctypes to call functions of the libclang.so dynamic library.

    I reduced the test case from 4350 lines of Python code to 131 lines of Python which only depends on the standard library and libclang.so.

    $ cd llvm-project/clang/bindings/python
    $ wc -l tests/cindex/test_access_specifiers.py tests/cindex/util.py clang/cindex.py 
        41 tests/cindex/test_access_specifiers.py
        90 tests/cindex/util.py
      4219 clang/cindex.py
      4350 total
    
    $ wc -l bug.py
    131 bug.py

    See attached bug.py script. Python 3.7 is fine, whereas Python 3.8 does crash:

    $ python3.7 bug.py 
    spelling
    spelling = None
    _CXString del: obj id=7f0904a6dcb0
    spelling = None ---
    
    
    $ python3.8 bug.py 
    spelling
    spelling = None
    _CXString del: obj id=7f4fa6180c40
    _CXString del: obj id=7f4fa6180ac0
    free(): double free detected in tcache 2
    Aborted (core dumped)

    Notice that _CXString.__del__() is only called once in Python 3.7, but called twice in Python 3.8.

    @vstinner
    Copy link
    Member

    It's a Python 3.8 regression, so I mark it as a release blocker.

    @vstinner
    Copy link
    Member

    I was surprised by the following test in StructUnionType_paramfunc():

    if ((size_t)self->b_size > sizeof(void*)) {
       ... copy the structure ...
    } else {
       ... pass by reference (?) ...
    }

    So I wrote a simple C library with the structure: typedef struct Point { int x; int y; } and functions modifying this structure, structured passed by copy or structure passed by reference.

    I'm surprised: ctypes works as expected :-)

    Try attached point.c and point.py:

    $ gcc -c point.c -fpic && gcc -shared -o libpoint.so point.o && python3 point.py
    p = <Point x=1 y=2>
    p = <Point x=1 y=2>
    p = <Point x=0 y=0>
    ok
    sizeof(Point) = 16 bytes

    Modify CoordType in point.c and point.py to test different sizes, on x86-64, I get:

    • c_byte: 2 bytes
    • c_short: 4 bytes
    • c_int: 8 bytes
    • c_long: 16 bytes

    @vstinner vstinner added the 3.9 only security fixes label Aug 30, 2019
    @vstinner
    Copy link
    Member

    Ok, it took me a while to understand the subtle ctypes internals. serge-sans-paille and me read the C code and run a debugger (gdb) on it to undertand how passing a structure by copy work in ctypes.

    The root function is StructUnionType_paramfunc(): it copies the structure if it's larger than sizeof(void*) bytes.

    StructUnionType_paramfunc() creates a new internal object which is stored into parg->obj. The only purpose of this object is to release the memory copy allocated by PyMem_Malloc().

    This issue is a problem with the internal object: if the struture has a finalizer, the finalizer is called twice. First, it is called on the internal object which is more and less a full copy of the structure. Second, it is called on the structure (once the last reference to the structure is removed).

    The code behaves as if the the finalizer is called twice. Even if it's two separated Python object, in fact the two objects contain the same structure values. For example, if the structure contains a pointer to memory block and the finalizer calls free(ptr): free(ptr) will be called twice with the same ptr value.

    This surprising behavior comes from this code:

            void *new_ptr = PyMem_Malloc(self->b_size);
            if (new_ptr == NULL)
                return NULL;
            memcpy(new_ptr, self->b_ptr, self->b_size);
            copied_self = (CDataObject *)PyCData_AtAddress(
                (PyObject *)Py_TYPE(self), new_ptr);
            copied_self->b_needsfree = 1;

    copied_self reuses the exact same type of the structure. If the structure has a finalizer defined in Python, it will be called.

    copied_self finalized is called at the "cleanup:" label of _ctypes_callproc():

        for (i = 0; i < argcount; ++i)
            Py_XDECREF(args[i].keep);

    --

    Trying to duplicate self isn't needed. All we need is to call PyMem_Free(ptr) at the _ctypes_callproc() exit.

    My PR 15612 introduces a new internal StructParam_Type type which has exactly one purpose: call PyMem_Free(ptr) in its deallocator. The type has no finalizer.

    @vstinner
    Copy link
    Member

    New changeset 96b4087 by Victor Stinner in branch 'master':
    bpo-37140: Fix StructUnionType_paramfunc() (GH-15612)
    96b4087

    @miss-islington
    Copy link
    Contributor

    New changeset 17f61ed by Miss Islington (bot) in branch '3.8':
    bpo-37140: Fix StructUnionType_paramfunc() (GH-15612)
    17f61ed

    @vstinner
    Copy link
    Member

    Ok, the regression is now fixed. I close the issue.

    Thanks Petr Viktorin for the bug report, Paul Monson for the fix proposed fix, and serge-sans-paille for your help on investigating this complex bug.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life 3.9 only security fixes topic-ctypes
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants