Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clang expects memory aligned on 16 bytes, but pymalloc aligns to 8 bytes #80799

vstinner opened this issue Apr 12, 2019 · 17 comments
3.8 only security fixes build The build process and cross-build


Copy link

BPO 36618
Nosy @gpshead, @vstinner, @serge-sans-paille, @pablogsal
  • bpo-36618: Add -fmax-type-align=8 flag for clang #12809
  • bpo-36618: Don't add -fmax-type-align flag to old clang #12811
  • bpo-36618: Don't add -fmax-type-align=8 flag for clang #13320
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-05-14.17:31:20.752>
    created_at = <Date 2019-04-12.17:36:22.152>
    labels = ['build', '3.8']
    title = 'clang expects memory aligned on 16 bytes, but pymalloc aligns to 8 bytes'
    updated_at = <Date 2019-05-14.17:31:20.749>
    user = '' fields:

    activity = <Date 2019-05-14.17:31:20.749>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-05-14.17:31:20.752>
    closer = 'vstinner'
    components = ['Build']
    creation = <Date 2019-04-12.17:36:22.152>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36618
    keywords = ['patch']
    message_count = 17.0
    messages = ['340087', '340100', '340106', '340108', '340116', '340126', '340128', '340129', '340130', '340131', '340261', '340279', '340280', '340281', '340322', '342490', '342491']
    nosy_count = 5.0
    nosy_names = ['gregory.p.smith', 'vstinner', 'fweimer', 'serge-sans-paille', 'pablogsal']
    pr_nums = ['12809', '12811', '13320']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = ''
    versions = ['Python 3.8']

    Copy link
    Member Author

    On x86-64, clang -O3 compiles the following function:

    PyCArgObject *
        PyCArgObject *p;
        p = PyObject_New(PyCArgObject, &PyCArg_Type);
        if (p == NULL)
            return NULL;
        p->pffi_type = NULL;
        p->tag = '\0';
        p->obj = NULL;
        memset(&p->value, 0, sizeof(p->value));
        return p;

    like that:

    0x00007fffe9c6acb0 <+0>: push rax
    0x00007fffe9c6acb1 <+1>: mov rdi,QWORD PTR [rip+0xe308] # 0x7fffe9c78fc0
    0x00007fffe9c6acb8 <+8>: call 0x7fffe9c5e8a0 <_PyObject_New@plt>
    0x00007fffe9c6acbd <+13>: test rax,rax
    0x00007fffe9c6acc0 <+16>: je 0x7fffe9c6acdf <PyCArgObject_new+47>
    0x00007fffe9c6acc2 <+18>: mov QWORD PTR [rax+0x20],0x0
    0x00007fffe9c6acca <+26>: mov BYTE PTR [rax+0x28],0x0
    0x00007fffe9c6acce <+30>: xorps xmm0,xmm0
    0x00007fffe9c6acd1 <+33>: movaps XMMWORD PTR [rax+0x30],xmm0
    0x00007fffe9c6acd5 <+37>: mov QWORD PTR [rax+0x40],0x0
    0x00007fffe9c6acdd <+45>: pop rcx
    0x00007fffe9c6acde <+46>: ret
    0x00007fffe9c6acdf <+47>: xor eax,eax
    0x00007fffe9c6ace1 <+49>: pop rcx
    0x00007fffe9c6ace2 <+50>: ret

    The problem is that movaps requires the memory address to be aligned on 16 bytes, whereas PyObject_New() uses pymalloc allocator (the requested size is 80 bytes, pymalloc supports allocations up to 512 bytes) and pymalloc only provides alignment on 8 bytes.

    If PyObject_New() returns an address not aligned on 16 bytes, PyCArgObject_new() crash immediately with a segmentation fault (SIGSEGV).

    CPython must be compiled using -fmax-type-align=8 to avoid such alignment crash. Using this compiler flag, clag emits expected machine code:

    0x00007fffe9caacb0 <+0>: push rax
    0x00007fffe9caacb1 <+1>: mov rdi,QWORD PTR [rip+0xe308] # 0x7fffe9cb8fc0
    0x00007fffe9caacb8 <+8>: call 0x7fffe9c9e8a0 <_PyObject_New@plt>
    0x00007fffe9caacbd <+13>: test rax,rax
    0x00007fffe9caacc0 <+16>: je 0x7fffe9caacdf <PyCArgObject_new+47>
    0x00007fffe9caacc2 <+18>: mov QWORD PTR [rax+0x20],0x0
    0x00007fffe9caacca <+26>: mov BYTE PTR [rax+0x28],0x0
    0x00007fffe9caacce <+30>: xorps xmm0,xmm0
    0x00007fffe9caacd1 <+33>: movups XMMWORD PTR [rax+0x30],xmm0
    0x00007fffe9caacd5 <+37>: mov QWORD PTR [rax+0x40],0x0
    0x00007fffe9caacdd <+45>: pop rcx
    0x00007fffe9caacde <+46>: ret
    0x00007fffe9caacdf <+47>: xor eax,eax
    0x00007fffe9caace1 <+49>: pop rcx
    0x00007fffe9caace2 <+50>: ret

    "movaps" instruction becomes "movups" instruction: "a" stands for "aligned" in movaps, whereas "u" stands for "unaligned" in movups.

    @vstinner vstinner added 3.8 only security fixes build The build process and cross-build labels Apr 12, 2019
    Copy link
    Member Author

    New changeset 23a683a by Victor Stinner in branch 'master':
    bpo-36618: Add -fmax-type-align=8 flag for clang (GH-12809)

    Copy link
    Member Author

    I merged a "workaround" in the master branch. Python 2.7 and 3.7 are also affected, but I prefer to wait to see if the change goes through buildbots.

    The real fix would be to modify pymalloc to use 16-byte alignement, but that's a more complex issue :-)

    Copy link
    Member Author

    Note to myself: "Sadly, the flag must be expected to CFLAGS and not just CFLAGS_NODIST, ..."

    It should be "Sadly, the flag must be *set* to CFLAGS and not just CFLAGS_NODIST, ..." :-( I should fix the NEWS entry.

    Copy link
    Member Author

    Oh, it seems like the change broke the FreeBSD 10 buildbot :-(

    checking for makedev... no
    checking for le64toh... no
    checking for mode_t... no
    checking for off_t... no
    checking for pid_t... no
    checking for size_t... no
    checking for uid_t in sys/types.h... yes
    checking for ssize_t... no
    checking for __uint128_t... no
    checking size of int... 0
    checking size of long... 0
    checking size of long long... 0
    checking size of void *... 0
    checking size of short... 0
    checking size of float... 0
    checking size of double... 0
    checking size of fpos_t... 0
    checking size of size_t... 0
    checking size of pid_t... 0
    checking size of uintptr_t... 0
    checking for long double... yes
    configure: error: in /usr/home/buildbot/python/3.x.koobs-freebsd10.nondebug/build': configure: error: cannot compute sizeof (long double) See config.log' for more details


    On the previous build, the output looked fine:

    checking size of int... 4
    checking size of long... 8
    checking size of long long... 8
    checking size of void *... 8
    checking size of short... 2
    checking size of float... 4
    checking size of double... 8
    checking size of fpos_t... 8
    checking size of size_t... 8
    checking size of pid_t... 4
    checking size of uintptr_t... 8
    checking for long double... yes
    checking size of long double... 16


    CC.version: FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
    os.uname: posix.uname_result(sysname='FreeBSD', nodename='10-STABLE-amd64.elysium', release='10.4-STABLE', version='FreeBSD 10.4-STABLE #1 r337021: Wed Aug 1 15:12:48 AEST 2018 root@10-STABLE-amd64.elysium:/usr/obj/usr/src/sys/GENERIC', machine='amd64')
    platform.platform: FreeBSD-10.4-STABLE-amd64-64bit-ELF

    Copy link
    Member Author

    See also:

    • bpo-27987: obmalloc's 8-byte alignment causes undefined behavior
    • bpo-18835: Add PyMem_AlignedAlloc()
    • bpo-31912: PyMem_Malloc() should guarantee alignof(max_align_t)

    Copy link

    gpshead commented Apr 12, 2019

    Even if you check for -fmax-type-align compiler support at configure time, there is a potential problem:

    Nothing guarantees that extension modules are built by the same compiler that CPython is. If CPython used an old clang without support for that flag and the extension module compiled by that CPython via pip and, etc. uses a more recent version of clang - it wouldn't specify that flag and the extension module code could be broken.

    I suppose this issue of conditional compiler flags is nothing new. It should not block us from going forward with a workaround like your PRs for now.

    Copy link

    gpshead commented Apr 12, 2019

    I believe -fno-max-type-align is also an option.

    Copy link
    Member Author

    New changeset a304b13 by Victor Stinner in branch 'master':
    bpo-36618: Don't add -fmax-type-align flag to old clang (GH-12811)

    Copy link
    Member Author

    It should not block us from going forward with a workaround like your PRs for now.

    I pushed a fix quickly to unblock my PR 12796, but also because I was very scared by what I saw :-D

    I see my change as a "quick fix", but we really have to sit down to think about the "correct fix". Especially since we will have to do something for Python 2.7 and 3.7, and adding -fmax-type-align=8 to exported CFLAGS can cause *new* issues, as you explained.

    That's why I mentioned bpo-27987 and other issues, to try to see what has already been said, and try to find to identify and fix the root issue, rather than working around the issue with compiler flags.

    Copy link

    fweimer mannequin commented Apr 15, 2019

    The issue is related to the definition of PyCArgObject:

    typedef struct tagPyCArgObject PyCArgObject;
    struct tagPyCArgObject {
        ffi_type *pffi_type;
        char tag;
        union {
            char c;
            char b;
            short h;
            int i;
            long l;
            long long q;
            long double D;
            double d;
            float f;
            void *p;
        } value;
        PyObject *obj;
        Py_ssize_t size; /* for the 'V' tag */

    This object must be allocated with suitable alignment (which is 16 on many platforms), and the default Python allocator apparently provides 8-byte alignment only on 64-bit platforms. In short, using PyObject_New with PyCArgObject results in undefined behavior.

    This issue potentially affects all compilers, not just Clang.

    Copy link
    Member Author

    C++ has a __alignof__ function/macro/operator (not sure what is its kind) to get the alignment of a type.

    C11 has <stdalign.h> header which provides an alignof() function.

    GCC has __alignof__().

    I also found "_Alignof()" name.

    ... no sure which one is the most portable ...

    Copy link
    Member Author

    I also found this macro:

    #define ALIGNOF(type) offsetof (struct { char c; type member; }, member)

    Copy link
    Member Author

    More info on alignof():

    Copy link

    serge-sans-paille mannequin commented Apr 16, 2019

    @vstinner: once you have a portable version of alignof, you can deciding to *not* use the pool allocator if the required alignment is greater than 8B, or you could modify the pool allocator to take alignment information as an extra parameter?

    Copy link
    Member Author

    New changeset d97adfb by Victor Stinner in branch 'master':
    bpo-36618: Don't add -fmax-type-align=8 flag for clang (GH-13320)

    Copy link
    Member Author

    Python 3.8 now respects the x86-64 ABI:

    I reverted my workaround.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    3.8 only security fixes build The build process and cross-build
    None yet

    No branches or pull requests

    2 participants