Skip to content

Conversation

savannahostrowski
Copy link
Member

/Users/runner/work/cpython/cpython/Tools/jit/trampoline.c:13:13: warning: 'visibility' attribute ignored [-Wignored-attributes]
   13 |     typedef DECLARE_TARGET((*jit_func));
      |             ^
/Users/runner/work/cpython/cpython/Tools/jit/jit.h:11:49: note: expanded from macro 'DECLARE_TARGET'
   11 |     _Py_CODEUNIT *__attribute__((preserve_none, visibility("hidden"))) \
      |

This was introduced in #136528. The DECLARE_TARGET macro includes visibility("hidden"), which is valid for forward declarations but gets ignored when used in a typedef, so the compiler is yelling.

@savannahostrowski savannahostrowski changed the title Fix compiler warning from misusing DECLARE_TARGET in typedef JIT: Fix compiler warning from visibility attribute in typedef Oct 12, 2025
Copy link
Member

@chris-eibl chris-eibl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Basically the same as my PR #140286 I've just closed since @Fidget-Spinner made me aware of yours :)

Just a suggestion:

Comment on lines +13 to 14
typedef _Py_CODEUNIT *__attribute__((preserve_none)) (*jit_func)(_PyInterpreterFrame *, _PyStackRef *, PyThreadState *);
jit_func jitted = (jit_func)exec->jit_code;
Copy link
Member

@chris-eibl chris-eibl Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typedef _Py_CODEUNIT *__attribute__((preserve_none)) (*jit_func)(_PyInterpreterFrame *, _PyStackRef *, PyThreadState *);
jit_func jitted = (jit_func)exec->jit_code;
jit_func_preserve_none jitted = (jit_func_preserve_none)exec->jit_code;

I suggest to use

// To use preserve_none in JIT builds, we need to declare a separate function
// pointer with __attribute__((preserve_none)), since this attribute may not be
// supported by the compiler used to build the rest of the interpreter.
typedef jit_func __attribute__((preserve_none)) jit_func_preserve_none;

which builds on top of
typedef _Py_CODEUNIT *(*jit_func)(_PyInterpreterFrame *frame, _PyStackRef *stack_pointer, PyThreadState *tstate);

and IMHO is better than duplicating?

Copy link
Member

@chris-eibl chris-eibl Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the other usage of jit_func_preserve_none in

#define TIER2_TO_TIER2(EXECUTOR) \
do { \
OPT_STAT_INC(traces_executed); \
_PyExecutorObject *_executor = (EXECUTOR); \
jit_func_preserve_none jitted = _executor->jit_code; \
__attribute__((musttail)) return jitted(frame, stack_pointer, tstate); \
} while (0)

suggests that it is even possible to omit the cast in

jit_func_preserve_none jitted = (jit_func_preserve_none)exec->jit_code;

and indeed the same code is generated for emit_trampoline in the stencils header 1.

Footnotes

  1. Tested only for x86_64-pc-windows-msvc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the back and forth, but having a closer look and playing on Godbolt suggests that the cast is better in case of -Wpedantic, since strictly speaking void *jit_code

typedef struct _PyExecutorObject {
PyObject_VAR_HEAD
const _PyUOpInstruction *trace;
_PyVMData vm_data; /* Used by the VM, but opaque to the optimizer */
uint32_t exit_count;
uint32_t code_size;
size_t jit_size;
void *jit_code;
_PyExitData exits[1];
} _PyExecutorObject;

is a data pointer. So maybe change to a cast in TIER2_TO_TIER2, too?

@chris-eibl
Copy link
Member

Sorry for being such a pain, this is now merely for my understanding and I hope asking this question here is ok:

When playing with the above Godbolt link, I more or less get the same nice output for jit_stencils-x86_64-unknown-linux-gnu.h in WSL:

void
emit_trampoline(
    unsigned char *code, unsigned char *data, _PyExecutorObject *executor,
    const _PyUOpInstruction *instruction, jit_state *state)
{
    // 
    // trampoline.o:  file format elf64-x86-64
    // 
    // Disassembly of section .text:
    // 
    // 0000000000000000 <_JIT_ENTRY>:
    // 0: 41 57                         pushq   %r15
    // 2: 41 56                         pushq   %r14
    // 4: 41 55                         pushq   %r13
    // 6: 41 54                         pushq   %r12
    // 8: 53                            pushq   %rbx
    // 9: 49 89 f4                      movq    %rsi, %r12
    // c: 49 89 d5                      movq    %rdx, %r13
    // f: 49 89 ce                      movq    %rcx, %r14
    // 12: ff 57 70                      callq   *0x70(%rdi)
    // 15: 5b                            popq    %rbx
    // 16: 41 5c                         popq    %r12
    // 18: 41 5d                         popq    %r13
    // 1a: 41 5e                         popq    %r14
    // 1c: 41 5f                         popq    %r15
    // 1e: c3                            retq
    const unsigned char code_body[31] = {
        0x41, 0x57, 0x41, 0x56, 0x41, 0x55, 0x41, 0x54,
        0x53, 0x49, 0x89, 0xf4, 0x49, 0x89, 0xd5, 0x49,
        0x89, 0xce, 0xff, 0x57, 0x70, 0x5b, 0x41, 0x5c,
        0x41, 0x5d, 0x41, 0x5e, 0x41, 0x5f, 0xc3,
    };
    memcpy(code, code_body, sizeof(code_body));
}

whereas on Windows x86_64 I get

void
emit_trampoline(
    unsigned char *code, unsigned char *data, _PyExecutorObject *executor,
    const _PyUOpInstruction *instruction, jit_state *state)
{
    // 
    // trampoline.o:       file format coff-x86-64
    // 
    // Disassembly of section .text:
    // 
    // 0000000000000000 <_JIT_ENTRY>:
    // 0: 41 57                         pushq   %r15
    // 2: 41 56                         pushq   %r14
    // 4: 41 55                         pushq   %r13
    // 6: 41 54                         pushq   %r12
    // 8: 56                            pushq   %rsi
    // 9: 57                            pushq   %rdi
    // a: 53                            pushq   %rbx
    // b: 48 81 ec a0 00 00 00          subq    $0xa0, %rsp
    // 12: 44 0f 29 bc 24 90 00 00 00    movaps  %xmm15, 0x90(%rsp)
    // 1b: 44 0f 29 b4 24 80 00 00 00    movaps  %xmm14, 0x80(%rsp)
    // 24: 44 0f 29 6c 24 70             movaps  %xmm13, 0x70(%rsp)
    // 2a: 44 0f 29 64 24 60             movaps  %xmm12, 0x60(%rsp)
    // 30: 44 0f 29 5c 24 50             movaps  %xmm11, 0x50(%rsp)
    // 36: 44 0f 29 54 24 40             movaps  %xmm10, 0x40(%rsp)
    // 3c: 44 0f 29 4c 24 30             movaps  %xmm9, 0x30(%rsp)
    // 42: 44 0f 29 44 24 20             movaps  %xmm8, 0x20(%rsp)
    // 48: 0f 29 7c 24 10                movaps  %xmm7, 0x10(%rsp)
    // 4d: 0f 29 34 24                   movaps  %xmm6, (%rsp)
    // 51: 49 89 d4                      movq    %rdx, %r12
    // 54: 4d 89 c5                      movq    %r8, %r13
    // 57: 4d 89 ce                      movq    %r9, %r14
    // 5a: ff 51 70                      callq   *0x70(%rcx)
    // 5d: 0f 28 34 24                   movaps  (%rsp), %xmm6
    // 61: 0f 28 7c 24 10                movaps  0x10(%rsp), %xmm7
    // 66: 44 0f 28 44 24 20             movaps  0x20(%rsp), %xmm8
    // 6c: 44 0f 28 4c 24 30             movaps  0x30(%rsp), %xmm9
    // 72: 44 0f 28 54 24 40             movaps  0x40(%rsp), %xmm10
    // 78: 44 0f 28 5c 24 50             movaps  0x50(%rsp), %xmm11
    // 7e: 44 0f 28 64 24 60             movaps  0x60(%rsp), %xmm12
    // 84: 44 0f 28 6c 24 70             movaps  0x70(%rsp), %xmm13
    // 8a: 44 0f 28 b4 24 80 00 00 00    movaps  0x80(%rsp), %xmm14
    // 93: 44 0f 28 bc 24 90 00 00 00    movaps  0x90(%rsp), %xmm15
    // 9c: 48 81 c4 a0 00 00 00          addq    $0xa0, %rsp
    // a3: 5b                            popq    %rbx
    // a4: 5f                            popq    %rdi
    // a5: 5e                            popq    %rsi
    // a6: 41 5c                         popq    %r12
    // a8: 41 5d                         popq    %r13
    // aa: 41 5e                         popq    %r14
    // ac: 41 5f                         popq    %r15
    // ae: c3                            retq
    const unsigned char code_body[175] = {
        0x41, 0x57, 0x41, 0x56, 0x41, 0x55, 0x41, 0x54,
        0x56, 0x57, 0x53, 0x48, 0x81, 0xec, 0xa0, 0x00,
        0x00, 0x00, 0x44, 0x0f, 0x29, 0xbc, 0x24, 0x90,
        0x00, 0x00, 0x00, 0x44, 0x0f, 0x29, 0xb4, 0x24,
        0x80, 0x00, 0x00, 0x00, 0x44, 0x0f, 0x29, 0x6c,
        0x24, 0x70, 0x44, 0x0f, 0x29, 0x64, 0x24, 0x60,
        0x44, 0x0f, 0x29, 0x5c, 0x24, 0x50, 0x44, 0x0f,
        0x29, 0x54, 0x24, 0x40, 0x44, 0x0f, 0x29, 0x4c,
        0x24, 0x30, 0x44, 0x0f, 0x29, 0x44, 0x24, 0x20,
        0x0f, 0x29, 0x7c, 0x24, 0x10, 0x0f, 0x29, 0x34,
        0x24, 0x49, 0x89, 0xd4, 0x4d, 0x89, 0xc5, 0x4d,
        0x89, 0xce, 0xff, 0x51, 0x70, 0x0f, 0x28, 0x34,
        0x24, 0x0f, 0x28, 0x7c, 0x24, 0x10, 0x44, 0x0f,
        0x28, 0x44, 0x24, 0x20, 0x44, 0x0f, 0x28, 0x4c,
        0x24, 0x30, 0x44, 0x0f, 0x28, 0x54, 0x24, 0x40,
        0x44, 0x0f, 0x28, 0x5c, 0x24, 0x50, 0x44, 0x0f,
        0x28, 0x64, 0x24, 0x60, 0x44, 0x0f, 0x28, 0x6c,
        0x24, 0x70, 0x44, 0x0f, 0x28, 0xb4, 0x24, 0x80,
        0x00, 0x00, 0x00, 0x44, 0x0f, 0x28, 0xbc, 0x24,
        0x90, 0x00, 0x00, 0x00, 0x48, 0x81, 0xc4, 0xa0,
        0x00, 0x00, 0x00, 0x5b, 0x5f, 0x5e, 0x41, 0x5c,
        0x41, 0x5d, 0x41, 0x5e, 0x41, 0x5f, 0xc3,
    };
    // 0: '\x00\x00\x00\x00'
    // 4: 00 00 00 00
    memcpy(code, code_body, sizeof(code_body));
}

which tells me, clang-cl is saving a ton more on the stack here.
Why that big difference? I wouldn't have assumed that from reading https://clang.llvm.org/docs/AttributeReference.html#preserve-none

On X86-64, only RSP and RBP are preserved by the callee. Registers R12, R13, R14, R15, RDI, RSI, RDX, RCX, R8, R9, R11, and RAX now can be used to pass function arguments. Floating-point registers (XMMs/YMMs) still follow the C calling convention.

@chris-eibl
Copy link
Member

Oh, I think I've found the answer myself: adding __attribute__((ms_abi)) to the _JIT_ENTRY function results in similar code as in the stencil, so it is just following MS ABI ...

@chris-eibl
Copy link
Member

Maybe also restore the comment

 // Note that this is *not* a tail call:

that got lost in #137961? Because it is utterly important that we use jit_func_preserve_none here and not jit_func, because this would result in a tail call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants