Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve core::intrinsics::black_box output. #99899

Open
thomcc opened this issue Jul 29, 2022 · 3 comments
Open

Improve core::intrinsics::black_box output. #99899

thomcc opened this issue Jul 29, 2022 · 3 comments
Labels
A-codegen Area: Code generation A-intrinsics Area: intrinsics C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@thomcc
Copy link
Member

thomcc commented Jul 29, 2022

On discord, the user kangalioo (unsure of github name) shared a custom version of the black_box (#64102) function they're using to improve the asm output of black_box, and reduce the overhead of its use. It does this by passing small things in registers instead of by pointers.

// Warning, not sound. Do not use.
pub fn black_box<T>(x: T) -> T {
    use std::mem::{transmute_copy as t, forget as f};
    use std::arch::asm;
    unsafe { match std::mem::size_of::<T>() {
        1 => { let mut y: u8 = t(&x); f(x); asm!("/*{y}*/", y = inout(reg_byte) y, options(nostack)); t(&y) }
        2 => { let mut y: u16 = t(&x); f(x); asm!("/*{y}*/", y = inout(reg) y, options(nostack)); t(&y) }
        4 => { let mut y: u32 = t(&x); f(x); asm!("/*{y}*/", y = inout(reg) y, options(nostack)); t(&y) }
        8 => { let mut y: u64 = t(&x); f(x); asm!("/*{y}*/", y = inout(reg) y, options(nostack)); t(&y) }
        16 => { let [mut y, mut z]: [u64; 2] = t(&x); f(x); asm!("/*{y}{z}*/", y = inout(reg) y, z = inout(reg) z, options(nostack)); t(&[y, z]) }
        _ => { x },
    } }
}
pub fn example() {
    black_box(black_box(2) + black_box(3));
    extern "C" { fn print(_: &str); }
    unsafe { print(black_box("hello world :)")); }
}

Which produces the following output:

example::example:
    mov     eax, 2
    mov     ecx, 3
    add     ecx, eax
    lea     rdi, [rip + .L__unnamed_1]
    mov     esi, 14
    jmp     qword ptr [rip + print@GOTPCREL]
.L__unnamed_1:
    .ascii  "hello world :)"

In comparison, the current black box black_box spills the output in basically all cases. The equivalent output with the current black_box is as follows (Godbolt for all this is available here https://godbolt.org/z/a7evcEP6x):

example::example:
    sub     rsp, 24
    mov     dword ptr [rsp + 8], 2
    lea     rax, [rsp + 8]
    mov     ecx, dword ptr [rsp + 8]
    mov     dword ptr [rsp + 8], 3
    add     ecx, dword ptr [rsp + 8]
    mov     dword ptr [rsp + 8], ecx
    lea     rcx, [rip + .L__unnamed_1]
    mov     qword ptr [rsp + 8], rcx
    mov     qword ptr [rsp + 16], 14
    mov     rdi, qword ptr [rsp + 8]
    mov     rsi, qword ptr [rsp + 16]
    call    qword ptr [rip + print@GOTPCREL]
    add     rsp, 24
    ret

.L__unnamed_1:
    .ascii  "hello world :)"

I believe this is basically because we just lower the intrinsic as passing a pointer to the value into an inline asm block, which forces the spilling.

I don't believe this can be fixed by libs changes, as we are just calling into an intrinsic and need to remain that way to support all targets (and cases like miri). Additionally, the version posted in discord has a soundness hole, and is considered UB if T contains padding bytes (and can't be fixed at the moment as passing MaybeUninit via registers isn't currently possible).

However, because we just pass the argument to an intrinsic, it seems likely that the compiler can lower it in a more optimal way, which seems to be a less error-prone way of handling this anyway.

Improving this output seems beneficial, since the whole point of this intrinsic is to have as close to 0 cost as possible while still providing an optimization barrier. I think the basic idea behind the black_box provided above is a reasonable starting point of what would be good, but it's obviously not a requirement that it's lowered in that manner.

@thomcc thomcc added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 29, 2022
@thomcc
Copy link
Member Author

thomcc commented Jul 29, 2022

This is probably not worth doing if the outcome of #64102 is not to stabilize it.

@eddyb
Copy link
Member

eddyb commented Jul 29, 2022

Heh, looks like an expansion of #64102 (comment) and following comments (cc @m-ou-se).

@thomcc
Copy link
Member Author

thomcc commented Jul 29, 2022

Thanks. I was pretty sure I had seen it before, but didn't feel like crawling the thread (and wasn't sure if it was there, in the RFC thread, or just in some random codebase).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-intrinsics Area: intrinsics C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants