-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Description
Hi,
This code:
#![feature(bench_black_box)]
use core::hint::black_box;
#[inline(never)]
pub fn foo<T>(mut val: T) {
loop {
black_box(&mut val);
}
}
pub fn bar() {
foo([0u8; 1024]);
}... compiles to:
define ... {
start:
%0 = alloca [1024 x i8]*, align 8
%1 = bitcast [1024 x i8]** %0 to i8*
/* ... */
}
... but "pinning" val to the stack:
#[inline(never)]
pub fn foo<T>(val: T) {
let mut val = val;
loop {
black_box(&mut val);
}
}... causes rustc to emit a seemingly spurious alloca + memcpy:
define ... {
start:
%0 = alloca [1024 x i8]*, align 8
%val1 = alloca [1024 x i8], align 1
%1 = getelementptr inbounds [1024 x i8], [1024 x i8]* %val1, i64 0, i64 0
%2 = getelementptr inbounds [1024 x i8], [1024 x i8]* %val, i64 0, i64 0
call void @llvm.memcpy.p0i8.p0i8.i64(...)
%3 = bitcast [1024 x i8]** %0 to i8*
/* ... */
}
(checked on current nightly & --release.)
Emitting that alloca + mempy (or LLVM not eliding them, for what it's worth) makes the function require twice the amount of stack size it would need otherwise, considering the fact that the parameter already lives on the stack and doesn't escape it.
I think this accounts for a missed optimization opportunity either in rustc (as in this shouldn't have been emitted) or LLVM (as in this should've been elided via MemCpyOptimizer) 🙂
In the wild, I've found this issue when writing an async executor for AVR - using futures::pin_mut!() made my executor require twice the amount of stack size, triggering stack overflow for seemingly innocuous, small futures. But that's just for context - I think this potentially missed optimization is not related to AVR, since it's present inside the LLVM IR itself.