Skip to content

Missed optimization opportunity during stack pinning #92643

@Patryk27

Description

@Patryk27

Hi,

This code:

#![feature(bench_black_box)]

use core::hint::black_box;

#[inline(never)]
pub fn foo<T>(mut val: T) {
    loop {
        black_box(&mut val);
    }
}

pub fn bar() {
    foo([0u8; 1024]);
}

... compiles to:

define ... {
start:
  %0 = alloca [1024 x i8]*, align 8
  %1 = bitcast [1024 x i8]** %0 to i8*
  
  /* ... */
}

... but "pinning" val to the stack:

#[inline(never)]
pub fn foo<T>(val: T) {
    let mut val = val;

    loop {
        black_box(&mut val);
    }
}

... causes rustc to emit a seemingly spurious alloca + memcpy:

define ... {
start:
  %0 = alloca [1024 x i8]*, align 8
  %val1 = alloca [1024 x i8], align 1
  %1 = getelementptr inbounds [1024 x i8], [1024 x i8]* %val1, i64 0, i64 0
  %2 = getelementptr inbounds [1024 x i8], [1024 x i8]* %val, i64 0, i64 0
  call void @llvm.memcpy.p0i8.p0i8.i64(...)
  %3 = bitcast [1024 x i8]** %0 to i8*
  /* ... */
}

(checked on current nightly & --release.)

Emitting that alloca + mempy (or LLVM not eliding them, for what it's worth) makes the function require twice the amount of stack size it would need otherwise, considering the fact that the parameter already lives on the stack and doesn't escape it.

I think this accounts for a missed optimization opportunity either in rustc (as in this shouldn't have been emitted) or LLVM (as in this should've been elided via MemCpyOptimizer) 🙂

In the wild, I've found this issue when writing an async executor for AVR - using futures::pin_mut!() made my executor require twice the amount of stack size, triggering stack overflow for seemingly innocuous, small futures. But that's just for context - I think this potentially missed optimization is not related to AVR, since it's present inside the LLVM IR itself.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions