Unobservable vec, and arrays writes always always emit for size >= 3 #96497

vimirage · 2022-04-28T01:39:53Z

While revealing that dead Vec<_> writes always emit, to someone else, I decided to look into this for arrays.

It seems that, for any [T; N], such that T is not a ZST or enum with a single possible value, and 3 <= N, then array writes are always emit. This applies to allowing emitting redundant memcpys, memclrs, memfills, and all related operations.

https://rust.godbolt.org/z/7vrT5xnxq

As for Vec<T>, as long as T isn't as described above, then unobservable writes and operations are usually emit.
https://rust.godbolt.org/z/jhM75Mj5E

The text was updated successfully, but these errors were encountered:

erikdesjardins · 2022-04-29T04:28:05Z

For [T; N], this is because we don't add lifetime markers to by-move arguments passed indirectly: https://rust.godbolt.org/z/vjd78n38a.

I believe we should be adding a lifetime.end marker at the end of the function.

(Note that byval also fixes this, but it should be avoided because it changes the ABI in a way that limits optimization potential, by requiring the argument to be in a specific stack position.)

For Vec<T>, LLVM should be able to see that the stores are dead because it knows that __rust_dealloc frees memory.

It appears to be failing here because we conditionally call __rust_dealloc, and LLVM doesn't have enough information to optimize out the condition, so __rust_dealloc doesn't actually postdominate the store.

For the simplified case of Box<[T]>, see https://rust.godbolt.org/z/vnPM5o1K4. Note that Box<[u32]> has an extra:

%6 = shl i64 %1, 2
%7 = icmp eq i64 %6, 0

This is doing something like x.len() * size_of::<u32>() == 0, to decide whether or not to call __rust_dealloc. This check shouldn't be necessary, because we just checked x.len() == 0 above that, and we know the multiplication never wraps (but LLVM doesn't).

The store is optimized out for Box<[u8]>, because size_of::<u8> == 1 and the two conditions are trivially equivalent.

Despite this, it's still not optimized out for Vec<u8>, because again __rust_dealloc doesn't postdominate the store. In this case it seems like it's because the bounds check is based on x.len() and the dealloc check is based on x.capacity(), and LLVM doesn't know that x.len() <= x.capacity().

vimirage · 2022-06-11T23:30:32Z

Is there any way that LLVM could be informed x.len() <= x.capacity()?
At a minimum, we could toss in an extra

unsafe {
 if !(x.len() <= x.capacity()) {
  std::hint::unreachable_unchecked();
 }
}

to Vec::{len, capacity} accessors? It could give LLVM the extra information necessary to optimize this code path? Otherwise, maybe an upstream issue could be made about introducing an attribute for marking members/variables/values/etc as always following a condition such as the above, so we could bind via a macro?
Any suggestions that could be valuable for improving this?

erikdesjardins · 2022-06-13T22:33:56Z

Yeah, that would likely work, or equivalently std::intrinsics::assume(x.len() <= x.capacity()). Might be worth opening a PR to run perf and make sure it's not too disruptive (assumes can interfere with other optimizations).

erikdesjardins · 2022-06-13T22:36:11Z

This is doing something like x.len() * size_of::<u32>() == 0, to decide whether or not to call __rust_dealloc. This check shouldn't be necessary, because we just checked x.len() == 0 above that, and we know the multiplication never wraps (but LLVM doesn't).

I'm working on this, should be straightforward to use unchecked multiplication in that size calculation

…rochenkov Use unchecked mul to compute slice sizes This allows LLVM to realize that `slice.len() > 0` iff `slice.len() * size_of::<T>() > 0`, allowing a branch on the latter to be folded into the former when dropping vecs and boxed slices, in some cases. Fixes (partially) rust-lang#96497

erikdesjardins · 2022-06-17T17:24:39Z

For the case of [T; N], from #98121 it seems that emitting lifetime.end is not viable.

A more viable option would be to add an attribute to upstream LLVM, like noreadonunwind/noreadonreturn discussed here: https://groups.google.com/g/llvm-dev/c/i0Z1FC51KVI.

However this is a bit less important now than when this issue was first opened, since MIR DSE is now able to remove dead stores in some cases, including the simple motivating case for [T; N]: https://rust.godbolt.org/z/MGe59WrxY.

This is still a problem when the argument is directly overwritten though (https://rust.godbolt.org/z/G3odYbq78), e.g.

pub fn example_f(mut x: String) {
    x = "foo".to_string(); // this still writes to memory in the caller's stack frame
}

vimirage · 2022-06-17T20:28:00Z

For the case of [T; N], from #98121 it seems that emitting lifetime.end is not viable.

A more viable option would be to add an attribute to upstream LLVM, like noreadonunwind/noreadonreturn discussed here: https://groups.google.com/g/llvm-dev/c/i0Z1FC51KVI.

Ah, is this the only thing still inhibiting optimization of more complicated cases/situations regarding dead code / dead stores? It does seem correct that the lifetime of the argument copy has ended within the function, yet I saw it mentioned in the PR that inlining caused complications with that?

This is still a problem when the argument is directly overwritten though (https://rust.godbolt.org/z/G3odYbq78), e.g.
pub fn example_f(mut x: String) {
    x = "foo".to_string(); // this still writes to memory in the caller's stack frame
}

Ouch, well.. this does impact arrays of Move-only types too! I'd certainly consider it to be in the same vein of issue, e.g.:

pub fn f(mut x: [Vec<u32>; 2]) {
    // three drops are emit: x[0], x[0], x[1], and an allocation for the new vector
    x[0] = vec![1, 2];
}

Hopefully the developers working on LLVM can eventually get a working implementation of what is described in that discussion, good luck.

vimirage · 2022-06-17T22:25:05Z

@rustbot label A-LLVM

erikdesjardins · 2024-02-24T19:22:04Z

#121298 adds writable / dead_on_unwind (discussed above as noreadonunwind) to sret arguments.

But in order to improve this issue (i.e. to remove the stores to %x here: https://rust.godbolt.org/z/zexzc8oqa), we would also require dead_on_return. (Or on the rustc side we need to copy indirectly passed by-value arguments to an alloca at the entry of the function, but that might introduce other optimization problems.)

erikdesjardins mentioned this issue Jun 14, 2022

Use unchecked mul to compute slice sizes #98078

Merged

erikdesjardins mentioned this issue Jun 15, 2022

Emit @llvm.lifetime.end for moved arguments passed indirectly #98121

Closed

rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Jun 17, 2022

workingjubilee added the A-codegen Area: Code generation label Aug 6, 2022

Nilstrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unobservable vec, and arrays writes always always emit for size >= 3 #96497

Unobservable vec, and arrays writes always always emit for size >= 3 #96497

vimirage commented Apr 28, 2022 •

edited

erikdesjardins commented Apr 29, 2022 •

edited

vimirage commented Jun 11, 2022 •

edited

erikdesjardins commented Jun 13, 2022

erikdesjardins commented Jun 13, 2022

erikdesjardins commented Jun 17, 2022

vimirage commented Jun 17, 2022

vimirage commented Jun 17, 2022

erikdesjardins commented Feb 24, 2024

Unobservable vec, and arrays writes always always emit for size >= 3 #96497

Unobservable vec, and arrays writes always always emit for size >= 3 #96497

Comments

vimirage commented Apr 28, 2022 • edited

erikdesjardins commented Apr 29, 2022 • edited

vimirage commented Jun 11, 2022 • edited

erikdesjardins commented Jun 13, 2022

erikdesjardins commented Jun 13, 2022

erikdesjardins commented Jun 17, 2022

vimirage commented Jun 17, 2022

vimirage commented Jun 17, 2022

erikdesjardins commented Feb 24, 2024

vimirage commented Apr 28, 2022 •

edited

erikdesjardins commented Apr 29, 2022 •

edited

vimirage commented Jun 11, 2022 •

edited