Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unobservable vec, and arrays writes always always emit for size >= 3 #96497

Open
vimirage opened this issue Apr 28, 2022 · 8 comments
Open

Unobservable vec, and arrays writes always always emit for size >= 3 #96497

vimirage opened this issue Apr 28, 2022 · 8 comments
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@vimirage
Copy link
Contributor

vimirage commented Apr 28, 2022

While revealing that dead Vec<_> writes always emit, to someone else, I decided to look into this for arrays.

It seems that, for any [T; N], such that T is not a ZST or enum with a single possible value, and 3 <= N, then array writes are always emit. This applies to allowing emitting redundant memcpys, memclrs, memfills, and all related operations.

https://rust.godbolt.org/z/7vrT5xnxq

As for Vec<T>, as long as T isn't as described above, then unobservable writes and operations are usually emit.
https://rust.godbolt.org/z/jhM75Mj5E

@erikdesjardins
Copy link
Contributor

erikdesjardins commented Apr 29, 2022

For [T; N], this is because we don't add lifetime markers to by-move arguments passed indirectly: https://rust.godbolt.org/z/vjd78n38a.

I believe we should be adding a lifetime.end marker at the end of the function.

(Note that byval also fixes this, but it should be avoided because it changes the ABI in a way that limits optimization potential, by requiring the argument to be in a specific stack position.)


For Vec<T>, LLVM should be able to see that the stores are dead because it knows that __rust_dealloc frees memory.

It appears to be failing here because we conditionally call __rust_dealloc, and LLVM doesn't have enough information to optimize out the condition, so __rust_dealloc doesn't actually postdominate the store.

For the simplified case of Box<[T]>, see https://rust.godbolt.org/z/vnPM5o1K4. Note that Box<[u32]> has an extra:

%6 = shl i64 %1, 2
%7 = icmp eq i64 %6, 0

This is doing something like x.len() * size_of::<u32>() == 0, to decide whether or not to call __rust_dealloc. This check shouldn't be necessary, because we just checked x.len() == 0 above that, and we know the multiplication never wraps (but LLVM doesn't).

The store is optimized out for Box<[u8]>, because size_of::<u8> == 1 and the two conditions are trivially equivalent.

Despite this, it's still not optimized out for Vec<u8>, because again __rust_dealloc doesn't postdominate the store. In this case it seems like it's because the bounds check is based on x.len() and the dealloc check is based on x.capacity(), and LLVM doesn't know that x.len() <= x.capacity().

@vimirage
Copy link
Contributor Author

vimirage commented Jun 11, 2022

Is there any way that LLVM could be informed x.len() <= x.capacity()?
At a minimum, we could toss in an extra

unsafe {
 if !(x.len() <= x.capacity()) {
  std::hint::unreachable_unchecked();
 }
}

to Vec::{len, capacity} accessors? It could give LLVM the extra information necessary to optimize this code path? Otherwise, maybe an upstream issue could be made about introducing an attribute for marking members/variables/values/etc as always following a condition such as the above, so we could bind via a macro?
Any suggestions that could be valuable for improving this?

@erikdesjardins
Copy link
Contributor

Yeah, that would likely work, or equivalently std::intrinsics::assume(x.len() <= x.capacity()). Might be worth opening a PR to run perf and make sure it's not too disruptive (assumes can interfere with other optimizations).

@erikdesjardins
Copy link
Contributor

This is doing something like x.len() * size_of::<u32>() == 0, to decide whether or not to call __rust_dealloc. This check shouldn't be necessary, because we just checked x.len() == 0 above that, and we know the multiplication never wraps (but LLVM doesn't).

I'm working on this, should be straightforward to use unchecked multiplication in that size calculation

JohnTitor added a commit to JohnTitor/rust that referenced this issue Jun 14, 2022
…rochenkov

Use unchecked mul to compute slice sizes

This allows LLVM to realize that `slice.len() > 0` iff `slice.len() * size_of::<T>() > 0`, allowing a branch on the latter to be folded into the former when dropping vecs and boxed slices, in some cases.

Fixes (partially) rust-lang#96497
JohnTitor added a commit to JohnTitor/rust that referenced this issue Jun 15, 2022
…rochenkov

Use unchecked mul to compute slice sizes

This allows LLVM to realize that `slice.len() > 0` iff `slice.len() * size_of::<T>() > 0`, allowing a branch on the latter to be folded into the former when dropping vecs and boxed slices, in some cases.

Fixes (partially) rust-lang#96497
@erikdesjardins
Copy link
Contributor

For the case of [T; N], from #98121 it seems that emitting lifetime.end is not viable.

A more viable option would be to add an attribute to upstream LLVM, like noreadonunwind/noreadonreturn discussed here: https://groups.google.com/g/llvm-dev/c/i0Z1FC51KVI.

However this is a bit less important now than when this issue was first opened, since MIR DSE is now able to remove dead stores in some cases, including the simple motivating case for [T; N]: https://rust.godbolt.org/z/MGe59WrxY.

This is still a problem when the argument is directly overwritten though (https://rust.godbolt.org/z/G3odYbq78), e.g.

pub fn example_f(mut x: String) {
    x = "foo".to_string(); // this still writes to memory in the caller's stack frame
}

@vimirage
Copy link
Contributor Author

For the case of [T; N], from #98121 it seems that emitting lifetime.end is not viable.

A more viable option would be to add an attribute to upstream LLVM, like noreadonunwind/noreadonreturn discussed here: https://groups.google.com/g/llvm-dev/c/i0Z1FC51KVI.

Ah, is this the only thing still inhibiting optimization of more complicated cases/situations regarding dead code / dead stores? It does seem correct that the lifetime of the argument copy has ended within the function, yet I saw it mentioned in the PR that inlining caused complications with that?

This is still a problem when the argument is directly overwritten though (https://rust.godbolt.org/z/G3odYbq78), e.g.

pub fn example_f(mut x: String) {
    x = "foo".to_string(); // this still writes to memory in the caller's stack frame
}

Ouch, well.. this does impact arrays of Move-only types too! I'd certainly consider it to be in the same vein of issue, e.g.:

pub fn f(mut x: [Vec<u32>; 2]) {
    // three drops are emit: x[0], x[0], x[1], and an allocation for the new vector
    x[0] = vec![1, 2];
}

Hopefully the developers working on LLVM can eventually get a working implementation of what is described in that discussion, good luck.

@vimirage
Copy link
Contributor Author

@rustbot label A-LLVM

@rustbot rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Jun 17, 2022
@workingjubilee workingjubilee added the A-codegen Area: Code generation label Aug 6, 2022
@Nilstrieb Nilstrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
@erikdesjardins
Copy link
Contributor

#121298 adds writable / dead_on_unwind (discussed above as noreadonunwind) to sret arguments.

But in order to improve this issue (i.e. to remove the stores to %x here: https://rust.godbolt.org/z/zexzc8oqa), we would also require dead_on_return. (Or on the rustc side we need to copy indirectly passed by-value arguments to an alloca at the entry of the function, but that might introduce other optimization problems.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants