Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal code generation for omittable bound checks #91109

Open
loskutov opened this issue Nov 21, 2021 · 6 comments
Open

Suboptimal code generation for omittable bound checks #91109

loskutov opened this issue Nov 21, 2021 · 6 comments
Labels
C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@loskutov
Copy link

Compiler Explorer link: https://godbolt.org/z/KMTfY18Wf

I tried this code:

pub fn zero(d: &mut [Vec<i32>]) {
    let n = d.len();
    for i in 0..n {
        assert!(d[i].len() == n);
        for j in 0..n {
            d[i][j] = 0;
        }
    }
}

With -C llvm-args=-enable-constraint-elimination, I expected the bound checks to be optimized out as they're performed manually by assert!. However, both assert and bound checks were present in the compiled code. The inner loop looks really dumb in terms of code generation:

.LBB0_5:
        cmp     rsi, rdx
        je      .LBB0_9
        mov     dword ptr [rcx + 4*rdx], 0
        add     rdx, 1
        cmp     rsi, rdx
        jne     .LBB0_5

rsi != rdx is the loop invariant, but it's verified on each iteration.

At first I suspected the issue to be entirely on the LLVM side, but bound checks are omitted in equivalent C++ code (which is also present at the Compiler Explorer link), and the whole inner loop is replaced with a single memset call.

Meta

rustc --version --verbose:

rustc 1.58.0-nightly (a77da2d45 2021-11-19)
binary: rustc
commit-hash: a77da2d454e6caa227a85b16410b95f93495e7e0
commit-date: 2021-11-19
host: x86_64-unknown-linux-gnu
release: 1.58.0-nightly
LLVM version: 13.0.0
@loskutov loskutov added the C-bug Category: This is a bug. label Nov 21, 2021
@the8472
Copy link
Member

the8472 commented Nov 21, 2021

To get memset as in the clang version you can replace

        for j in 0..n {
            d[i][j] = 0;
        }

with

d[i][0..n].fill(0);

@loskutov
Copy link
Author

The code example is pretty synthetic and doesn't make much sense on its own; however, similar effects can be observed in slightly more complicated programs, e.g. Floyd—Warshall algorithm.

My point is that the missed optimization is pretty important in real-life programs, as it's not uncommon to be unable to avoid array indexing, and bounds-checking overhead is often claimed to be negligible as the checks are likely to be optimized out. And of course I expect them to be optimized out when they are optimized out by clang in an equivalent C++ program.

@leonardo-m
Copy link

There are multiple ways to solve the problem, like:

pub fn zero(d: &mut [Vec<i32>]) {
    let n = d.len();
    for i in 0 .. n {
        let row = &mut d[i][.. n];
        for j in 0 .. n {
            row[j] = 0;
        }
    }
}

What's the best way to solve this problem depends on the specific situation.

@loskutov
Copy link
Author

True. However, the issue is about poor codegen for this particular piece of code, not about looking for a way to rewrite it.

@the8472 the8472 added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Nov 21, 2021
@leonardo-m
Copy link

Your original code contains an assert, its purpose is to tell something to the optimizer. Often that helps, but you can't be sure the compiler will use the information given by the assert to its fullest. It's a "best effort" for the compiler. So you are waiting for the compiler to do what you want based on "best efforts".

@scottmcm
Copy link
Member

Thanks for including the vector::at C++ version!

I suggest that you prefer re-slicing to asserts for avoiding bounds checks. For example, this one works:

pub fn zero(d: &mut [Vec<i32>]) {
    let n = d.len();
    for i in 0..n {
        let di = &mut d[i][..n];
        for j in 0..n {
            di[j] = 0;
        }
    }
}

https://rust.godbolt.org/z/f19nasGPh

And that's a general strategy for improving bound-check-elimination. It's how the the new reverse implementation works, for example (#90821).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

4 participants