New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable noalias annotations by default once LLVM no longer miscompiles them #54878

Open
bstrie opened this Issue Oct 6, 2018 · 9 comments

Comments

Projects
None yet
5 participants
@bstrie
Copy link
Contributor

bstrie commented Oct 6, 2018

This issue tracks the undoing of the -Zmutable-alias=no default introduced in #54639 on account of a bug in LLVM. cc @nagisa

(Deja vu?)

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Oct 7, 2018

I’m still working on figuring out the underlying issue. The interesting ticket is #54462.

@nagisa nagisa self-assigned this Oct 7, 2018

@Aaron1011

This comment has been minimized.

Copy link
Contributor

Aaron1011 commented Oct 13, 2018

Using @nagisa's minimal reproduction:

Minimised test case with no unsafe code (make sure to compile with 1 codegen unit!):
fn linidx(row: usize, col: usize) -> usize {
    row * 1 + col * 3
}

fn swappy() -> [f32; 12] {
    let mut mat = [1.0f32, 5.0, 9.0, 2.0, 6.0, 10.0, 3.0, 7.0, 11.0, 4.0, 8.0, 12.0];

    for i in 0..2 {
        for j in i+1..3 {
            if mat[linidx(j, 3)] > mat[linidx(i, 3)] {
                    for k in 0..4 {
                            let (x, rest) = mat.split_at_mut(linidx(i, k) + 1);
                            let a = x.last_mut().unwrap();
                            let b = rest.get_mut(linidx(j, k) - linidx(i, k) - 1).unwrap();
                            ::std::mem::swap(a, b);
                    }
            }
        }
    }

    mat
}

fn main() {
    let mat = swappy();
    assert_eq!([9.0, 5.0, 1.0, 10.0, 6.0, 2.0, 11.0, 7.0, 3.0, 12.0, 8.0, 4.0], mat);
}

I was able to bisect LLVM's optimization passes to find the one causing the error.

Running this command results in a working executeable (replace bug.rs with the name of the file you saved the reproduction in).

rustc -Z no-parallel-llvm -C codegen-units=1 -O -Z mutable-noalias=yes -C llvm-args=-opt-bisect-limit=2260 bug.rs

While running this command results in a broken executable (the `assert_eq`` fails):

rustc -Z no-parallel-llvm -C codegen-units=1 -O -Z mutable-noalias=yes -C llvm-args=-opt-bisect-limit=2261 bug.rs

LLVM bisect output

For this file, optimization 2261 corresponds to Global Value Numbering on function (_ZN3bug6swappy17hdcc51d0e284ea38bE)

@comex

This comment has been minimized.

Copy link
Contributor

comex commented Oct 13, 2018

Bisecting LLVM revisions (using llvmlab bisect) narrows it down to r305936-r305938, presumably r305938:

[BasicAA] Use MayAlias instead of PartialAlias for fallback.

Note that this is a pretty old change, from June 2017.

Edit: Looking at the commit description, it seems likely that the bug existed prior to that, but was masked by BasicAA preventing later alias passes from running, which is what the commit fixed. The case involves checking aliasing between a pair of getelementptr instructions where the compiler knows they have the same base address but doesn't know the offsets.

Edit2: Also, passing -enable-scoped-noalias=false as an LLVM option prevents the miscompilation. (This is not surprising since that disables noalias handling altogether, but just in case it helps…)

@nikic

This comment has been minimized.

Copy link
Contributor

nikic commented Oct 13, 2018

From a look at the pre-GVN IR, I feel like the root cause here might be in loop unrolling, depending on whether my understanding of how LLVM aliasing annotations work is correct.

Consider a code like

int *a, *b;
for (int i = 0; i < 4; i++) {
    a[i & 1] = b[i & 1];
}

where a[i & 1] and b[i & 1] do not alias within a single iteration, but a and b in general may alias.

In LLVM IR this would go something like:

define void @test(i32* %addr1, i32* %addr2) {
start:
    br label %body

body:
    %i = phi i32 [ 0, %start ], [ %i2, %body ]
    %j = and i32 %i, 1
    %addr1i = getelementptr inbounds i32, i32* %addr1, i32 %j
    %addr2i = getelementptr inbounds i32, i32* %addr2, i32 %j

    %x = load i32, i32* %addr1i, !alias.scope !2
    store i32 %x, i32* %addr2i, !noalias !2

    %i2 = add i32 %i, 1
    %cmp = icmp slt i32 %i2, 4
    br i1 %cmp, label %body, label %end

end:
    ret void
}

!0 = !{!0}
!1 = !{!1, !0}
!2 = !{!1}

If we run this through -loop-unroll we get:

define void @test(i32* %addr1, i32* %addr2) {
start:
  br label %body

body:                                             ; preds = %start
  %x = load i32, i32* %addr1, !alias.scope !0
  store i32 %x, i32* %addr2, !noalias !0
  %addr1i.1 = getelementptr inbounds i32, i32* %addr1, i32 1
  %addr2i.1 = getelementptr inbounds i32, i32* %addr2, i32 1
  %x.1 = load i32, i32* %addr1i.1, !alias.scope !0
  store i32 %x.1, i32* %addr2i.1, !noalias !0
  %x.2 = load i32, i32* %addr1, !alias.scope !0
  store i32 %x.2, i32* %addr2, !noalias !0
  %addr1i.3 = getelementptr inbounds i32, i32* %addr1, i32 1
  %addr2i.3 = getelementptr inbounds i32, i32* %addr2, i32 1
  %x.3 = load i32, i32* %addr1i.3, !alias.scope !0
  store i32 %x.3, i32* %addr2i.3, !noalias !0
  ret void
}

!0 = !{!1}
!1 = distinct !{!1, !2}
!2 = distinct !{!2}

Note how all four copies of the loop use aliasing metadata over the same aliasing domain. Instead of being noalias within a single iteration, it's noalias across the whole function.

Finally, -scoped-noalias -gvn gives us:

define void @test(i32* %addr1, i32* %addr2) {
start:
  %x = load i32, i32* %addr1, !alias.scope !0
  store i32 %x, i32* %addr2, !noalias !0
  %addr1i.1 = getelementptr inbounds i32, i32* %addr1, i32 1
  %addr2i.1 = getelementptr inbounds i32, i32* %addr2, i32 1
  %x.1 = load i32, i32* %addr1i.1, !alias.scope !0
  store i32 %x.1, i32* %addr2i.1, !noalias !0
  store i32 %x, i32* %addr2, !noalias !0
  store i32 %x.1, i32* %addr2i.1, !noalias !0
  ret void
}

!0 = !{!1}
!1 = distinct !{!1, !2}
!2 = distinct !{!2}

And this will result in incorrect results if a = b + 1.

It's possible to reproduce this issue from C with the following code:

#include "stdio.h"

void copy(int * restrict to, int * restrict from) {
	*to = *from;
}

void test(int *a, int *b) {
	for (int i = 0; i < 4; i++) {
		copy(&b[i & 1], &a[i & 1]);
	}
}

int main() {
	int ary[] = {0, 1, 2};
	test(&ary[1], &ary[0]);
	printf("%d %d %d\n", ary[0], ary[1], ary[2]);
	return 1;
}

With Clang 6.0 this prints 2 2 2 at -O0 and 1 2 2 at -O3. I'm not sure if this code is legal under restrict semantics in C, but I think it should be legal under the stricter noalias semantics of LLVM.

@comex

This comment has been minimized.

Copy link
Contributor

comex commented Oct 13, 2018

I reduced it to a simple C test case (compile at -O3 and -O0 and compare output):

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

__attribute__((always_inline))
static inline void copy(int *restrict a, int *restrict b) {
    assert(a != b);
    *b = *a;
    *a = 7;
}

__attribute__((noinline))
void floppy(int mat[static 2], size_t idxs[static 3]) {
    for (int i = 0; i < 3; i++) {
        copy(&mat[i%2], &mat[idxs[i]]);
    }
}

int main() {
    int mat[3] = {10, 20};
    size_t idxs[3] = {1, 0, 1};
    floppy(mat, idxs);
    printf("%d %d\n", mat[0], mat[1]);
}

Note that if you remove restrict, the C equivalent of noalias, behavior is correct. Yet even then, the assert(a != b) passes, proving that no UB can occur due to calling it with restrict.

What's happening is:

  1. copy() gets inlined, resulting in something like:
for (int i = 0; i < 3; i++) {
    mat[idxs[i]] = mat[i%2]; mat[i%2] = 7;
}
  1. LLVM unrolls the loop:
mat[idxs[0]] = mat[0]; mat[0] = 7; /* from copy(&mat[0%2], &mat[idxs[0]]) */
mat[idxs[1]] = mat[1]; mat[1] = 7; /* from copy(&mat[1%2], &mat[idxs[1]]) */
mat[idxs[2]] = mat[0]; mat[0] = 7; /* from copy(&mat[2%2], &mat[idxs[2]]) */
  1. LLVM thinks mat[0] cannot alias with mat[idxs[1]] or mat[1], ergo it cannot have been changed between mat[0] = 7; and mat[idxs[2]] = mat[0];, ergo it's safe for global value numbering to optimize the latter to mat[idxs[2]] = 7;.

But mat[0] does alias with mat[idxs[1]], because idxs[1] == 0. And we did not promise it wouldn't, because on the second iteration when &mat[idxs[1]] is passed to copy, the other argument is &mat[1]. So why does LLVM think it can't?

Well, it has to do with the way copy is inlined. The noalias function attribute is turned into !alias.scope and !noalias metadata on the load and store instructions, like:

  %8 = load i32, i32* %0, align 4, !tbaa !8, !alias.scope !10, !noalias !13
  store i32 %8, i32* %7, align 4, !tbaa !8, !alias.scope !13, !noalias !10
  store i32 7, i32* %0, align 4, !tbaa !8, !alias.scope !10, !noalias !13

Normally, if a function is inlined multiple times, each copy gets its own unique IDs for alias.scope and noalias, indicating that each call represents its own 'inequality' relationship* between the pair of arguments marked noalias (restrict at C level), which may have different values for each call.

However, in this case, first the function is inlined into the loop, then the inlined code is duplicated when the loop is unrolled – and this duplication does not change the IDs. Because of this, LLVM thinks none of the a's can alias with any of the b's, which is false, because a from the first and third calls aliases with b from the second call (all pointing to &mat[0]).

Amazingly, GCC also miscompiles this, with different output. (clang and GCC at -O0 both output 7 10; clang at -O3 outputs 7 7; GCC at -O3 outputs 10 7.) Uh, I really hope I didn't screw something up and add UB after all, but I don't see how...

* It's a bit more complicated than that, but in this case, since copy does not use any pointer arithmetic and writes to both pointers, the inequality a != b is necessary and sufficient for a call not to be UB.

@comex

This comment has been minimized.

Copy link
Contributor

comex commented Oct 13, 2018

Heh, looks like I raced with @nikic to find the same explanation. Their test case is slightly nicer :)

@nikic

This comment has been minimized.

Copy link
Contributor

nikic commented Oct 13, 2018

That's some really great timing ^^ We reached the same conclusion with nearly the same reduced test case at the same time :)

To fix this, probably something along the lines of https://github.com/llvm-mirror/llvm/blob/54d4881c352796b18bfe7314662a294754e3a752/lib/Transforms/Utils/InlineFunction.cpp#L801 needs to be also be done in LoopUnrollPass.

@nikic

This comment has been minimized.

Copy link
Contributor

nikic commented Oct 13, 2018

I've submitted an LLVM bug report for this issue at https://bugs.llvm.org/show_bug.cgi?id=39282.

@comex

This comment has been minimized.

Copy link
Contributor

comex commented Oct 14, 2018

And – just mentioning this for completeness – I submitted a bug report to GCC since it also miscompiled my C test case: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87609

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment