Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscompile with LoopVectorizer: Incorrect code with disjoint or #81872

Closed
annamthomas opened this issue Feb 15, 2024 · 2 comments · Fixed by #83821
Closed

Miscompile with LoopVectorizer: Incorrect code with disjoint or #81872

annamthomas opened this issue Feb 15, 2024 · 2 comments · Fixed by #83821

Comments

@annamthomas
Copy link
Contributor

This latent bug was exposed through d77067d (improvement of known bits through dominating conditions).

Given this source code snippet:

lArr = new long[N]; // initialized to 0. 
for (iv = 99; iv >= 90; --iv) {
    int tmp9 = (iv % 2); 
    if (tmp9 == 0) {
      int tmp7 = (iv + 1); 
      lArr[tmp7] = 1;
    }
}
print(lArr[99]);

This should mean lArr[99] is 1 after the loop (it is set when iv is 98). With known bits improved by dominating conditions, we know that we can convert: tmp7 = add iv, 1 into tmp7 = or disjoint iv, 1 (since iv is known divisible by 2 at that point).

When we vectorize this IR, we incorrectly vectorize the code:

Before vectorization:

bb15:                                             ; preds = %bb20, %bb8
  %iv = phi i64 [ 99, %bb8 ], [ %iv.next, %bb20 ]
  %and = and i64 %iv, 1
  %icmp17 = icmp eq i64 %and, 0
  br i1 %icmp17, label %bb18, label %bb20, !prof !21

bb18:                                             ; preds = %bb15
  %or = or disjoint i64 %iv, 1
  %getelementptr19 = getelementptr inbounds i64, ptr addrspace(1) %getelementptr, i64 %or
  store i64 1, ptr addrspace(1) %getelementptr19, align 8
  br label %bb20

bb20:                                             ; preds = %bb18, %bb15
  %iv.next = add nsw i64 %iv, -1
  %icmp22 = icmp eq i64 %iv.next, 90
  br i1 %icmp22, label %bb6, label %bb15, !prof !22

After vectorization:

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.ind = phi <4 x i64> [ <i64 99, i64 98, i64 97, i64 96>, %vector.ph ], [ %vec.ind.next, %vector.body ]
  %offset.idx = sub i64 99, %index
  %0 = add i64 %offset.idx, 0
  %broadcast.splatinsert = insertelement <4 x i64> poison, i64 %index, i64 0
  %broadcast.splat = shufflevector <4 x i64> %broadcast.splatinsert, <4 x i64> poison, <4 x i32> zeroinitializer
  %vec.iv = add <4 x i64> %broadcast.splat, <i64 0, i64 1, i64 2, i64 3>
  %1 = icmp ule <4 x i64> %vec.iv, <i64 8, i64 8, i64 8, i64 8>
  %2 = and <4 x i64> %vec.ind, <i64 1, i64 1, i64 1, i64 1>
  %3 = icmp eq <4 x i64> %2, zeroinitializer
  %4 = select <4 x i1> %1, <4 x i1> %3, <4 x i1> zeroinitializer
  %5 = or i64 %0, 1
  %6 = getelementptr i64, ptr addrspace(1) %getelementptr, i64 %5
  %7 = getelementptr i64, ptr addrspace(1) %6, i32 0
  %8 = getelementptr i64, ptr addrspace(1) %7, i32 -3
  %reverse = shufflevector <4 x i1> %4, <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
  call void @llvm.masked.store.v4i64.p1(<4 x i64> <i64 1, i64 1, i64 1, i64 1>, ptr addrspace(1) %8, i32 8, <4 x i1> %reverse)
  %index.next = add i64 %index, 4
  %vec.ind.next = add <4 x i64> %vec.ind, <i64 -4, i64 -4, i64 -4, i64 -4>
  %9 = icmp eq i64 %index.next, 12
  br i1 %9, label %middle.block, label %vector.body, !prof !3, !llvm.loop !4

Complete snippet transformation here: https://godbolt.org/z/Kvq1zerTs

99 disjoint 1 =  99
The array is in first iteration becomes:
a[96, 97, 98, 99] <— 1,0,1,0

Which makes a[99] as 0.

Before vectorization, we only did the store if iv was divisible by 2.

@annamthomas
Copy link
Contributor Author

annamthomas commented Feb 15, 2024

@fhahn I'm not sure what the right fix here is. It looks like we should not vectorize this case. Note that if we convert the disjoint or to an add before vectorization, we would still vectorize it (and have the correct answer for a[99]) :
basically: a[97, 98, 99, 100] <- 1,0,1,0. Since it is a masked store, we do not access a[100] anyway.

@dtcxzyw dtcxzyw linked a pull request Feb 16, 2024 that will close this issue
annamthomas added a commit to annamthomas/llvm-project that referenced this issue Mar 11, 2024
annamthomas added a commit to annamthomas/llvm-project that referenced this issue Mar 11, 2024
This testcase was added to show miscompile in
llvm#81872
annamthomas added a commit to annamthomas/llvm-project that referenced this issue Mar 11, 2024
This testcase was added to show miscompile in
llvm#81872
annamthomas added a commit that referenced this issue Mar 11, 2024
This testcase was added to show miscompile in
#81872
fhahn added a commit that referenced this issue Mar 19, 2024
…83821)

Dropping disjoint from an OR may yield incorrect results, as some
analysis may have converted it to an Add implicitly (e.g. SCEV used for
dependence analysis). Instead, replace it with an equivalent Add.

This is possible as all users of the disjoint OR only access lanes where
the operands are disjoint or poison otherwise.

Note that replacing all disjoint ORs with ADDs instead of dropping the
flags is not strictly necessary. It is only needed for disjoint ORs that
SCEV treated as ADDs, but those are not tracked.

There are other places that may drop poison-generating flags; those
likely need similar treatment.

Fixes #81872


PR: #83821
chencha3 pushed a commit to chencha3/llvm-project that referenced this issue Mar 23, 2024
…lvm#83821)

Dropping disjoint from an OR may yield incorrect results, as some
analysis may have converted it to an Add implicitly (e.g. SCEV used for
dependence analysis). Instead, replace it with an equivalent Add.

This is possible as all users of the disjoint OR only access lanes where
the operands are disjoint or poison otherwise.

Note that replacing all disjoint ORs with ADDs instead of dropping the
flags is not strictly necessary. It is only needed for disjoint ORs that
SCEV treated as ADDs, but those are not tracked.

There are other places that may drop poison-generating flags; those
likely need similar treatment.

Fixes llvm#81872


PR: llvm#83821
@danilaml
Copy link
Collaborator

This issue is closed as fixed but the fix was reverted.

@danilaml danilaml reopened this Mar 27, 2024
@fhahn fhahn closed this as completed in 6ef8299 Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants