Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect instcombine fold of vector ult -> sgt #45299

Open
nunoplopes opened this issue May 17, 2020 · 3 comments
Open

Incorrect instcombine fold of vector ult -> sgt #45299

nunoplopes opened this issue May 17, 2020 · 3 comments
Labels

Comments

@nunoplopes
Copy link
Member

nunoplopes commented May 17, 2020

Bugzilla Link 45954
Version trunk
OS All
CC @aqjune,@LebedevRI,@RKSimon,@regehr,@rotateright

Extended Description

Test: Transforms/InstCombine/canonicalize-clamp-like-pattern-between-zero-and-positive-threshold.ll
Summary: There's a off-by-one in the transformation below that makes the optimize code return the wrong value.
The original code can return either %x or %replacement_low, while the optimized code returns %replacement_high for %x[1] == 65536.

define <3 x i32> @​t20_ult_slt_vec_undef1(<3 x i32> %x, <3 x i32> %replacement_low, <3 x i32> %replacement_high) {
%t0 = icmp slt <3 x i32> %x, { 65536, 65537, 65536 }
%t1 = select <3 x i1> %t0, <3 x i32> %replacement_low, <3 x i32> %replacement_high
%t2 = icmp ult <3 x i32> %x, { 65536, undef, 65536 }
%r = select <3 x i1> %t2, <3 x i32> %x, <3 x i32> %t1
ret <3 x i32> %r
}
=>
define <3 x i32> @​t20_ult_slt_vec_undef1(<3 x i32> %x, <3 x i32> %replacement_low, <3 x i32> %replacement_high) {
%1 = icmp slt <3 x i32> %x, { 0, 0, 0 }
%2 = icmp sgt <3 x i32> %x, { 65535, 65535, 65535 }
%3 = select <3 x i1> %1, <3 x i32> %replacement_low, <3 x i32> %x
%r = select <3 x i1> %2, <3 x i32> %replacement_high, <3 x i32> %3
ret <3 x i32> %r
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
<3 x i32> %x = < #x0000ffff (65535), #x00010000 (65536), #x08000000 (134217728) >
<3 x i32> %replacement_low = < *, #x00000000 (0), * >
<3 x i32> %replacement_high = < *, #x00001000 (4096), #x0000ffff (65535) >

Source:
<3 x i1> %t0 = < #x1 (1), #x1 (1), #x0 (0) >
<3 x i32> %t1 = < *, #x00000000 (0), #x0000ffff (65535) >
<3 x i1> %t2 = < #x1 (1), undef, #x0 (0) >
<3 x i32> %r = < #x0000ffff (65535), #x00000000 (0) [based on undef value], #x0000ffff (65535) >

Target:
<3 x i1> %1 = < #x0 (0), #x0 (0), #x0 (0) >
<3 x i1> %2 = < #x0 (0), #x1 (1), #x1 (1) >
<3 x i32> %3 = < #x0000ffff (65535), #x00010000 (65536), #x08000000 (134217728) >
<3 x i32> %r = < #x0000ffff (65535), #x00001000 (4096), #x0000ffff (65535) >
Source value: < #x0000ffff (65535), #x00000000 (0), #x0000ffff (65535) >
Target value: < #x0000ffff (65535), #x00001000 (4096), #x0000ffff (65535) >

https://web.ist.utl.pt/nuno.lopes/alive2/index.php?hash=72296a443c683892&test=Transforms%2FInstCombine%2Fcanonicalize-clamp-like-pattern-between-zero-and-positive-threshold.ll

@LebedevRI
Copy link
Member

LebedevRI commented May 17, 2020

Hmm.
But then why @​t19_ult_slt_vec_undef2 is fine?
There undef's could be 65537 and 65536, and we'd have the same issue?

@LebedevRI
Copy link
Member

LebedevRI commented May 17, 2020

Hmm.
But then why @​t19_ult_slt_vec_undef2 is fine?
@​t21_ult_slt_vec_undef2 that is of course.

There undef's could be 65537 and 65536, and we'd have the same issue?

@aqjune
Copy link
Contributor

aqjune commented Sep 3, 2020

Hi,

It is because the transformation is fine if we can choose a value for undef in source that explains it.
We can choose the two undefs to be 65536 and 65536, and then the transformation is correct, so it is okay.

Another example:

// you can assume that y is poison, because undef can be chosen to be NaN.
y = fadd nnan x, undef
use(y)
=>
use(poison)

On the contrary, if the target has undef, it becomes tough.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants