Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

At some point llvm re-added pavgw intrinsics #6302

Merged
merged 2 commits into from Oct 10, 2021
Merged

Conversation

abadams
Copy link
Member

@abadams abadams commented Oct 8, 2021

This is a good thing, because these do not reliably trigger from the
pattern in runtime/x86.ll

This is a good thing, because these do not reliably trigger from the
pattern in runtime/x86.ll
@abadams abadams requested a review from dsharletg October 8, 2021 21:30
Copy link
Contributor

@dsharletg dsharletg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what version of LLVM these came back in? I guess we can just wait and see what the build bots say.

@@ -30,28 +30,6 @@ define weak_odr <8 x i16> @psubuswx8(<8 x i16> %a0, <8 x i16> %a1) nounwind alwa
ret <8 x i16> %3
}

; Note that this is only used for LLVM 6.0+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are also some things in x86_avx2.ll that should be deleted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x86_avx.ll had some. Deleted.

@abadams
Copy link
Member Author

abadams commented Oct 8, 2021

That was my plan. I'll see what the bots say.

@LebedevRI
Copy link
Contributor

This is a good thing, because these do not reliably trigger from the pattern in runtime/x86.ll

What do you mean by "reliably"? After inlining? Any particular problematic snippets?

@abadams
Copy link
Member Author

abadams commented Oct 10, 2021

When multiple averaging operations are combined together into a tree, I wasn't seeing as many pavgw instructions in the generated code as expected. This PR fixes it, and makes Halide generate many fewer ops for averaging trees. In general whenever LLVM removes an intrinsic and replaces it with pattern matching, instruction selection gets worse whenever that pattern has instructions that can be folded with surrounding ones.

This is also true of Halide in the places where we rely on pattern matching. You can emit the perfect Expr for a particular instruction, and it compiles to that instruction if that's the only thing the Func is doing, but in a more complex expression as soon as the compiler sees a chance to do some CSE or simplification, things can go south. Sadly, pattern-based instruction selection is inherently brittle. I'll go dig up the llvm IR that was causing the problem ...

@abadams
Copy link
Member Author

abadams commented Oct 10, 2021

Halide Expr:


Expr avg_u(Expr a, Expr b) {
    // should lower to pavgw on x86
    return Internal::rounding_halving_add(a, b);
}

Expr avg_d(Expr a, Expr b) {
    // lowers to (a ^ b) + ((a & b) >> 1) on x86
    return Internal::halving_add(a, b);
}

...
    Expr v4 = avg_d(v0, v1); 
    Expr v5 = avg_u(v0, v1); 
    Expr v6 = avg_u(v2, v3); 
    Expr v7 = avg_u(v4, v6); 
    Expr v8 = avg_d(v5, v7); 
    return v8;

Halide Stmt:

  let t110 = p2[ramp((((k1133$0.s0.v0.v0 + t114)*32) - p2.min.0) + 2, 1, 32)]
  let t111 = p2[ramp((((k1133$0.s0.v0.v0 + t114)*32) - p2.min.0) + 3, 1, 32)]
  let t112 = k1133$0.s0.v0.v0 + t114
  let t113 = (t112*32) - p2.min.0
  k1133$0[ramp((t112*32) - k1133$0.min.0, 1, 32)] = (uint16x32)halving_add((uint16x32)rounding_halving_add(t110, t111), (uint16x32)rounding_halving_add((uint16x32)halving_add(t110, t111), (uint16x32)rounding_halving_add(p2[ramp(t113, 1, 32)], p2[ramp(t113 + 1, 1, 32)])))

llvm IR before this PR. pavgw lowered as i16((i32(a) + i32(b) + 1)>>1)

 %indvars.iv = phi i64 [ 0, %"for k1133$0.s0.v0.v0.preheader" ], [ %indvars.iv.next, %"for k1133$0.s0.v0.v0" ]
  %"k1133$0.s0.v0.v0" = phi i32 [ 0, %"for k1133$0.s0.v0.v0.preheader" ], [ %91, %"for k1133$0.s0.v0.v0" ]
  %18 = add nsw i64 %indvars.iv, %14
  %19 = shl nsw i64 %18, 5
  %20 = sub nsw i64 %19, %15
  %21 = getelementptr inbounds i16, i16* %6, i64 %20
  %22 = getelementptr inbounds i16, i16* %21, i64 2
  %23 = bitcast i16* %22 to <32 x i16>*
  %t110 = load <32 x i16>, <32 x i16>* %23, align 2, !tbaa !25
  %24 = getelementptr inbounds i16, i16* %21, i64 3
  %25 = bitcast i16* %24 to <32 x i16>*
  %t111 = load <32 x i16>, <32 x i16>* %25, align 2, !tbaa !25
  %t112 = add nsw i32 %"k1133$0.s0.v0.v0", %12
  %26 = shl nsw i32 %t112, 5
  %t113 = sub nsw i32 %26, %8
  %27 = shufflevector <32 x i16> %t110, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  %28 = shufflevector <32 x i16> %t111, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  %29 = zext <16 x i16> %27 to <16 x i32>
  %30 = zext <16 x i16> %28 to <16 x i32>
  %31 = add nuw nsw <16 x i32> %29, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %32 = add nuw nsw <16 x i32> %31, %30
  %33 = lshr <16 x i32> %32, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %34 = trunc <16 x i32> %33 to <16 x i16>
  %35 = shufflevector <32 x i16> %t110, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %36 = shufflevector <32 x i16> %t111, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %37 = zext <16 x i16> %35 to <16 x i32>
  %38 = zext <16 x i16> %36 to <16 x i32>
  %39 = add nuw nsw <16 x i32> %37, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %40 = add nuw nsw <16 x i32> %39, %38
  %41 = lshr <16 x i32> %40, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %42 = trunc <16 x i32> %41 to <16 x i16>
  %43 = shufflevector <16 x i16> %34, <16 x i16> %42, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %44 = and <32 x i16> %t111, %t110
  %45 = xor <32 x i16> %t111, %t110
  %46 = lshr <32 x i16> %45, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
  %47 = add <32 x i16> %46, %44
  %48 = sext i32 %t113 to i64
  %49 = getelementptr inbounds i16, i16* %6, i64 %48
  %50 = bitcast i16* %49 to <32 x i16>*
  %51 = load <32 x i16>, <32 x i16>* %50, align 2, !tbaa !25
  %52 = getelementptr inbounds i16, i16* %49, i64 1
  %53 = bitcast i16* %52 to <32 x i16>*
  %54 = load <32 x i16>, <32 x i16>* %53, align 2, !tbaa !25
  %55 = shufflevector <32 x i16> %51, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  %56 = shufflevector <32 x i16> %54, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  %57 = zext <16 x i16> %55 to <16 x i32>
  %58 = zext <16 x i16> %56 to <16 x i32>
  %59 = add nuw nsw <16 x i32> %57, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %60 = add nuw nsw <16 x i32> %59, %58
  %61 = lshr <16 x i32> %60, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %62 = shufflevector <32 x i16> %51, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %63 = shufflevector <32 x i16> %54, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %64 = zext <16 x i16> %62 to <16 x i32>
  %65 = zext <16 x i16> %63 to <16 x i32>
  %66 = add nuw nsw <16 x i32> %64, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %67 = add nuw nsw <16 x i32> %66, %65
  %68 = lshr <16 x i32> %67, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %69 = shufflevector <32 x i16> %47, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  %70 = zext <16 x i16> %69 to <16 x i32>
  %71 = and <16 x i32> %61, <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
  %72 = add nuw nsw <16 x i32> %70, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %73 = add nuw nsw <16 x i32> %72, %71
  %74 = lshr <16 x i32> %73, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %75 = trunc <16 x i32> %74 to <16 x i16>
  %76 = shufflevector <32 x i16> %47, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %77 = zext <16 x i16> %76 to <16 x i32>
  %78 = and <16 x i32> %68, <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
  %79 = add nuw nsw <16 x i32> %77, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %80 = add nuw nsw <16 x i32> %79, %78
  %81 = lshr <16 x i32> %80, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %82 = trunc <16 x i32> %81 to <16 x i16>
  %83 = shufflevector <16 x i16> %75, <16 x i16> %82, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %84 = and <32 x i16> %83, %43
  %85 = xor <32 x i16> %83, %43
  %86 = lshr <32 x i16> %85, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
  %87 = add <32 x i16> %86, %84
  %88 = sub nsw i64 %19, %16
  %89 = getelementptr inbounds i16, i16* %1, i64 %88
  %90 = bitcast i16* %89 to <32 x i16>*
  store <32 x i16> %87, <32 x i16>* %90, align 2, !tbaa !28
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %91 = add nuw nsw i32 %"k1133$0.s0.v0.v0", 1
  %.not = icmp eq i64 %indvars.iv.next, %17
  br i1 %.not, label %destructor_block, label %"for k1133$0.s0.v0.v0"

llvm IR after this PR, with direct use of intrinsics:

%indvars.iv = phi i64 [ 0, %"for k1133$0.s0.v0.v0.preheader" ], [ %indvars.iv.next, %"for k1133$0.s0.v0.v0" ]
  %"k1133$0.s0.v0.v0" = phi i32 [ 0, %"for k1133$0.s0.v0.v0.preheader" ], [ %48, %"for k1133$0.s0.v0.v0" ]
  %18 = add nsw i64 %indvars.iv, %14
  %19 = shl nsw i64 %18, 5
  %20 = sub nsw i64 %19, %15
  %21 = getelementptr inbounds i16, i16* %6, i64 %20
  %22 = getelementptr inbounds i16, i16* %21, i64 2
  %23 = bitcast i16* %22 to <32 x i16>*
  %t110 = load <32 x i16>, <32 x i16>* %23, align 2, !tbaa !25
  %24 = getelementptr inbounds i16, i16* %21, i64 3
  %25 = bitcast i16* %24 to <32 x i16>*
  %t111 = load <32 x i16>, <32 x i16>* %25, align 2, !tbaa !25
  %t112 = add nsw i32 %"k1133$0.s0.v0.v0", %12
  %26 = shl nsw i32 %t112, 5
  %t113 = sub nsw i32 %26, %8
  %27 = tail call <32 x i16> @llvm.x86.avx512.pavg.w.512(<32 x i16> %t110, <32 x i16> %t111)
  %28 = and <32 x i16> %t111, %t110
  %29 = xor <32 x i16> %t111, %t110
  %30 = lshr <32 x i16> %29, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
  %31 = add <32 x i16> %30, %28
  %32 = sext i32 %t113 to i64
  %33 = getelementptr inbounds i16, i16* %6, i64 %32
  %34 = bitcast i16* %33 to <32 x i16>*
  %35 = load <32 x i16>, <32 x i16>* %34, align 2, !tbaa !25
  %36 = getelementptr inbounds i16, i16* %33, i64 1
  %37 = bitcast i16* %36 to <32 x i16>*
  %38 = load <32 x i16>, <32 x i16>* %37, align 2, !tbaa !25
  %39 = tail call <32 x i16> @llvm.x86.avx512.pavg.w.512(<32 x i16> %35, <32 x i16> %38)
  %40 = tail call <32 x i16> @llvm.x86.avx512.pavg.w.512(<32 x i16> %31, <32 x i16> %39)
  %41 = and <32 x i16> %40, %27
  %42 = xor <32 x i16> %40, %27
  %43 = lshr <32 x i16> %42, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
  %44 = add <32 x i16> %43, %41
  %45 = sub nsw i64 %19, %16
  %46 = getelementptr inbounds i16, i16* %1, i64 %45
  %47 = bitcast i16* %46 to <32 x i16>*
  store <32 x i16> %44, <32 x i16>* %47, align 2, !tbaa !28
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %48 = add nuw nsw i32 %"k1133$0.s0.v0.v0", 1
  %.not = icmp eq i64 %indvars.iv.next, %17
  br i1 %.not, label %destructor_block, label %"for k1133$0.s0.v0.v0"

Assembly before. God knows what has gone wrong here. I see two pavgw instructions, but also lots of 32-bit integer math.

	vmovdqu64	-2(%rdx,%rdi), %zmm2
	vmovdqu64	(%rdx,%rdi), %zmm3
	vmovdqu	(%rdx,%rdi), %ymm4
	vpavgw	-2(%rdx,%rdi), %ymm4, %ymm4
	vmovdqu	32(%rdx,%rdi), %ymm5
	vpavgw	30(%rdx,%rdi), %ymm5, %ymm5
	vinserti64x4	$1, %ymm5, %zmm4, %zmm4
	vpandq	%zmm2, %zmm3, %zmm5
	vpxorq	%zmm2, %zmm3, %zmm2
	vpsrlw	$1, %zmm2, %zmm2
	vpaddw	%zmm5, %zmm2, %zmm2
	vpmovzxwd	-2(%rsi,%rdi), %zmm3   
	vpmovzxwd	(%rsi,%rdi), %zmm5    
	vpaddd	%zmm5, %zmm3, %zmm3
	vpsubd	%zmm0, %zmm3, %zmm3
	vpmovzxwd	30(%rsi,%rdi), %zmm5   
	vpsrld	$1, %zmm3, %zmm3
	vpmovzxwd	32(%rsi,%rdi), %zmm6  
	vpaddd	%zmm6, %zmm5, %zmm5
	vpsubd	%zmm0, %zmm5, %zmm5
	vpmovzxwd	%ymm2, %zmm6           
	vpandq	%zmm1, %zmm3, %zmm3
	vpaddd	%zmm3, %zmm6, %zmm3
	vpsubd	%zmm0, %zmm3, %zmm3
	vpsrld	$1, %zmm3, %zmm3
	vpmovdw	%zmm3, %ymm3
	vpsrld	$1, %zmm5, %zmm5
	vextracti64x4	$1, %zmm2, %ymm2
	vpmovzxwd	%ymm2, %zmm2          
	vpandq	%zmm1, %zmm5, %zmm5
	vpaddd	%zmm5, %zmm2, %zmm2
	vpsubd	%zmm0, %zmm2, %zmm2
	vpsrld	$1, %zmm2, %zmm2
	vpmovdw	%zmm2, %ymm2
	vinserti64x4	$1, %ymm2, %zmm3, %zmm2
	vpandq	%zmm4, %zmm2, %zmm3
	vpxorq	%zmm4, %zmm2, %zmm2
	vpsrlw	$1, %zmm2, %zmm2
	vpaddw	%zmm3, %zmm2, %zmm2
	vmovdqu64	%zmm2, (%rcx,%rdi)

Assembly after using intrinsics directly:

        vmovdqu64	-2(%rdx,%rdi), %zmm0
	vmovdqu64	(%rdx,%rdi), %zmm1
	vpavgw	%zmm1, %zmm0, %zmm2
	vpandq	%zmm0, %zmm1, %zmm3
	vpxorq	%zmm0, %zmm1, %zmm0
	vpsrlw	$1, %zmm0, %zmm0
	vmovdqu64	-2(%rsi,%rdi), %zmm1
	vpaddw	%zmm3, %zmm0, %zmm0
	vpavgw	(%rsi,%rdi), %zmm1, %zmm1
	vpavgw	%zmm1, %zmm0, %zmm0
	vpandq	%zmm2, %zmm0, %zmm1
	vpxorq	%zmm2, %zmm0, %zmm0
	vpsrlw	$1, %zmm0, %zmm0
	vpaddw	%zmm1, %zmm0, %zmm0
	vmovdqu64	%zmm0, (%rcx,%rdi)

@abadams abadams merged commit 2a2c4b0 into master Oct 10, 2021
@LebedevRI
Copy link
Contributor

LebedevRI commented Oct 10, 2021

Thanks!

Assembly before. God knows what has gone wrong here. I see two pavgw instructions, but also lots of 32-bit integer math.

(on godbolt: https://godbolt.org/z/T63WE4xcn)

@abadams
Copy link
Member Author

abadams commented Oct 11, 2021

Also being fixed upstream:
https://reviews.llvm.org/D111571

Thanks @LebedevRI !

I'll leave this PR as-is, so that we have two chances to catch pavgw. If we miss it in our pattern-matching, hopefully llvm will catch it for us.

LebedevRI added a commit to llvm/llvm-project that referenced this pull request Oct 12, 2021
…131)

As noted in halide/Halide#6302,
we hilariously fail to match PAVG if we even as much
as look at it the wrong way.

In this particular case, the problem stems from the fact that
`PAVG` root (def) is a `trunc`, and leafs (uses) are `zext`'s,
and InstCombine really loves to get rid of both of these,
for example replace them with a bit mask. So we may not have
said `zext`.

Instead of checking for that + type match,
i think we should rely on the actual active type,
as per the knownbits.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111571
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this pull request Oct 7, 2022
…131)

As noted in halide/Halide#6302,
we hilariously fail to match PAVG if we even as much
as look at it the wrong way.

In this particular case, the problem stems from the fact that
`PAVG` root (def) is a `trunc`, and leafs (uses) are `zext`'s,
and InstCombine really loves to get rid of both of these,
for example replace them with a bit mask. So we may not have
said `zext`.

Instead of checking for that + type match,
i think we should rely on the actual active type,
as per the knownbits.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111571
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants