At some point llvm re-added pavgw intrinsics #6302

abadams · 2021-10-08T21:30:16Z

This is a good thing, because these do not reliably trigger from the
pattern in runtime/x86.ll

This is a good thing, because these do not reliably trigger from the pattern in runtime/x86.ll

dsharletg

Do we know what version of LLVM these came back in? I guess we can just wait and see what the build bots say.

dsharletg · 2021-10-08T22:02:18Z

src/runtime/x86.ll

@@ -30,28 +30,6 @@ define weak_odr <8 x i16> @psubuswx8(<8 x i16> %a0, <8 x i16> %a1) nounwind alwa
  ret <8 x i16> %3
 }

-; Note that this is only used for LLVM 6.0+


Looks like there are also some things in x86_avx2.ll that should be deleted?

x86_avx.ll had some. Deleted.

abadams · 2021-10-08T22:04:50Z

That was my plan. I'll see what the bots say.

LebedevRI · 2021-10-10T19:29:00Z

This is a good thing, because these do not reliably trigger from the pattern in runtime/x86.ll

What do you mean by "reliably"? After inlining? Any particular problematic snippets?

abadams · 2021-10-10T20:53:03Z

When multiple averaging operations are combined together into a tree, I wasn't seeing as many pavgw instructions in the generated code as expected. This PR fixes it, and makes Halide generate many fewer ops for averaging trees. In general whenever LLVM removes an intrinsic and replaces it with pattern matching, instruction selection gets worse whenever that pattern has instructions that can be folded with surrounding ones.

This is also true of Halide in the places where we rely on pattern matching. You can emit the perfect Expr for a particular instruction, and it compiles to that instruction if that's the only thing the Func is doing, but in a more complex expression as soon as the compiler sees a chance to do some CSE or simplification, things can go south. Sadly, pattern-based instruction selection is inherently brittle. I'll go dig up the llvm IR that was causing the problem ...

abadams · 2021-10-10T21:05:43Z

Halide Expr:


Expr avg_u(Expr a, Expr b) {
    // should lower to pavgw on x86
    return Internal::rounding_halving_add(a, b);
}

Expr avg_d(Expr a, Expr b) {
    // lowers to (a ^ b) + ((a & b) >> 1) on x86
    return Internal::halving_add(a, b);
}

...
    Expr v4 = avg_d(v0, v1); 
    Expr v5 = avg_u(v0, v1); 
    Expr v6 = avg_u(v2, v3); 
    Expr v7 = avg_u(v4, v6); 
    Expr v8 = avg_d(v5, v7); 
    return v8;

Halide Stmt:

  let t110 = p2[ramp((((k1133$0.s0.v0.v0 + t114)*32) - p2.min.0) + 2, 1, 32)]
  let t111 = p2[ramp((((k1133$0.s0.v0.v0 + t114)*32) - p2.min.0) + 3, 1, 32)]
  let t112 = k1133$0.s0.v0.v0 + t114
  let t113 = (t112*32) - p2.min.0
  k1133$0[ramp((t112*32) - k1133$0.min.0, 1, 32)] = (uint16x32)halving_add((uint16x32)rounding_halving_add(t110, t111), (uint16x32)rounding_halving_add((uint16x32)halving_add(t110, t111), (uint16x32)rounding_halving_add(p2[ramp(t113, 1, 32)], p2[ramp(t113 + 1, 1, 32)])))

llvm IR before this PR. pavgw lowered as i16((i32(a) + i32(b) + 1)>>1)

 %indvars.iv = phi i64 [ 0, %"for k1133$0.s0.v0.v0.preheader" ], [ %indvars.iv.next, %"for k1133$0.s0.v0.v0" ]
  %"k1133$0.s0.v0.v0" = phi i32 [ 0, %"for k1133$0.s0.v0.v0.preheader" ], [ %91, %"for k1133$0.s0.v0.v0" ]
  %18 = add nsw i64 %indvars.iv, %14
  %19 = shl nsw i64 %18, 5
  %20 = sub nsw i64 %19, %15
  %21 = getelementptr inbounds i16, i16* %6, i64 %20
  %22 = getelementptr inbounds i16, i16* %21, i64 2
  %23 = bitcast i16* %22 to <32 x i16>*
  %t110 = load <32 x i16>, <32 x i16>* %23, align 2, !tbaa !25
  %24 = getelementptr inbounds i16, i16* %21, i64 3
  %25 = bitcast i16* %24 to <32 x i16>*
  %t111 = load <32 x i16>, <32 x i16>* %25, align 2, !tbaa !25
  %t112 = add nsw i32 %"k1133$0.s0.v0.v0", %12
  %26 = shl nsw i32 %t112, 5
  %t113 = sub nsw i32 %26, %8
  %27 = shufflevector <32 x i16> %t110, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  %28 = shufflevector <32 x i16> %t111, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  %29 = zext <16 x i16> %27 to <16 x i32>
  %30 = zext <16 x i16> %28 to <16 x i32>
  %31 = add nuw nsw <16 x i32> %29, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %32 = add nuw nsw <16 x i32> %31, %30
  %33 = lshr <16 x i32> %32, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %34 = trunc <16 x i32> %33 to <16 x i16>
  %35 = shufflevector <32 x i16> %t110, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %36 = shufflevector <32 x i16> %t111, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %37 = zext <16 x i16> %35 to <16 x i32>
  %38 = zext <16 x i16> %36 to <16 x i32>
  %39 = add nuw nsw <16 x i32> %37, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %40 = add nuw nsw <16 x i32> %39, %38
  %41 = lshr <16 x i32> %40, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %42 = trunc <16 x i32> %41 to <16 x i16>
  %43 = shufflevector <16 x i16> %34, <16 x i16> %42, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %44 = and <32 x i16> %t111, %t110
  %45 = xor <32 x i16> %t111, %t110
  %46 = lshr <32 x i16> %45, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
  %47 = add <32 x i16> %46, %44
  %48 = sext i32 %t113 to i64
  %49 = getelementptr inbounds i16, i16* %6, i64 %48
  %50 = bitcast i16* %49 to <32 x i16>*
  %51 = load <32 x i16>, <32 x i16>* %50, align 2, !tbaa !25
  %52 = getelementptr inbounds i16, i16* %49, i64 1
  %53 = bitcast i16* %52 to <32 x i16>*
  %54 = load <32 x i16>, <32 x i16>* %53, align 2, !tbaa !25
  %55 = shufflevector <32 x i16> %51, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  %56 = shufflevector <32 x i16> %54, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  %57 = zext <16 x i16> %55 to <16 x i32>
  %58 = zext <16 x i16> %56 to <16 x i32>
  %59 = add nuw nsw <16 x i32> %57, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %60 = add nuw nsw <16 x i32> %59, %58
  %61 = lshr <16 x i32> %60, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %62 = shufflevector <32 x i16> %51, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %63 = shufflevector <32 x i16> %54, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %64 = zext <16 x i16> %62 to <16 x i32>
  %65 = zext <16 x i16> %63 to <16 x i32>
  %66 = add nuw nsw <16 x i32> %64, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %67 = add nuw nsw <16 x i32> %66, %65
  %68 = lshr <16 x i32> %67, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %69 = shufflevector <32 x i16> %47, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  %70 = zext <16 x i16> %69 to <16 x i32>
  %71 = and <16 x i32> %61, <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
  %72 = add nuw nsw <16 x i32> %70, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %73 = add nuw nsw <16 x i32> %72, %71
  %74 = lshr <16 x i32> %73, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %75 = trunc <16 x i32> %74 to <16 x i16>
  %76 = shufflevector <32 x i16> %47, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %77 = zext <16 x i16> %76 to <16 x i32>
  %78 = and <16 x i32> %68, <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
  %79 = add nuw nsw <16 x i32> %77, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %80 = add nuw nsw <16 x i32> %79, %78
  %81 = lshr <16 x i32> %80, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %82 = trunc <16 x i32> %81 to <16 x i16>
  %83 = shufflevector <16 x i16> %75, <16 x i16> %82, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %84 = and <32 x i16> %83, %43
  %85 = xor <32 x i16> %83, %43
  %86 = lshr <32 x i16> %85, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
  %87 = add <32 x i16> %86, %84
  %88 = sub nsw i64 %19, %16
  %89 = getelementptr inbounds i16, i16* %1, i64 %88
  %90 = bitcast i16* %89 to <32 x i16>*
  store <32 x i16> %87, <32 x i16>* %90, align 2, !tbaa !28
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %91 = add nuw nsw i32 %"k1133$0.s0.v0.v0", 1
  %.not = icmp eq i64 %indvars.iv.next, %17
  br i1 %.not, label %destructor_block, label %"for k1133$0.s0.v0.v0"

llvm IR after this PR, with direct use of intrinsics:

%indvars.iv = phi i64 [ 0, %"for k1133$0.s0.v0.v0.preheader" ], [ %indvars.iv.next, %"for k1133$0.s0.v0.v0" ]
  %"k1133$0.s0.v0.v0" = phi i32 [ 0, %"for k1133$0.s0.v0.v0.preheader" ], [ %48, %"for k1133$0.s0.v0.v0" ]
  %18 = add nsw i64 %indvars.iv, %14
  %19 = shl nsw i64 %18, 5
  %20 = sub nsw i64 %19, %15
  %21 = getelementptr inbounds i16, i16* %6, i64 %20
  %22 = getelementptr inbounds i16, i16* %21, i64 2
  %23 = bitcast i16* %22 to <32 x i16>*
  %t110 = load <32 x i16>, <32 x i16>* %23, align 2, !tbaa !25
  %24 = getelementptr inbounds i16, i16* %21, i64 3
  %25 = bitcast i16* %24 to <32 x i16>*
  %t111 = load <32 x i16>, <32 x i16>* %25, align 2, !tbaa !25
  %t112 = add nsw i32 %"k1133$0.s0.v0.v0", %12
  %26 = shl nsw i32 %t112, 5
  %t113 = sub nsw i32 %26, %8
  %27 = tail call <32 x i16> @llvm.x86.avx512.pavg.w.512(<32 x i16> %t110, <32 x i16> %t111)
  %28 = and <32 x i16> %t111, %t110
  %29 = xor <32 x i16> %t111, %t110
  %30 = lshr <32 x i16> %29, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
  %31 = add <32 x i16> %30, %28
  %32 = sext i32 %t113 to i64
  %33 = getelementptr inbounds i16, i16* %6, i64 %32
  %34 = bitcast i16* %33 to <32 x i16>*
  %35 = load <32 x i16>, <32 x i16>* %34, align 2, !tbaa !25
  %36 = getelementptr inbounds i16, i16* %33, i64 1
  %37 = bitcast i16* %36 to <32 x i16>*
  %38 = load <32 x i16>, <32 x i16>* %37, align 2, !tbaa !25
  %39 = tail call <32 x i16> @llvm.x86.avx512.pavg.w.512(<32 x i16> %35, <32 x i16> %38)
  %40 = tail call <32 x i16> @llvm.x86.avx512.pavg.w.512(<32 x i16> %31, <32 x i16> %39)
  %41 = and <32 x i16> %40, %27
  %42 = xor <32 x i16> %40, %27
  %43 = lshr <32 x i16> %42, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
  %44 = add <32 x i16> %43, %41
  %45 = sub nsw i64 %19, %16
  %46 = getelementptr inbounds i16, i16* %1, i64 %45
  %47 = bitcast i16* %46 to <32 x i16>*
  store <32 x i16> %44, <32 x i16>* %47, align 2, !tbaa !28
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %48 = add nuw nsw i32 %"k1133$0.s0.v0.v0", 1
  %.not = icmp eq i64 %indvars.iv.next, %17
  br i1 %.not, label %destructor_block, label %"for k1133$0.s0.v0.v0"

Assembly before. God knows what has gone wrong here. I see two pavgw instructions, but also lots of 32-bit integer math.

	vmovdqu64	-2(%rdx,%rdi), %zmm2
	vmovdqu64	(%rdx,%rdi), %zmm3
	vmovdqu	(%rdx,%rdi), %ymm4
	vpavgw	-2(%rdx,%rdi), %ymm4, %ymm4
	vmovdqu	32(%rdx,%rdi), %ymm5
	vpavgw	30(%rdx,%rdi), %ymm5, %ymm5
	vinserti64x4	$1, %ymm5, %zmm4, %zmm4
	vpandq	%zmm2, %zmm3, %zmm5
	vpxorq	%zmm2, %zmm3, %zmm2
	vpsrlw	$1, %zmm2, %zmm2
	vpaddw	%zmm5, %zmm2, %zmm2
	vpmovzxwd	-2(%rsi,%rdi), %zmm3   
	vpmovzxwd	(%rsi,%rdi), %zmm5    
	vpaddd	%zmm5, %zmm3, %zmm3
	vpsubd	%zmm0, %zmm3, %zmm3
	vpmovzxwd	30(%rsi,%rdi), %zmm5   
	vpsrld	$1, %zmm3, %zmm3
	vpmovzxwd	32(%rsi,%rdi), %zmm6  
	vpaddd	%zmm6, %zmm5, %zmm5
	vpsubd	%zmm0, %zmm5, %zmm5
	vpmovzxwd	%ymm2, %zmm6           
	vpandq	%zmm1, %zmm3, %zmm3
	vpaddd	%zmm3, %zmm6, %zmm3
	vpsubd	%zmm0, %zmm3, %zmm3
	vpsrld	$1, %zmm3, %zmm3
	vpmovdw	%zmm3, %ymm3
	vpsrld	$1, %zmm5, %zmm5
	vextracti64x4	$1, %zmm2, %ymm2
	vpmovzxwd	%ymm2, %zmm2          
	vpandq	%zmm1, %zmm5, %zmm5
	vpaddd	%zmm5, %zmm2, %zmm2
	vpsubd	%zmm0, %zmm2, %zmm2
	vpsrld	$1, %zmm2, %zmm2
	vpmovdw	%zmm2, %ymm2
	vinserti64x4	$1, %ymm2, %zmm3, %zmm2
	vpandq	%zmm4, %zmm2, %zmm3
	vpxorq	%zmm4, %zmm2, %zmm2
	vpsrlw	$1, %zmm2, %zmm2
	vpaddw	%zmm3, %zmm2, %zmm2
	vmovdqu64	%zmm2, (%rcx,%rdi)

Assembly after using intrinsics directly:

        vmovdqu64	-2(%rdx,%rdi), %zmm0
	vmovdqu64	(%rdx,%rdi), %zmm1
	vpavgw	%zmm1, %zmm0, %zmm2
	vpandq	%zmm0, %zmm1, %zmm3
	vpxorq	%zmm0, %zmm1, %zmm0
	vpsrlw	$1, %zmm0, %zmm0
	vmovdqu64	-2(%rsi,%rdi), %zmm1
	vpaddw	%zmm3, %zmm0, %zmm0
	vpavgw	(%rsi,%rdi), %zmm1, %zmm1
	vpavgw	%zmm1, %zmm0, %zmm0
	vpandq	%zmm2, %zmm0, %zmm1
	vpxorq	%zmm2, %zmm0, %zmm0
	vpsrlw	$1, %zmm0, %zmm0
	vpaddw	%zmm1, %zmm0, %zmm0
	vmovdqu64	%zmm0, (%rcx,%rdi)

LebedevRI · 2021-10-10T21:14:29Z

Thanks!

Assembly before. God knows what has gone wrong here. I see two pavgw instructions, but also lots of 32-bit integer math.

(on godbolt: https://godbolt.org/z/T63WE4xcn)

abadams · 2021-10-11T19:09:49Z

Also being fixed upstream:
https://reviews.llvm.org/D111571

Thanks @LebedevRI !

I'll leave this PR as-is, so that we have two chances to catch pavgw. If we miss it in our pattern-matching, hopefully llvm will catch it for us.

…131) As noted in halide/Halide#6302, we hilariously fail to match PAVG if we even as much as look at it the wrong way. In this particular case, the problem stems from the fact that `PAVG` root (def) is a `trunc`, and leafs (uses) are `zext`'s, and InstCombine really loves to get rid of both of these, for example replace them with a bit mask. So we may not have said `zext`. Instead of checking for that + type match, i think we should rely on the actual active type, as per the knownbits. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111571

At some point llvm re-added pavgw intrinsics

c8d98fd

This is a good thing, because these do not reliably trigger from the pattern in runtime/x86.ll

abadams requested a review from dsharletg October 8, 2021 21:30

dsharletg reviewed Oct 8, 2021

View reviewed changes

Delete more dead code

99d3795

dsharletg approved these changes Oct 8, 2021

View reviewed changes

abadams merged commit 2a2c4b0 into master Oct 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

At some point llvm re-added pavgw intrinsics #6302

At some point llvm re-added pavgw intrinsics #6302

abadams commented Oct 8, 2021

dsharletg left a comment

dsharletg Oct 8, 2021

abadams Oct 8, 2021

abadams commented Oct 8, 2021

LebedevRI commented Oct 10, 2021

abadams commented Oct 10, 2021 •

edited

abadams commented Oct 10, 2021

LebedevRI commented Oct 10, 2021 •

edited

abadams commented Oct 11, 2021

At some point llvm re-added pavgw intrinsics #6302

At some point llvm re-added pavgw intrinsics #6302

Conversation

abadams commented Oct 8, 2021

dsharletg left a comment

Choose a reason for hiding this comment

dsharletg Oct 8, 2021

Choose a reason for hiding this comment

abadams Oct 8, 2021

Choose a reason for hiding this comment

abadams commented Oct 8, 2021

LebedevRI commented Oct 10, 2021

abadams commented Oct 10, 2021 • edited

abadams commented Oct 10, 2021

LebedevRI commented Oct 10, 2021 • edited

abadams commented Oct 11, 2021

abadams commented Oct 10, 2021 •

edited

LebedevRI commented Oct 10, 2021 •

edited