Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed SLP vectorization #49277

Closed
davidbolvansky opened this issue Apr 12, 2021 · 8 comments
Closed

Missed SLP vectorization #49277

davidbolvansky opened this issue Apr 12, 2021 · 8 comments
Labels
bugzilla Issues migrated from bugzilla confirmed Verified by a second party llvm:SLPVectorizer

Comments

@davidbolvansky
Copy link
Collaborator

davidbolvansky commented Apr 12, 2021

Bugzilla Link 49933
Version trunk
OS Windows NT
CC @alexey-bataev,@anton-afanasyev,@RKSimon,@rotateright

Extended Description

typedef unsigned char uint8_t;

static inline uint8_t bar( uint8_t x )
{
  return x&(~63) ? -x : x;
}

void foo( uint8_t *__restrict dst, uint8_t *__restrict src)
{   
	for( int x = 0; x < 8; x++ )
	    dst[x] = bar(src[x]);
}

ICC:

foo(unsigned char*, unsigned char*):
        vpmovzxbd ymm2, QWORD PTR [rsi]                         #5.25
        vpand     ymm0, ymm2, YMMWORD PTR .L_2il0floatpacket.0[rip] #5.12
        vpxor     ymm1, ymm1, ymm1                              #5.25
        vptestmd  k1, ymm0, ymm0                                #5.12
        vpsubd    ymm2{k1}, ymm1, ymm2                          #5.25
        vpmovdb   QWORD PTR [rdi], ymm2                         #13.6
        vzeroupper                                              #14.1
        ret    

LLVM does not vectorize it with avx/avx2/avx512 - cost model issue?

https://godbolt.org/z/Kheeec4cG

@davidbolvansky
Copy link
Collaborator Author

Ok, with typedef unsigned short uint8_t;

LLVM produces good codegen. So it looks like a cost model issue for (U)INT8.

@RKSimon
Copy link
Collaborator

RKSimon commented Apr 12, 2021

Not sure if its purely a cost model issue, but also to do with 8 x uint8_t being smaller than the 128-bit vector target minimum.

@davidbolvansky
Copy link
Collaborator Author

But according to llvm-mca, ICC's codegen is much better

Block RThroughput is 4, for LLVM is 9.8

@davidbolvansky
Copy link
Collaborator Author

davidbolvansky commented Apr 25, 2021

With -mllvm -slp-min-reg-size=64

We have this nice codegen

foo(unsigned char*, unsigned char*):                             # @foo(unsigned char*, unsigned char*)
        vmovq   xmm0, qword ptr [rsi]           # xmm0 = mem[0],zero
        vpcmpltub       k1, xmm0, xmmword ptr [rip + .LCPI0_0]
        vpxor   xmm1, xmm1, xmm1
        vpsubb  xmm1, xmm1, xmm0
        vmovdqu8        xmm1 {k1}, xmm0
        vmovq   qword ptr [rdi], xmm1
        ret

Block RThroughput: 1.8

@anton-afanasyev
Copy link
Contributor

anton-afanasyev commented Jul 23, 2021

The best way to fix this bug is to wait for this patch to land: https://reviews.llvm.org/D57059 ("non-power-of-2 vectors"). I've checked that it works better:

foo(unsigned char*, unsigned char*):
        vmovq   xmm0, qword ptr [rsi]           # xmm0 = mem[0],zero
        vpcmpltub       k1, xmm0, xmmword ptr [rip + .LCPI0_0]
        vpxor   xmm1, xmm1, xmm1
        vpsubb  xmm1, xmm1, xmm0
        vmovdqu8        xmm1 {k1}, xmm0
        mov     ax, 255
        kmovd   k1, eax
        vmovdqu8        xmmword ptr [rdi] {k1}, xmm1
        ret

Although this vectorized codegen is different compared to -slp-min-reg-size=64: Block RThroughput: 2.2

This difference comes from using @llvm.masked.store instead of store.

@anton-afanasyev
Copy link
Contributor

Added test to track issue: https://reviews.llvm.org/rGdd028c359e09

@davidbolvansky
Copy link
Collaborator Author

mentioned in issue llvm/llvm-bugzilla-archive#49934

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
@llvmbot llvmbot added the confirmed Verified by a second party label Jan 26, 2022
@RKSimon
Copy link
Collaborator

RKSimon commented May 14, 2022

Fixed in 9dc4ced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla confirmed Verified by a second party llvm:SLPVectorizer
Projects
None yet
Development

No branches or pull requests

5 participants