Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] Worse code generation from patch 24780e13e5be1501e34330148137a10fa9965166 #61923

Closed
xortator opened this issue Apr 4, 2023 · 4 comments
Assignees

Comments

@xortator
Copy link
Contributor

xortator commented Apr 4, 2023

We see worse code generation on memcmp-like function caused by

commit 24780e13e5be1501e34330148137a10fa9965166 (HEAD -> main)
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Sat Apr 1 15:38:38 2023 +0100

    [X86] MatchVectorAllEqualTest - add support for icmp(reduce_and(X),-1) allof reduction patterns

    Also, improve codegen in LowerVectorAllEqual for X == -1 cases to reduce over sized vector using a AND reduction

Asm on last release: https://godbolt.org/z/Ebcd7P3jP
Current asm: https://godbolt.org/z/nxv1vc16e

Loop block changed from

.LBB0_2:                                # %memcmp.loop
        vmovdqu (%rsi,%rcx), %xmm0
        vmovdqu 16(%rsi,%rcx), %xmm1
        vpsubb  16(%rdi,%rcx), %xmm1, %xmm1
        vpsubb  (%rdi,%rcx), %xmm0, %xmm0
        vpor    %xmm1, %xmm0, %xmm0
        vptest  %xmm0, %xmm0
        jne     .LBB0_4
        addq    $32, %rcx
        cmpq    %rax, %rcx
        jb      .LBB0_2

to

.LBB0_2:                                # %memcmp.loop
        vmovdqu (%rsi,%rcx), %xmm0
        vmovdqu 16(%rsi,%rcx), %xmm1
        vpcmpeqb        (%rdi,%rcx), %xmm0, %xmm0
        vpmovmskb       %xmm0, %edx
        vpcmpeqb        16(%rdi,%rcx), %xmm1, %xmm0
        vpmovmskb       %xmm0, %r8d
        shll    $16, %r8d
        orl     %edx, %r8d
        cmpl    $-1, %r8d
        jne     .LBB0_4
        addq    $32, %rcx
        cmpq    %rax, %rcx
        jb      .LBB0_2

I am not sure into how big performance regression this translates (build & benchmark runs are underway), but the assembly definitely looks worse in the loop block.

@xortator
Copy link
Contributor Author

xortator commented Apr 4, 2023

@RKSimon can you please take a look?

@xortator
Copy link
Contributor Author

xortator commented Apr 4, 2023

Reduced the test (links updated).

xortator added a commit that referenced this issue Apr 4, 2023
We see increased number of assembly instructions after patch
24780e1 for this test. See details
at #61923.
@RKSimon RKSimon self-assigned this Apr 4, 2023
@RKSimon
Copy link
Collaborator

RKSimon commented Apr 4, 2023

This will be fixed by https://reviews.llvm.org/D147452

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 4, 2023

@llvm/issue-subscribers-backend-x86

gysit pushed a commit to nextsilicon/llvm-project that referenced this issue Apr 27, 2023
We see increased number of assembly instructions after patch
24780e1 for this test. See details
at llvm#61923.
gysit pushed a commit to nextsilicon/llvm-project that referenced this issue Apr 27, 2023
… bitcast(<X x i1> V)) canonicalization

This already exists in InstCombine but was missing from the late stage ExpandReductions pass

Fixes llvm#53419
Fixes llvm#61923

Differential Revision: https://reviews.llvm.org/D147452
DianQK pushed a commit to DianQK/llvm-project that referenced this issue Oct 10, 2023
We see increased number of assembly instructions after patch
24780e1 for this test. See details
at llvm#61923.
DianQK pushed a commit to DianQK/llvm-project that referenced this issue Oct 10, 2023
… bitcast(<X x i1> V)) canonicalization

This already exists in InstCombine but was missing from the late stage ExpandReductions pass

Fixes llvm#53419
Fixes llvm#61923

Differential Revision: https://reviews.llvm.org/D147452
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants